Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

English | 中文版

Appendix H: Safety Differential Analysis

Analysis of 998 CANN 8.5 kernel pairs (AscendC C++ vs ascend-rs Rust) from the real ascendc-to-rs transpilation batch — same corpus surveyed in Appendix G.

For each kernel, we identify which memory-safety vulnerability classes exist in the C++ version and how the Rust transpilation prevents them. The six classes below are structural properties of the AscendC programming model; they apply uniformly regardless of operator category.

Scope note — two fidelity tiers. Of the 998 kernels, 247 are Transpiled (body carries the C++ compute intrinsics) and 751 are Registered (body is an identity stub; signature and ABI are real). The safety-class counts in §H.1 / §H.2 analyse the C++ source — i.e. the hazards present in the operator the user would write by hand. The “Rust Prevention” column refers to structural properties of the ascend-rs API (typed pointers, absence of FreeTensor, composite intrinsics with built-in barriers): these apply to any kernel routed through the transpiler, whether its body is currently Transpiled or Registered, because they are properties of the generated ABI and imported API surface — not the body contents.

H.1 Safety Class Summary

#Safety ClassC++ RiskRust PreventionKernels Affected
1Type ConfusionGM_ADDR type erasureTyped pointer signature (*const T)983/998 (98%)
2Buffer OverflowGetValue(i)/SetValue(i, v) with i >= countOpaque buffer ID + explicit count parameter9/998 (1%)
3Use-After-FreeFreeTensor() leaves stale handleNo FreeTensor operation in the ascend-rs API3/998 (0.3%)
4Missing SynchronizationDMA→compute without pipe_barrier()kernel_ops composites include barriers internally793/998 (79%)
5Double FreeFreeTensor() called twice on the same handleNo FreeTensor operation in the ascend-rs API3/998 (0.3%)
6Integer Overflowu32 arithmetic: blockIdx * perBlockLenwrapping_mul makes overflow semantics explicit785/998 (78%)

H.2 Category Breakdown

Counts below are scaled to the real ascendc-to-rs batch categories (see Appendix G §G.1). Type Confusion, Missing Synchronization, and Integer Overflow are structural — they affect nearly every kernel. Buffer Overflow / UAF / Double Free are rare and cluster in the operators that maintain explicit LocalTensor lifetimes (primarily ops_nn and ops_transformer).

CategoryTotalC1: TypeC2: BoundsC3: UAFC4: SyncC5: DblFreeC6: Overflow
ops_cv41410033032
ops_legacy343343002730270
ops_math155155001230121
ops_nn306301632433240
ops_oam3300202
ops_transformer150140301190120
Total998983937933785

H.3 Counter-Example Inputs

For each safety class, a counter-example input that triggers the vulnerability in C++ but is caught or prevented in Rust. The example kernels are drawn from the real ascendc-to-rs batch.

Evidence scope. Where an example kernel is currently a Registered identity stub (see Appendix G), the cited blockIdx * perBlockLen / FreeTensor / GetValue pattern is in the original C++ source at cann_kernels/<kernel>/<kernel>.cpp, not in the current .rs body. The Rust prevention mechanism is structural (typed pointers, API surface, composite intrinsics) — it will remain in force when the transpiler lowers the body in a future pass.

Class 1: Type Confusion

Trigger: pass f16 data to an f32 kernel

C++ behaviour: silent data corruption (interprets f16 bits as f32)

Rust behaviour: compile-time type error (*const u16*const f32)

Example kernels: ops_legacy__fast_gelu, ops_math__cos_apt, ops_nn__gelu_apt

Evidence: all use GM_ADDR (type-erased uint8_t*) at the kernel boundary; the transpiler replaces this with typed pointers derived from MLIR element types.


Class 2: Buffer Overflow

Trigger: count = buffer_size + 1

C++ behaviour: out-of-bounds SRAM read/write (undefined behaviour)

Rust behaviour: buffer-ID abstraction prevents raw indexing; explicit count parameter flows through the typed ascend_* API

Example kernels: ops_legacy__drop_out_v3, ops_nn__masked_scatter_apt (and the related ops_math__drop_out_* / ops_legacy__scatter_nd_* variants)

Evidence: uses GetValue (unchecked index) + array indexing on a LocalTensor.


Class 3: Use-After-Free

Trigger: free buffer, then read through the stale handle

C++ behaviour: reads deallocated SRAM (garbage data)

Rust behaviour: no free API exists — buffer lifetime managed by the runtime

Example kernels: the three drop_out_* variants (ops_legacy__drop_out_v3, ops_math__drop_out_v3, ops_legacy__drop_out_do_mask) that call FreeTensor() in their C++ body

Evidence: calls FreeTensor() — the corresponding handle remains valid in Rust because ascend-rs has no FreeTensor operation.


Class 4: Missing Synchronization

Trigger: remove the barrier between load and compute

C++ behaviour: reads stale / partial DMA data (non-deterministic)

Rust behaviour: ascend_pipe_barrier() always emitted between stages

Example kernels: ops_legacy__foreach_add_list_inplace, ops_legacy__log_softmax_v2_apt, ops_transformer__attention_update_apt

Evidence: these kernels have two explicit pipe_barrier calls in the C++ body — omitting either one causes data races. The ascend-rs composites insert them unconditionally.


Class 5: Double Free

Trigger: call FreeTensor twice on the same LocalTensor

C++ behaviour: corrupts the queue’s free list (undefined behaviour)

Rust behaviour: no free API exists — double-free is unrepresentable

Example kernels: the same three drop_out_* variants as C3

Evidence: FreeTensor is called repeatedly in the C++ dropout kernels; the transpiled Rust simply has no analogous operation.


Class 6: Integer Overflow

Trigger: blockIdx = 1048576, perBlockLen = 4096 → wraps to 0

C++ behaviour: silent wrap to 0, wrong memory offset

Rust behaviour: wrapping_mul(4096)0 (explicit, debug-mode panic)

Example kernels: any kernel that tiles across blocks, e.g. ops_transformer__flash_attention_score, ops_nn__batch_norm_v3, ops_legacy__foreach_add_list_inplace

Evidence: uses blockIdx * perBlockLen with uint32_t for offset calculation.


H.4 Interpretation

The dominant vulnerability class is C1: Type Confusion (98% of kernels). This is a structural property of the AscendC C++ API: all kernel entry points receive tensor pointers as GM_ADDR (= uint8_t*), erasing all element-type information at the kernel boundary. Any mismatch between the host’s tensor dtype and the kernel’s assumed dtype produces silent data corruption with no runtime error.

In ascend-rs, kernel entry points use typed Rust pointers (*const u16 for f16/bf16, *const f32 for f32, etc.). The mismatch is a compile-time type error, caught before the kernel is ever compiled or run.

C4: Missing Synchronization affects 79% of kernels. The AscendC programming model requires manual pipe_barrier() calls between DMA operations and subsequent vector computations. Omitting them produces non-deterministic wrong results with no diagnostic. ascend-rs kernel_ops composites (e.g., ascend_vec_add_f16) always include the necessary barriers — they cannot be accidentally omitted.

C6: Integer Overflow affects 78% of kernels. Block-index arithmetic (blockIdx * perBlockLen) uses uint32_t in C++, silently wrapping at 2³² without any diagnostic. Rust’s wrapping_mul makes the wrap-around behaviour explicit and triggers a panic in debug builds.

H.5 Per-Kernel Detail

The full per-kernel safety report (all 998 real-batch kernels) is maintained as a machine-generated companion file: blog/appendix_safety_report.md in the repository. It lists each kernel’s safety-class membership (C1–C6) and the evidence that identifies each class.