English | 中文版
Appendix H: Safety Differential Analysis
Analysis of 998 CANN 8.5 kernel pairs (AscendC C++ vs ascend-rs Rust) from the real ascendc-to-rs transpilation batch — same corpus surveyed in Appendix G.
For each kernel, we identify which memory-safety vulnerability classes exist in the C++ version and how the Rust transpilation prevents them. The six classes below are structural properties of the AscendC programming model; they apply uniformly regardless of operator category.
Scope note — two fidelity tiers. Of the 998 kernels, 247 are Transpiled (body carries the C++ compute intrinsics) and 751 are Registered (body is an identity stub; signature and ABI are real). The safety-class counts in §H.1 / §H.2 analyse the C++ source — i.e. the hazards present in the operator the user would write by hand. The “Rust Prevention” column refers to structural properties of the
ascend-rsAPI (typed pointers, absence ofFreeTensor, composite intrinsics with built-in barriers): these apply to any kernel routed through the transpiler, whether its body is currently Transpiled or Registered, because they are properties of the generated ABI and imported API surface — not the body contents.
H.1 Safety Class Summary
| # | Safety Class | C++ Risk | Rust Prevention | Kernels Affected |
|---|---|---|---|---|
| 1 | Type Confusion | GM_ADDR type erasure | Typed pointer signature (*const T) | 983/998 (98%) |
| 2 | Buffer Overflow | GetValue(i)/SetValue(i, v) with i >= count | Opaque buffer ID + explicit count parameter | 9/998 (1%) |
| 3 | Use-After-Free | FreeTensor() leaves stale handle | No FreeTensor operation in the ascend-rs API | 3/998 (0.3%) |
| 4 | Missing Synchronization | DMA→compute without pipe_barrier() | kernel_ops composites include barriers internally | 793/998 (79%) |
| 5 | Double Free | FreeTensor() called twice on the same handle | No FreeTensor operation in the ascend-rs API | 3/998 (0.3%) |
| 6 | Integer Overflow | u32 arithmetic: blockIdx * perBlockLen | wrapping_mul makes overflow semantics explicit | 785/998 (78%) |
H.2 Category Breakdown
Counts below are scaled to the real ascendc-to-rs batch categories (see Appendix G §G.1). Type Confusion, Missing Synchronization, and Integer Overflow are structural — they affect nearly every kernel. Buffer Overflow / UAF / Double Free are rare and cluster in the operators that maintain explicit LocalTensor lifetimes (primarily ops_nn and ops_transformer).
| Category | Total | C1: Type | C2: Bounds | C3: UAF | C4: Sync | C5: DblFree | C6: Overflow |
|---|---|---|---|---|---|---|---|
| ops_cv | 41 | 41 | 0 | 0 | 33 | 0 | 32 |
| ops_legacy | 343 | 343 | 0 | 0 | 273 | 0 | 270 |
| ops_math | 155 | 155 | 0 | 0 | 123 | 0 | 121 |
| ops_nn | 306 | 301 | 6 | 3 | 243 | 3 | 240 |
| ops_oam | 3 | 3 | 0 | 0 | 2 | 0 | 2 |
| ops_transformer | 150 | 140 | 3 | 0 | 119 | 0 | 120 |
| Total | 998 | 983 | 9 | 3 | 793 | 3 | 785 |
H.3 Counter-Example Inputs
For each safety class, a counter-example input that triggers the vulnerability in C++ but is caught or prevented in Rust. The example kernels are drawn from the real ascendc-to-rs batch.
Evidence scope. Where an example kernel is currently a Registered identity stub (see Appendix G), the cited
blockIdx * perBlockLen/FreeTensor/GetValuepattern is in the original C++ source atcann_kernels/<kernel>/<kernel>.cpp, not in the current.rsbody. The Rust prevention mechanism is structural (typed pointers, API surface, composite intrinsics) — it will remain in force when the transpiler lowers the body in a future pass.
Class 1: Type Confusion
Trigger: pass f16 data to an f32 kernel
C++ behaviour: silent data corruption (interprets f16 bits as f32)
Rust behaviour: compile-time type error (*const u16 ≠ *const f32)
Example kernels: ops_legacy__fast_gelu, ops_math__cos_apt, ops_nn__gelu_apt
Evidence: all use GM_ADDR (type-erased uint8_t*) at the kernel boundary; the transpiler replaces this with typed pointers derived from MLIR element types.
Class 2: Buffer Overflow
Trigger: count = buffer_size + 1
C++ behaviour: out-of-bounds SRAM read/write (undefined behaviour)
Rust behaviour: buffer-ID abstraction prevents raw indexing; explicit count parameter flows through the typed ascend_* API
Example kernels: ops_legacy__drop_out_v3, ops_nn__masked_scatter_apt (and the related ops_math__drop_out_* / ops_legacy__scatter_nd_* variants)
Evidence: uses GetValue (unchecked index) + array indexing on a LocalTensor.
Class 3: Use-After-Free
Trigger: free buffer, then read through the stale handle
C++ behaviour: reads deallocated SRAM (garbage data)
Rust behaviour: no free API exists — buffer lifetime managed by the runtime
Example kernels: the three drop_out_* variants (ops_legacy__drop_out_v3, ops_math__drop_out_v3, ops_legacy__drop_out_do_mask) that call FreeTensor() in their C++ body
Evidence: calls FreeTensor() — the corresponding handle remains valid in Rust because ascend-rs has no FreeTensor operation.
Class 4: Missing Synchronization
Trigger: remove the barrier between load and compute
C++ behaviour: reads stale / partial DMA data (non-deterministic)
Rust behaviour: ascend_pipe_barrier() always emitted between stages
Example kernels: ops_legacy__foreach_add_list_inplace, ops_legacy__log_softmax_v2_apt, ops_transformer__attention_update_apt
Evidence: these kernels have two explicit pipe_barrier calls in the C++ body — omitting either one causes data races. The ascend-rs composites insert them unconditionally.
Class 5: Double Free
Trigger: call FreeTensor twice on the same LocalTensor
C++ behaviour: corrupts the queue’s free list (undefined behaviour)
Rust behaviour: no free API exists — double-free is unrepresentable
Example kernels: the same three drop_out_* variants as C3
Evidence: FreeTensor is called repeatedly in the C++ dropout kernels; the transpiled Rust simply has no analogous operation.
Class 6: Integer Overflow
Trigger: blockIdx = 1048576, perBlockLen = 4096 → wraps to 0
C++ behaviour: silent wrap to 0, wrong memory offset
Rust behaviour: wrapping_mul(4096) → 0 (explicit, debug-mode panic)
Example kernels: any kernel that tiles across blocks, e.g. ops_transformer__flash_attention_score, ops_nn__batch_norm_v3, ops_legacy__foreach_add_list_inplace
Evidence: uses blockIdx * perBlockLen with uint32_t for offset calculation.
H.4 Interpretation
The dominant vulnerability class is C1: Type Confusion (98% of kernels). This is a structural property of the AscendC C++ API: all kernel entry points receive tensor pointers as GM_ADDR (= uint8_t*), erasing all element-type information at the kernel boundary. Any mismatch between the host’s tensor dtype and the kernel’s assumed dtype produces silent data corruption with no runtime error.
In ascend-rs, kernel entry points use typed Rust pointers (*const u16 for f16/bf16, *const f32 for f32, etc.). The mismatch is a compile-time type error, caught before the kernel is ever compiled or run.
C4: Missing Synchronization affects 79% of kernels. The AscendC programming model requires manual pipe_barrier() calls between DMA operations and subsequent vector computations. Omitting them produces non-deterministic wrong results with no diagnostic. ascend-rs kernel_ops composites (e.g., ascend_vec_add_f16) always include the necessary barriers — they cannot be accidentally omitted.
C6: Integer Overflow affects 78% of kernels. Block-index arithmetic (blockIdx * perBlockLen) uses uint32_t in C++, silently wrapping at 2³² without any diagnostic. Rust’s wrapping_mul makes the wrap-around behaviour explicit and triggers a panic in debug builds.
H.5 Per-Kernel Detail
The full per-kernel safety report (all 998 real-batch kernels) is maintained as a machine-generated companion file: blog/appendix_safety_report.md in the repository. It lists each kernel’s safety-class membership (C1–C6) and the evidence that identifies each class.