English | 中文版
2. Hello World: Your First NPU Program
Installation
ascend-rs is distributed as a self-contained package with a pre-built compiler backend and Rust source crates for the host and kernel APIs.
Prerequisites:
- CANN toolkit (8.x or 9.x) installed on the target machine
- Rust nightly toolchain (auto-installed by
rustupfrom the includedrust-toolchain.toml)
Setup:
# 1. Extract the distribution
tar xzf ascend-rs-0.1.1-$(uname -m).tar.gz
cd ascend-rs-0.1.1
# 2. Source the CANN environment
source /usr/local/Ascend/ascend-toolkit/latest/bin/setenv.bash
# 3. Make the compiler backend discoverable
export LD_LIBRARY_PATH="$(pwd)/lib:$LD_LIBRARY_PATH"
# 4. Verify (compiles a kernel and runs it on the NPU)
bash test.sh --run
What’s in the package:
ascend-rs-0.1.1/
├── lib/librustc_codegen_mlir.so # Compiler backend (Rust → NPU binary)
├── crates/
│ ├── ascend_rs/ # Host API: device, stream, memory, kernel launch
│ ├── ascend_sys/ # FFI bindings (auto-generated from CANN headers)
│ ├── ascend_std/ # Kernel runtime: buffer ops, vector intrinsics
│ ├── ascend_std_macros/ # #[aiv_kernel] attribute macro
│ ├── ascend_rs_builder/ # Build-time kernel compiler (KernelBuilder)
│ └── ascend_rs_builder_config/ # CANN path detection
├── examples/vec_add/ # Working starter project
├── test.sh # Smoke test
└── rust-toolchain.toml # Pinned nightly version
The compiler backend (librustc_codegen_mlir.so) is loaded by rustc during kernel compilation. It translates Rust kernel code through MLIR to AscendC C++, then invokes the CANN bisheng compiler to produce NPU binaries. Users interact with it indirectly through KernelBuilder in their build.rs scripts.
Let’s start with the simplest possible example. This Hello World demonstrates the basics of the ascend-rs host API — safely initializing the NPU, creating execution contexts, and launching kernels from Rust.
Kernel Code (C++)
At this stage, Hello World uses a C++ kernel, which is the native approach for the CANN SDK:
// hello_world.cpp
#include "kernel_operator.h"
extern "C" __global__ __aicore__ void hello_world() {
AscendC::printf("Hello World!!!\n");
}
extern "C" void hello_world_do(uint32_t blockDim, void *stream) {
hello_world<<<blockDim, nullptr, stream>>>();
}
Here, __global__ marks the function as a host-callable entry point, and __aicore__ indicates it runs on the Ascend AI Core. The <<<...>>> syntax, similar to CUDA, specifies parallelism and execution stream.
Host Code (Rust)
The host code demonstrates ascend-rs’s most important design principle — RAII resource management and lifetime safety:
use ascend_rs::prelude::*;
use std::error::Error;
// Declare FFI interface to the C++ kernel
unsafe extern "C" {
fn hello_world_do(dim: u32, stream: *mut std::ffi::c_void);
}
fn main() -> Result<(), Box<dyn Error>> {
// Step 1: Initialize ACL runtime
let acl = Acl::new()?;
// Step 2: Select and initialize device
let device = Device::new(&acl)?;
// Step 3: Create execution context and stream
let context = AclContext::new(&device)?;
let stream = AclStream::new(&context)?;
// Step 4: Launch kernel (8 parallel blocks)
unsafe {
hello_world_do(8, stream.to_raw());
}
// Step 5: Synchronize and wait for kernel completion
stream.synchronize()?;
// Step 6: All resources automatically freed (RAII)
// Drop order: stream → context → device → acl
Ok(())
}
Key Design: Lifetime Chain
Notice the type signatures in this code:
Acl → Lifetime root
Device<'acl> → Must drop before Acl
AclContext<'d> → Must drop before Device
AclStream<'c> → Must drop before Context
If you try to use these resources in the wrong order, the code simply won’t compile. This is the power of Rust’s type system — guaranteeing correct resource management at compile time, whereas C++ can only rely on programmer discipline.
Comparison: Pitfalls in C++
The equivalent C++ code requires manual lifecycle management for every resource:
// C++ version: every resource requires manual cleanup
aclInit(nullptr);
aclrtSetDevice(0);
aclrtContext ctx;
aclrtCreateContext(&ctx, 0);
aclrtStream stream;
aclrtCreateStream(&stream);
hello_world_do(8, stream);
aclrtSynchronizeStream(stream);
// Must manually free in correct order, otherwise undefined behavior
aclrtDestroyStream(stream);
aclrtDestroyContext(ctx);
aclrtResetDevice(0);
aclFinalize();
If any step throws an exception or returns early, the subsequent cleanup code is skipped. In the Rust version, the Drop trait guarantees resources are always freed correctly, regardless of control flow changes.