Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

English | 中文版

2. Hello World: Your First NPU Program

Let’s start with the simplest possible example. This Hello World demonstrates the basics of the ascend-rs host API — safely initializing the NPU, creating execution contexts, and launching kernels from Rust.

Kernel Code (C++)

At this stage, Hello World uses a C++ kernel, which is the native approach for the CANN SDK:

// hello_world.cpp
#include "kernel_operator.h"

extern "C" __global__ __aicore__ void hello_world() {
    AscendC::printf("Hello World!!!\n");
}

extern "C" void hello_world_do(uint32_t blockDim, void *stream) {
    hello_world<<<blockDim, nullptr, stream>>>();
}

Here, __global__ marks the function as a host-callable entry point, and __aicore__ indicates it runs on the Ascend AI Core. The <<<...>>> syntax, similar to CUDA, specifies parallelism and execution stream.

Host Code (Rust)

The host code demonstrates ascend-rs’s most important design principle — RAII resource management and lifetime safety:

use ascend_rs::prelude::*;
use std::error::Error;

// Declare FFI interface to the C++ kernel
unsafe extern "C" {
    fn hello_world_do(dim: u32, stream: *mut std::ffi::c_void);
}

fn main() -> Result<(), Box<dyn Error>> {
    // Step 1: Initialize ACL runtime
    let acl = Acl::new()?;

    // Step 2: Select and initialize device
    let device = Device::new(&acl)?;

    // Step 3: Create execution context and stream
    let context = AclContext::new(&device)?;
    let stream = AclStream::new(&context)?;

    // Step 4: Launch kernel (8 parallel blocks)
    unsafe {
        hello_world_do(8, stream.to_raw());
    }

    // Step 5: Synchronize and wait for kernel completion
    stream.synchronize()?;

    // Step 6: All resources automatically freed (RAII)
    // Drop order: stream → context → device → acl
    Ok(())
}

Key Design: Lifetime Chain

Notice the type signatures in this code:

Acl                    → Lifetime root
  Device<'acl>         → Must drop before Acl
    AclContext<'d>     → Must drop before Device
      AclStream<'c>   → Must drop before Context

If you try to use these resources in the wrong order, the code simply won’t compile. This is the power of Rust’s type system — guaranteeing correct resource management at compile time, whereas C++ can only rely on programmer discipline.

Comparison: Pitfalls in C++

The equivalent C++ code requires manual lifecycle management for every resource:

// C++ version: every resource requires manual cleanup
aclInit(nullptr);
aclrtSetDevice(0);
aclrtContext ctx;
aclrtCreateContext(&ctx, 0);
aclrtStream stream;
aclrtCreateStream(&stream);

hello_world_do(8, stream);
aclrtSynchronizeStream(stream);

// Must manually free in correct order, otherwise undefined behavior
aclrtDestroyStream(stream);
aclrtDestroyContext(ctx);
aclrtResetDevice(0);
aclFinalize();

If any step throws an exception or returns early, the subsequent cleanup code is skipped. In the Rust version, the Drop trait guarantees resources are always freed correctly, regardless of control flow changes.