Skip to content

NarsilRust-native inference serving engine

One Rust serving shell — Burn-native kernels and native torch (libtorch + FBGEMM) underneath, many transports on top.

What is Narsil?

Narsil is a Rust inference serving engine. Its thesis is to be the best-in-class serving shell — owning the transport, scheduling, and batching in safe Rust — while staying pluggable about the compute underneath:

  • Internally: native torch inference (tch-rs → libtorch + FBGEMM) and custom Rust/cuTile kernels (Burn).
  • Externally: many transports — gRPC today, with HTTP, QUIC, NCCL, and UCCL on the roadmap.

The current reference workload is DLRM (Deep Learning Recommendation Model), benchmarked against the official TorchRec C++ gRPC server.

The headline result

Serving the same INT8 DLRM artifact on an NVIDIA L4:

Pathconc 1 — p50conc 16 — samples/s
Narsil route A (libtorch/FBGEMM)1.02 ms~438,000
TorchRec C++ gRPC (documented baseline)1.033 ms~215,000
Narsil Burn CUDA DLRM (FP32)13.04 ms~7,900

The conc-1 p50 is measured in worker mode; the conc-16 throughput in batch mode (see Benchmarks for the per-mode tables). Route A matches the C++ baseline's single-request latency and exceeds its throughput via continuous batching. See the full tables and caveats there, and the Decision Log for how we got here.

Apache-2.0 licensed.