language

Neutron Mojo

GPU kernels, quantized inference, and a training stack in a language that reads like Python and runs like CUDA. Preview-shipped today, stable when Mojo 1.0 lands.

GPU kernels that speak Python.

Available

SIMDKernels

5Quantization Formats

125Test Suites

TensorFirst-Class Type

GPU code that a Python programmer can read.

Mojo is Modular's language designed to be a superset of Python with the ergonomics of CPython and the speed of CUDA. Neutron Mojo is the ML library on top: SIMD-accelerated kernels, five quantization formats (int4, int8, fp8, fp16, bf16), a tensor type you can differentiate through, and an inference pipeline that doesn't assume you brought PyTorch with you.

This is a preview. Mojo itself is pre-1.0, so the surface may shift when the language stabilizes. We ship against the current stable Mojo release and bump versions deliberately.

kernel/gemm.mojo
from neutron.tensor import Tensor, DType
from neutron.simd import vectorize, tile

fn gemm[
    dtype: DType, M: Int, N: Int, K: Int
](C: Tensor[dtype, M, N], A: Tensor[dtype, M, K], B: Tensor[dtype, K, N]):
    @parameter
    fn row(m: Int):
        @parameter
        fn col[nelts: Int](n: Int):
            var acc = SIMD[dtype, nelts](0)
            for k in range(K):
                acc += A[m, k] * B[k, n:n+nelts]
            C[m, n:n+nelts] = acc
        vectorize[col, simd_width[dtype]()](N)
    tile[row](M, tile_size=64)
SIMD GEMM kernel. Vectorized at comptime, tiled for L1.

SIMD kernels

Vectorized matmul, softmax, layernorm, rotary embeddings, KV cache, attention. All parameterized on dtype and tile size, monomorphized at comptime.

Five quant formats

int4, int8, fp8 (e4m3 + e5m2), fp16, bf16. Packed and unpacked kernels for each. Mix formats per layer for optimal accuracy/size tradeoff.

Tensor as a type

Shape, dtype, and device are part of the type. Shape mismatches fail to compile. No runtime shape checks on the hot path.

Inference pipeline

Load a GGUF or safetensors file, select a quant format, serve. Streaming token generation with paged KV cache.

Training stack

Autodiff, Adam/AdamW/Lion optimizers, gradient accumulation, mixed-precision training. Enough to fine-tune small models locally.

125 test suites

Each kernel verified against a reference NumPy implementation. Numeric tolerance asserted per dtype.

Quantization formats

int4

Packed, 2× density vs int8

int8

Symmetric and asymmetric

fp8

e4m3 + e5m2

fp16

Half precision, IEEE 754

bf16

Brain float, training favorite

Preview, then stable.

Mojo is still pre-1.0; the language is evolving every release. Neutron Mojo tracks the stable branch and bumps deliberately when breaking changes land. Expect surface changes until Mojo 1.0 ships — after that, we commit to semver.

What it's for

Model inference on the same machine as your application. Fine-tuning small models on customer data without an external GPU service. SIMD-heavy data transforms that outgrew NumPy. Anywhere you'd reach for CUDA C++ but would rather keep reading Python.

Why Mojo?

Because it's Python-shaped but compiles through MLIR to the same codegen path as CUDA. Because @parameter and vectorize replace a thousand lines of C++ templates. Because the same kernel definition runs on CPU SIMD, GPU, and TPU with no per-target rewrite.

Part of a bigger system

Train or fine-tune in Neutron Mojo. Expose the model through Neutron Python's MCP server. Consume from the edge in Neutron TypeScript. Persist training runs, metrics, and model artifacts in Nucleus — one database, one contract, whether you're shipping a web app or an inference service.