Neutron Mojo
GPU kernels, quantized inference, and a training stack in a language that reads like Python and runs like CUDA. Preview-shipped today, stable when Mojo 1.0 lands.
GPU kernels that speak Python.
GPU code that a Python programmer can read.
Mojo is Modular's language designed to be a superset of Python with the ergonomics of CPython and the speed of CUDA. Neutron Mojo is the ML library on top: SIMD-accelerated kernels, five quantization formats (int4, int8, fp8, fp16, bf16), a tensor type you can differentiate through, and an inference pipeline that doesn't assume you brought PyTorch with you.
This is a preview. Mojo itself is pre-1.0, so the surface may shift when the language stabilizes. We ship against the current stable Mojo release and bump versions deliberately.
from neutron.tensor import Tensor, DType
from neutron.simd import vectorize, tile
fn gemm[
dtype: DType, M: Int, N: Int, K: Int
](C: Tensor[dtype, M, N], A: Tensor[dtype, M, K], B: Tensor[dtype, K, N]):
@parameter
fn row(m: Int):
@parameter
fn col[nelts: Int](n: Int):
var acc = SIMD[dtype, nelts](0)
for k in range(K):
acc += A[m, k] * B[k, n:n+nelts]
C[m, n:n+nelts] = acc
vectorize[col, simd_width[dtype]()](N)
tile[row](M, tile_size=64)Preview, then stable.
Mojo is still pre-1.0; the language is evolving every release. Neutron Mojo tracks the stable branch and bumps deliberately when breaking changes land. Expect surface changes until Mojo 1.0 ships — after that, we commit to semver.
What it's for
Model inference on the same machine as your application. Fine-tuning small models on customer data without an external GPU service. SIMD-heavy data transforms that outgrew NumPy. Anywhere you'd reach for CUDA C++ but would rather keep reading Python.
Why Mojo?
Because it's Python-shaped but compiles through MLIR to the same codegen path as CUDA. Because @parameter and vectorize replace a thousand lines of C++ templates. Because the same kernel definition runs on CPU SIMD, GPU, and TPU with no per-target rewrite.
Part of a bigger system
Train or fine-tune in Neutron Mojo. Expose the model through Neutron Python's MCP server. Consume from the edge in Neutron TypeScript. Persist training runs, metrics, and model artifacts in Nucleus — one database, one contract, whether you're shipping a web app or an inference service.