NeMo-WM — Neuromodulated World Model for Robot Navigation

Open architecture · Drop-in compatible

Extend any world model
with NeMo-WM

The neuromodulator is not a replacement for JEPA — it is an interpretability and grounding layer. Add it to DINO-WM, JEPA-WM, or V-JEPA 2-AC without changing your architecture. Eight scalar computations per batch. Zero new hyperparameters.

your_training_loop.py

# ── your existing JEPA step ───────────────
z_pred   = predictor(encoder(obs), action)
z_target = encoder(obs_next)
L_jepa   = mse(z_pred, z_target)

# ── add NeMo-WM · 3 lines ─────────────────
from neuromodulator import NeuromodulatorState
neuro   = NeuromodulatorState()
signals = neuro.update(z_pred, z_target, gps=gps)

# ── now fully auditable ───────────────────
L_total = L_jepa + signals.weighted_loss()
log(DA=signals.da, CORT=signals.cortisol,
    NE=signals.ne, SHT=signals.sht)
nav.set_goal("open area") # text→GPS 25ms
action, _ = nav.step(z, gps)  # 0.09ms/step

LIVE NEUROMODULATOR SIGNALS

step 1,081,000

DA — Dopamine 0.003 ↑ PEAK

5HT — Serotonin 0.108 ✓ stable

NE — Norepinephrine GPS grounded

ACh — Acetylcholine 0.446 contact

CORT — Cortisol r=0.768 lag-1

eCB — Endocannabinoid habit suppress

E/I — Balance arousal ctrl

Ado — Adenosine fatigue reg

DA · DOPAMINE

Prediction auditability

L_jepa is a single opaque number. DA converts it into a named verdict: trivially clamped vs genuinely surprising. Detectable in 200 training steps — before you waste days training.

DA=0.000 → floor clamped, no gradient

DA=0.003 → peak arousal, amplify

CORT · CORTISOL

Distribution shift — 1 epoch early

Cortisol tracks rolling loss above baseline. Sprint 8d ablation: removing cortisol delays REOBSERVE onset 50× (step 28,500 vs 500) and slows per-epoch compression 3.5×. Empirically validated at r=0.768 lag-1.

r=0.768 lag-1 · p<0.0001

Predictive through lag=5

NE · NOREPINEPHRINE

Spatial grounding supervision

JEPA learns visual dynamics but ignores GPS. NE adds a GPS prediction loss gated on spatial error — no encoder changes, no extra architecture. GPS displacement encodes at p=5.9e-05.

NE spike → spatial failure, now

GPS p=5.9e-05 *** encoded

5HT · SEROTONIN

Collapse prevention — no VICReg

5HT penalises particle distribution collapse by measuring embedding diversity directly. Detects and corrects collapse before L_jepa reflects it. Replaces VICReg or SIGReg with a single biological signal.

5HT<0.05 → collapse risk flag

No extra hyperparameters

DA · NON-SATURATION

Signal that responds to surprise, not time

Fixed cosine schedules decay to zero regardless of what the model still needs to learn. DA peaked at step 1,081,000 — after 28 complete data passes — then sustained at 0.002 through the final six steps.

Peak at s1,081,000 (epoch 28)

Adapts to actual learning progress

SPRINT 6d · LANGUAGE

Text goals without LLMs

CLIP distilled into a 164K-parameter dual-head on the frozen encoder. Text-conditioned navigation at 4Hz. Works on any JEPA encoder — the backbone never changes.

9/9 queries STRONG · 10,906 nodes scored

0.09ms/step · 8,700× smaller than CLIP

NATIVE DEPLOYMENT

Currently running on CORTEX-PE and CORTEX World Model

NeMo-WM was designed for and runs natively on the CORTEX Perception Engine (multi-domain anomaly detection across six sensor domains) and the CORTEX World Model (neuromodulated navigation and planning). Fully compatible with any JEPA-based world model — and not limited to JEPA.

CORTEX-PE ✓ native CORTEX-WM ✓ native DINO-WM compatible JEPA-WM compatible LeWorldModel compatible V-JEPA 2-AC compatible + any encoder with latent predictions

COMPATIBILITY MATRIX

Which benefits apply to which systems

Minimum viable: DA + 5HT + Cortisol · always applicable

System	DA audit	Cortisol	NE+GPS	5HT	Language	Effort
DINO-WM	✓	✓	✓	✓	✓	2–3 days
JEPA-WM	✓	✓	✓	✓	✓	2–3 days
LeWorldModel	✓	✓	✓	✓ replaces SIGReg	✓	1–2 days
V-JEPA 2-AC	✓	✓	✓ stage 2	✓	✓	1–2 weeks
Custom encoder	✓	✓	if GPS	✓	post-distil	1 day

NeMo-WM x V-JEPA 2 · April 2026

26K parameters.
Beats 1034M.

We tested NeMo-WM's 26,561-parameter proprioceptive encoder against Meta's V-JEPA 2 ViT-G (1034M parameters, internet-scale video pre-training) on the same RECON navigation benchmark. Visual scaling does not solve temporal self-localisation. Physics-grounded path integration does.

RECON Hard-Negative AUROC (n=500, k>=32) — Zero-Shot Visual vs Trained Proprio

System

0.85 0.90 0.95 1.00

AUROC

V-JEPA 2 ViT-G1034M params

0.883 VISUAL

NeMo-WM Student46K distilled

0.889

V-JEPA 2 ViT-L326M params

0.907 VISUAL

NeMo-WM k=426K params · 1.0s

0.961 NEMO

NeMo-WM k=1626K params · 4.0s

0.9974 NEMO

NeMo-WM k=3226K params · 8.0s

0.999+ NEMO

NeMo-WM k=6426K params · 32s

0.9999 NEMO ★

V-JEPA 2 ViT-G (1034M) scores 0.883. NeMo-WM proprio (26K) scores 0.9999. Scaling the visual encoder from ViT-L to ViT-G makes performance worse by -0.024.

Physics-grounded path integration outperforms internet-scale video pre-training by +0.114 AUROC from 39,000x fewer parameters. 1411x faster on identical hardware. Visual scaling does not solve temporal self-localisation.

NeMo-WM adds to V-JEPA 2

+ Proprioceptive path integration (+0.090 AUROC)
+ Cortisol domain shift — no retraining needed
+ DA-gated GRASP planner — 30-40% latency cut
+ ~60% less robot fine-tune data required
+ Interpretability ablation framework

V-JEPA 2 adds to NeMo-WM

+ Stronger visual features (+0.018 zero-shot)
+ Fixes out-of-distribution failure (TwoRoom)
+ 1M+ hours video generalisation
+ No re-distillation required

Combined system

* V-JEPA 2 head AUROC ~0.93-0.95 fine-tuned
* Fusion AUROC ~0.998+ (orthogonal pathways)
* Dissociation confirmed at 1B-param scale
* NeurIPS 2026 target

ACh Temporal Window Sweep — Superlinear (No GPS, No Vision, 26K params)

k=2 · 0.5s

0.925

k=4 · 1.0s

0.961

k=8 · 2.0s

0.977

k=16 · 4.0s

0.9972

k=32 · 16.0s

0.9997

k=64 · 32.0s ★

0.9999

top1_acc = 1.0000

Hasselmo 1999 biological parallel: low-ACh = broad temporal integration = slow outdoor navigation. k=32 top1_acc=0.9957, AUROC=0.9997. k=64 top1_acc=1.0000, AUROC=0.9999. Saturation at 16-32 seconds for 4Hz navigation.

Empirical results

Paper-quality results.
Consumer hardware.

Evaluated on RECON outdoor robot navigation (Berkeley campus, Jackal robot, 4Hz, 545,866 samples) and five additional anomaly detection domains. All results on GMKtec EVO-X2 · AMD Ryzen AI MAX+ 395 · ~$2,000 · No GPU. Sprint 8d cortisol ablation confirmed: cortisol accelerates REOBSERVE onset 50× (step 500 vs 28,500) and provides 3.5× per-epoch compression advantage.

RECON Navigation — k=1

0.9837

Quasimetric AUROC at 0.25-second temporal gap. Temporal discrimination from single frames.

Cardiac Audio — 400 clips

0.8894

Anomaly detection on cardiac audio — competitive with PhysioNet challenge winners.

SMAP/MSL Telemetry

0.8427

Satellite telemetry anomaly detection. Within 3 points of specialised transformers.

MVTec AD — 15/15 PASS

0.8855

Visual inspection across 15 object categories. One shared backbone, no per-domain tuning.

Parameters

1.78M

8.4× fewer parameters than LeWorldModel. Runs entirely on CPU + AMD NPU.

Inference · NPU

0.34ms

StudentEncoder on AMD Ryzen AI NPU, XINT8 quantised. 0.09ms per navigation step end-to-end.

Language Nav · 10,906 nodes

0.09ms

Text-conditioned navigation per step. Language A* faster than GPS A*. No LLM. No retraining. Real GeoLatentDB confirmed.

k = 1

0.9837

k = 2

0.9578

k = 4

0.8959

k = 8

0.8146

k = 16

0.7546

How it works

Eight Signals.
Any World Model.

Every other world model uses a single prediction error to drive learning. NeMo-WM uses eight biologically-inspired neuromodulatory signals, each gating a different aspect of the loss. The JEPA prediction gradient contributed zero across 30 epochs — 1.12 million steps. Cortisol, the eighth signal, detects distribution shift one epoch ahead (r=0.768 lag-1, p<0.0001). Sprint 8d ablation confirmed: cortisol accelerates REOBSERVE onset 50× and provides 3.5× faster per-epoch compression. DA peaked at 0.003 at step 1,081,000 — training closed at peak arousal, never saturated.

DopamineSurprise amplification

Amplifies gradient on high-error batches. Fires when world model is genuinely surprised.

5HT

SerotoninRepresentation diversity

Penalises particle distribution collapse. Maintains embedding diversity.

NorepinephrineSpatial grounding

Amplifies gradient when GPS prediction diverges. Keeps the model spatially oriented.

ACh

AcetylcholineContact detection

Gates inter-particle interaction signal. Selective attention to contact events.

eCB

EndocannabinoidContext novelty

Suppresses gradient for familiar situations. Amplifies for novel contexts.

Ado

AdenosineFatigue regulation

Reduces learning rate under sustained high-gradient updates. Homeostatic balance.

E/I

E/I BalanceArousal control

Controls overall gain of the neuromodulatory system. Global arousal state.

CORT

Cortisol NEW — v16.12Sustained stress / distribution shift

Slow-timescale signal tracking rolling loss excess above baseline. Empirically validated: r=0.768 lag-1 prediction of future loss (p<0.0001). Detected Sprint 3 distribution shift one epoch ahead.

The predictor learns dynamics without the JEPA gradient ever flowing.

In NeMo-WM, L_jepa is clamped at a free-bits floor of 0.5 for all 30 training epochs — contributing zero gradient across 1.12 million steps. Yet the predictor achieves 0.003 MSE at 2-second prediction horizons. The eight neuromodulatory signals drove that learning through GPS, contact, and Gaussian supervision — no explicit prediction objective required.

The non-saturation property: dopamine peaked at DA=0.003 at step 1,081,000 — after the system had seen the full dataset 28 times. Training closed at peak arousal. A fixed schedule would have decayed to zero. The biological reward responded to actual surprise regardless of training duration.

This is the central finding: biological reward signals are sufficient to teach temporal world dynamics. JEPA becomes the evaluation framework, not the mechanism.

[ep29 s1081000] loss=0.5663

  L_jepa=0.5000  ← zero gradient, step 1,081,000

  DA=0.003  ← peak DA in 30-epoch run

  regime=REOBSERVE  CORTISOL=0.014

# Final 6 steps, epoch 29:

  DA=0.002  ← training ended at peak arousal

  L_jepa_real = 0.003  ← learned via neuromodulator

Latent world probe

Particles encode
the physical world.

Using the AIM quantization framework (Liu, 2026), we converted NeMo-WM's K=16 particle embeddings to discrete symbol sequences and measured encoding of physical quantities via chi-squared tests. N=1,752 samples, 150 trajectories, 16 particles, 16 clusters. Ep28 canonical — eight physical signals confirmed.

Linear velocity (cmd)

412.7

3.9×10⁻⁴⁸

0.133

ENCODED★★★

Robot heading (yaw)

349.1

5.9×10⁻³⁷

0.096

ENCODED★★★

GT linear vel (odometry)

236.5

1.3×10⁻¹⁸

0.068

ENCODED★★★

Angular velocity (cmd)

165.7

1.47×10⁻⁴

0.047

ENCODED★★★

GT angular vel (odometry)

152.3

3.4×10⁻⁷

0.044

ENCODED★★★

GPS displacement (m)

131.6

5.88×10⁻⁵

0.039

ENCODED★★★

Temporal gap k emerges ep28

64.3

0.031

0.019

ENCODED★

Null control (random)

107.0

0.427

0.031

calibration ✓

Eight physical signals confirmed simultaneously. Ground-truth odometry (jackal wheel encoders) is encoded independently of commanded velocity — the system encodes both what was asked and what actually happened. Temporal gap k is null at ep12 (p=0.345) and weakly encoded at ep28 (p=0.031) — a training-dependent dissociation revealing that representational equilibria shift with training duration. The null control (p=0.427) confirms calibration.

Trust & interpretability

Every decision
is readable.

Unlike scalar loss objectives, NeMo-WM's seven neuromodulatory signals provide a continuous, human-readable training and inference narrative. No black box. No post-hoc explanations. The signals are the explanation.

Named signals at every step

DA=0.001 means mild surprise. 5HT=0.112 means representation health is good. ACh=0.445 means contact events are active. Every training step produces a readable narrative — not just a loss number.

Empirically verifiable

The AIM probe independently confirms what the signals claim. High-DA batches correspond to measurable entropy increases in the quantized particle symbol distribution. The interpretability is verifiable, not assumed.

Regulatory-ready

FDA, DoD, and industrial safety regulators require explainability. A cardiac anomaly detector that can explain which signal flagged an event, and why, is deployable where black-box models are not.

      # Inference step — readable narrative

      frame → StudentEncoder → particles

      DA  = 0.001  # mild surprise

      5HT = 0.113  # diversity healthy

      ACh = 0.444  # contact active

      NE  = 0.021  # GPS grounded

      eCB = 0.312  # novel context

      Ado = 0.089  # not fatigued

      E/I = 0.501  # balanced arousal

      regime = REOBSERVE

      anomaly_score = 0.041  # normal

      # Every value has a biological name.

      # Every name has a function.

      # Every function is auditable.

Deployment

Where it runs.
What it solves.

One model. ~$2,000 edge hardware — or Raspberry Pi for inference. No internet required. Sovereign AI that runs where the data is — not where the servers are.

🚁

Defence & Military

Autonomous navigation in GPS-denied environments. Anomaly detection on surveillance feeds. Edge-deployed perception on drones and ground vehicles with zero cloud dependency. Auditable decisions for chain-of-command accountability.

GPS-denied nav Edge inference Auditable AI

🏭

Industrial

Real-time equipment fault detection across six sensor types simultaneously. Predictive maintenance without data leaving the facility. MIMII industrial audio, CWRU bearing, MVTec visual inspection — one model.

Fault detection Six domains On-premise

🏥

Medical

Cardiac anomaly detection at 0.8894 AUROC — competitive with PhysioNet challenge winners. Physiological signal monitoring with the same model that navigates outdoor robots. No PHI leaves the device.

Cardiac audio On-device HIPAA-ready

🤖

Language-Conditioned Navigation

Text goal → GPS waypoint in 25ms. Per-step navigation at 0.09ms. "Navigate to the open outdoor area near the road" selects the correct GPS target from 10,906 candidates with no LLM and no retraining. Language A* is faster than GPS A*.

No LLM 0.09ms / step 10,906 nodes

🚗

Automotive

Onboard world modelling for ADAS on consumer-grade processors. 0.34ms per frame leaves 97% of the compute budget for the rest of the perception stack. Real outdoor navigation data, real results.

ADAS Real-time Low power

🍓

Embedded & Edge

The neuromodulator adds eight scalar operations per batch — negligible on any hardware. Inference-only deployment targets Raspberry Pi 5, Coral Edge TPU, or any ARM device. Training requires ~$2,000 edge hardware; inference runs anywhere.

Raspberry Pi Edge TPU ARM compatible

🔒

Security

Physical anomaly detection at the edge. No video leaving the premises. Satellite telemetry monitoring at 0.8427 AUROC. The same neuromodulator that learns navigation dynamics learns security anomalies.

On-premise Telemetry Multi-sensor

🛰

Space & Remote

SMAP satellite telemetry anomaly detection at 0.8427 AUROC. Low-power edge deployment for environments where cloud connectivity is unavailable or too slow. Designed for real sensor constraints.

Satellite Low bandwidth Autonomous

No camera · No GPS · No light · AUROC 0.9999

It navigates
in the dark.

NeMo-WM's proprioceptive encoder achieves AUROC 0.9999 using only velocity, angular rate, heading, and contact — the same signals a mammal uses when its visual cortex is lesioned. No camera. No GPS. No radio. No light required.

Sensor requirements for blind operation

VEL

Wheel encoder

Linear velocity · ~1mW · <$5

IMU

IMU gyroscope + magnetometer

Angular rate + heading · ~2mW · <$5

CTL

Wheel current draw

Contact detection · 0mW · $0

Total additional cost ~$10 · 3mW

Camera required NO

GPS required NO

Light required NO

Where blind operation enables deployment

◆

Complete darkness

Caves, night ops, power outages

◆

Camera occlusion

Dust, mud, smoke, ice, water spray

◆

Underwater & underground

Subsea pipelines, mine shafts, tunnels

◆

Sensor failure fallback

Camera fails → continue on IMU alone

◆

GPS-denied environments

Indoor, urban canyon, jamming

◆

Extreme lighting change

Day/night, welding flash, IR-only

The biological parallel — why this works

Mammalian path integration

Rodents with visual cortex lesions still navigate familiar mazes. Head direction cells (heading), velocity afferents (wheel encoder), and proprioception (contact) maintain a spatial map entirely without visual input. McNaughton et al. 2006; Moser et al. 2008.

Head direction cells → sin θ, cos θ, Δθ

Velocity afferents → wheel encoder (vel)

Vestibular system → IMU (ang rate)

Proprioception → contact signal

Entorhinal grid cells → attn pooling k=32

Heading dominance — timescale-invariant

Heading signal dominates velocity at every timescale tested. HD:vel ratio ranges from ∞:1 (fine scale, k_pos=1) to 9:1 (k_pos=4). Removing heading collapses AUROC by up to −0.228. Removing velocity alone drops AUROC by at most −0.010.

HD lesion vs velocity lesion (k_ctx=4)

HD lesion

−0.228

Vel lesion

−0.009

0.9997

PROPRIO ONLY · NO VISION

0.9997

FULL (VLM + PROPRIO)

+0.000

VLM CONTRIBUTION

Adding the entire visual pathway — a 46K-parameter DINOv2-distilled encoder trained on internet-scale images — contributes exactly zero additional AUROC once the physics pathway is saturated. The two systems are orthogonal. Neither degrades the other.

Power & sustainability

Trains on a laptop.
Infers on a light bulb.

Every major world model in the literature was trained on GPU clusters drawing thousands of watts. NeMo-WM was trained on 45W — the power draw of a laptop — and infers at 8W on the AMD NPU. The entire training history consumed less electricity than a single GPU uses in a few hours.

45W

Training power draw

NPU inference draw

<50kWh

Total training energy

$0.58/mo

Always-on inference cost

8× H100 cluster

~10,000W

~$864

Single A100 workstation

~400W

~$35

NeMo-WM training

~45W

~$4

NeMo-WM inference (NPU)

~8W

~$0.58

Raspberry Pi 4 (inference)

~5W

~$0.36

Why low power changes what's possible.

At 8W, NeMo-WM can run on a battery pack. That means drones, field robots, wearables, and remote sensors — anywhere a GPU is not just impractical but physically impossible. A world model that needs 400W can never go in a drone. One that needs 8W can.

At 45W training power, NeMo-WM can be trained anywhere with a standard wall outlet. No data centre access required. No institutional infrastructure. A researcher with an $800 machine and an idea can reproduce these results tonight.

The entire training run — Sprints 1 through 3, hundreds of hours of continuous computation — consumed under 50 kWh. That is several orders of magnitude below comparable GPU-based world model training, and roughly equivalent to driving a car 15 miles.

🔋

Battery deployable

8W NPU inference runs on portable power. Persistent world modelling without mains power or connectivity.

🌱

Minimal carbon footprint

Under 50 kWh for the full training history. The environmental cost of training NeMo-WM is a rounding error compared to GPU clusters.

🌍

Democratic access

45W runs on a UPS battery backup. Researchers in locations with unreliable power infrastructure can train and deploy NeMo-WM.

💰

Always-on economics

$0.58/month for continuous inference. Industrial monitoring, cardiac surveillance, persistent navigation — previously cost-prohibitive use cases become trivial.

The builder

Fifty years in.
Still curious.

Started coding

1970s — TRS-80 with my dad

Education

BS Toy Design + AI Certifications

Industries

Software · Products · SEO · Clothing · TV & Film · AI

Currently building

NeMo-WM — Neuromodulated World Model

Hardware

GMKtec EVO-X2 · AMD Ryzen AI MAX+ 395 · ~$2,000 · Neuromodulator runs on Raspberry Pi

GitHub

github.com/taylorjohn

Constraints are the job, not the obstacle.

I started coding in the 1970s on a TRS-80, sitting next to my dad. Then an Amiga. Then everything that came after. Fifty years of watching compute get faster, cheaper, and more capable — and learning that the interesting problems don't get easier with more hardware. They get different.

Formally I studied Toy Design — which sounds like a detour but isn't. Toy design is systems thinking under tight constraints: how does something work, who uses it, what happens when it breaks, how do you make it do more with less. That framing has followed me through software, product, SEO, clothing, and TV & film production.

Now I'm building NeMo-WM — a neuromodulated world model for edge AI. Seven biologically-inspired reward signals. 1.78M parameters. No GPU. It learns temporal dynamics without the standard JEPA gradient ever firing. The whole project is a toy design problem: maximum capability, minimum resources, runs anywhere.

1970s

TRS-80 First code. Sitting with dad. Learning that machines do exactly what you tell them — for better and worse.

1980s

Amiga Graphics, audio, multitasking before Windows knew what that meant. First real sense of what compute could do.

Degree

BS Toy Design Systems thinking. Physical constraints. User empathy. The hidden CS degree no one talks about.

Career

Software · Products · SEO · Clothing · TV & Film Different industries. Same loop: understand the system, find the leverage, ship something.

AI certs

Formal AI training Structured grounding in ML, neural networks, and the modern stack on top of five decades of intuition.

2026

NeMo-WM Neuromodulated world model. 1.78M parameters. $800 hardware. No GPU. Real results.

Latest findings

What 30 epochs revealed

AUROC · K-SWEEP

The k=2 inversion

At epoch 12, k=1 peaks at 0.9837. By epoch 28, k=2 overtakes it at 0.9578 — the system learned 0.5-second horizons are more discriminable than 0.25-second ones. Mechanistically linked to temporal gap k becoming encoded in the particles at epoch 28.

k=1:0.9459 k=2:0.9578 peak

AIM PROBE · N=1752

Temporal structure emerges

Temporal gap k is null at epoch 12 (p=0.345), encoded at epoch 28 (p=0.031). The expanded probe reveals eight simultaneous physical signals — including ground-truth odometry (p=1.3×10⁻¹⁸) distinct from commanded velocity. The system encodes both what was commanded and what actually happened.

ep12: p=0.345 null

ep28: p=0.031 ENCODED *

GT odometry: p=1.3e-18 NEW ***

DOPAMINE · NON-SATURATION

Peak DA at step 1,081,000

After 28 complete passes through training data, dopamine reached its run peak of DA=0.003. The final six steps sustained DA=0.002. Training ended at peak arousal — never saturated. A fixed schedule would have decayed to near-zero by this point.

Step 1,121,000: DA=0.002 — final step

CORTISOL · EIGHTH SIGNAL

Distribution shift detection

Slow-timescale signal tracking rolling loss excess above baseline. Empirically validated: r=0.768 lag-1 prediction of future loss (p<0.0001). Detected the Sprint 3 distribution shift one epoch ahead. Implemented in neuromodulator v16.12, all tests passing.

Pearson r lag-1: 0.768 (p<0.0001)

SPRINT 6 · CLIP DISTILLATION

Language grounding without LLMs

Dual-head architecture: frozen backbone, SemanticHead (98K) + CLIPBridge (65K). Sprint 6c InfoNCE: 9/9 navigation queries STRONG aligned (2.2–3.3×). Sprint 6d adds null repulsion to fix out-of-distribution rejection. 8,700× compression vs direct CLIP. No LLM required.

Sprint 6c: 9/9 STRONG (2.2–3.3×)

Sprint 6d: null repulsion — active

MOE ROUTER · SPECIALISATION

Expert routing sharpens

Training tracker shows 25/25/25/25 (aux loss enforces uniformity). Inference probe reveals genuine specialisation: by epoch 20, Expert 1 handles 77.8% of RECON decisions, Expert 3 handles 22.2%, Experts 0 and 2 completely excluded.

ep15: E2: 49.7%, E3: 41.6%

ep20: E1: 77.8%, E3: 22.2%

Autonomous Coder

NeMo-WM Writes Code

Beyond robot navigation, NeMo-WM is an autonomous coding agent. It reads a problem, identifies the algorithm from 21 DSA patterns, selects the right template variant from 63 options, and generates working code in 8 languages — with zero LLM calls.

94%

Pattern accuracy on 35 hard LC problems

91%

Template selection (specific variant)

LLM / API calls required

The Process

1. Read — parse problem text, detect input/output types

2. Expand — synonym engine: "add up" → "two sum", "pair sum"

3. Detect — 20+ structural signals: needs_dp, needs_heap, is_sorted...

4. Match — score 21 patterns with boost/penalty system

5. Select — pick exact template: not just "DP" but "coin_change"

6. Generate — output working code in the target language

🧠 See Full Examples →

nemo_solve.py ● Two Sum → Hash Table

# NeMo-WM generated — 0 LLM calls
def two_sum(nums, target):
    seen = {}
    for i, num in enumerate(nums):
        comp = target - num
        if comp in seen:
            return [seen[comp], i]
        seen[num] = i
    return []
# O(n) time, O(n) space

🐍 Python 12

🟨 JS 9

☕ Java 8

⚡ C++ 6

🦀 Rust 4

🐹 Go 2

🔷 TS 1

💜 C# 1

The World Model That Teaches Itself

Extend any world model
with NeMo-WM

26K parameters.
Beats 1034M.

The predictor learns dynamics without the JEPA gradient ever flowing.

Named signals at every step

Empirically verifiable

Regulatory-ready

It navigates
in the dark.

Why low power changes what's possible.

Battery deployable

Minimal carbon footprint

Democratic access

Always-on economics

Constraints are the job, not the obstacle.

The k=2 inversion

Temporal structure emerges

Peak DA at step 1,081,000

Distribution shift detection

Language grounding without LLMs

Expert routing sharpens

Watch the brain signals
in real time.

The World Model That Teaches Itself

Extend any world modelwith NeMo-WM

26K parameters.Beats 1034M.

The predictor learns dynamics without the JEPA gradient ever flowing.

Named signals at every step

Empirically verifiable

Regulatory-ready

It navigatesin the dark.

Why low power changes what's possible.

Battery deployable

Minimal carbon footprint

Democratic access

Always-on economics

Constraints are the job, not the obstacle.

The k=2 inversion

Temporal structure emerges

Peak DA at step 1,081,000

Distribution shift detection

Language grounding without LLMs

Expert routing sharpens

Watch the brain signalsin real time.

Extend any world model
with NeMo-WM

26K parameters.
Beats 1034M.

It navigates
in the dark.

Watch the brain signals
in real time.