Preprint — April 2026 · v18.0 · arXiv endorsed · Submitting

The World Model That Teaches Itself

JEPA fails at 4Hz — adjacent frames are too similar to learn from. NeMo-WM replaces the gradient entirely with eight biological reward signals. AUROC 0.9999 · 26K params · 0.34ms · ~$2,000 edge hardware · 8W inference. 1411× faster than V-JEPA 2-L on identical hardware.

↓ Paper (arXiv — submitting) ↗ View on GitHub → See Results ⚡ Live Dashboard
26K
Parameters
0.34ms
Per Frame (NPU)
0.9999
AUROC k=64
7
Neuro Signals
8W
Inference Power
$800
Full Training Cost
AUROC k=64 0.9999 AUROC k=32 0.9997 AUROC k=16 0.9972 MVTec 15/15 PASS Params 26K Latency 0.34ms Cardiac AUROC 0.7730 SMAP AUROC 0.7730 V-JEPA 2-L 1849ms vs NeMo-WM 1.31ms lin_vel p-value 5×10⁻⁵⁰ yaw p-value 10⁻¹⁸ Speedup vs V-JEPA 2-L 1411× AUROC k=64 0.9999 AUROC k=32 0.9997 AUROC k=16 0.9972 MVTec 15/15 PASS Params 26K Latency 0.34ms Cardiac AUROC 0.7730 SMAP AUROC 0.7730 V-JEPA 2-L 1849ms vs NeMo-WM 1.31ms lin_vel p-value 5×10⁻⁵⁰ yaw p-value 10⁻¹⁸ Speedup vs V-JEPA 2-L 1411×
Open architecture · Drop-in compatible

Extend any world model
with NeMo-WM

The neuromodulator is not a replacement for JEPA — it is an interpretability and grounding layer. Add it to DINO-WM, JEPA-WM, or V-JEPA 2-AC without changing your architecture. Eight scalar computations per batch. Zero new hyperparameters.

your_training_loop.py
# ── your existing JEPA step ───────────────
z_pred   = predictor(encoder(obs), action)
z_target = encoder(obs_next)
L_jepa   = mse(z_pred, z_target)

# ── add NeMo-WM · 3 lines ─────────────────
from neuromodulator import NeuromodulatorState
neuro   = NeuromodulatorState()
signals = neuro.update(z_pred, z_target, gps=gps)

# ── now fully auditable ───────────────────
L_total = L_jepa + signals.weighted_loss()
log(DA=signals.da, CORT=signals.cortisol,
    NE=signals.ne, SHT=signals.sht)
nav.set_goal("open area") # text→GPS 25ms
action, _ = nav.step(z, gps)  # 0.09ms/step
LIVE NEUROMODULATOR SIGNALS
step 1,081,000
DA — Dopamine 0.003 ↑ PEAK
5HT — Serotonin 0.108 ✓ stable
NE — Norepinephrine GPS grounded
ACh — Acetylcholine 0.446 contact
CORT — Cortisol r=0.768 lag-1
eCB — Endocannabinoid habit suppress
E/I — Balance arousal ctrl
Ado — Adenosine fatigue reg
DA · DOPAMINE
Prediction auditability

L_jepa is a single opaque number. DA converts it into a named verdict: trivially clamped vs genuinely surprising. Detectable in 200 training steps — before you waste days training.

DA=0.000 → floor clamped, no gradient
DA=0.003 → peak arousal, amplify
CORT · CORTISOL
Distribution shift — 1 epoch early

Cortisol tracks rolling loss above baseline. Sprint 8d ablation: removing cortisol delays REOBSERVE onset 50× (step 28,500 vs 500) and slows per-epoch compression 3.5×. Empirically validated at r=0.768 lag-1.

r=0.768 lag-1 · p<0.0001
Predictive through lag=5
NE · NOREPINEPHRINE
Spatial grounding supervision

JEPA learns visual dynamics but ignores GPS. NE adds a GPS prediction loss gated on spatial error — no encoder changes, no extra architecture. GPS displacement encodes at p=5.9e-05.

NE spike → spatial failure, now
GPS p=5.9e-05 *** encoded
5HT · SEROTONIN
Collapse prevention — no VICReg

5HT penalises particle distribution collapse by measuring embedding diversity directly. Detects and corrects collapse before L_jepa reflects it. Replaces VICReg or SIGReg with a single biological signal.

5HT<0.05 → collapse risk flag
No extra hyperparameters
DA · NON-SATURATION
Signal that responds to surprise, not time

Fixed cosine schedules decay to zero regardless of what the model still needs to learn. DA peaked at step 1,081,000 — after 28 complete data passes — then sustained at 0.002 through the final six steps.

Peak at s1,081,000 (epoch 28)
Adapts to actual learning progress
SPRINT 6d · LANGUAGE
Text goals without LLMs

CLIP distilled into a 164K-parameter dual-head on the frozen encoder. Text-conditioned navigation at 4Hz. Works on any JEPA encoder — the backbone never changes.

9/9 queries STRONG · 10,906 nodes scored
0.09ms/step · 8,700× smaller than CLIP
NATIVE DEPLOYMENT
Currently running on CORTEX-PE and CORTEX World Model

NeMo-WM was designed for and runs natively on the CORTEX Perception Engine (multi-domain anomaly detection across six sensor domains) and the CORTEX World Model (neuromodulated navigation and planning). Fully compatible with any JEPA-based world model — and not limited to JEPA.

CORTEX-PE ✓ native CORTEX-WM ✓ native DINO-WM compatible JEPA-WM compatible LeWorldModel compatible V-JEPA 2-AC compatible + any encoder with latent predictions
COMPATIBILITY MATRIX
Which benefits apply to which systems
Minimum viable: DA + 5HT + Cortisol · always applicable
System DA audit Cortisol NE+GPS 5HT Language Effort
DINO-WM 2–3 days
JEPA-WM 2–3 days
LeWorldModel replaces SIGReg 1–2 days
V-JEPA 2-AC stage 2 1–2 weeks
Custom encoder if GPS post-distil 1 day
NeMo-WM x V-JEPA 2 · April 2026

26K parameters.
Beats 1034M.

We tested NeMo-WM's 26,561-parameter proprioceptive encoder against Meta's V-JEPA 2 ViT-G (1034M parameters, internet-scale video pre-training) on the same RECON navigation benchmark. Visual scaling does not solve temporal self-localisation. Physics-grounded path integration does.

RECON Hard-Negative AUROC (n=500, k>=32) — Zero-Shot Visual vs Trained Proprio
System
0.85 0.90 0.95 1.00
AUROC
V-JEPA 2 ViT-G1034M params
0.883 VISUAL
NeMo-WM Student46K distilled
0.889
V-JEPA 2 ViT-L326M params
0.907 VISUAL
NeMo-WM k=426K params · 1.0s
0.961 NEMO
NeMo-WM k=1626K params · 4.0s
0.9974 NEMO
NeMo-WM k=3226K params · 8.0s
0.999+ NEMO
NeMo-WM k=6426K params · 32s
0.9999 NEMO ★
V-JEPA 2 ViT-G (1034M) scores 0.883. NeMo-WM proprio (26K) scores 0.9999. Scaling the visual encoder from ViT-L to ViT-G makes performance worse by -0.024.
Physics-grounded path integration outperforms internet-scale video pre-training by +0.114 AUROC from 39,000x fewer parameters. 1411x faster on identical hardware. Visual scaling does not solve temporal self-localisation.
NeMo-WM adds to V-JEPA 2
  • + Proprioceptive path integration (+0.090 AUROC)
  • + Cortisol domain shift — no retraining needed
  • + DA-gated GRASP planner — 30-40% latency cut
  • + ~60% less robot fine-tune data required
  • + Interpretability ablation framework
V-JEPA 2 adds to NeMo-WM
  • + Stronger visual features (+0.018 zero-shot)
  • + Fixes out-of-distribution failure (TwoRoom)
  • + 1M+ hours video generalisation
  • + No re-distillation required
Combined system
  • * V-JEPA 2 head AUROC ~0.93-0.95 fine-tuned
  • * Fusion AUROC ~0.998+ (orthogonal pathways)
  • * Dissociation confirmed at 1B-param scale
  • * NeurIPS 2026 target
ACh Temporal Window Sweep — Superlinear (No GPS, No Vision, 26K params)
k=2 · 0.5s
0.925
k=4 · 1.0s
0.961
k=8 · 2.0s
0.977
k=16 · 4.0s
0.9972
k=32 · 16.0s
0.9997
k=64 · 32.0s ★
0.9999
top1_acc = 1.0000
Hasselmo 1999 biological parallel: low-ACh = broad temporal integration = slow outdoor navigation. k=32 top1_acc=0.9957, AUROC=0.9997. k=64 top1_acc=1.0000, AUROC=0.9999. Saturation at 16-32 seconds for 4Hz navigation.
0.9999
NO-VLM AUROC k=64
26K
PARAMETERS
0.09ms
PER NAV STEP
8W
INFERENCE · $800 HW
8
NEURO SIGNALS
0
JEPA GRADIENT USED
Empirical results
Paper-quality results.
Consumer hardware.

Evaluated on RECON outdoor robot navigation (Berkeley campus, Jackal robot, 4Hz, 545,866 samples) and five additional anomaly detection domains. All results on GMKtec EVO-X2 · AMD Ryzen AI MAX+ 395 · ~$2,000 · No GPU. Sprint 8d cortisol ablation confirmed: cortisol accelerates REOBSERVE onset 50× (step 500 vs 28,500) and provides 3.5× per-epoch compression advantage.

RECON Navigation — k=1
0.9837
Quasimetric AUROC at 0.25-second temporal gap. Temporal discrimination from single frames.
Cardiac Audio — 400 clips
0.8894
Anomaly detection on cardiac audio — competitive with PhysioNet challenge winners.
SMAP/MSL Telemetry
0.8427
Satellite telemetry anomaly detection. Within 3 points of specialised transformers.
MVTec AD — 15/15 PASS
0.8855
Visual inspection across 15 object categories. One shared backbone, no per-domain tuning.
Parameters
1.78M
8.4× fewer parameters than LeWorldModel. Runs entirely on CPU + AMD NPU.
Inference · NPU
0.34ms
StudentEncoder on AMD Ryzen AI NPU, XINT8 quantised. 0.09ms per navigation step end-to-end.
Language Nav · 10,906 nodes
0.09ms
Text-conditioned navigation per step. Language A* faster than GPS A*. No LLM. No retraining. Real GeoLatentDB confirmed.
Temporal Gap
AUROC Score
Value
k = 1
0.9837
k = 2
0.9578
k = 4
0.8959
k = 8
0.8146
k = 16
0.7546
How it works
Eight Signals.
Any World Model.

Every other world model uses a single prediction error to drive learning. NeMo-WM uses eight biologically-inspired neuromodulatory signals, each gating a different aspect of the loss. The JEPA prediction gradient contributed zero across 30 epochs — 1.12 million steps. Cortisol, the eighth signal, detects distribution shift one epoch ahead (r=0.768 lag-1, p<0.0001). Sprint 8d ablation confirmed: cortisol accelerates REOBSERVE onset 50× and provides 3.5× faster per-epoch compression. DA peaked at 0.003 at step 1,081,000 — training closed at peak arousal, never saturated.

DA
DopamineSurprise amplification
Amplifies gradient on high-error batches. Fires when world model is genuinely surprised.
5HT
SerotoninRepresentation diversity
Penalises particle distribution collapse. Maintains embedding diversity.
NE
NorepinephrineSpatial grounding
Amplifies gradient when GPS prediction diverges. Keeps the model spatially oriented.
ACh
AcetylcholineContact detection
Gates inter-particle interaction signal. Selective attention to contact events.
eCB
EndocannabinoidContext novelty
Suppresses gradient for familiar situations. Amplifies for novel contexts.
Ado
AdenosineFatigue regulation
Reduces learning rate under sustained high-gradient updates. Homeostatic balance.
E/I
E/I BalanceArousal control
Controls overall gain of the neuromodulatory system. Global arousal state.
CORT
Cortisol NEW — v16.12Sustained stress / distribution shift
Slow-timescale signal tracking rolling loss excess above baseline. Empirically validated: r=0.768 lag-1 prediction of future loss (p<0.0001). Detected Sprint 3 distribution shift one epoch ahead.

The predictor learns dynamics without the JEPA gradient ever flowing.

In NeMo-WM, L_jepa is clamped at a free-bits floor of 0.5 for all 30 training epochs — contributing zero gradient across 1.12 million steps. Yet the predictor achieves 0.003 MSE at 2-second prediction horizons. The eight neuromodulatory signals drove that learning through GPS, contact, and Gaussian supervision — no explicit prediction objective required.

The non-saturation property: dopamine peaked at DA=0.003 at step 1,081,000 — after the system had seen the full dataset 28 times. Training closed at peak arousal. A fixed schedule would have decayed to zero. The biological reward responded to actual surprise regardless of training duration.

This is the central finding: biological reward signals are sufficient to teach temporal world dynamics. JEPA becomes the evaluation framework, not the mechanism.

[ep29 s1081000] loss=0.5663
L_jepa=0.5000 ← zero gradient, step 1,081,000
DA=0.003 ← peak DA in 30-epoch run
regime=REOBSERVE CORTISOL=0.014

# Final 6 steps, epoch 29:
DA=0.002 ← training ended at peak arousal
L_jepa_real = 0.003 ← learned via neuromodulator
Latent world probe
Particles encode
the physical world.

Using the AIM quantization framework (Liu, 2026), we converted NeMo-WM's K=16 particle embeddings to discrete symbol sequences and measured encoding of physical quantities via chi-squared tests. N=1,752 samples, 150 trajectories, 16 particles, 16 clusters. Ep28 canonical — eight physical signals confirmed.

Physical Quantity
χ²
p-value
MI (bits)
Result
Linear velocity (cmd)
412.7
3.9×10⁻⁴⁸
0.133
ENCODED★★★
Robot heading (yaw)
349.1
5.9×10⁻³⁷
0.096
ENCODED★★★
GT linear vel (odometry)
236.5
1.3×10⁻¹⁸
0.068
ENCODED★★★
Angular velocity (cmd)
165.7
1.47×10⁻⁴
0.047
ENCODED★★★
GT angular vel (odometry)
152.3
3.4×10⁻⁷
0.044
ENCODED★★★
GPS displacement (m)
131.6
5.88×10⁻⁵
0.039
ENCODED★★★
Temporal gap k emerges ep28
64.3
0.031
0.019
ENCODED
Null control (random)
107.0
0.427
0.031
calibration ✓

Eight physical signals confirmed simultaneously. Ground-truth odometry (jackal wheel encoders) is encoded independently of commanded velocity — the system encodes both what was asked and what actually happened. Temporal gap k is null at ep12 (p=0.345) and weakly encoded at ep28 (p=0.031) — a training-dependent dissociation revealing that representational equilibria shift with training duration. The null control (p=0.427) confirms calibration.

Trust & interpretability
Every decision
is readable.

Unlike scalar loss objectives, NeMo-WM's seven neuromodulatory signals provide a continuous, human-readable training and inference narrative. No black box. No post-hoc explanations. The signals are the explanation.

01

Named signals at every step

DA=0.001 means mild surprise. 5HT=0.112 means representation health is good. ACh=0.445 means contact events are active. Every training step produces a readable narrative — not just a loss number.

02

Empirically verifiable

The AIM probe independently confirms what the signals claim. High-DA batches correspond to measurable entropy increases in the quantized particle symbol distribution. The interpretability is verifiable, not assumed.

03

Regulatory-ready

FDA, DoD, and industrial safety regulators require explainability. A cardiac anomaly detector that can explain which signal flagged an event, and why, is deployable where black-box models are not.

# Inference step — readable narrative

frame → StudentEncoder → particles

DA = 0.001 # mild surprise
5HT = 0.113 # diversity healthy
ACh = 0.444 # contact active
NE = 0.021 # GPS grounded
eCB = 0.312 # novel context
Ado = 0.089 # not fatigued
E/I = 0.501 # balanced arousal

regime = REOBSERVE
anomaly_score = 0.041 # normal

# Every value has a biological name.
# Every name has a function.
# Every function is auditable.
🔍
Fully Auditable
Seven named signals explain every gradient step. DA, 5HT, ACh — readable at inference time. No black box.
8W Inference
Runs on the power of a light bulb. Battery-deployable. Always-on monitoring at negligible cost.
🌐
Edge-First
No cloud dependency. No data leaves the device. Sovereign AI that runs where the data is.
🔓
Open Research
Trained on $800 hardware. Full methodology documented. Reproducible by anyone with a modern laptop.
Deployment
Where it runs.
What it solves.

One model. ~$2,000 edge hardware — or Raspberry Pi for inference. No internet required. Sovereign AI that runs where the data is — not where the servers are.

🚁
Defence & Military
Autonomous navigation in GPS-denied environments. Anomaly detection on surveillance feeds. Edge-deployed perception on drones and ground vehicles with zero cloud dependency. Auditable decisions for chain-of-command accountability.
GPS-denied nav Edge inference Auditable AI
🏭
Industrial
Real-time equipment fault detection across six sensor types simultaneously. Predictive maintenance without data leaving the facility. MIMII industrial audio, CWRU bearing, MVTec visual inspection — one model.
Fault detection Six domains On-premise
🏥
Medical
Cardiac anomaly detection at 0.8894 AUROC — competitive with PhysioNet challenge winners. Physiological signal monitoring with the same model that navigates outdoor robots. No PHI leaves the device.
Cardiac audio On-device HIPAA-ready
🤖
Language-Conditioned Navigation
Text goal → GPS waypoint in 25ms. Per-step navigation at 0.09ms. "Navigate to the open outdoor area near the road" selects the correct GPS target from 10,906 candidates with no LLM and no retraining. Language A* is faster than GPS A*.
No LLM 0.09ms / step 10,906 nodes
🚗
Automotive
Onboard world modelling for ADAS on consumer-grade processors. 0.34ms per frame leaves 97% of the compute budget for the rest of the perception stack. Real outdoor navigation data, real results.
ADAS Real-time Low power
🍓
Embedded & Edge
The neuromodulator adds eight scalar operations per batch — negligible on any hardware. Inference-only deployment targets Raspberry Pi 5, Coral Edge TPU, or any ARM device. Training requires ~$2,000 edge hardware; inference runs anywhere.
Raspberry Pi Edge TPU ARM compatible
🔒
Security
Physical anomaly detection at the edge. No video leaving the premises. Satellite telemetry monitoring at 0.8427 AUROC. The same neuromodulator that learns navigation dynamics learns security anomalies.
On-premise Telemetry Multi-sensor
🛰
Space & Remote
SMAP satellite telemetry anomaly detection at 0.8427 AUROC. Low-power edge deployment for environments where cloud connectivity is unavailable or too slow. Designed for real sensor constraints.
Satellite Low bandwidth Autonomous
No camera · No GPS · No light · AUROC 0.9999

It navigates
in the dark.

NeMo-WM's proprioceptive encoder achieves AUROC 0.9999 using only velocity, angular rate, heading, and contact — the same signals a mammal uses when its visual cortex is lesioned. No camera. No GPS. No radio. No light required.

Sensor requirements for blind operation
VEL
Wheel encoder
Linear velocity · ~1mW · <$5
IMU
IMU gyroscope + magnetometer
Angular rate + heading · ~2mW · <$5
CTL
Wheel current draw
Contact detection · 0mW · $0
Total additional cost ~$10 · 3mW
Camera required NO
GPS required NO
Light required NO
Where blind operation enables deployment
Complete darkness
Caves, night ops, power outages
Camera occlusion
Dust, mud, smoke, ice, water spray
Underwater & underground
Subsea pipelines, mine shafts, tunnels
Sensor failure fallback
Camera fails → continue on IMU alone
GPS-denied environments
Indoor, urban canyon, jamming
Extreme lighting change
Day/night, welding flash, IR-only
The biological parallel — why this works
Mammalian path integration

Rodents with visual cortex lesions still navigate familiar mazes. Head direction cells (heading), velocity afferents (wheel encoder), and proprioception (contact) maintain a spatial map entirely without visual input. McNaughton et al. 2006; Moser et al. 2008.

Head direction cells → sin θ, cos θ, Δθ
Velocity afferents → wheel encoder (vel)
Vestibular system → IMU (ang rate)
Proprioception → contact signal
Entorhinal grid cells → attn pooling k=32
Heading dominance — timescale-invariant

Heading signal dominates velocity at every timescale tested. HD:vel ratio ranges from ∞:1 (fine scale, k_pos=1) to 9:1 (k_pos=4). Removing heading collapses AUROC by up to −0.228. Removing velocity alone drops AUROC by at most −0.010.

HD lesion vs velocity lesion (k_ctx=4)
HD lesion
−0.228
Vel lesion
−0.009
0.9997
PROPRIO ONLY · NO VISION
0.9997
FULL (VLM + PROPRIO)
+0.000
VLM CONTRIBUTION
Adding the entire visual pathway — a 46K-parameter DINOv2-distilled encoder trained on internet-scale images — contributes exactly zero additional AUROC once the physics pathway is saturated. The two systems are orthogonal. Neither degrades the other.
vs. prior work
One model.
Every advantage.
Feature
DINO-WM
LeWM
PLDM
NeMo-WM
No GPU required
Parameters
~400M
15M
~50M
26K
Multi-domain (6 domains)
Interpretable training signals
Real-time edge deployment
✓ 0.34ms
Auditable per-step decisions
GPS-grounded spatial memory
Proprioceptive path integration
X
X
X
0.9974
Beats V-JEPA 2 ViT-G (1034M)
X
X
X
+0.114
Power & sustainability
Trains on a laptop.
Infers on a light bulb.

Every major world model in the literature was trained on GPU clusters drawing thousands of watts. NeMo-WM was trained on 45W — the power draw of a laptop — and infers at 8W on the AMD NPU. The entire training history consumed less electricity than a single GPU uses in a few hours.

45W
Training power draw
8W
NPU inference draw
<50kWh
Total training energy
$0.58/mo
Always-on inference cost
System
Power
Monthly cost
8× H100 cluster
~10,000W
~$864
Single A100 workstation
~400W
~$35
NeMo-WM training
~45W
~$4
NeMo-WM inference (NPU)
~8W
~$0.58
Raspberry Pi 4 (inference)
~5W
~$0.36

Why low power changes what's possible.

At 8W, NeMo-WM can run on a battery pack. That means drones, field robots, wearables, and remote sensors — anywhere a GPU is not just impractical but physically impossible. A world model that needs 400W can never go in a drone. One that needs 8W can.

At 45W training power, NeMo-WM can be trained anywhere with a standard wall outlet. No data centre access required. No institutional infrastructure. A researcher with an $800 machine and an idea can reproduce these results tonight.

The entire training run — Sprints 1 through 3, hundreds of hours of continuous computation — consumed under 50 kWh. That is several orders of magnitude below comparable GPU-based world model training, and roughly equivalent to driving a car 15 miles.

🔋

Battery deployable

8W NPU inference runs on portable power. Persistent world modelling without mains power or connectivity.

🌱

Minimal carbon footprint

Under 50 kWh for the full training history. The environmental cost of training NeMo-WM is a rounding error compared to GPU clusters.

🌍

Democratic access

45W runs on a UPS battery backup. Researchers in locations with unreliable power infrastructure can train and deploy NeMo-WM.

💰

Always-on economics

$0.58/month for continuous inference. Industrial monitoring, cardiac surveillance, persistent navigation — previously cost-prohibitive use cases become trivial.

The builder
Fifty years in.
Still curious.
Started coding
1970s — TRS-80 with my dad
Education
BS Toy Design + AI Certifications
Industries
Software · Products · SEO · Clothing · TV & Film · AI
Currently building
NeMo-WM — Neuromodulated World Model
Hardware
GMKtec EVO-X2 · AMD Ryzen AI MAX+ 395 · ~$2,000 · Neuromodulator runs on Raspberry Pi

Constraints are the job, not the obstacle.

I started coding in the 1970s on a TRS-80, sitting next to my dad. Then an Amiga. Then everything that came after. Fifty years of watching compute get faster, cheaper, and more capable — and learning that the interesting problems don't get easier with more hardware. They get different.

Formally I studied Toy Design — which sounds like a detour but isn't. Toy design is systems thinking under tight constraints: how does something work, who uses it, what happens when it breaks, how do you make it do more with less. That framing has followed me through software, product, SEO, clothing, and TV & film production.

Now I'm building NeMo-WM — a neuromodulated world model for edge AI. Seven biologically-inspired reward signals. 1.78M parameters. No GPU. It learns temporal dynamics without the standard JEPA gradient ever firing. The whole project is a toy design problem: maximum capability, minimum resources, runs anywhere.

1970s
TRS-80 First code. Sitting with dad. Learning that machines do exactly what you tell them — for better and worse.
1980s
Amiga Graphics, audio, multitasking before Windows knew what that meant. First real sense of what compute could do.
Degree
BS Toy Design Systems thinking. Physical constraints. User empathy. The hidden CS degree no one talks about.
Career
Software · Products · SEO · Clothing · TV & Film Different industries. Same loop: understand the system, find the leverage, ship something.
AI certs
Formal AI training Structured grounding in ML, neural networks, and the modern stack on top of five decades of intuition.
2026
NeMo-WM Neuromodulated world model. 1.78M parameters. $800 hardware. No GPU. Real results.
What 30 epochs revealed
AUROC · K-SWEEP

The k=2 inversion

At epoch 12, k=1 peaks at 0.9837. By epoch 28, k=2 overtakes it at 0.9578 — the system learned 0.5-second horizons are more discriminable than 0.25-second ones. Mechanistically linked to temporal gap k becoming encoded in the particles at epoch 28.

k=1:0.9459 k=2:0.9578 peak
AIM PROBE · N=1752

Temporal structure emerges

Temporal gap k is null at epoch 12 (p=0.345), encoded at epoch 28 (p=0.031). The expanded probe reveals eight simultaneous physical signals — including ground-truth odometry (p=1.3×10⁻¹⁸) distinct from commanded velocity. The system encodes both what was commanded and what actually happened.

ep12: p=0.345 null
ep28: p=0.031 ENCODED *
GT odometry: p=1.3e-18 NEW ***
DOPAMINE · NON-SATURATION

Peak DA at step 1,081,000

After 28 complete passes through training data, dopamine reached its run peak of DA=0.003. The final six steps sustained DA=0.002. Training ended at peak arousal — never saturated. A fixed schedule would have decayed to near-zero by this point.

Step 1,121,000: DA=0.002 — final step
CORTISOL · EIGHTH SIGNAL

Distribution shift detection

Slow-timescale signal tracking rolling loss excess above baseline. Empirically validated: r=0.768 lag-1 prediction of future loss (p<0.0001). Detected the Sprint 3 distribution shift one epoch ahead. Implemented in neuromodulator v16.12, all tests passing.

Pearson r lag-1: 0.768 (p<0.0001)
SPRINT 6 · CLIP DISTILLATION

Language grounding without LLMs

Dual-head architecture: frozen backbone, SemanticHead (98K) + CLIPBridge (65K). Sprint 6c InfoNCE: 9/9 navigation queries STRONG aligned (2.2–3.3×). Sprint 6d adds null repulsion to fix out-of-distribution rejection. 8,700× compression vs direct CLIP. No LLM required.

Sprint 6c: 9/9 STRONG (2.2–3.3×)
Sprint 6d: null repulsion — active
MOE ROUTER · SPECIALISATION

Expert routing sharpens

Training tracker shows 25/25/25/25 (aux loss enforces uniformity). Inference probe reveals genuine specialisation: by epoch 20, Expert 1 handles 77.8% of RECON decisions, Expert 3 handles 22.2%, Experts 0 and 2 completely excluded.

ep15: E2: 49.7%, E3: 41.6%
ep20: E1: 77.8%, E3: 22.2%
Live Neuromodulator Dashboard

Watch the brain signals
in real time.

ACh, dopamine, cortisol, norepinephrine, serotonin, eCB — all seven neuromodulator signals visualised live. Watch how the model adapts its temporal integration window as it navigates.

ACh
Temporal window
DA
Reward prediction
CORT
Domain shift
NE
Novelty response
⚡ Open Live Dashboard →
Autonomous Coder
NeMo-WM Writes Code
Beyond robot navigation, NeMo-WM is an autonomous coding agent. It reads a problem, identifies the algorithm from 21 DSA patterns, selects the right template variant from 63 options, and generates working code in 8 languages — with zero LLM calls.
94%
Pattern accuracy on 35 hard LC problems
91%
Template selection (specific variant)
0
LLM / API calls required
The Process
1. Read — parse problem text, detect input/output types
2. Expand — synonym engine: "add up" → "two sum", "pair sum"
3. Detect — 20+ structural signals: needs_dp, needs_heap, is_sorted...
4. Match — score 21 patterns with boost/penalty system
5. Select — pick exact template: not just "DP" but "coin_change"
6. Generate — output working code in the target language
🧠 See Full Examples →
nemo_solve.py ● Two Sum → Hash Table
# NeMo-WM generated — 0 LLM calls
def two_sum(nums, target):
    seen = {}
    for i, num in enumerate(nums):
        comp = target - num
        if comp in seen:
            return [seen[comp], i]
        seen[num] = i
    return []
# O(n) time, O(n) space
🐍 Python 12
🟨 JS 9
☕ Java 8
⚡ C++ 6
🦀 Rust 4
🐹 Go 2
🔷 TS 1
💜 C# 1