What is LeWorldModel (LeWM)?

LeWorldModel, abbreviated LeWM, is a self-supervised world model in the Joint Embedding Predictive Architecture (JEPA) family. Its central innovation is replacing the seven separate regularizer terms used in earlier JEPA variants with one Gaussian-matching loss called SIGReg, controlled by a single hyperparameter. The architecture and training code are released publicly.

Where is the LeWorldModel GitHub code?

LeWorldModel's reference implementation is hosted as a public repository under the LeWM project. The released code covers the encoder-predictor architecture, the SIGReg loss, the training loop, and the evaluation harness. Search the LeWM organization on GitHub or follow the link from the LeWorldModel lab on jakecuth.com.

SIGReg is the Gaussian-matching regularizer at the core of LeWorldModel. It pushes the empirical distribution of learned embeddings toward a unit Gaussian by penalizing deviations of the embedding moments (mean, covariance) from the corresponding Gaussian moments. The result is that the representation collapse problem and the variance/covariance/decorrelation regularizers from earlier JEPA training (VICReg-style) are all subsumed into one objective with one tuning knob.

How does LeWorldModel relate to JEPA?

JEPA (Joint Embedding Predictive Architecture) is Yann LeCun's proposed framework for self-supervised representation learning where one network predicts the embedding of a target from the embedding of a context. LeWorldModel is a specific implementation of JEPA aimed at world models (predicting future states from past states). The lineage runs from JEPA to V-JEPA to I-JEPA to LeWM, with LeWM's contribution being the SIGReg simplification rather than a new architectural family.

Why replace seven loss terms with one?

Two reasons. First, hyperparameter search across seven coupled regularizer weights is expensive and rarely converges to a stable recipe. Second, the seven terms are themselves attempting to enforce one underlying property: that embeddings span the space without collapsing. SIGReg targets that property directly via the Gaussian moment-matching, which is mathematically the cleanest way to say variance, covariance, and decorrelation in one objective.

LeWorldModel (LeWM) Explained · Yann LeCun's JEPA with SIGReg

LeWorldModel, LeWM for short, is the cleanest JEPA-family world model released to date. The headline simplification: it replaces the seven regularizer terms that prior JEPA recipes carried with one Gaussian-matching loss called SIGReg, controlled by one hyperparameter. This note is the text version of what the LeWorldModel lab covers visually.

The JEPA family in one paragraph

Joint Embedding Predictive Architecture (JEPA) is Yann LeCun's framework for self-supervised representation learning. The setup is simple: two networks, an encoder that maps inputs into embeddings, and a predictor that takes an embedding of a context and produces an embedding of a target. The loss is computed in embedding space, not pixel space. Predicting in embedding space is the reason JEPA can ignore the rendering details that distract pixel-space models from learning useful structure.

The lineage: JEPA → V-JEPA → I-JEPA → LeWM. Each successor either widened the application (V-JEPA for video, I-JEPA for images) or simplified the recipe. LeWM is in the simplification branch.

The collapse problem

Any encoder that produces an embedding can cheat by producing the same embedding for everything. The loss is zero; the representation is useless. This is called representation collapse, and it has dominated self-supervised learning since the field began.

The solutions historically:

Contrastive learning: push negatives away from positives. Works, but needs large batch sizes and careful negative mining.
Stop-gradient + EMA: BYOL-style asymmetry between the two encoders. Works, but is finicky.
Explicit regularizers: VICReg-style losses that enforce variance, covariance, and decorrelation properties on the embeddings. Works, but introduces multiple hyperparameters that interact.

Earlier JEPA recipes leaned on the third option, accumulating seven coupled regularizer terms over successive papers. SIGReg replaces all seven with one.

SIGReg, the Gaussian-matching loss

The insight: the seven regularizer terms were all approximations to one underlying property. They wanted the embedding distribution to look like a unit Gaussian. SIGReg targets that directly.

Sketch of the loss:

# z: batch of embeddings, shape (B, D)
mu  = z.mean(dim=0)                # should match Gaussian mean (0)
cov = (z - mu).T @ (z - mu) / B    # should match identity covariance

loss_sigreg = (mu ** 2).sum() + ((cov - I) ** 2).sum()

First and second moment matching, scaled by one coefficient. That's the whole regularizer. The mean term prevents drift; the covariance term prevents both collapse (cov → 0) and feature redundancy (off-diagonal entries ≠ 0). Variance, covariance, and decorrelation all fall out of one term because they were always one property in disguise.

The training loop in code

for batch in loader:
    ctx, tgt = batch.context, batch.target

    z_ctx = encoder(ctx)
    z_tgt = ema_encoder(tgt).detach()
    pred  = predictor(z_ctx)

    loss_pred   = ((pred - z_tgt) ** 2).mean()
    loss_reg    = sigreg(z_ctx) + sigreg(pred)
    loss        = loss_pred + lambda_sigreg * loss_reg

    loss.backward()
    optimizer.step()
    ema_update(ema_encoder, encoder, m=0.999)

One prediction loss, one regularizer, one weight to tune (lambda_sigreg), one EMA target. The structural reduction from "tune seven knobs to find the corner of phase space that does not collapse" to "tune one knob" is the headline.

Why this matters for world models

World models predict future states from past states. The signal is dense but the prediction targets are abstract: not pixels, but the latent state of the environment. JEPA's embedding-space prediction is the right substrate for this, but the recipe complexity made world-model training a research-team specialty. SIGReg lowers the floor enough that smaller groups can train working JEPA world models. That is the practical lever.

What to read and run

The reference implementation is public under the LeWM organization on GitHub. The training code, evaluation harness, and pretrained checkpoints are all there. The interactive walkthrough with diagrams of the encoder/predictor split, the SIGReg loss surface, and the comparison to V-JEPA is at the LeWorldModel lab.

For the long-context language-model side of the same broader question (how do we build models that represent extended state efficiently), see the DeepSeek-V4 architecture note and the Subquadratic explained note.

jepa world-models self-supervised lewm

NOTE 007 2026-05-17 · world models · jepa

LeWorldModel (LeWM), explained.