Mezzanine is a small research toolkit for distilling symmetry‑marginalized invariants (“distill the expectation”), and for measuring instability (the warrant gap) and distilling symmetry‑marginalized invariant world models into a single forward pass.
World models ≠ pixel prediction.
Warranted inference ≠ maximum likelihood on a single realization.
- Adapters registry: HuggingFace Datasets, LeRobot (HF robotics datasets), Gymnasium, I‑PHYRE
- Symmetries registry: view / order / factorization / action‑shuffle
- Backbone registry: HF vision (I‑JEPA/ViT/DINOv2/…), CLIP vision, DINOv2, HF language encoders
- Recipes registry: runnable end‑to‑end presets (start with I‑PHYRE)
Latents are cached to disk keyed by:
- world fingerprint (dataset config + deterministic subsampling)
- encoder fingerprint (checkpoint + pooling/layer config)
This makes experiments cheap to re-run and easy to reproduce/share.
A generic AutoTuner is included (used by recipes as needed) to search for regimes that:
- are not trivially easy (no signal)
- are not impossible (dead)
- maximize the effect size you care about (e.g., action helps vs no‑action)
- YAML/JSON config loading
- deterministic subsampling
- global seeding helpers
--log wandb(if installed)--log tensorboard(if installed)- default is no-op logging
Minimal:
pip install -e .With YAML configs:
pip install -e ".[yaml]"With i‑PHYRE:
pip install -e ".[iphyre]"With quark/gluon jets (EnergyFlow):
pip install -e ".[qg]"
pip install pillow # only needed for --make_gifWith NeuralGCM / WeatherBench2 (Zarr IO):
pip install -e ".[weather]"Everything:
pip install -e ".[all]"List runnable recipes:
mezzanine listList built-in components:
mezzanine list-adapters
mezzanine list-encoders
mezzanine list-symmetriesRun the I‑PHYRE latent dynamics distillation (physics puzzles):
mezzanine run iphyre_latent_dynamics --out out_iphyre \
--games hole,seesaw \
--n_train 3000 --n_test 768 \
--delta_seconds 4.0 \
--embed_mode mean_std --embed_layer -4Add caching:
mezzanine run iphyre_latent_dynamics --out out_iphyre \
--cache_dir ~/.cache/mezzanine_latents \
--games hole,seesawUse a config file as defaults, override on CLI:
mezzanine run iphyre_latent_dynamics --out out_iphyre \
--config configs/iphyre.yml \
--delta_seconds 2.0Enable W&B logging (optional):
mezzanine run iphyre_latent_dynamics --out out_iphyre \
--log wandb --wandb_project mezzanine \
--games hole,seesawOutputs (in --out):
results.jsondiagnostics.pngmontage.png
These small numerical experiments make the compute story concrete:
- A symmetry-corrected teacher often needs K forward passes per example (orbit-averaging over symmetry views).
- A distilled student matches that orbit-averaged target in one forward pass (lower latency / cost at inference).
Generate toy datasets:
python examples/kepler_generate_dataset.py --out data/kepler_root_toy.npz --n_train 50000 --n_test 10000
python examples/linear_system_generate_dataset.py --out data/linear_system_toy.npz --n_train 50000 --n_test 10000
python examples/ode_generate_dataset.py --out data/ode_lorenz_toy.npz --system lorenz --n_traj 400 --t_max 40 --dt 0.01
python examples/integration_generate_dataset.py --out data/integration_toy.npz --n_train 50000 --n_test 10000 --n_grid 128
python examples/eigen_generate_dataset.py --out data/eigen_toy.npz --n_train 20000 --n_test 5000 --n 64 --density 0.05 --k 5Run a recipe:
mezzanine run kepler_root_distill --out out_kepler --dataset data/kepler_root_toy.npz --k_train 4 --k_test 16Symmetry-aware kernel demo (Triton):
python examples/cg_fused_kernel_demo.py --B 4096 --I 256 --J 256 --O 256 --P 32 --dtype bf16 --outdir cg_demo_outMore details: docs/physics_addons.md.
This recipe measures the warrant gap under particle permutation + internal SO(2) rotations and distills a single‑pass student that approximates the symmetry‑marginalized predictor.
Run:
mezzanine run qg_jets_distill --out out_qg \
--encoder qg_flatten \
--num_data 50000 --n_train 20000 --n_test 5000 \
--max_particles 64 \
--k_train 8 --k_test 16 \
--theta_max 6.283185307179586 \
--hard_label_weight 0.1 \
--make_gif --gif_bins 180 --gif_extent 3.2 --gif_ms 90Notes:
--cache_diris for the latent cache; dataset caching is--ef_cache_dir(defaults to~/.energyflow).- If the GIF looks empty, increase
--gif_extent(typicalphirange is about[-π, π]).
Outputs (in --out):
results.jsonprobs_test_views.npzjet_nuisance.gif(only if--make_gif)
This recipe measures a regression warrant gap under ensemble-member exchangeability (and optionally a lightweight field “codec” symmetry), then distills the ensemble mean into a single-pass head.
mezzanine run neuralgcm_ens_warrant_distill --out out_neuralgcm \
--lead_hours 24 \
--variables temperature,geopotential \
--num_members 50 --k_train 8 --k_test 16 \
--steps 2000 --batch 8192Notes:
- The default
--members_zarr/--mean_zarrpoint at publicgs://weatherbench2/...Zarr stores. - Use
--heroto emithero.gifand per-lead diagnostics.
Outputs (in --out):
results.jsondiagnostics.pnghead_plain.pt,head_sym.pt
This recipe mirrors the same latent-dynamics distillation + action-shuffle counterfactual
as iphyre_latent_dynamics, and adds a lightweight downstream demo:
goal-conditioned action retrieval (select the action that reaches a target latent state).
It also includes an optional V-JEPA 2-AC-style 1-step CEM planning objective in latent space as an offline proxy.
mezzanine run lerobot_latent_dynamics --out out_pusht \
--repo_id lerobot/pusht_image \
--camera_key observation.image \
--action_key action \
--n_train 4000 --n_test 2000 --delta_steps 1 \
--do_planning --plan_candidates 32 --plan_eval 512 \
--do_cem --cem_eval 128mezzanine run lerobot_latent_dynamics --out out_aloha_sim \
--repo_id lerobot/aloha_sim_transfer_cube_scripted_image \
--camera_key observation.images.top \
--action_key action \
--n_train 4000 --n_test 2000 --delta_steps 1 \
--do_planningmezzanine run lerobot_latent_dynamics --out out_libero \
--repo_id lerobot/libero_10_image_subtask \
--camera_key observation.images.image \
--action_key action \
--n_train 4000 --n_test 2000 --delta_steps 1 \
--do_planningMake/Break condition: all latent-dynamics recipes use the same criterion: action must improve retrieval (vs no-action) and action-shuffle must hurt, using mean-rank/R@10 thresholds.
Create mezzanine/worlds/my_world.py and register:
from mezzanine.registry import ADAPTERS
@ADAPTERS.register("my_world")
class MyWorldAdapter(WorldAdapter):
...Create mezzanine/symmetries/my_symmetry.py and register:
from mezzanine.registry import SYMMETRIES
@SYMMETRIES.register("my_symmetry")
class MySymmetry(Symmetry):
...Create mezzanine/encoders/my_encoder.py and register:
from mezzanine.registry import ENCODERS
@ENCODERS.register("my_encoder")
class MyEncoder(Encoder):
...This recipe demonstrates the same "distill the expectation" pattern in language:
mezzanine run hf_text_order_distill --out out_text \
--dataset ag_news --n_train 5000 --n_test 2000 \
--model_name distilbert-base-uncased \
--k_train 8 --k_test 16It measures how much a classifier's predictions change when you shuffle the order of sentences, and then distills the symmetry-marginalized teacher into a single-pass student head.
See also:
-
docs/plugin_api.md(minimal plugin API + templates) -
docs/llm_examples.md(LLM-oriented usage patterns) -
hf_llm_hiddenstate_order_distill— logits→hidden-state distillation for order-robust BoolQ.
This recipe uses a simple tabular head on return-window features, measures the bar-offset warrant gap, then distills the symmetry-marginalized teacher back into a single-pass student:
The repo includes a small public OHLCV sample CSV for offline runs/tests:
examples/data/spy_daily_ohlcv.csv— SPY.US daily bars (Stooq export), columnsDate,Open,High,Low,Close,Volume(2000 rows)
mezzanine run finance_csv_bar_offset_distill --out out_finance \
--path examples/data/spy_daily_ohlcv.csv \
--timestamp_col Date --close_col Close \
--n_train 1400 --n_test 400 \
--lookback 32 --max_offset 1 --trend_lookback 128 \
--k_train 8 --k_test 16Outputs (in --out):
results.json(includes amake_breakverdict)diagnostics.png