diff --git a/.gitignore b/.gitignore
index 3501d70e6..072bd184a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -125,3 +125,12 @@ uv.lock
 # venvstacks build artifact: regenerated from venvstacks.toml + pyproject.toml
 # by packaging/build.py:_generate_venvstacks_toml() on every Swift app build.
 packaging/_venvstacks_resolved.toml
+.claude/
+
+# Video generation artifacts (job outputs live under {base_path}, never in-repo;
+# these guard against measurement/test runs writing into the worktree)
+*.mp4
+*.mp4.metadata.json
+video-jobs/
+video-artifacts/
+p0_video/
diff --git a/docs/upstream-sync.md b/docs/upstream-sync.md
index 810ee4226..33f2763e8 100644
--- a/docs/upstream-sync.md
+++ b/docs/upstream-sync.md
@@ -427,3 +427,29 @@ Qwen-Gemma / oQ)的相关度。
   破坏部分 HTTP 客户端 / Copilot CLI
 
 > 下次 review 上游 open PR 时,把结论(引入 / 跳过)回填到对应小节。
+
+---
+
+## 2026-06-11 分化标记: 视频生成引擎 (fmlx 自有, 永不回流)
+
+feat/video-engine 引入文生视频引擎 (Wan2.2 T2V A14B via mlx-gen, 设计
+docs/video-generation-engine-spec.md). 这是 fmlx 与上游的有意分化,
+不向上游 PR. 对上游同源文件的补丁面 (cherry-pick 撞冲突时参考):
+
+- model_discovery.py: ModelType/EngineType Literal + model_index.json
+  识别分支 + _register_model 视频臂与跳过过滤
+- engine_pool.py: Literal + 映射 + get_engine 入口 video 拒绝臂 +
+  _load_engine 防御臂
+- server.py: video 路由挂载 / pre-pool 400 / 默认模型 chat-capable 过滤 /
+  ModelInfo.model_type / lifespan 构造与关停 VideoJobManager
+- process_memory_enforcer.py: 视频内存租约 (acquire/set pid/release +
+  ceiling 扣减 + 动态 ceiling 加回)
+- settings.py: VideoSettings section + huggingface.disable_xet
+- admin/routes.py: valid_types/type_to_engine + 列表与删除门放宽 +
+  global-settings video 字段
+- cli.py: HF_HUB_DISABLE_XET 注入
+- exceptions.py: ModelTypeNotLoadableError
+
+全新文件 (无冲突面): omlx/video/*, omlx/api/video_models.py,
+omlx/api/video_routes.py, tests/test_video_*.py,
+scripts/video_p0_measure.py.
diff --git a/docs/video-generation-engine-spec.md b/docs/video-generation-engine-spec.md
new file mode 100644
index 000000000..2c37cfc27
--- /dev/null
+++ b/docs/video-generation-engine-spec.md
@@ -0,0 +1,627 @@
+# fmlx 视频生成引擎 spec (Wan2.2 T2V, mlx-gen 运行时)
+
+状态: 设计稿 v2 (2026-06-10), 未实现, 待拍板.
+v2 = v1 经 6 视角对抗评审修订: 22 条 blocker/major 发现全部确认并吸收
+(拒绝臂位置, OpenAI SDK multipart 兼容, Metal wired 双进程治理, 租约双重
+计数, venv 污染, A/B 协议算术错误等). 评审记录见会话工单.
+定位: Flyto MLX (fmlx) 自有功能, 不回流上游 (soft-fork 自有分化, 参见 §10).
+本文档所有代码事实均经子代理逐行核实 (file:line 可验证), mlx-gen 事实核实
+到其源码与 pyproject (2026-06-10, v0.18.14).
+
+## 0. 背景与定位
+
+fmlx 当前是 LLM/VLM/audio 推理引擎. 战略方向调整: Apple Silicon 单机统一内存
+(128GB 级) 对本地多媒体生成 (文生视频/文生图) 是结构性优势 -- 大权重 + 大激活
+全在 UMA 里, 不需要多卡切分. fmlx 要把 "单机多媒体" 做成与上游 oMLX 的核心差异.
+
+第一个落点: Wan2.2 T2V A14B (MLX 量化 8bit, diffusers 布局, 42.4GB) 已完整下载
+并逐文件校验通过, 位于 m5max `~/.fmlx/models/AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit`.
+该权重就是为纯 MLX 运行时 mlx-gen 制作的 (safetensors dtype = U32+scales+biases,
+即 mx.quantize 格式), mlx-gen 文档的示例命令逐字引用这个 repo.
+
+## 1. 目标与非目标
+
+目标 (MVP, P1):
+
+1. fmlx 能发现 diffusers 布局的视频模型 (model_index.json), 类型化为 `video`,
+   在 /v1/models 与 admin 列表中正确展示, 可删除, 不污染 chat 模型列表,
+   不会成为隐式默认模型.
+2. 新增 OpenAI 形态的异步 job API: POST /v1/videos 提交, GET 轮询, list 枚举,
+   content 下载. 官方 openai SDK 的 client.videos.* 可直接打通 (含其
+   multipart 提交形态).
+3. 生成跑在独立 venv 的 subprocess worker 里, 与 LLM 服务进程隔离; worker
+   自身被 Metal wired limit 钉死在租约内 (预防性, 非反应性).
+4. 视频任务持有内存租约 (lease), 经现有 ProcessMemoryEnforcer 单一咽喉点
+   传播. 与中小 LLM (权重 + 工作集能与 lease 共存于 ceiling 内) 真共驻;
+   与超大 LLM (如 glm4.5 85GB) 是设计上的互斥 -- job 排队等内存, 不硬挤,
+   不重蹈 m5max kernel panic.
+5. 全链路在 m5max 真机 A/B 验证后才可合并 (本项目铁律: 单测过 != 真机过).
+
+非目标 (MVP 不做, 部分进 P2):
+
+- SSE 进度流 (轮询够用), 图生视频 (I2V) 输入上传, 文生图 (FLUX 系), TI2V-5B,
+  admin 专属视频 UI 页, 多并发生成 (Semaphore(1) 一次一个), 分布式队列,
+  ModelScope 下载视频模型 (有 flat symlink 陷阱, 见 §4.1), 训练/LoRA,
+  为视频任务主动驱逐已加载 LLM (MVP 只被动排队, 驱逐策略 P2).
+
+## 2. 关键事实 (设计依据)
+
+以下两小节分别是外部运行时与本仓代码的核实结论, 全部影响 §3 的架构取舍.
+
+### 2.1 mlx-gen 运行时
+
+| 维度 | 事实 |
+|---|---|
+| 真身 | filipstrand/mflux 的 fork; Python 包名是 `mflux`, `import mlxgen` 只是 sys.modules 别名 |
+| 视频类 | `mflux.models.wan.variants.Wan2_2_TI2V` 一个类管全部 Wan 变体; `ModelConfig.wan2_2_t2v_a14b()` + `model_path=<本地目录>` 即可加载我们已下载的目录 (路径解析规则 1 "exists_locally" 短路一切) |
+| 生成 API | `generate_video(seed, prompt, steps, height, width, num_frames, fps, ..., progress_callback)` 阻塞同步, batch=1; ProgressEvent 带 phase/step/total_steps |
+| 取消 | 无一等取消 API; callback 里抛异常可中断但实例报废 -- 健壮取消 = 杀 subprocess |
+| 依赖 | 不是纯 MLX: torch 是硬依赖 (UMT5 text encoder 走 torch/CPU), 另有 transformers>=5, huggingface-hub>=1.1.6,<2, opencv, matplotlib, av (PyAV 自带 ffmpeg wheel); twine 混在 runtime deps 里 (供应链卫生信号, 计入 §9 风险评级) |
+| 输出 | GeneratedVideo (PIL frames + 元数据), .save() 写 MP4 + 健康校验 + metadata sidecar |
+| license / 版本 | MIT; v0.18.14 (2026-06-08); 两周内 15 个 release, bus factor 1 (lpalbou) |
+| 实测内存 (官方, M5 Max 128GB) | T2V A14B q8: 物理峰值 20.7 GiB, MLX 峰值 15.5 GiB, 154.8s @ 384x224, 33 帧, 12 步; 生产分辨率 (480x240, 101 帧, 25 步) 约 30 分钟. 注意: 这是唯一公开测点, 是小 profile |
+| 多线程 | 文档明言 model 实例有状态, 必须串行访问 |
+
+依赖结论: mlx-gen 的 transformers>=5 / hf-hub<2 / torch 与 fmlx 主 venv 共装冲突
+风险高且无必要 -- 这是 subprocess + 独立 venv 方案的第一推力.
+
+### 2.2 fmlx 代码侧事实 (全部 file:line 已核实)
+
+- 发现机制只认根 config.json (`_is_model_dir`, model_discovery.py:697-699).
+  Wan2.2 目录在 owner/repo 两级布局下整体隐身; 但在 FLAT 布局下 (恰好是
+  ModelScope 下载器产出的 symlink 形态, ms_downloader.py:665) 会被当成 org
+  文件夹下钻, transformer/ transformer_2/ vae/ text_encoder/ 各自带 config.json,
+  会注册成 4 个幽灵 "llm" 模型, 甚至可能成为默认模型 (server.py:1279-1290).
+  这是现存隐患, 发现机制改造必须先行.
+- pool.get_engine 在调用 _load_engine 之前就跑内存准入循环 (engine_pool.py:
+  359-396): projected = current + entry.estimated_size, 不够就 LRU 驱逐已加载
+  模型. 任何 "在 _load_engine 里拒绝" 的方案都晚了 -- 拒绝必须在准入循环
+  之前 (§3, §4.1).
+- 隐式默认模型 = available_models[0] (server.py:1279-1292), 不分类型; model
+  fallback (server.py:698-711) 会重试默认模型. video 条目必须从两处排除.
+- 全局 MLX executor 是 max_workers=1 的单线程池 (engine_core.py:106-120),
+  全部非 batched 引擎的 GPU 操作串行其上 (Metal command-buffer race, #85).
+  in-process 跑分钟级 diffusion 会头部阻塞 audio/embedding/unload. 第二推力.
+- mlx-gen 无取消 API + audio 模板全链路无超时 (audio_routes.py 零 asyncio.wait_for).
+  第三推力: subprocess kill 是唯一可靠的取消/超时手段.
+- 内存防护活体只有 phys-based 链路: ProcessMemoryEnforcer (1s tick) ->
+  `_get_hard_limit_bytes` (process_memory_enforcer.py:495-517) 是单一咽喉点,
+  pool 准入/软硬水位/prefill gate cap 全部从它派生; estimate/monitor 那套在
+  生产 inert (memory_monitor 永远 None), 不得依赖.
+- 动态 ceiling 在 safe/balanced/aggressive 档 (生产默认 balanced) 是
+  系统级感知的 (own_phys + free + inactive + active*ratio, process_memory_
+  enforcer.py:483-493): worker 真实占用会经由 free 下降自动压低父进程
+  ceiling -- 与显式租约叠加就是双重计数, 必须修正 (§4.4).
+- `get_phys_footprint(pid)` 接受任意 pid (utils/proc_memory.py:94-118), 父进程
+  可测 worker 足迹; 失败返回 0, 必须按错误处理. 同文件 :63 已声明
+  `ri_lifetime_max_phys_footprint` (进程生命期峰值 ledger) 但全仓未读 --
+  P0 测量的正确仪器 (§7).
+- Metal wired limit 是 per-process 可设的 (mx.set_wired_limit; enforcer 对
+  自己进程已这么做, process_memory_enforcer.py:407-446). worker 子进程默认
+  继承机器级 cap (~107.5GB), 不主动设限就没有任何 Metal 级约束 -- 这是
+  双进程 wired-sum panic 的根源, 也是 §4.4 预防性方案的依据.
+- 后台 job 的成熟范式在 admin 侧: HFDownloader (task dict + status enum +
+  asyncio.create_task + 协作取消) 与 OQManager (Semaphore(1) + is_quantizing
+  被推理端点 503 联动). /v1/responses 的 `background` 字段是死的 (零消费者),
+  ResponseStore 只存终态无生命周期, 都不能直接复用.
+- 非 LLM 引擎接入范式: audio 三件套 (BaseNonStreamingEngine + api/audio_routes.py
+  直连 pool). 注意 audio 路由的条件挂载 (server.py:439-448) 发生在模块 import
+  时, 彼时 settings 尚未初始化 (init_server 才注入) -- 它能工作只因为它的门
+  是 "mlx_audio 可 import". settings 驱动的门不能放在那里 (§4.3).
+
+### 2.3 共驻内存风险 (m5max 教训)
+
+128GB 机 Metal wired cap 约 107.5GB, 越线 = 整机 kernel panic (已发生过).
+要害不是稳态而是瞬时尖峰; 1s 轮询与 chunk 边界读数都看不见 sub-poll 瞬时.
+M5 Max 内存带宽 ~0.5TB/s 量级, 一次 mx.eval 可以在远小于 2s 的窗口里物化
+几十 GB -- 任何 "轮询 + 杀进程" 的反应式手段都不构成 panic 兜底, 只能做
+次级清理. 预防性手段只有两类: Metal wired limit (per-process, 越限退化为
+非常驻页或分配失败, 不越机器 cap) 与余量常数 (prefill 侧 12GB margin 的
+方法论, settings.py:404-412). 视频侧两者都要用 (§4.4).
+
+## 3. 总体架构
+
+决策: subprocess worker + 独立 venv + job manager, 视频模型注册进 pool 名册
+但被 typed 拒绝在加载链路之外 (拒绝点在准入循环之前).
+
+```
+fmlx server 进程 (主 venv, 无 mlx-gen 依赖)
+  |- model_discovery: 认出 video 模型 (model_index.json), 列表/删除/设置
+  |- server.get_engine: alias 解析后 entry.model_type=="video" -> 400 + 指引
+  |- pool.get_engine: 入口处 (准入循环之前) video -> ModelTypeNotLoadableError
+  |- /v1/videos 路由 (api/video_routes.py, 无条件挂载, handler 内按设置门控)
+  |- VideoJobManager (omlx/video/manager.py, lifespan 内构造, 注入 enforcer)
+  |    |- queue + Semaphore(1) + job 持久化 (JSON per job)
+  |    |- 内存租约: enforcer.acquire_video_lease(bytes) / set_video_worker_pid
+  |    |    / release_video_lease
+  |    |- spawn: <video_venv>/bin/python -I <omlx>/video/worker.py --spec job.json
+  |    |- 监控: stdout JSONL (进度 + 相位心跳) + 足迹 watchdog + 停滞超时
+  |    |- 取消/超时: SIGTERM -> 5s -> SIGKILL
+  |- ProcessMemoryEnforcer: ceiling -= lease; 动态 ceiling 加回 min(worker, lease)
+  |
+  +-- worker 子进程 (video venv: 锁定依赖集, 不 import omlx)
+       |- 进场即 mx.set_wired_limit(lease 内值) -- Metal 级自缚 (预防性)
+       |- mflux Wan2_2_TI2V(model_config=..., model_path=registry 提供的本地目录)
+       |- generate_video(progress_callback -> stdout JSONL)
+       |- video.save(<artifacts>/<job_id>/output.mp4) -> exit 0
+```
+
+为什么不 in-process (按否决强度排序):
+
+1. 依赖冲突: torch/transformers>=5/hf-hub<2 装进主 venv 风险不可控.
+2. 取消与超时: mlx-gen 无取消 API, in-process 卡死的 denoise 永久占住全局
+   MLX executor 且无 kill 手段; subprocess 杀进程即回收一切 (含 Metal 内存).
+3. executor 头部阻塞: 分钟级任务串行在 max_workers=1 的全局执行器上,
+   audio/embedding 全堵.
+4. 崩溃隔离: 视频管线 NaN/Metal 错误不殃及 LLM 服务.
+5. 内存回收确定性: 进程退出即归零, 无碎片/泄漏累积.
+
+代价与对策:
+
+- worker 内存对父进程 phys_footprint 不可见, 但对动态 ceiling 可见 (经 free
+  下降) -> 租约 + 加回修正, 计一次不计两次 (§4.4).
+- 每个 job 冷加载权重 (42GB 读盘) -> MVP 接受; P2 再考虑常驻 worker + idle TTL.
+- 双进程 wired-sum -> worker 自缚 wired limit (预防) + watchdog (清理), §4.4.
+
+视频模型与 engine pool 的关系: 发现机制注册 entry (model_type=engine_type=
+"video"), 使列表/设置/删除/类型护栏全部生效. 但 video 条目永不可加载:
+
+- pool.get_engine 在 entry 查到后, already-loaded 快路径与准入循环之前,
+  对 video 抛新 typed 异常 `ModelTypeNotLoadableError` (子类 EnginePoolError,
+  消息携带 "use POST /v1/videos"). 这保证零驱逐/零 settle barrier/零 507
+  副作用 -- 若拒绝放在 _load_engine 里, 一次误指 video 模型的 chat 请求
+  就会先按 42GB 跑准入, 驱逐在驻的生产 LLM 再被拒 (评审 blocker, 已核实).
+- server.get_engine 在 alias 解析后, 进 pool 之前, 查 entry.model_type ==
+  "video" -> HTTPException 400 + /v1/videos 指引 (chat/embeddings/rerank 全部
+  流经此函数, 一处护全). 异常映射链在 EnginePoolError->500 之前加
+  ModelTypeNotLoadableError->400 臂. 原 v1 计划的 _suggest_endpoint_for_engine
+  加提示是死代码 (该函数只对成功返回的 engine 实例 isinstance, video 永远
+  没有实例), 撤销.
+- /v1/models/{id}/load 与 admin load 端点各自加 pre-pool 类型检查 -> 400
+  (公共 load 端点的 blanket except Exception->500 会吞 typed 异常, 必须在
+  进 pool 前查).
+- _load_engine 的 dispatch 链里保留防御性 raise (同 typed 异常), 护住其他
+  pool.get_engine 调用方.
+- 默认模型卫生: 隐式默认选择 (server.py:1279-1292) 过滤到 model_type in
+  {"llm","vlm"}, 无候选则 default=None (落到现有干净 400); model_fallback
+  (server.py:698-711) 重试前校验默认模型类型; admin 默认模型设置器
+  (admin/routes.py:2171-2173) 拒绝 video 条目.
+
+权重生命周期完全归 worker 子进程; pool 的 42GB 准入与卸载 settle barrier
+对 video 条目因前置拒绝而永不触发.
+
+## 4. 模块设计
+
+按改动面从发现层到 API 层再到内存与配置依次展开.
+
+### 4.1 模型发现与类型系统
+
+改动点 (全部小而集中):
+
+- `_is_model_dir` (model_discovery.py:697-699): `config.json 存在` 或
+  `model_index.json 存在` 均算模型根. 后者必须先于 org-folder 下钻判定,
+  这同时修掉 §2.2 的幽灵组件隐患 (flat 布局不再下钻 transformer/ 等子目录).
+- `detect_model_type` (model_discovery.py:385-549): 在 "config.json 缺失 ->
+  llm" 早退 (404-406) 之前加 model_index.json 分支: 读 `_class_name`,
+  在允许清单内 (MVP = {"WanPipeline"}) -> "video"; 不在清单 -> 跳过哨兵,
+  `_register_model` 据此跳过并 log warning (不注册不可跑的管线, 也不产
+  幽灵). 契约说明: 所有 config.json 路径保持现有 str 返回契约不变, 哨兵
+  只出现在 "model_index.json 存在且 _class_name 不在清单" 的新分支 --
+  现有测试零破坏.
+- Literal 与映射五处同改 + 一致性测试: model_discovery.py:26-27,
+  engine_pool.py:56-57, `_MODEL_TYPE_TO_ENGINE` (engine_pool.py:203-211),
+  `_register_model` if/elif (model_discovery.py:737-751), admin valid_types +
+  type_to_engine (admin/routes.py:1860, 1870-1878). 三份重复映射已是现存
+  债务, 加断言测试防 silent "batched" 降级.
+- 加载链路拒绝 (位置是要害, 见 §3): pool.get_engine 入口 typed 拒绝 +
+  server.get_engine pre-pool 400 + 两个 load 端点 pre-pool 400 +
+  _load_engine 防御臂. 新异常类入 engine_pool 异常族.
+- 默认模型与 fallback 卫生 (见 §3 末尾): 隐式默认过滤 / fallback 校验 /
+  admin setter 拒绝, 配 "video 模型按字典序排第一" 的发现 fixture 单测.
+  这顺带修掉 embedding/audio 模型当默认的同款现存隐患.
+- `estimate_model_size`: 递归 **/*.safetensors 分支 (679-681) 已覆盖 diffusers
+  布局 (42GB), 不改; 该值对 video 只作展示 -- 准确表述: video 条目因前置
+  拒绝永不进入会消费 estimated_size 的准入循环.
+- /v1/models 卫生: ModelInfo (api/openai_models.py:409-415) 增加 `model_type`
+  字段并在 server.py:1717-1722 填充 (对 OpenAI 客户端是 additive);
+  这同时激活 cli.py:349 现成但 inert 的 llm/vlm 过滤. admin chat picker
+  (dashboard.js:2081) 已天然排除未知类型.
+- admin DELETE / 本地列表的 config.json 门 (admin/routes.py:4538/4547/4495/
+  4511) 放宽为 config.json|model_index.json, 否则 42GB 模型在 UI 不可见
+  不可删. 共享一个 is_model_root() helper, 不再三处发散.
+
+### 4.2 VideoJobManager 与 worker 协议
+
+构造与接线: VideoJobManager 在 lifespan 启动序里构造, 紧跟 enforcer 块之后
+(server.py:367 后), 构造器注入 `enforcer: ProcessMemoryEnforcer | None`
+(镜像 server.py:365-366 给 pool 注入 enforcer 的先例); 实例存
+`_server_state.video_job_manager`, 路由经 `_get_video_job_manager()` 懒访问器
+取用 (audio_routes.py:68-80 范式, 单测可 patch). 不得在 init_server 里构造
+(init_server 先于 lifespan, 彼时 enforcer 不存在); "仿 OQManager" 只指
+job/队列/持久化形态, 不指构造位置.
+
+job 模型:
+
+- id (uuid4, 前缀 "video_"), 对外 status 严格四值 `queued|in_progress|
+  completed|failed` (与 openai SDK Video.status Literal 完全一致; 取消不是
+  wire 状态, 内部记日志/metrics 即可, to_dict() 永不输出 cancelled).
+- progress 0-100, phase 字符串, created_at / started_at / completed_at,
+  `expires_at` (nullable; 产物被保留策略清除时置为清除时刻, 记录本身保留
+  且 status 不变), 请求参数回显, 产物路径, error.
+- error: null 或结构化 `{code, message}` (对齐 openai SDK Video.error 形态).
+  稳定 code 集: `worker_crashed` (非零退出), `worker_stalled` (停滞超时),
+  `job_timeout` (单次运行超时), `memory_lease_exceeded` (watchdog 足迹超租约),
+  `monitor_failed` (连续 3 次足迹读 0), `server_restarted` (启动回放),
+  `output_invalid` (exit 0 但 mp4 健康校验失败). worker 的 failure manifest
+  用同一 {code, message, detail?} schema, manager 透传.
+
+队列与时钟:
+
+- FIFO + asyncio.Semaphore(1); 队列深度上限 (settings, 默认 4), 超限提交
+  直接 503. 一次只有一个 worker 子进程.
+- 内存准入只在 dispatch (spawn 前) 评估 (判据见 §4.4): 不满足 -> job 留在
+  队头, 乘 enforcer 1s tick 节奏每 ~5s 重查, 永不对已接收的 job 503.
+  饱和的 LLM 服务可以让视频 job 长等 -- 这是接受的取舍 (§9), 用户可 DELETE
+  取消排队中的 job.
+- job_timeout_seconds (默认 7200) 的时钟从 worker spawn 起算 (per-run),
+  排队等待不计时. 停滞超时见下.
+
+持久化与产物:
+
+- 每 job 一个 JSON, 原子写 (tmp+replace, 仿 responses_utils.py:447-454),
+  目录 {base_path}/video-jobs/; 产物 {base_path}/video-artifacts/{job_id}/.
+  启动时回放: in_progress/queued 的标记为 failed (code=server_restarted).
+- 保留策略: 数量 + 总字节双上限, LRU 清产物; 清除只删 blob 并置 expires_at,
+  job 记录保留 (list 与 GET 仍可见历史).
+
+worker (omlx/video/worker.py, 只 import mflux + mlx + 标准库, 不 import omlx):
+
+- spawn 形态: `<video_venv>/bin/python -I <omlx>/video/worker.py --spec
+  <job_spec.json>`. `-I` (isolated) 隔离 sys.path/PYTHONPATH/用户 site,
+  防 worker 误 import 主仓 omlx; env 由 manager 白名单构造 (PATH, HOME,
+  TMPDIR + 刻意选择的 HF 变量), 不整体继承.
+- spec 内 model dir 只能取自 registry entry.model_path (discovery 扫描产物,
+  server 自有根目录下); request.model 字符串在任何分支都不得参与路径构造,
+  resolve 失败一律 404.
+- 进场顺序: 先 `mx.set_wired_limit(lease_bytes - wired_margin)` (lease 经
+  spec 传入; mlx 本来就是 mflux 依赖, 不破 "不 import omlx" 规则), 再加载
+  模型. 这是 wired-sum 治理的承重墙 (§4.4).
+- 进度协议: stdout 每行一个 JSON. 两类行: 相位转换心跳
+  `{"phase": "loading"|"text_encoding"|"denoise"|"vae_decode"|"saving"}`
+  (静默长相位 -- 42GB 权重加载/torch 文本编码/VAE decode -- 也有活性信号)
+  与步进行 `{"phase": "denoise", "step": n, "total_steps": m}` (接
+  ProgressCallback).
+- 结束: video.save(输出路径, validate_health=True) + metadata sidecar;
+  异常时写 failure manifest JSON 后 exit 非零.
+
+监控与终止 (manager 侧, 统一在 2s watchdog tick 里):
+
+- 足迹: get_phys_footprint(worker_pid); 连续 3 次读 0 -> 杀,
+  code=monitor_failed; 足迹 > lease -> SIGKILL, code=memory_lease_exceeded.
+  注意 watchdog 定位是次级清理/泄漏检测, 不是 panic 兜底 (§2.3, §4.4).
+- 停滞: 追踪 last_jsonl_line_at, in_progress 且静默超过
+  progress_stall_timeout_seconds (settings, 默认 600) -> SIGTERM -> 5s ->
+  SIGKILL, code=worker_stalled. 相位心跳的存在使该阈值在生产分辨率
+  (单步可 ~70s+) 下既不误杀也不失效.
+- 单次运行超时 job_timeout_seconds 同终止路径, code=job_timeout.
+- DELETE 取消: SIGTERM -> 5s -> SIGKILL, 释放租约, 删记录与产物.
+
+mlx-gen 演进风险的真实缓解次序 (v1 的 "CLI 兜底" 评审降级): 第一道 = 锁定
+依赖集的精确 pin (§4.5 lockfile) -- 依赖冻结后 API/CLI 都不会在运行期破裂,
+破裂只能经显式升级 PR 进来, 是可 review 的代码变更; 第二道 = vendor wan
+子树 (MIT 允许; 真实规模约 130 文件含 models/wan + models/common + utils +
+callbacks, 且 torch/transformers 依赖不因 vendor 消失 -- 诚实代价见 §9);
+CLI 形态切换只是第三道且会破坏 JSONL 进度协议, 不作为设计依赖.
+
+### 4.3 /v1/videos API (OpenAI 形态)
+
+路由文件 api/video_routes.py. 挂载: 无条件 include_router (mcp_router
+先例, server.py:435-437) -- 不能用 audio 的条件挂载范式, 因为那发生在
+import 时而 settings 彼时未初始化 (§2.2). 门控全部在 handler 内:
+settings.video.enabled 为 false 或 manager 未初始化 -> 503 + 指引;
+venv 探测失败 -> 503 + 安装指引 (指引文本用 §4.5 修正后的命令).
+router 级 Depends(verify_api_key).
+
+| 端点 | 行为 |
+|---|---|
+| POST /v1/videos | 见下方提交语义 |
+| GET /v1/videos | MVP 必做 (LRU 清产物后这是唯一枚举手段). 参数 limit (默认 20, 上限 100) / after (游标 = job id) / order (asc|desc, 默认 desc, 按 created_at). 响应 {"object": "list", "data": [...], "has_more": bool, "last_id": ...} -- openai SDK 游标分页所需字段 |
+| GET /v1/videos/{id} | job 对象 (status, progress, phase, error, expires_at, ...) |
+| GET /v1/videos/{id}/content | FileResponse mp4 (media_type=video/mp4, 支持 Range); 未完成 -> 409; completed 但产物已被保留策略清除 -> 404 + code=artifact_expired (响应体指向 expires_at); handler 必须先查文件存在 (FileResponse 对缺失路径会 500) |
+| DELETE /v1/videos/{id} | queued/in_progress: 杀 worker (SIGTERM->5s->SIGKILL) + 释放租约 + 删记录与产物; completed/failed: 删记录与产物. 返回 {"id", "object": "video.deleted", "deleted": true} (openai SDK VideoDeleteResponse 形态); 之后 GET -> 404 |
+
+提交语义 (POST /v1/videos):
+
+- 兼容要害 (评审 blocker): openai SDK 的 client.videos.create 发送的是
+  multipart/form-data (为 input_reference 文件域), 纯 JSON pydantic body
+  会对官方 SDK 一律 422. handler 收原始 Request, 按 Content-Type 分支:
+  multipart -> await request.form(); JSON/缺失 -> await request.json();
+  两路归一进同一个内部 pydantic 模型 (video_models.py 保留). FastAPI 不能
+  按 content type 在同路径派发两个 handler, 必须单 handler 手工解析 --
+  与 audio_routes 的 pydantic-body 范式刻意不同, 此处注明原因.
+- 字段: model, prompt, 可选 size "WxH", seconds (SDK 发的是字符串字面量
+  "4"|"8"|"12"; multipart 下所有字段都是字符串, 数值字段必须走 pydantic
+  lax 强转), 以及 fmlx 扩展 negative_prompt/steps/fps/seed/guidance/
+  guidance_2 (扩展字段碰撞政策: 若未来 OpenAI 占用同名字段, fmlx 语义让位,
+  扩展迁移到 fmlx_ 前缀; MVP 不预先加前缀).
+- seconds 按 fps 折算帧数, 强制 4n+1; size 向上取整到 16 的倍数.
+- 准入即拒 (400/413): 参数越静态上限 (max_frames/max_steps/max_pixels,
+  settings); 或按 §4.4 的逐请求峰值预测器 predicted_peak(W,H,frames) +
+  margin > lease -- 响应体带 预测值 vs lease 数字. 静态上限是 UX 边界,
+  预测器才是内存边界.
+- 接受 -> 立即返回 job 对象 (status=queued).
+- 503 仅三种: 队列满 / venv 缺失 / 内存 guard 关闭 (均为提交时点的持久性
+  条件, 带可操作原因). 内存紧张不 503, 进队等 (§4.2).
+
+错误映射: 模型不存在 404 (带 available 提示); 模型非 video 类型 400.
+每请求计入 metrics (record_request_complete, 0 token, 仿 audio_routes.py:
+426-436).
+
+### 4.4 内存共驻: 三层治理 (wired 自缚 / 租约 / watchdog)
+
+第一层, 预防 (承重墙): Metal wired limit 把两个进程各自钉死.
+
+- worker 进场即 `mx.set_wired_limit(lease_bytes - wired_margin)` (§4.2).
+  越限退化为非常驻页 (变慢) 或分配失败 (job 失败, manager 上报 failed) --
+  永不向机器 cap 方向增长. 与 enforcer 对父进程的现有做法同机制
+  (process_memory_enforcer.py:407-446).
+- acquire_video_lease 时父进程把自身 wired limit 重设为 (static_ceiling -
+  lease), release 时恢复. 若父进程在驻 wired 已超新限 (如 85GB 模型在载),
+  MLX 退化为非常驻页 (decode 变慢) 而非 panic -- 可接受, 且 §4.4 准入判据
+  使该情形罕见.
+
+第二层, 预算 (租约, 改 process_memory_enforcer.py 约 50 行):
+
+- `_video_lease_bytes` 在 `_get_hard_limit_bytes` (495-517) 末尾扣减:
+  `ceiling = max(0, ceiling - lease)`. 单一咽喉点, pool 准入/软硬水位/
+  admission_paused/prefill gate cap 下一个 1s tick 全部自动收紧, 零
+  scheduler 改动.
+- 双重计数修正 (评审 major): 动态 ceiling (483-493, 非 custom 档) 会因
+  worker 占用经 free 下降而再降一次. 修正: 非 custom 分支加回
+  `min(get_phys_footprint(worker_pid), lease)` -- worker 被精确计一次.
+  clamp 到 lease 保证失控 worker (watchdog 杀掉前的窗口) 不会反向抬高
+  父进程 ceiling; 足迹读 0 时退化为今天的双重计数, 即 fail-conservative.
+- API: `acquire_video_lease(bytes)` (spawn 前, 此时无 pid, 加回项为 0,
+  正确 -- 尚未分配), `set_video_worker_pid(pid)` (spawn 后立即),
+  `release_video_lease()` (清两者). 改值即 `_propagate_memory_limit()`
+  (现有 runtime setter 范式, 372-400).
+- guard 关闭 (get_final_ceiling()==0) -> 拒绝视频任务 (提交时 503, §4.3),
+  不在无防护机器上引入 panic 源.
+
+dispatch 准入判据 (评审修正: 不得在 "落租约即触发硬压力" 的窗口里放行):
+
+- enforcer 存在且 guard 启用, 且 `recent_peak_bytes() <= min(
+  soft_ratio * (ceiling - lease), (ceiling - lease) - prefill_transient_
+  margin)` -- 用滚动峰值而非瞬时值, 且要求落租约后系统直接处于 "ok 压力 +
+  在驻负载不触 prefill gate" 的状态. 不满足 -> 留队重查 (§4.2).
+- 在途长 prefill 的残余情形 (判据通过后, 租约落地前才进来的增长型负载):
+  租约落地使 gate cap 收紧, 该 prefill 的下一个 chunk 被 gate 干净拒绝
+  (503 类错误, 无 panic) -- 这是设计内行为, 记入 §9 取舍表. MVP 不做
+  drain (等 prefill 排空再落租约), P2 视实测再议.
+- 与超大模型的互斥算术 (评审 blocker 的修正): 107.5 ceiling - 28 lease =
+  79.5, glm4.5 (85GB 权重) 根本放不进 -- 即设计上 video 与 >=80GB LLM
+  互斥, job 排队直到大模型被 TTL/手动卸载. 真共驻的适用域是 "LLM 权重 +
+  工作集 <= ceiling - lease - 余量", 128GB 机上约 <=50GB 级模型. §1 目标 4
+  与 §7 A/B 协议均按此表述.
+
+第三层, 清理 (watchdog, §4.2): 2s 足迹轮询超租约杀 + 停滞杀 + 超时杀.
+定位是泄漏检测与次级清理 -- sub-2s 的 wired 冲刺由第一层挡, 不靠它.
+
+lease 大小: settings.video.memory_lease_gb, 默认初值 28, 由 P0 实测校准
+(§7: 用 lifetime-max ledger 测真峰 + 最差单步瞬时); 校准值与依赖 lock
+digest 绑定 (§4.5, §9.1).
+
+### 4.5 settings (VideoSettings 新 section)
+
+按四件套范式接 (settings.py:789-817 / 879-912 / 1136-1154 / 1376-1397) +
+admin GET/POST + GlobalSettingsRequest 平铺字段 + _settings.html 表单.
+
+| 字段 | 默认 | 说明 |
+|---|---|---|
+| enabled | false | 总开关; false 时 handler 一律 503 (路由仍挂载, §4.3) |
+| worker_python | "" | video venv 的 python 路径; 空 = {base_path}/venvs/video/bin/python |
+| memory_lease_gb | 36 | P0 已校准 (低 RAM 上限角 27.9 + 6 余量, §6), 与 lock digest 绑定 |
+| max_queued_jobs | 4 | 超限提交 503 |
+| job_timeout_seconds | 7200 | 单次运行超时 (spawn 起算), 排队不计 |
+| progress_stall_timeout_seconds | 600 | JSONL 静默杀 (§4.2) |
+| default_steps / default_fps | 20 / 16 | 未显式给参时的生成默认 (P0 校准) |
+| max_frames / max_steps / max_pixels_per_frame | 121 / 50 / 1280x720 | 请求 UX 上限; 内存边界由峰值预测器把守 (§4.3/§4.4) |
+| artifacts_max_count / artifacts_max_gb | 50 / 50 | 产物保留 (LRU 清 blob, 记录保留) |
+
+venv 管理 (评审 blocker 修正: v1 的裸命令会从仓库 cwd 装进生产主 venv):
+
+- 锁定: 仓库提交 `omlx/video/requirements.in` (一行 `mlx-gen==0.18.14`) 与
+  `omlx/video/requirements.lock` (`uv pip compile --generate-hashes`, 必须在
+  macOS arm64 + 与 worker venv 相同 Python minor 上生成 -- mlx 只有 darwin
+  轮子, 理想在 m5max 上生成).
+- 创建 (文档化命令, 也是 503 指引文本):
+
+```
+uv venv -p 3.12 {base_path}/venvs/video
+uv pip sync --python {base_path}/venvs/video/bin/python omlx/video/requirements.lock
+```
+
+- 警告: 裸 `uv pip install` 按 VIRTUAL_ENV / 最近 .venv 解析目标, 从仓库根
+  执行就是生产 fmlx venv -- 该形态永远不得用于此用途.
+- 启动探测: 跑 `<worker_python> -c "import mflux"`, 且断言 worker_python
+  与主进程 sys.executable 不是同一解释器 (防误配); 失败 -> 提交一律 503
+  带安装指引. admin 一键安装是 P2.
+
+### 4.6 admin 面 (MVP 最小)
+
+- 模型列表自动获得 video 条目 (get_status 透传 model_type, 零改动);
+  类型下拉 (_modal_model_settings.html:272-280) 加 video 选项; 删除可用
+  (§4.1 的门放宽).
+- job 可见性 MVP 靠 GET /v1/videos (已升必做) 与日志; admin 视频页 P2.
+
+### 4.7 与下载链路的关系 (顺带修复, 建议拆独立小 PR)
+
+- HF 下载器对 diffusers repo 零改动可用 (snapshot_download 全树落
+  <model_dir>/<owner>/<repo>, on_complete 触发再发现).
+- 中国网络 Xet 墙: `HF_HUB_DISABLE_XET` 在 huggingface_hub import 时冻结,
+  只能进程级注入 -- 加到 cli.py:115-140 的 serve 启动 env 块, 由
+  settings.huggingface.disable_xet 驱动 (默认 false, 文档建议国内开).
+  本次 42GB 下载即是被 Xet 卡死 6.5 小时, 换 LFS 链路后 8.8MB/s 拉完.
+- ModelScope 下载视频模型 MVP 明确不支持 (flat symlink 触发幽灵组件,
+  §4.1 的发现修复使其不再产幽灵, 但 MS 路径的正式支持等 P2).
+
+## 5. 文件清单
+
+| 路径 | 新/改 | 预估 LOC | 内容 |
+|---|---|---|---|
+| omlx/video/__init__.py | 新 | 10 | 导出 |
+| omlx/video/manager.py | 新 | ~650 | job 模型/队列/持久化/spawn/watchdog/停滞/租约/保留策略 |
+| omlx/video/worker.py | 新 | ~180 | 子进程脚本 (wired 自缚 + JSONL + manifest), 只依赖 mflux/mlx |
+| omlx/video/requirements.in + .lock | 新 | -- | 依赖锁 (§4.5) |
+| omlx/api/video_routes.py | 新 | ~300 | 5 端点 + 双 content-type 解析 + 门控 |
+| omlx/api/video_models.py | 新 | ~110 | pydantic 内部模型/响应/error code 枚举 |
+| omlx/model_discovery.py | 改 | ~60 | _is_model_dir / detect_model_type / 注册臂 / Literal |
+| omlx/engine_pool.py | 改 | ~30 | Literal / 映射 / get_engine 入口拒绝 + 新异常 / _load_engine 防御臂 |
+| omlx/server.py | 改 | ~40 | 路由挂载 / pre-pool 400 / 异常映射臂 / 默认模型与 fallback 卫生 / ModelInfo.model_type / manager 构造接线 |
+| omlx/process_memory_enforcer.py | 改 | ~50 | 租约三 API + 扣减 + 动态 ceiling 加回 + 父进程 wired 重设 |
+| omlx/settings.py | 改 | ~120 | VideoSettings 四件套 |
+| omlx/admin/routes.py | 改 | ~45 | valid_types / 映射 / DELETE 与列表门 / global-settings / 默认 setter 拒绝 video |
+| omlx/cli.py | 改 | ~6 | disable_xet env 注入 |
+| templates/static | 改 | ~20 | 类型下拉 + settings 表单 |
+| tests/ (多文件) | 新 | ~1500 | 见 §7 |
+
+合计新增约 1.25k (业务) + 1.5k (测试), 修改约 370, 分布在 8 个上游同源
+文件的小补丁 (§10).
+
+## 6. 初始默认值与 P0 实测记录
+
+生成参数默认 (服务端兜底, 客户端可覆盖, UX 上限受 settings 钳制, 内存
+边界由预测器把守): size 480x272 (16 倍数), seconds 3 (按 fps=16 折 49 帧,
+4n+1), steps 20, guidance 4.0 / guidance_2 3.0 (mlx-gen A14B 模型默认),
+seed 随机. 实测默认档约 8 分钟出片 (含 ~60s 冷加载).
+
+### 6.1 P0 实测 (m5max, M5 Max 128GB, mlx-gen==0.18.14, A14B q8, 2026-06-11)
+
+| 档位 | 参数 | 真峰值 (lifetime ledger) | 用时 | 最差 0.5s 瞬时 |
+|---|---|---|---|---|
+| default (自然) | 480x272, 49f, 20 步 | 49.32 GB | 537s | 10.98 GB |
+| steps40 (自然) | 480x272, 49f, 40 步 | 49.32 GB | 861s | -- |
+| mid_spatial (自然) | 832x480, 49f, 20 步 | 75.46 GB | 2560s | -- |
+| frames101 (自然) | 480x272, 101f, 20 步 | 49.44 GB | 1278s | -- |
+| default (低 RAM) | 480x272, 49f, 20 步 | 18.83 GB | 491s | 3.15 GB |
+| mid_spatial (低 RAM) | 832x480, 49f, 20 步 | 21.88 GB | 2566s | 5.29 GB |
+
+结论 (全部进入实现):
+1. 峰值与步数无关 (49.32 == 49.32 逐字节), 与帧数无关 (49.32 vs 49.44);
+   只随单帧空间 token (W/16 x H/16) 线性增长. 预测器公式据此定为
+   peak = BASE + COEF x spatial_tokens, 帧数不进公式.
+2. 低 RAM 模式 (worker 默认): 内存降 62% 且不慢 (491s vs 537s), 空间缩放
+   也被压平 (3.06x token 只 +3GB). 校准: BASE=17.5, COEF=0.0029 GB/token,
+   margin=6 (最差瞬时 5.29 padded). 上限角 1280x720 预测 27.9+6=33.9,
+   lease 默认 36 可容纳.
+3. 自然模式是无谓的奢侈 (中档分辨率即 75GB), 仅留作 worker 可选项.
+4. 共驻算术: 107.5 ceiling - 36 lease = 71.5GB 留给 LLM, 128GB 机与
+   <=50GB 级模型真共驻成立; 与 85GB 级 (glm4.5) 互斥, job 排队.
+
+## 7. 测试计划
+
+单测 (CI 无 GPU, 全部不碰真权重):
+
+- discovery: diffusers 布局 fixture (空权重文件) -> 认出 video / 不产幽灵
+  组件 / 未知 pipeline 跳过 + log / flat 与 owner-repo 两种布局 / video
+  模型按字典序第一时不成为默认模型.
+- 类型映射一致性断言 (三份映射 + valid_types 同步).
+- 加载拒绝: pool.get_engine 对 video 条目零驱逐零加载直接 typed 异常;
+  server 侧 chat/embeddings/load 端点 400 + 指引.
+- manager 状态机: 提交/排队/取消/超时/停滞/worker 非零退出/manifest 透传/
+  启动回放 (code=server_restarted), worker 用假脚本 (输出 JSONL + touch
+  mp4) 替身; enforcer 经构造注入假实现 (§4.2 接线即测试缝).
+- 并发与竞态: asyncio.gather 多提交 -> 恰一 running + 队列上限 503, 永不
+  双 worker; watchdog 足迹读 0 路径; worker 退出与 watchdog tick 竞态.
+- 租约: acquire/release 对 ceiling 的影响, 动态 ceiling 加回 clamp,
+  guard 关闭拒绝, 准入判据 (滚动峰值口径).
+- API: 双 content-type 解析 (multipart 字符串字段强转 + JSON), schema/
+  错误码/Range 下载/越限 400/预测器 413/list 分页游标/expires_at 与
+  artifact_expired/DELETE 全状态语义.
+- 保留策略: 数量与字节双上限 LRU, blob 删除后记录可见且 expires_at 置位.
+- 回归保护: /v1/models payload 增量字段不破坏现有断言 (排查既有精确匹配
+  测试), settings 章节枚举类测试同步.
+
+P0 真机测量 (m5max, 无 fmlx 代码, 用户 go-ahead 后执行):
+
+- 仪器: 外部足迹轮询曲线 (相位归因, 接 JSONL 流) + 内核 lifetime-max
+  ledger (`ri_lifetime_max_phys_footprint`, proc_memory.py:63 已声明未读,
+  加一个 get_phys_footprint_lifetime_max 变体在 video.save 返回后, 进程
+  退出前自读) -- worker 每 job 一进程, lifetime max == 含加载/VAE/全部
+  sub-poll 尖峰的真峰值. 轮询曲线只定相位形状, 真峰值用 ledger.
+- 测量矩阵: 默认档 (480x272, 49f) / 中档 (832x480, 81f) / 上限角
+  (1280x720, 121f) / 一个 steps 变体 (验证 steps 不动峰值). 拟合
+  peak ~= W + a*latent_tokens (若非融合 SDPA 注意叠加二次项), 同时记录
+  每档最差单步瞬时 (sub-poll delta) -- 它而非稳态峰决定 lease 内 margin
+  (settings.py:404 方法论).
+- 产出: lease 默认值 + 峰值预测器系数 (回填 §4.3/§4.4/§4.5) + 默认参数
+  档位 + lock digest 绑定记录.
+
+P1 真机 A/B (评审修正: v1 的 "glm4.5 共驻" 算术不可能, 85+28 > 107.5):
+
+- 场景 A, 互斥语义 (glm4.5 85GB): 大模型在载且保持活跃 (pin 或持续流量,
+  防 TTL 中途卸载), POST /v1/videos. 断言: job 停留 queued 且 GET 可见
+  内存原因; 无 worker 进程 (ps 验证); 零 OVER_HARD; 整机存活. 然后卸载
+  glm4.5, 断言 job 在 ~2 个 enforcer tick 内转 in_progress. 测试用短超时,
+  不用 2h 默认.
+- 场景 B, 真共驻 (<=50GB 级模型, 如 gemma4-26b 量化档): 视频 job 运行中
+  发 LLM 长 prefill. 断言: prefill gate 在 (ceiling - lease) 收紧后的 cap
+  下干净拒绝或正常完成 (按预算算术预期), admission pause 行为符合水位,
+  零 OVER_HARD, 零 panic; 视频 job 正常完成且 mp4 健康.
+- 场景 C, 释放与恢复: job 结束 (完成与 DELETE 两路) 后断言租约释放,
+  父进程 wired limit 恢复, LLM 满额服务恢复, 产物可 Range 下载.
+- 回归: 完整 pytest 套件零回归 (基线见 docs/upstream-sync.md).
+
+## 8. 阶段划分
+
+- P0 真机测量 (先行, 零集成代码): §7 P0. 产出校准数据回填本 spec.
+- P1 MVP: §4 全部 + §7 单测 + §7 A/B 三场景. 单分支 feat/video-engine,
+  人审人合.
+- P2 (按需排期): admin 视频页 + SSE 进度, I2V (图上传), TI2V-5B 与 bf16
+  变体, 文生图 (FLUX 系同运行时, /v1/images), 常驻 worker + idle TTL,
+  per-model 生成默认, ModelScope 正式支持, admin 一键装 venv, 视频任务
+  主动驱逐 LLM 的策略, drain 式租约落地.
+
+## 9. 风险与对策
+
+| 风险 | 等级 | 对策 |
+|---|---|---|
+| mlx-gen 高速 0.x 演进 + bus factor 1 (twine 混 runtime deps 的卫生信号) | 高 | 第一道: hash 锁全依赖集 (§4.5), 破裂只能经升级 PR 进来; 第二道: vendor wan 子树 (诚实规模 ~130 文件, torch/transformers 不因 vendor 消失); CLI 切换仅第三道. 升级程序见 §9.1 |
+| 生产分辨率内存未实测 (官方数是小 profile, 上限角约 39x 测点 latent 量) | 高 | P0 测量矩阵 + lifetime-max ledger 定真峰; 逐请求峰值预测器把内存边界从静态 caps 解耦 (§4.3); worker wired 自缚保底 |
+| 双进程 Metal wired-sum 越机器 cap | 中 | 预防层: worker 进场 wired 自缚 + 父进程 acquire 时 wired 重设 (§4.4 第一层); watchdog 仅作次级清理; A/B 场景 B 专项验证 |
+| 租约落地瞬间触发硬压力误伤在途 LLM 请求 | 中 | 准入判据用滚动峰值且要求落地后即处 ok 压力 (§4.4); 残余: 落地后才增长的在途 prefill 被 gate 干净拒绝 (设计内, 无 panic) |
+| 与 >=80GB LLM 互斥导致视频 job 长等 | 低 | 设计内取舍, 排队原因对 GET 可见, 可 DELETE; 主动驱逐策略 P2 |
+| worker 卡死 (不出步进也不退出) | 中 | 相位心跳 + progress_stall_timeout_seconds 停滞杀 + 单次运行超时, 双层杀 (§4.2, 已入 settings 与状态机) |
+| 队列任务跨重启丢失 | 低 | 持久化 + 启动标记 failed (code=server_restarted), 不静默消失; MVP 不做断点续跑 |
+| 产物盘占用 | 低 | 双上限 LRU 清 blob, 记录保留 + expires_at |
+| settings 旧版本降级丢字段 | 低 | 已知 from_dict 行为, 文档注明 |
+
+### 9.1 升级与依赖漂移程序
+
+1. 锁整个 venv 而非顶层包: requirements.lock (hash) 进仓, venv 创建/重建
+   一律 `uv pip sync`. 一切依赖变更 (顶层 bump 或传递漂移) 必须经 PR.
+2. 每次 lock 变更的合并门: 在 m5max 重跑 P0 测量矩阵 (至少默认档 + 上限角),
+   PR body 携带数字 (真峰/最差瞬时); 新峰值 + margin 逼近在配 lease 时,
+   同 PR 重校准 memory_lease_gb 与预测器系数.
+3. lease 默认与预测器系数的有效性与 lock digest 绑定 (spec 与 settings
+   注释双处记 digest); digest 不匹配时启动 log warning.
+4. 输出质量回归: 升级 PR 附固定 seed 的 golden 短片对比 (人工目检即可,
+   MVP 不做自动指标).
+
+## 10. 与上游 soft-fork 的关系
+
+本功能是 fmlx 自有分化, 永不回流. 冲突面控制策略: 业务全部在新文件
+(omlx/video/, api/video_*), 对上游同源文件只做小而可 grep 的补丁
+(§5 清单中 8 个 "改" 文件, 约 370 行, 其中 cli.py/templates 各 <=20 行).
+上游 cherry-pick 撞到这些文件时, 冲突块小且语义独立, 解决成本可控.
+docs/upstream-sync.md 记一条分化标记.
+
+## 11. 待拍板的未决问题
+
+1. lease 默认 / 预测器系数 / 默认参数档位 -- 已由 P0 实测回填 (§6.1), 关闭.
+2. settings.video.enabled 默认 false (需手动开启) vs 默认 true (venv 缺失
+   时 503 指引) -- 倾向 false, 灰度心智.
+3. Xet 修复 (§4.7) 是否拆独立小 PR 先行 (与视频无耦合, 运维价值即时) --
+   倾向拆.
+4. 真共驻适用域的产品表述: 文档要不要给出 "128GB 机建议 <=50GB LLM 与
+   视频并用" 的明确指引 -- 倾向给, 写进 README 视频章节.
diff --git a/omlx/admin/i18n/en.json b/omlx/admin/i18n/en.json
index 512804277..811753370 100644
--- a/omlx/admin/i18n/en.json
+++ b/omlx/admin/i18n/en.json
@@ -767,5 +767,22 @@
   "cluster.router.unknown": "unknown",
   "cluster.healthy.yes": "yes",
   "cluster.healthy.no": "no",
-  "cluster.save.failed": "Save failed"
+  "cluster.save.failed": "Save failed",
+  "settings": {
+    "video": {
+      "title": "Video Generation",
+      "enabled": "Enable video generation",
+      "enabled_desc": "Serve POST /v1/videos via the subprocess worker. Requires the video worker venv.",
+      "memory_lease": "Memory lease (GB)",
+      "memory_lease_desc": "Reserved against the memory ceiling while a job runs; co-resident LLMs throttle accordingly.",
+      "default_steps": "Default denoise steps",
+      "default_fps": "Default FPS",
+      "max_queued_jobs": "Max queued jobs",
+      "job_timeout": "Job timeout (seconds)",
+      "artifacts_max_gb": "Artifact storage cap (GB)",
+      "artifacts_max_gb_desc": "Oldest video files are purged beyond this; job records are kept.",
+      "worker_python": "Worker python path",
+      "worker_python_desc": "Python of the isolated video venv. Empty = default path."
+    }
+  }
 }
diff --git a/omlx/admin/i18n/ru.json b/omlx/admin/i18n/ru.json
index 6d68461be..061dd7d40 100644
--- a/omlx/admin/i18n/ru.json
+++ b/omlx/admin/i18n/ru.json
@@ -742,5 +742,22 @@
   "cluster.router.unknown": "неизвестно",
   "cluster.healthy.yes": "да",
   "cluster.healthy.no": "нет",
-  "cluster.save.failed": "Не удалось сохранить"
+  "cluster.save.failed": "Не удалось сохранить",
+  "settings": {
+    "video": {
+      "title": "Генерация видео",
+      "enabled": "Включить генерацию видео",
+      "enabled_desc": "Обслуживать POST /v1/videos через подпроцесс-воркер. Требуется venv видео-воркера.",
+      "memory_lease": "Резерв памяти (ГБ)",
+      "memory_lease_desc": "Резервируется из лимита памяти на время задачи; LLM соответственно замедляются.",
+      "default_steps": "Шаги диффузии по умолчанию",
+      "default_fps": "FPS по умолчанию",
+      "max_queued_jobs": "Макс. задач в очереди",
+      "job_timeout": "Таймаут задачи (сек)",
+      "artifacts_max_gb": "Лимит хранения (ГБ)",
+      "artifacts_max_gb_desc": "Старые видеофайлы удаляются сверх лимита; записи задач сохраняются.",
+      "worker_python": "Путь к python воркера",
+      "worker_python_desc": "Python изолированного видео-venv. Пусто = путь по умолчанию."
+    }
+  }
 }
diff --git a/omlx/admin/i18n/zh.json b/omlx/admin/i18n/zh.json
index 964cfe02b..2719f4d46 100644
--- a/omlx/admin/i18n/zh.json
+++ b/omlx/admin/i18n/zh.json
@@ -765,5 +765,22 @@
   "cluster.router.unknown": "未知",
   "cluster.healthy.yes": "是",
   "cluster.healthy.no": "否",
-  "cluster.save.failed": "保存失败"
+  "cluster.save.failed": "保存失败",
+  "settings": {
+    "video": {
+      "title": "视频生成",
+      "enabled": "启用视频生成",
+      "enabled_desc": "通过子进程 worker 提供 POST /v1/videos。需要先安装视频 worker venv。",
+      "memory_lease": "内存租约 (GB)",
+      "memory_lease_desc": "任务运行期间从内存上限中预留;共驻的 LLM 会相应限流。",
+      "default_steps": "默认去噪步数",
+      "default_fps": "默认帧率",
+      "max_queued_jobs": "最大排队任务数",
+      "job_timeout": "任务超时 (秒)",
+      "artifacts_max_gb": "产物存储上限 (GB)",
+      "artifacts_max_gb_desc": "超限时清除最旧的视频文件;任务记录保留。",
+      "worker_python": "Worker python 路径",
+      "worker_python_desc": "独立视频 venv 的 python。留空用默认路径。"
+    }
+  }
 }
diff --git a/omlx/admin/routes.py b/omlx/admin/routes.py
index a0c5d8e8d..97145037e 100644
--- a/omlx/admin/routes.py
+++ b/omlx/admin/routes.py
@@ -281,6 +281,21 @@ class GlobalSettingsRequest(BaseModel):
     # Idle timeout settings. null disables the global fallback.
     idle_timeout_seconds: int | None = Field(default=None, ge=60)
 
+    # Video generation settings (docs/video-generation-engine-spec.md 4.5)
+    video_enabled: bool | None = None
+    video_worker_python: str | None = None
+    video_memory_lease_gb: float | None = Field(default=None, gt=0)
+    video_max_queued_jobs: int | None = Field(default=None, ge=1)
+    video_job_timeout_seconds: int | None = Field(default=None, ge=60)
+    video_progress_stall_timeout_seconds: int | None = Field(default=None, ge=30)
+    video_default_steps: int | None = Field(default=None, ge=1)
+    video_default_fps: int | None = Field(default=None, ge=1)
+    video_max_frames: int | None = Field(default=None, ge=5)
+    video_max_steps: int | None = Field(default=None, ge=1)
+    video_max_pixels_per_frame: int | None = Field(default=None, ge=256)
+    video_artifacts_max_count: int | None = Field(default=None, ge=1)
+    video_artifacts_max_gb: float | None = Field(default=None, gt=0)
+
     # Auth settings
     api_key: str | None = None
     skip_api_key_verification: bool | None = None
@@ -1857,7 +1872,7 @@ async def update_model_settings(
                     )
         current_settings.model_alias = alias_value
     if "model_type_override" in sent:
-        valid_types = {"llm", "vlm", "embedding", "reranker", "audio_stt", "audio_tts", "audio_sts"}
+        valid_types = {"llm", "vlm", "embedding", "reranker", "audio_stt", "audio_tts", "audio_sts", "video"}
         # Treat empty string as None (auto-detect)
         override_value = request.model_type_override or None
         if override_value is not None and override_value not in valid_types:
@@ -1875,6 +1890,7 @@ async def update_model_settings(
             "audio_stt": "audio_stt",
             "audio_tts": "audio_tts",
             "audio_sts": "audio_sts",
+            "video": "video",
         }
         if override_value:
             entry.model_type = override_value
@@ -2849,6 +2865,7 @@ async def get_global_settings(is_admin: bool = Depends(require_admin)):
         "idle_timeout": {
             "idle_timeout_seconds": global_settings.idle_timeout.idle_timeout_seconds,
         },
+        "video": global_settings.video.to_dict(),
     }
 
 
@@ -3268,6 +3285,30 @@ async def update_global_settings(
         else:
             logger.info("Idle timeout disabled")
 
+    # Apply video settings (Live for caps/timeouts; enabled flips per-request
+    # gating immediately because handlers read settings.video each call.
+    # worker_python/memory_lease affect the NEXT job dispatch.)
+    _video_fields = {
+        "video_enabled": "enabled",
+        "video_worker_python": "worker_python",
+        "video_memory_lease_gb": "memory_lease_gb",
+        "video_max_queued_jobs": "max_queued_jobs",
+        "video_job_timeout_seconds": "job_timeout_seconds",
+        "video_progress_stall_timeout_seconds": "progress_stall_timeout_seconds",
+        "video_default_steps": "default_steps",
+        "video_default_fps": "default_fps",
+        "video_max_frames": "max_frames",
+        "video_max_steps": "max_steps",
+        "video_max_pixels_per_frame": "max_pixels_per_frame",
+        "video_artifacts_max_count": "artifacts_max_count",
+        "video_artifacts_max_gb": "artifacts_max_gb",
+    }
+    for req_field, attr in _video_fields.items():
+        value = getattr(request, req_field, None)
+        if value is not None:
+            setattr(global_settings.video, attr, value)
+            runtime_applied.append(req_field)
+
     # Apply auth settings (API key change)
     if request.api_key is not None:
         from ..server import _server_state
@@ -4465,7 +4506,7 @@ async def list_hf_models(is_admin: bool = Depends(require_admin)):
 
     model_dirs = global_settings.model.get_model_dirs(global_settings.base_path)
 
-    from ..model_discovery import _resolve_hf_cache_entry
+    from ..model_discovery import _is_model_dir, _resolve_hf_cache_entry
 
     def _add_model(model_path: Path, model_name: str) -> None:
         if model_name in seen_names:
@@ -4492,7 +4533,9 @@ def _add_model(model_path: Path, model_name: str) -> None:
             if not subdir.is_dir() or subdir.name.startswith("."):
                 continue
 
-            if (subdir / "config.json").exists():
+            # _is_model_dir accepts config.json or model_index.json roots
+            # (diffusers-layout video models) and excludes adapters.
+            if _is_model_dir(subdir):
                 # Level 1: direct model folder
                 _add_model(subdir, subdir.name)
             else:
@@ -4500,7 +4543,7 @@ def _add_model(model_path: Path, model_name: str) -> None:
                 hf_resolved = _resolve_hf_cache_entry(subdir)
                 if hf_resolved is not None:
                     snapshot_path, model_name = hf_resolved
-                    if (snapshot_path / "config.json").exists():
+                    if _is_model_dir(snapshot_path):
                         _add_model(snapshot_path, model_name)
                     continue
 
@@ -4508,7 +4551,7 @@ def _add_model(model_path: Path, model_name: str) -> None:
                 for child in sorted(subdir.iterdir()):
                     if not child.is_dir() or child.name.startswith("."):
                         continue
-                    if (child / "config.json").exists():
+                    if _is_model_dir(child):
                         _add_model(child, child.name)
 
     return {"models": models}
@@ -4528,14 +4571,18 @@ async def delete_hf_model(
 
     model_dirs = global_settings.model.get_model_dirs(global_settings.base_path)
 
-    # Search for model across all directories in both flat and org-folder layouts
+    # Search for model across all directories in both flat and org-folder
+    # layouts. _is_model_dir accepts config.json or model_index.json roots
+    # (diffusers-layout video models must be deletable too).
+    from ..model_discovery import _is_model_dir
+
     model_path = None
     parent_model_dir = None
     for model_dir in model_dirs:
         if not model_dir.exists():
             continue
         candidate = model_dir / model_name
-        if candidate.is_dir() and (candidate / "config.json").exists():
+        if candidate.is_dir() and _is_model_dir(candidate):
             model_path = candidate
             parent_model_dir = model_dir
             break
@@ -4544,7 +4591,7 @@ async def delete_hf_model(
             if not subdir.is_dir() or subdir.name.startswith("."):
                 continue
             candidate = subdir / model_name
-            if candidate.is_dir() and (candidate / "config.json").exists():
+            if candidate.is_dir() and _is_model_dir(candidate):
                 model_path = candidate
                 parent_model_dir = model_dir
                 break
diff --git a/omlx/admin/static/js/dashboard.js b/omlx/admin/static/js/dashboard.js
index a493b9d04..d6baf91b4 100644
--- a/omlx/admin/static/js/dashboard.js
+++ b/omlx/admin/static/js/dashboard.js
@@ -44,6 +44,7 @@
                 integrations: { copilot_model: null, codex_model: null, opencode_model: null, openclaw_model: null, pi_model: null, openclaw_tools_profile: 'full' },
                 ui: { language: 'en' },
                 idle_timeout: { idle_timeout_seconds: null },
+                video: { enabled: false, worker_python: '', memory_lease_gb: 36, max_queued_jobs: 4, job_timeout_seconds: 7200, progress_stall_timeout_seconds: 600, default_steps: 20, default_fps: 16, max_frames: 121, max_steps: 50, max_pixels_per_frame: 921600, artifacts_max_count: 50, artifacts_max_gb: 50 },
                 system: { total_memory_bytes: 0, total_memory: '', auto_model_memory: '', ssd_total_bytes: 0, ssd_total: '' },
             },
 
@@ -781,6 +782,7 @@
                             claude_code: { ...this.globalSettings.claude_code, ...data.claude_code },
                             integrations: { ...this.globalSettings.integrations, ...data.integrations },
                             idle_timeout: { ...this.globalSettings.idle_timeout, ...data.idle_timeout },
+                            video: { ...this.globalSettings.video, ...data.video },
                             system: { ...this.globalSettings.system, ...data.system },
                         };
                         this.globalSettings.ui = data.ui || { language: 'en' };
@@ -884,6 +886,14 @@
                             ...(this.globalSettings.auth.api_key ? { api_key: this.globalSettings.auth.api_key } : {}),
                             skip_api_key_verification: this.globalSettings.auth.skip_api_key_verification,
                             idle_timeout_seconds: this.globalSettings.idle_timeout?.idle_timeout_seconds ?? null,
+                            video_enabled: this.globalSettings.video?.enabled ?? null,
+                            video_worker_python: this.globalSettings.video?.worker_python || null,
+                            video_memory_lease_gb: this.globalSettings.video?.memory_lease_gb ?? null,
+                            video_max_queued_jobs: this.globalSettings.video?.max_queued_jobs ?? null,
+                            video_job_timeout_seconds: this.globalSettings.video?.job_timeout_seconds ?? null,
+                            video_default_steps: this.globalSettings.video?.default_steps ?? null,
+                            video_default_fps: this.globalSettings.video?.default_fps ?? null,
+                            video_artifacts_max_gb: this.globalSettings.video?.artifacts_max_gb ?? null,
                         }),
                     });
 
diff --git a/omlx/admin/templates/dashboard/_modal_model_settings.html b/omlx/admin/templates/dashboard/_modal_model_settings.html
index 0121801ac..4fdad8b8c 100644
--- a/omlx/admin/templates/dashboard/_modal_model_settings.html
+++ b/omlx/admin/templates/dashboard/_modal_model_settings.html
@@ -277,6 +277,7 @@ <h3 class="text-xs font-bold uppercase tracking-widest text-neutral-400 mb-5">{{
                                         <option value="audio_stt">Audio STT</option>
                                         <option value="audio_tts">Audio TTS</option>
                                         <option value="audio_sts">Audio STS</option>
+                                        <option value="video">Video</option>
                                     </select>
                                 </div>
                                 <div x-show="reasoningParsers.length > 0">
diff --git a/omlx/admin/templates/dashboard/_settings.html b/omlx/admin/templates/dashboard/_settings.html
index 87ba1d5ae..b192ca715 100644
--- a/omlx/admin/templates/dashboard/_settings.html
+++ b/omlx/admin/templates/dashboard/_settings.html
@@ -575,6 +575,83 @@ <h3 class="text-2xl font-bold tracking-tight text-neutral-900">{{ t('settings.gl
                             </div>
                         </div>
 
+                        <!-- Video Generation Section -->
+                        <div class="bg-white rounded-2xl border border-neutral-200 overflow-hidden">
+                            <div class="flex items-center justify-between px-6 py-4 bg-neutral-100 border-b border-neutral-200">
+                                <div class="flex items-center gap-3">
+                                    <i data-lucide="clapperboard" class="w-4 h-4 text-neutral-500"></i>
+                                    <h3 class="text-sm font-bold uppercase tracking-wider text-neutral-500">{{ t('settings.video.title') }}</h3>
+                                </div>
+                            </div>
+                            <div class="divide-y divide-neutral-100">
+                                <div class="flex flex-col sm:flex-row sm:items-center sm:justify-between gap-2 px-4 sm:px-6 py-4">
+                                    <div>
+                                        <label class="text-sm text-neutral-700">{{ t('settings.video.enabled') }}</label>
+                                        <p class="text-xs text-neutral-400 mt-0.5">{{ t('settings.video.enabled_desc') }}</p>
+                                    </div>
+                                    <button @click="globalSettings.video.enabled = !globalSettings.video.enabled; saveGlobalSettings()"
+                                            :class="globalSettings.video.enabled ? 'bg-neutral-900' : 'bg-neutral-200'"
+                                            class="relative w-11 h-6 rounded-full transition-colors duration-300 focus:outline-none focus:ring-2 focus:ring-offset-2 focus:ring-black">
+                                        <span :class="globalSettings.video.enabled ? 'translate-x-5' : 'translate-x-0'"
+                                              class="block w-5 h-5 bg-white rounded-full shadow-sm transform transition-transform duration-300 absolute top-0.5 left-0.5"></span>
+                                    </button>
+                                </div>
+                                <div class="flex flex-col sm:flex-row sm:items-center sm:justify-between gap-2 px-4 sm:px-6 py-4">
+                                    <div>
+                                        <label class="text-sm text-neutral-700">{{ t('settings.video.memory_lease') }}</label>
+                                        <p class="text-xs text-neutral-400 mt-0.5">{{ t('settings.video.memory_lease_desc') }}</p>
+                                    </div>
+                                    <input type="number" min="1" x-model.number="globalSettings.video.memory_lease_gb"
+                                           class="w-full sm:w-48 px-3 py-2 text-sm text-right border border-neutral-200 rounded-lg focus:ring-2 focus:ring-neutral-900 focus:border-transparent transition-all">
+                                </div>
+                                <div class="flex flex-col sm:flex-row sm:items-center sm:justify-between gap-2 px-4 sm:px-6 py-4">
+                                    <div>
+                                        <label class="text-sm text-neutral-700">{{ t('settings.video.default_steps') }}</label>
+                                    </div>
+                                    <input type="number" min="1" x-model.number="globalSettings.video.default_steps"
+                                           class="w-full sm:w-48 px-3 py-2 text-sm text-right border border-neutral-200 rounded-lg focus:ring-2 focus:ring-neutral-900 focus:border-transparent transition-all">
+                                </div>
+                                <div class="flex flex-col sm:flex-row sm:items-center sm:justify-between gap-2 px-4 sm:px-6 py-4">
+                                    <div>
+                                        <label class="text-sm text-neutral-700">{{ t('settings.video.default_fps') }}</label>
+                                    </div>
+                                    <input type="number" min="1" x-model.number="globalSettings.video.default_fps"
+                                           class="w-full sm:w-48 px-3 py-2 text-sm text-right border border-neutral-200 rounded-lg focus:ring-2 focus:ring-neutral-900 focus:border-transparent transition-all">
+                                </div>
+                                <div class="flex flex-col sm:flex-row sm:items-center sm:justify-between gap-2 px-4 sm:px-6 py-4">
+                                    <div>
+                                        <label class="text-sm text-neutral-700">{{ t('settings.video.max_queued_jobs') }}</label>
+                                    </div>
+                                    <input type="number" min="1" x-model.number="globalSettings.video.max_queued_jobs"
+                                           class="w-full sm:w-48 px-3 py-2 text-sm text-right border border-neutral-200 rounded-lg focus:ring-2 focus:ring-neutral-900 focus:border-transparent transition-all">
+                                </div>
+                                <div class="flex flex-col sm:flex-row sm:items-center sm:justify-between gap-2 px-4 sm:px-6 py-4">
+                                    <div>
+                                        <label class="text-sm text-neutral-700">{{ t('settings.video.job_timeout') }}</label>
+                                    </div>
+                                    <input type="number" min="60" x-model.number="globalSettings.video.job_timeout_seconds"
+                                           class="w-full sm:w-48 px-3 py-2 text-sm text-right border border-neutral-200 rounded-lg focus:ring-2 focus:ring-neutral-900 focus:border-transparent transition-all">
+                                </div>
+                                <div class="flex flex-col sm:flex-row sm:items-center sm:justify-between gap-2 px-4 sm:px-6 py-4">
+                                    <div>
+                                        <label class="text-sm text-neutral-700">{{ t('settings.video.artifacts_max_gb') }}</label>
+                                        <p class="text-xs text-neutral-400 mt-0.5">{{ t('settings.video.artifacts_max_gb_desc') }}</p>
+                                    </div>
+                                    <input type="number" min="1" x-model.number="globalSettings.video.artifacts_max_gb"
+                                           class="w-full sm:w-48 px-3 py-2 text-sm text-right border border-neutral-200 rounded-lg focus:ring-2 focus:ring-neutral-900 focus:border-transparent transition-all">
+                                </div>
+                                <div class="flex flex-col sm:flex-row sm:items-center sm:justify-between gap-2 px-4 sm:px-6 py-4">
+                                    <div>
+                                        <label class="text-sm text-neutral-700">{{ t('settings.video.worker_python') }}</label>
+                                        <p class="text-xs text-neutral-400 mt-0.5">{{ t('settings.video.worker_python_desc') }}</p>
+                                    </div>
+                                    <input type="text" x-model="globalSettings.video.worker_python"
+                                           placeholder="{base_path}/venvs/video/bin/python"
+                                           class="w-full sm:w-64 px-3 py-2 text-sm text-right border border-neutral-200 rounded-lg focus:ring-2 focus:ring-neutral-900 focus:border-transparent transition-all font-mono text-xs">
+                                </div>
+                            </div>
+                        </div>
+
                         <!-- Sampling Defaults Section -->
                         <div class="bg-white rounded-2xl border border-neutral-200 overflow-hidden">
                             <div class="flex items-center justify-between px-6 py-4 bg-neutral-100 border-b border-neutral-200">
diff --git a/omlx/api/openai_models.py b/omlx/api/openai_models.py
index 900d3a50f..3e32003a1 100644
--- a/omlx/api/openai_models.py
+++ b/omlx/api/openai_models.py
@@ -413,6 +413,10 @@ class ModelInfo(BaseModel):
     object: str = "model"
     created: int = Field(default_factory=get_unix_timestamp)
     owned_by: str = "omlx"
+    # fmlx extension (additive; OpenAI clients ignore unknown fields).
+    # Lets clients filter non-chat models (video/embedding/audio) out of
+    # chat pickers; the CLI's llm/vlm filter consumes it.
+    model_type: str = "llm"
 
 
 class ModelsResponse(BaseModel):
diff --git a/omlx/api/video_models.py b/omlx/api/video_models.py
new file mode 100644
index 000000000..e64cd3443
--- /dev/null
+++ b/omlx/api/video_models.py
@@ -0,0 +1,40 @@
+# SPDX-License-Identifier: Apache-2.0
+"""Request/response models for the /v1/videos API.
+
+POST /v1/videos accepts BOTH application/json and multipart/form-data --
+the official openai SDK sends multipart (all fields as strings), so the
+route normalizes either body into VideoCreateParams here. Pydantic v2 lax
+coercion handles the string-to-number conversion ("4" -> 4).
+Design: docs/video-generation-engine-spec.md section 4.3.
+"""
+
+from __future__ import annotations
+
+from typing import Optional
+
+from pydantic import BaseModel, Field
+
+
+class VideoCreateParams(BaseModel):
+    """Normalized create-video parameters (JSON or multipart source).
+
+    OpenAI-compatible core: model, prompt, size ("WxH"), seconds (the SDK
+    sends string literals like "4"). fmlx extensions: negative_prompt,
+    frames/steps/fps/seed/guidance/guidance_2. Extension collision policy:
+    if OpenAI later claims an extension name, fmlx semantics yield and the
+    extension moves to an fmlx_ prefix (spec 4.3).
+    """
+
+    model: str
+    prompt: str = Field(min_length=1)
+    size: Optional[str] = None  # "WxH", e.g. "480x272"
+    seconds: Optional[float] = None
+    negative_prompt: Optional[str] = None
+    width: Optional[int] = None  # Explicit override beats size
+    height: Optional[int] = None
+    frames: Optional[int] = None  # Explicit override beats seconds*fps
+    steps: Optional[int] = None
+    fps: Optional[int] = None
+    seed: Optional[int] = None
+    guidance: Optional[float] = None
+    guidance_2: Optional[float] = None
diff --git a/omlx/api/video_routes.py b/omlx/api/video_routes.py
new file mode 100644
index 000000000..a9a56a77f
--- /dev/null
+++ b/omlx/api/video_routes.py
@@ -0,0 +1,331 @@
+# SPDX-License-Identifier: Apache-2.0
+"""/v1/videos -- OpenAI-style async video generation job API.
+
+Endpoints (design: docs/video-generation-engine-spec.md section 4.3):
+- POST   /v1/videos               submit, returns job object immediately
+- GET    /v1/videos               cursor-paginated list
+- GET    /v1/videos/{id}          poll job object
+- GET    /v1/videos/{id}/content  download the mp4 (Range supported)
+- DELETE /v1/videos/{id}          cancel/delete job + artifacts
+
+The router is mounted UNCONDITIONALLY at import time (settings are not
+initialized yet at that point); all gating happens per-request:
+settings.video.enabled off -> 503, manager missing -> 503, worker venv
+unusable -> 503 with install guidance.
+"""
+
+from __future__ import annotations
+
+import logging
+import math
+import random
+import uuid
+from pathlib import Path
+from typing import Any
+
+from fastapi import APIRouter, HTTPException, Request
+from fastapi.responses import FileResponse
+from pydantic import ValidationError
+
+from .video_models import VideoCreateParams
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter()
+
+GB = 1024**3
+
+# Per-request peak predictor, calibrated from the P0 low-RAM measurement
+# matrix on m5max (2026-06-11, mlx-gen==0.18.14 lock; recalibrate on every
+# lock bump, spec 9.1). Empirical findings: peak scales with PER-FRAME
+# spatial latent tokens (W/16 * H/16) and is invariant to frame count and
+# step count (measured: 480x272 49f==101f within 0.2GB; 20 vs 40 steps
+# byte-identical). Low-RAM mode (the worker default): 510 tok -> 18.83GB,
+# 1560 tok -> 21.88GB => BASE 17.5, COEF 0.0029 GB/token. Margin covers
+# the worst observed sub-poll transient (5.29GB per 0.5s) padded.
+_PEAK_BASE_GB = 17.5
+_PEAK_COEF_GB_PER_SPATIAL_TOKEN = 0.0029
+_PEAK_MARGIN_GB = 6.0
+
+
+def _get_video_manager():
+    """Active VideoJobManager from server state (test-patchable)."""
+    from omlx.server import _server_state
+
+    settings = getattr(_server_state, "global_settings", None)
+    video_settings = getattr(settings, "video", None) if settings else None
+    if video_settings is None or not video_settings.enabled:
+        raise HTTPException(
+            status_code=503,
+            detail=(
+                "Video generation is disabled. Enable settings.video.enabled "
+                "and configure the worker venv "
+                "(docs/video-generation-engine-spec.md)."
+            ),
+        )
+    manager = getattr(_server_state, "video_job_manager", None)
+    if manager is None:
+        raise HTTPException(
+            status_code=503, detail="Video job manager not initialized"
+        )
+    return manager
+
+
+def _get_engine_pool():
+    from omlx.server import _server_state
+
+    pool = _server_state.engine_pool
+    if pool is None:
+        raise HTTPException(status_code=503, detail="Server not initialized")
+    return pool
+
+
+def _resolve_model(model_id: str) -> str:
+    from omlx.server import resolve_model_id
+
+    return resolve_model_id(model_id) or model_id
+
+
+def _record_video_request(model_id: str) -> None:
+    """Record request count without treating anything as tokens."""
+    try:
+        from omlx.server import get_server_metrics
+
+        get_server_metrics().record_request_complete(
+            prompt_tokens=0,
+            completion_tokens=0,
+            cached_tokens=0,
+            model_id=model_id,
+        )
+    except Exception as exc:  # noqa: BLE001
+        logger.warning("Failed to record video metrics for %s: %s", model_id, exc)
+
+
+def _round_up(value: int, multiple: int) -> int:
+    return ((value + multiple - 1) // multiple) * multiple
+
+
+def _normalize_params(
+    params: VideoCreateParams, video_settings: Any
+) -> dict[str, Any]:
+    """Apply defaults, dimension rules (W/H multiples of 16, frames 4n+1)
+    and UX caps. Raises HTTPException 400 on violations."""
+    width = params.width
+    height = params.height
+    if (width is None or height is None) and params.size:
+        try:
+            w_str, h_str = params.size.lower().split("x", 1)
+            width = width or int(w_str)
+            height = height or int(h_str)
+        except ValueError:
+            raise HTTPException(
+                status_code=400,
+                detail=f"Invalid size '{params.size}', expected 'WxH'",
+            )
+    width = width or 480
+    height = height or 272
+    if width <= 0 or height <= 0:
+        raise HTTPException(status_code=400, detail="size must be positive")
+    width = _round_up(width, 16)
+    height = _round_up(height, 16)
+
+    fps = params.fps or int(video_settings.default_fps)
+    steps = params.steps or int(video_settings.default_steps)
+
+    frames = params.frames
+    if frames is None:
+        seconds = params.seconds if params.seconds is not None else 3.0
+        if seconds <= 0:
+            raise HTTPException(status_code=400, detail="seconds must be positive")
+        frames = int(round(seconds * fps))
+    # Wan requires 4n+1 frames
+    frames = max(5, 4 * math.ceil((frames - 1) / 4) + 1)
+
+    if frames > int(video_settings.max_frames):
+        raise HTTPException(
+            status_code=400,
+            detail=f"frames {frames} exceeds max_frames "
+                   f"{video_settings.max_frames}",
+        )
+    if steps > int(video_settings.max_steps):
+        raise HTTPException(
+            status_code=400,
+            detail=f"steps {steps} exceeds max_steps {video_settings.max_steps}",
+        )
+    if width * height > int(video_settings.max_pixels_per_frame):
+        raise HTTPException(
+            status_code=400,
+            detail=f"{width}x{height} exceeds max_pixels_per_frame "
+                   f"{video_settings.max_pixels_per_frame}",
+        )
+
+    # Memory bound: predicted peak must fit the lease (spec 4.3/4.4). The
+    # static caps above are UX bounds only. Peak is frame-count-invariant
+    # (P0 measured), so only per-frame spatial tokens enter the formula.
+    spatial_tokens = (width / 16) * (height / 16)
+    predicted_gb = _PEAK_BASE_GB + _PEAK_COEF_GB_PER_SPATIAL_TOKEN * spatial_tokens
+    lease_gb = float(video_settings.memory_lease_gb)
+    if predicted_gb + _PEAK_MARGIN_GB > lease_gb:
+        raise HTTPException(
+            status_code=413,
+            detail=(
+                f"Predicted memory peak {predicted_gb:.1f}GB (+{_PEAK_MARGIN_GB}GB "
+                f"margin) exceeds video.memory_lease_gb {lease_gb:.0f}GB. "
+                "Reduce resolution/frames or raise the lease."
+            ),
+        )
+
+    seed = params.seed if params.seed is not None else random.randint(0, 2**31 - 1)
+    normalized: dict[str, Any] = {
+        "prompt": params.prompt,
+        "width": width,
+        "height": height,
+        "frames": frames,
+        "steps": steps,
+        "fps": fps,
+        "seed": int(seed),
+        "seconds": round(frames / fps, 2),
+    }
+    if params.negative_prompt:
+        normalized["negative_prompt"] = params.negative_prompt
+    if params.guidance is not None:
+        normalized["guidance"] = float(params.guidance)
+    if params.guidance_2 is not None:
+        normalized["guidance_2"] = float(params.guidance_2)
+    return normalized
+
+
+async def _parse_create_body(request: Request) -> VideoCreateParams:
+    """Accept JSON or multipart (openai SDK sends multipart, all-string
+    fields; pydantic lax coercion converts them)."""
+    content_type = (request.headers.get("content-type") or "").lower()
+    try:
+        if "multipart/form-data" in content_type:
+            form = await request.form()
+            data = {k: v for k, v in form.items() if isinstance(v, str)}
+        else:
+            data = await request.json()
+    except Exception:
+        raise HTTPException(status_code=400, detail="Malformed request body")
+    try:
+        return VideoCreateParams.model_validate(data)
+    except ValidationError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+
+
+@router.post("/v1/videos")
+async def create_video(request: Request):
+    manager = _get_video_manager()
+    params = await _parse_create_body(request)
+
+    pool = _get_engine_pool()
+    resolved = _resolve_model(params.model)
+    entry = pool.get_entry(resolved) if hasattr(pool, "get_entry") else None
+    if entry is None:
+        raise HTTPException(
+            status_code=404,
+            detail=f"Model '{params.model}' not found",
+        )
+    if getattr(entry, "model_type", "") != "video":
+        raise HTTPException(
+            status_code=400,
+            detail=(
+                f"Model '{params.model}' is not a video generation model "
+                f"(model_type={getattr(entry, 'model_type', '?')})"
+            ),
+        )
+
+    ok, reason = manager.guard_available()
+    if not ok:
+        raise HTTPException(status_code=503, detail=reason)
+    venv_ok, venv_reason = await manager.probe_worker_venv()
+    if not venv_ok:
+        raise HTTPException(status_code=503, detail=venv_reason)
+
+    from omlx.server import _server_state
+
+    video_settings = _server_state.global_settings.video
+    normalized = _normalize_params(params, video_settings)
+
+    from omlx.video.manager import QueueFullError, VideoJob
+
+    job = VideoJob(
+        id=f"video_{uuid.uuid4().hex}",
+        model_id=resolved,
+        model_dir=str(entry.model_path),
+        params=normalized,
+    )
+    try:
+        await manager.submit(job)
+    except QueueFullError as e:
+        raise HTTPException(status_code=503, detail=str(e))
+    _record_video_request(resolved)
+    return job.to_dict()
+
+
+@router.get("/v1/videos")
+async def list_videos(
+    limit: int = 20, after: str | None = None, order: str = "desc"
+):
+    manager = _get_video_manager()
+    limit = max(1, min(int(limit), 100))
+    if order not in ("asc", "desc"):
+        raise HTTPException(status_code=400, detail="order must be asc|desc")
+    page, has_more = manager.list_jobs(limit=limit, after=after, order=order)
+    data = [j.to_dict() for j in page]
+    return {
+        "object": "list",
+        "data": data,
+        "has_more": has_more,
+        "first_id": data[0]["id"] if data else None,
+        "last_id": data[-1]["id"] if data else None,
+    }
+
+
+@router.get("/v1/videos/{video_id}")
+async def get_video(video_id: str):
+    manager = _get_video_manager()
+    job = manager.get(video_id)
+    if job is None:
+        raise HTTPException(status_code=404, detail=f"Video '{video_id}' not found")
+    return job.to_dict()
+
+
+@router.get("/v1/videos/{video_id}/content")
+async def get_video_content(video_id: str):
+    manager = _get_video_manager()
+    job = manager.get(video_id)
+    if job is None:
+        raise HTTPException(status_code=404, detail=f"Video '{video_id}' not found")
+    if job.status != "completed":
+        raise HTTPException(
+            status_code=409,
+            detail=f"Video '{video_id}' is {job.status}, content not available",
+        )
+    if not job.artifact_path or not Path(job.artifact_path).exists():
+        # Artifact purged by retention (spec 4.3): record outlives the blob.
+        raise HTTPException(
+            status_code=404,
+            detail={
+                "code": "artifact_expired",
+                "message": (
+                    f"The artifact for '{video_id}' was purged by the "
+                    "retention policy"
+                ),
+                "expires_at": int(job.expires_at) if job.expires_at else None,
+            },
+        )
+    return FileResponse(
+        job.artifact_path,
+        media_type="video/mp4",
+        filename=f"{video_id}.mp4",
+    )
+
+
+@router.delete("/v1/videos/{video_id}")
+async def delete_video(video_id: str):
+    manager = _get_video_manager()
+    deleted = await manager.delete(video_id)
+    if not deleted:
+        raise HTTPException(status_code=404, detail=f"Video '{video_id}' not found")
+    return {"id": video_id, "object": "video.deleted", "deleted": True}
diff --git a/omlx/cli.py b/omlx/cli.py
index 75669ff6b..e4d5aea5d 100644
--- a/omlx/cli.py
+++ b/omlx/cli.py
@@ -116,6 +116,14 @@ def serve_command(args):
     if settings.huggingface.endpoint:
         os.environ["HF_ENDPOINT"] = settings.huggingface.endpoint
 
+    # Disable the Xet transfer backend if configured. huggingface_hub reads
+    # HF_HUB_DISABLE_XET into constants at import time, so it must be set
+    # here -- before any huggingface_hub import -- and cannot be toggled
+    # per-download. Xet (cas-bridge.xethub.hf.co) is unreachable from some
+    # networks (observed: mainland China); the plain LFS path works.
+    if settings.huggingface.disable_xet:
+        os.environ["HF_HUB_DISABLE_XET"] = "1"
+
     # Apply ModelScope endpoint if configured. The modelscope SDK builds its URL
     # as https://<MODELSCOPE_DOMAIN>, so this must be a BARE host -- a full URL
     # like "https://modelscope.cn" becomes "https://https://modelscope.cn" and
diff --git a/omlx/engine_pool.py b/omlx/engine_pool.py
index 02719a633..ef89bde74 100644
--- a/omlx/engine_pool.py
+++ b/omlx/engine_pool.py
@@ -38,6 +38,7 @@
     ModelLoadingError,
     ModelNotFoundError,
     ModelTooLargeError,
+    ModelTypeNotLoadableError,
 )
 from .model_discovery import DiscoveredModel, discover_models, format_size
 from .engine_core import get_mlx_executor
@@ -53,8 +54,8 @@ class EngineEntry:
 
     model_id: str  # Directory name (e.g., "llama-3b")
     model_path: str  # Full path to model directory
-    model_type: Literal["llm", "vlm", "embedding", "reranker", "audio_stt", "audio_tts", "audio_sts"]  # Model type
-    engine_type: Literal["batched", "simple", "embedding", "reranker", "vlm", "audio_stt", "audio_tts", "audio_sts"]  # Engine type to use
+    model_type: Literal["llm", "vlm", "embedding", "reranker", "audio_stt", "audio_tts", "audio_sts", "video"]  # Model type
+    engine_type: Literal["batched", "simple", "embedding", "reranker", "vlm", "audio_stt", "audio_tts", "audio_sts", "video"]  # Engine type to use
     estimated_size: int  # Pre-calculated from safetensors (bytes)
     config_model_type: str = ""  # Raw model_type from config.json (e.g., "deepseekocr_2")
     thinking_default: bool | None = None  # True if model thinks by default, False if not, None if unknown
@@ -208,6 +209,7 @@ def discover_models(
         "audio_stt": "audio_stt",
         "audio_tts": "audio_tts",
         "audio_sts": "audio_sts",
+        "video": "video",
     }
 
     def apply_settings_overrides(
@@ -336,6 +338,14 @@ async def get_engine(
             if not entry:
                 raise ModelNotFoundError(model_id, list(self._entries.keys()))
 
+            # Video models are job-managed (POST /v1/videos) and never
+            # pool-loaded. Reject BEFORE the admission loop below -- letting
+            # a 42GB video entry into admission would evict resident LLM
+            # engines before failing (docs/video-generation-engine-spec.md
+            # section 3).
+            if entry.model_type == "video":
+                raise ModelTypeNotLoadableError(model_id, entry.model_type)
+
             # Already loaded - just update access time
             if entry.engine is not None:
                 # If force_lm requested but current engine is VLM, unload and reload
@@ -661,6 +671,11 @@ async def _load_engine(self, model_id: str, force_lm: bool = False) -> None:
                         model_name=entry.model_path,
                         config_model_type=entry.config_model_type,
                     )
+                elif entry.engine_type == "video":
+                    # Defense in depth: get_engine rejects video entries
+                    # before admission; this arm catches any other caller
+                    # so a diffusers dir never falls into BatchedEngine.
+                    raise ModelTypeNotLoadableError(model_id, entry.model_type)
                 else:
                     engine = BatchedEngine(
                         model_name=entry.model_path,
diff --git a/omlx/exceptions.py b/omlx/exceptions.py
index 71aeea083..160e8e1c8 100644
--- a/omlx/exceptions.py
+++ b/omlx/exceptions.py
@@ -426,6 +426,25 @@ def __init__(self, model_id: str):
         super().__init__(f"Model '{model_id}' is already being loaded")
 
 
+class ModelTypeNotLoadableError(EnginePoolError):
+    """Raised when a model type is not pool-loadable (e.g. video models).
+
+    Video generation models are job-managed by the VideoJobManager and are
+    never loaded into the engine pool. Raised by EnginePool.get_engine
+    BEFORE the memory-admission loop so a misrouted request cannot evict
+    resident LLM engines (docs/video-generation-engine-spec.md section 3).
+    The server layer maps this to HTTP 400 with an endpoint hint.
+    """
+
+    def __init__(self, model_id: str, model_type: str):
+        self.model_id = model_id
+        self.model_type = model_type
+        super().__init__(
+            f"Model '{model_id}' is a {model_type} generation model and "
+            "cannot be loaded as an inference engine. Use POST /v1/videos."
+        )
+
+
 # =============================================================================
 # MCP Errors
 # =============================================================================
diff --git a/omlx/model_discovery.py b/omlx/model_discovery.py
index 731ecf2f5..6637cbb3d 100644
--- a/omlx/model_discovery.py
+++ b/omlx/model_discovery.py
@@ -23,8 +23,15 @@
 
 logger = logging.getLogger(__name__)
 
-ModelType = Literal["llm", "vlm", "embedding", "reranker", "audio_stt", "audio_tts", "audio_sts"]
-EngineType = Literal["batched", "vlm", "embedding", "reranker", "audio_stt", "audio_tts", "audio_sts"]
+ModelType = Literal["llm", "vlm", "embedding", "reranker", "audio_stt", "audio_tts", "audio_sts", "video"]
+EngineType = Literal["batched", "vlm", "embedding", "reranker", "audio_stt", "audio_tts", "audio_sts", "video"]
+
+# Diffusers pipeline classes (model_index.json "_class_name") that fmlx can
+# serve via the video engine (docs/video-generation-engine-spec.md). Unknown
+# pipeline classes are skipped at discovery -- registering them would produce
+# unloadable entries, and historically the org-folder descent turned their
+# component subdirs into phantom "llm" models.
+VIDEO_PIPELINE_CLASSES = {"WanPipeline"}
 
 # Known VLM (Vision-Language Model) types from mlx-vlm
 VLM_MODEL_TYPES = {
@@ -401,6 +408,14 @@ def detect_model_type(model_path: Path) -> ModelType:
     Returns:
         Model type: "llm", "vlm", "embedding", "reranker", "audio_stt", "audio_tts", or "audio_sts"
     """
+    # Diffusers-layout video models: model_index.json at the root, no root
+    # config.json. Must run before the missing-config.json fallback below.
+    # Unknown pipeline classes never reach this point for registration --
+    # _register_model skips them outright.
+    pipeline_class = read_model_index_pipeline_class(model_path)
+    if pipeline_class in VIDEO_PIPELINE_CLASSES:
+        return "video"
+
     config_path = model_path / "config.json"
     if not config_path.exists():
         return "llm"
@@ -694,9 +709,35 @@ def _is_adapter_dir(path: Path) -> bool:
     return (path / "adapter_config.json").exists()
 
 
+def read_model_index_pipeline_class(path: Path) -> str | None:
+    """Return the "_class_name" from a diffusers model_index.json, else None.
+
+    Diffusers-layout models (e.g. Wan2.2 T2V) have model_index.json at the
+    root and no root config.json.
+    """
+    index_path = path / "model_index.json"
+    if not index_path.exists():
+        return None
+    try:
+        with open(index_path) as f:
+            value = json.load(f).get("_class_name")
+        return value if isinstance(value, str) else None
+    except (json.JSONDecodeError, OSError):
+        return None
+
+
 def _is_model_dir(path: Path) -> bool:
-    """Check if a directory contains a valid model (has config.json)."""
-    return (path / "config.json").exists() and not _is_adapter_dir(path)
+    """Check if a directory contains a valid model.
+
+    A model root has either config.json (transformers layout) or
+    model_index.json (diffusers layout). The model_index.json check must
+    live here -- it is what stops the org-folder descent in
+    discover_models() from registering diffusers component subdirs
+    (transformer/, vae/, ...) as phantom standalone models.
+    """
+    if _is_adapter_dir(path):
+        return False
+    return (path / "config.json").exists() or (path / "model_index.json").exists()
 
 
 def _resolve_hf_cache_entry(path: Path) -> tuple[Path, str] | None:
@@ -734,6 +775,24 @@ def _register_model(
             logger.info(f"Skipping unsupported model: {model_id}")
             return
 
+        # Diffusers-layout dirs whose pipeline class fmlx cannot serve are
+        # skipped outright -- registering them would produce unloadable
+        # entries (docs/video-generation-engine-spec.md section 4.1). This
+        # includes model_index.json files with a missing/unreadable
+        # _class_name (pipeline_class None): without a root config.json
+        # such a dir would otherwise register as an unloadable llm entry.
+        pipeline_class = read_model_index_pipeline_class(model_dir)
+        if (
+            (model_dir / "model_index.json").exists()
+            and not (model_dir / "config.json").exists()
+            and pipeline_class not in VIDEO_PIPELINE_CLASSES
+        ):
+            logger.warning(
+                f"Skipping unsupported diffusers pipeline "
+                f"'{pipeline_class}': {model_id}"
+            )
+            return
+
         model_type = detect_model_type(model_dir)
         if model_type == "embedding":
             engine_type: EngineType = "embedding"
@@ -747,18 +806,25 @@ def _register_model(
             engine_type = "audio_tts"
         elif model_type == "audio_sts":
             engine_type = "audio_sts"
+        elif model_type == "video":
+            engine_type = "video"
         else:
             engine_type = "batched"
         estimated_size = estimate_model_size(model_dir)
 
-        # Read raw config model_type for sub-type detection (e.g., OCR models)
+        # Read raw config model_type for sub-type detection (e.g., OCR models).
+        # Video models have no root config.json; surface the diffusers
+        # pipeline class instead so the admin UI shows something meaningful.
         config_model_type = ""
-        try:
-            import json
-            with open(model_dir / "config.json") as f:
-                config_model_type = json.load(f).get("model_type", "")
-        except Exception:
-            pass
+        if model_type == "video":
+            config_model_type = pipeline_class or ""
+        else:
+            try:
+                import json
+                with open(model_dir / "config.json") as f:
+                    config_model_type = json.load(f).get("model_type", "")
+            except Exception:
+                pass
 
         thinking_default = detect_thinking_default(model_dir)
         preserve_thinking_default = detect_preserve_thinking(model_dir)
diff --git a/omlx/process_memory_enforcer.py b/omlx/process_memory_enforcer.py
index 7d11fee31..6b9cdc781 100644
--- a/omlx/process_memory_enforcer.py
+++ b/omlx/process_memory_enforcer.py
@@ -33,6 +33,7 @@
 import logging
 import subprocess
 import sys
+from collections import deque
 from typing import TYPE_CHECKING, Any
 
 import mlx.core as mx
@@ -291,6 +292,7 @@ def __init__(
         hard_threshold: float = 0.95,
         prefill_safe_zone_ratio: float = 0.80,
         prefill_min_chunk_tokens: int = 32,
+        prefill_transient_margin_gb: float = 0.0,
     ):
         """
         Initialize the process memory enforcer.
@@ -317,6 +319,11 @@ def __init__(
             prefill_safe_zone_ratio: Fraction of hard cap below which prefill
                 runs at full chunk size; above triggers adaptive shrink.
             prefill_min_chunk_tokens: Floor for adaptive shrink.
+            prefill_transient_margin_gb: Conservative margin added to the
+                modelled per-chunk prefill peak by the scheduler's
+                forward-front gate, covering the MoE expert-dequant activation
+                spike that estimate_prefill_peak_bytes does not model.
+                Propagated to each scheduler. 0 = no extra margin.
         """
         self._engine_pool = engine_pool
         self._memory_guard_tier = self._normalize_tier(memory_guard_tier)
@@ -331,6 +338,9 @@ def __init__(
         self._hard_threshold = hard_threshold
         self._prefill_safe_zone_ratio = prefill_safe_zone_ratio
         self._prefill_min_chunk_tokens = prefill_min_chunk_tokens
+        self._prefill_transient_margin_bytes = max(
+            0, int(prefill_transient_margin_gb * 1024**3)
+        )
         self._task: asyncio.Task | None = None
         self._running = False
         # Most recently observed pressure level, consumed by scheduler /
@@ -340,6 +350,21 @@ def __init__(
         # or the call failed). Used by the admin dashboard to surface a
         # warning when the kernel iogpu.wired_limit_mb is below this.
         self._metal_wired_limit_request: int = 0
+        # Rolling window of recent usage readings + their high-water mark.
+        # Prefill memory dips into a trough between chunks, so the instant
+        # reading can read low mid-prefill; preflight admission consults this
+        # peak instead so it does not wave through a request that will wall
+        # the next chunk. Updated on every poll iteration.
+        self._usage_window: deque[int] = deque(maxlen=5)
+        self._recent_peak_bytes: int = 0
+        # Video job memory lease (docs/video-generation-engine-spec.md 4.4).
+        # While held, the lease is subtracted from the final ceiling so pool
+        # admission, watermarks and the prefill gate all tighten coherently.
+        # The worker pid lets the dynamic ceiling count the subprocess
+        # exactly once (its real usage drains system free pages, which
+        # would otherwise stack on top of the explicit lease).
+        self._video_lease_bytes: int = 0
+        self._video_worker_pid: int | None = None
 
     @staticmethod
     def _normalize_tier(tier: str) -> str:
@@ -463,17 +488,31 @@ def _get_dynamic_ceiling(self) -> int:
         if self._memory_guard_tier == "custom":
             return max(0, self._memory_guard_custom_ceiling_bytes)
 
+        # Video worker correction: the worker's real usage drains system
+        # free pages, shrinking this ceiling -- but the lease is ALREADY
+        # subtracted in _get_hard_limit_bytes. Add the worker's footprint
+        # back (clamped to the lease) so it is counted exactly once. A
+        # footprint read of 0 (failure) degrades to double-counting, which
+        # is fail-conservative.
+        worker_extra = 0
+        if self._video_worker_pid is not None and self._video_lease_bytes > 0:
+            worker = get_phys_footprint(self._video_worker_pid)
+            if worker > 0:
+                worker_extra = min(worker, self._video_lease_bytes)
+
         omlx_usage = get_phys_footprint()
         stats = get_macos_vm_stats()
         if stats is None:
-            return max(0, omlx_usage + psutil.virtual_memory().available)
+            return max(
+                0, omlx_usage + worker_extra + psutil.virtual_memory().available
+            )
         ratio = _ACTIVE_RECLAIM_RATIO[self._memory_guard_tier]
         reclaimable = (
             stats["free"]
             + stats["inactive"]
             + int(stats["active"] * ratio)
         )
-        return max(0, omlx_usage + reclaimable)
+        return max(0, omlx_usage + worker_extra + reclaimable)
 
     def _get_hard_limit_bytes(self) -> int:
         """Final hard ceiling = min(static, dynamic, metal_cap).
@@ -497,12 +536,85 @@ def _get_hard_limit_bytes(self) -> int:
         metal_cap = get_effective_metal_cap_bytes()
         if metal_cap > 0:
             candidates.append(metal_cap)
-        return min(candidates)
+        ceiling = min(candidates)
+        if self._video_lease_bytes > 0:
+            # Clamp to >= 1, never 0: every consumer treats ceiling 0 as
+            # "guard disabled", which would drop all protection exactly
+            # while a video job holds memory. A 1-byte ceiling instead
+            # pauses admission and trips the gate -- the safe direction.
+            return max(1, ceiling - self._video_lease_bytes)
+        return ceiling
 
     def get_final_ceiling(self) -> int:
         """Public accessor used by engine_pool pre-load admission."""
         return self._get_hard_limit_bytes()
 
+    def recent_peak_bytes(self) -> int:
+        """Recent high-water memory usage over the last few poll ticks."""
+        return self._recent_peak_bytes
+
+    @property
+    def video_lease_bytes(self) -> int:
+        """Currently held video memory lease (0 when none)."""
+        return self._video_lease_bytes
+
+    def acquire_video_lease(self, lease_bytes: int) -> None:
+        """Reserve memory for a video worker job.
+
+        Subtracts the lease from the final ceiling (single choke point:
+        pool admission, soft/hard watermarks, admission_paused and the
+        prefill gate cap all derive from it) and lowers this process's
+        Metal wired limit so parent + worker wired sets cannot stack
+        toward the machine cap. One lease at a time -- the VideoJobManager
+        serializes jobs (docs/video-generation-engine-spec.md 4.4).
+
+        Raises:
+            RuntimeError: If a lease is already held.
+            ValueError: If lease_bytes is not positive.
+        """
+        if lease_bytes <= 0:
+            raise ValueError(f"lease_bytes must be positive, got {lease_bytes}")
+        if self._video_lease_bytes > 0:
+            raise RuntimeError(
+                "A video memory lease is already held "
+                f"({_format_gb(self._video_lease_bytes)})"
+            )
+        self._video_lease_bytes = int(lease_bytes)
+        if self._prefill_memory_guard:
+            target = max(1, self._get_static_ceiling() - self._video_lease_bytes)
+            _apply_metal_wired_limit(target)
+            self._metal_wired_limit_request = target
+        if self._running:
+            self._propagate_memory_limit()
+        logger.info(
+            "[videolease] acquired %s (ceiling now %s)",
+            _format_gb(self._video_lease_bytes),
+            _format_gb(self._get_hard_limit_bytes()),
+        )
+
+    def set_video_worker_pid(self, pid: int | None) -> None:
+        """Bind the running video worker pid for dynamic-ceiling correction."""
+        self._video_worker_pid = pid
+
+    def release_video_lease(self) -> None:
+        """Release the video memory lease and restore the Metal wired limit."""
+        if self._video_lease_bytes <= 0:
+            return
+        released = self._video_lease_bytes
+        self._video_lease_bytes = 0
+        self._video_worker_pid = None
+        if self._prefill_memory_guard:
+            static_ceiling = self._get_static_ceiling()
+            _apply_metal_wired_limit(static_ceiling)
+            self._metal_wired_limit_request = static_ceiling
+        if self._running:
+            self._propagate_memory_limit()
+        logger.info(
+            "[videolease] released %s (ceiling now %s)",
+            _format_gb(released),
+            _format_gb(self._get_hard_limit_bytes()),
+        )
+
     def _soft_bytes(self) -> int:
         """Soft watermark: ceiling * soft_threshold."""
         ceiling = self._get_hard_limit_bytes()
@@ -589,6 +701,10 @@ def _propagate_memory_limit(self) -> None:
                 scheduler._admission_paused = admission_paused
                 scheduler._prefill_safe_zone_ratio = self._prefill_safe_zone_ratio
                 scheduler._prefill_min_chunk_tokens = self._prefill_min_chunk_tokens
+                scheduler._prefill_transient_margin_bytes = (
+                    self._prefill_transient_margin_bytes
+                )
+                scheduler._memory_recent_peak_bytes = self._recent_peak_bytes
                 bg = getattr(scheduler, "batch_generator", None)
                 if bg is not None and hasattr(bg, "_memory_limit_bytes"):
                     bg._memory_limit_bytes = soft_limit
@@ -671,6 +787,8 @@ async def _check_and_enforce(self) -> None:
             return
 
         current = self._current_usage_bytes()
+        self._usage_window.append(current)
+        self._recent_peak_bytes = max(self._usage_window) if self._usage_window else current
         soft = int(ceiling * self._soft_threshold)
         hard = int(ceiling * self._hard_threshold)
         prev_level = self._pressure_level
diff --git a/omlx/scheduler.py b/omlx/scheduler.py
index 253845bf9..148986c12 100644
--- a/omlx/scheduler.py
+++ b/omlx/scheduler.py
@@ -796,10 +796,26 @@ def __init__(
         # soft_threshold. Schedulers stop admitting new prefills while this is
         # set; in-flight requests proceed.
         self._admission_paused: bool = False
+        # Recent high-water memory usage, propagated from ProcessMemoryEnforcer.
+        # Preflight admission maxes the instant reading against this so it does
+        # not wave through a request during a prefill trough that would wall
+        # the next chunk. 0 until the enforcer sets it.
+        self._memory_recent_peak_bytes: int = 0
         # Adaptive prefill throttle params, propagated from enforcer.
         # Until set, _adaptive_chunk_size is a no-op (returns requested as-is).
         self._prefill_safe_zone_ratio: float = 0.80
         self._prefill_min_chunk_tokens: int = 32
+        # Conservative transient margin (bytes) added to the modelled per-chunk
+        # prefill peak by the forward-front gate (_prefill_forward_gate).
+        # estimate_prefill_peak_bytes only models KV + SDPA; it does NOT model
+        # the MoE expert-dequant activation spike, which on glm4.5-air-106b
+        # (MoE) is the dominant single-step transient. Sized from the observed
+        # worst-case single-step current jump on m5max (see MemorySettings
+        # .prefill_transient_margin_gb). 0 until the enforcer propagates it.
+        self._prefill_transient_margin_bytes: int = 0
+        # One-shot guard for _log_prefill_gate_state_once (loud resolved-state
+        # log so a mis-propagated margin can't ship silently inert again).
+        self._prefill_gate_state_logged: bool = False
         # EWMA estimator of per-token chunk transient bytes, used by
         # _adaptive_chunk_size in the caution zone. Owned per-scheduler.
         _tracker_model_id = ""
@@ -1729,6 +1745,153 @@ def _apply_turboquant_kv_convert(self, prompt_cache: list[Any]) -> None:
                 f"cache layers to {bits}-bit{skip_msg}"
             )
 
+    def _prefill_forward_gate(
+        self,
+        chunk_tokens: int,
+        *,
+        request_id: str,
+        loop_label: str,
+    ) -> None:
+        """Forward-FRONT memory gate: refuse a prefill chunk BEFORE it runs.
+
+        PRIMARY protection against a single request's prefill transient breaching
+        the Metal cap and kernel-panicking the box. The legacy chunk-END check
+        (after self.model(...) + mx.eval) only fires once the transient has
+        already been allocated -- on Apple Silicon a chunk that overshoots the
+        cap panics the whole machine, so the after-the-fact check never runs.
+        This predicts the next chunk's peak and raises BEFORE the forward when it
+        would exceed the cap; the call-site handler in _schedule_waiting catches
+        the RuntimeError, _sync_and_clear_cache()s the accumulated KV, and emits
+        a finish_reason="error" output instead of crashing.
+
+        predicted_peak = current(phys high-water) + estimate(optional) + margin
+          - current: max(active, phys_footprint, recent_peak) -- ALL three are
+            LIVE production signals (the same readings [memcheck:external] and
+            the enforcer use). recent_peak is the enforcer's rolling high-water,
+            so a mid-prefill trough in the instant reading does not mask the
+            real footprint.
+          - estimate: OPTIONAL. memory_monitor.estimate_prefill_peak_bytes models
+            this chunk's KV + SDPA. In production scheduler.memory_monitor is
+            never wired (see estimate-guards-inert finding), so estimate is 0 and
+            the gate is phys+margin only -- which is correct, because at chunk
+            granularity the KV+SDPA term is tiny and the margin dominates anyway.
+          - margin: _prefill_transient_margin_bytes is the real safety mechanism.
+            It covers the un-modelled MoE expert-dequant spike (the dominant
+            single-step transient on glm4.5-air). CRITICALLY it is propagated from
+            the ENFORCER (live), not the memory_monitor (dead), so unlike the
+            estimate this gate's safety actually fires in production. Sized so
+            margin >= worst-case single-step transient (see settings).
+
+        Trip point: with estimate~0, the gate fires once current > cap - margin.
+        Functional residual: on a model that fills most of the cap (glm4.5-air,
+        85GB on 128GB), a long prompt is refused cleanly (503-class) once its
+        accumulated KV approaches the headroom -- the correct behaviour (refuse
+        the request, do not crash the box); fit a longer context by using a
+        smaller quant. See MemorySettings.prefill_transient_margin_gb.
+
+        No-op (returns) only when the guard is off, the hard limit is unset, or
+        chunk_tokens <= 0. NOT a no-op when the monitor/estimate is missing --
+        that is the whole point of being phys-based.
+
+        Raises:
+            RuntimeError: when the predicted peak exceeds the hard limit.
+        """
+        if not self._prefill_memory_guard:
+            return
+        if self._memory_hard_limit_bytes <= 0:
+            return
+        if chunk_tokens <= 0:
+            return
+
+        # Emit the resolved gate state ONCE, before it can matter, so a
+        # mis-propagated margin (the exact silent failure that made the prior
+        # monitor-based gate inert) is visible in the log instead of shipping
+        # blind. See _log_prefill_gate_state.
+        self._log_prefill_gate_state_once()
+
+        # Estimate is OPTIONAL (monitor is unwired in production). Phys reading
+        # + the enforcer-propagated margin carry the guarantee.
+        estimate = 0
+        if self.memory_monitor is not None:
+            estimate = self.memory_monitor.estimate_prefill_peak_bytes(
+                chunk_tokens, self.config.prefill_step_size
+            )
+
+        predicted_transient = estimate + self._prefill_transient_margin_bytes
+        current = max(
+            mx.get_active_memory(),
+            get_phys_footprint(),
+            self._memory_recent_peak_bytes,
+        )
+        predicted_peak = current + predicted_transient
+
+        if predicted_peak > self._memory_hard_limit_bytes:
+            logger.warning(
+                "[memgate:%s] rid=%s refusing prefill chunk (n=%d) BEFORE "
+                "forward: predicted peak %.3fGB = current %.3fGB + transient "
+                "%.3fGB (estimate %.3fGB + margin %.3fGB) exceeds hard cap "
+                "%.3fGB. Aborting request to avoid a Metal-cap kernel panic.",
+                loop_label,
+                request_id,
+                chunk_tokens,
+                predicted_peak / 1024**3,
+                current / 1024**3,
+                predicted_transient / 1024**3,
+                estimate / 1024**3,
+                self._prefill_transient_margin_bytes / 1024**3,
+                self._memory_hard_limit_bytes / 1024**3,
+            )
+            raise RuntimeError(
+                "Prefill refused before forward: predicted peak "
+                f"{predicted_peak / 1024**3:.1f}GB (current "
+                f"{current / 1024**3:.1f}GB + transient "
+                f"{predicted_transient / 1024**3:.1f}GB) would exceed the "
+                f"memory ceiling {self._memory_hard_limit_bytes / 1024**3:.1f}GB. "
+                "Reduce context length or increase --max-process-memory."
+            )
+
+    def _log_prefill_gate_state_once(self) -> None:
+        """Log the resolved prefill-gate configuration exactly once.
+
+        The prior monitor-based gate shipped INERT and SILENT (its memory_monitor
+        was never wired, so it no-op'd with no signal -- found only on hardware).
+        This makes the resolved state loud, the first time the gate runs, so the
+        one dependency that still matters -- the margin propagated from the
+        enforcer -- is visible instead of shipping blind. A margin of 0 degrades
+        the gate to the bare cap check; that is surfaced as a WARNING, not a
+        silent no-op.
+        """
+        if getattr(self, "_prefill_gate_state_logged", False):
+            return
+        self._prefill_gate_state_logged = True
+
+        estimator_live = False
+        if self.memory_monitor is not None:
+            try:
+                estimator_live = (
+                    self.memory_monitor.estimate_prefill_peak_bytes(
+                        self.config.prefill_step_size,
+                        self.config.prefill_step_size,
+                    )
+                    > 0
+                )
+            except Exception:
+                estimator_live = False
+
+        margin_bytes = self._prefill_transient_margin_bytes
+        emit = logger.warning if margin_bytes <= 0 else logger.info
+        emit(
+            "[memgate] prefill forward gate ACTIVE (phys-based): "
+            "margin=%.1fGB, cap=%.1fGB, model-dim estimator=%s%s",
+            margin_bytes / 1024**3,
+            self._memory_hard_limit_bytes / 1024**3,
+            "active" if estimator_live
+            else "DISABLED (phys+margin only)",
+            "  -- WARNING: margin=0, gate degraded to the bare cap check"
+            if margin_bytes <= 0
+            else "",
+        )
+
     def _do_external_prefill(
         self,
         request: "Request",
@@ -1885,6 +2048,15 @@ def _do_external_prefill(
                         extra_kwargs, n_to_process
                     )
 
+            # Forward-FRONT gate: predict this chunk's peak and refuse BEFORE
+            # the forward if it would breach the Metal cap (post-forward checks
+            # cannot save us -- the overshoot kernel-panics the machine).
+            self._prefill_forward_gate(
+                n_to_process,
+                request_id=request.request_id,
+                loop_label="external",
+            )
+
             _throttle_pre = get_phys_footprint()
             self.model(input_arr[:, :n_to_process], cache=prompt_cache, **model_kwargs)
             mx.eval([c.state for c in prompt_cache])
@@ -2223,6 +2395,17 @@ def _step_prefill_chunk(self, state: _PrefillState) -> bool:
 
         chunk = state.tokens_remaining[:, :n]
         state.tokens_remaining = state.tokens_remaining[:, n:]
+
+        # Forward-FRONT gate: predict this chunk's peak and refuse BEFORE the
+        # forward if it would breach the Metal cap. Mirrors the external loop;
+        # raises RuntimeError that _advance_chunked_prefills converts into a
+        # finish_reason="error" output without crashing the machine.
+        self._prefill_forward_gate(
+            n,
+            request_id=state.request.request_id,
+            loop_label="chunked_step",
+        )
+
         _throttle_pre = get_phys_footprint()
         self.model(chunk, cache=state.cache)
         mx.eval([c.state for c in state.cache])
@@ -4510,6 +4693,16 @@ def _preflight_memory_check(self, request: "Request") -> str | None:
         """
         Estimate whether prefill would exceed memory limits.
 
+        NOTE: this guard is monitor-DEPENDENT and therefore INERT in production
+        -- scheduler.memory_monitor is never wired, so estimate==0 and this
+        returns None (no rejection) for every request (see the
+        estimate-guards-inert finding). It is kept for the test-injected-monitor
+        path and as documentation; the live single-request protection is the
+        phys-based _prefill_forward_gate. A phys-only version here would buy
+        nothing: at admission time current is the idle baseline (~weights), well
+        below cap - margin, so it would never reject. Do not mistake this for an
+        active guard.
+
         Computes worst-case peak memory for the last prefill chunk
         (model weights + KV cache + SDPA attention matrix) and rejects
         if it would exceed the hard limit.
@@ -4541,7 +4734,11 @@ def _preflight_memory_check(self, request: "Request") -> str | None:
         if peak == 0:
             return None  # can't estimate, skip
 
-        current = max(mx.get_active_memory(), get_phys_footprint())
+        current = max(
+            mx.get_active_memory(),
+            get_phys_footprint(),
+            self._memory_recent_peak_bytes,
+        )
 
         if current + peak > self._memory_hard_limit_bytes:
             from .utils.hardware import format_bytes
diff --git a/omlx/server.py b/omlx/server.py
index 0d5df6745..2db9f33dd 100644
--- a/omlx/server.py
+++ b/omlx/server.py
@@ -165,6 +165,7 @@
     ModelLoadingError,
     ModelNotFoundError,
     ModelTooLargeError,
+    ModelTypeNotLoadableError,
     SchedulerQueueFullError,
 )
 from .model_discovery import format_size
@@ -227,6 +228,7 @@ class ServerState:
     responses_store: ResponseStore = field(default_factory=ResponseStore)
     oq_manager: Optional[object] = None  # OQManager
     hf_uploader: Optional[object] = None  # HFUploader
+    video_job_manager: Optional[object] = None  # VideoJobManager
 
 
 # Global server state instance
@@ -359,12 +361,29 @@ async def lifespan(app: FastAPI):
             hard_threshold=mem_cfg.hard_threshold,
             prefill_safe_zone_ratio=mem_cfg.prefill_safe_zone_ratio,
             prefill_min_chunk_tokens=mem_cfg.prefill_min_chunk_tokens,
+            prefill_transient_margin_gb=mem_cfg.prefill_transient_margin_gb,
         )
         _server_state.process_memory_enforcer = enforcer
         _server_state.engine_pool._process_memory_enforcer = enforcer
         _server_state.engine_pool._get_final_ceiling = enforcer.get_final_ceiling
         enforcer.start()
 
+    # Video job manager -- constructed AFTER the enforcer so the memory
+    # lease can be constructor-injected (testability seam, spec 4.2).
+    # Cheap when video is disabled: no worker spawns until a job arrives.
+    if _server_state.global_settings is not None:
+        from .video.manager import VideoJobManager
+
+        try:
+            _server_state.video_job_manager = VideoJobManager(
+                settings=_server_state.global_settings.video,
+                base_path=_server_state.global_settings.base_path,
+                enforcer=_server_state.process_memory_enforcer,
+            )
+        except Exception as e:  # noqa: BLE001 -- never block serving on video
+            logger.warning(f"Video job manager unavailable: {e}")
+            _server_state.video_job_manager = None
+
     # Start TTL-only checker if process memory enforcer is not running
     # (enforcer already includes TTL checks in its polling loop)
     ttl_task = None
@@ -398,6 +417,13 @@ async def _ttl_check_loop():
 
     # Shutdown: Save all-time stats, stop TTL task, process memory enforcer, etc.
     get_server_metrics().save_alltime()
+    # isinstance (not None-check): tests patch _server_state wholesale and
+    # auto-created mock attributes are not awaitable.
+    from .video.manager import VideoJobManager as _VideoJobManager
+    if isinstance(_server_state.video_job_manager, _VideoJobManager):
+        await _server_state.video_job_manager.shutdown()
+        _server_state.video_job_manager = None
+        logger.info("Video job manager stopped")
     if ttl_task is not None:
         ttl_task.cancel()
         try:
@@ -446,6 +472,13 @@ async def _ttl_check_loop():
 except ImportError:
     pass
 
+# Video routes are mounted unconditionally -- a settings-driven gate cannot
+# live here because settings are not initialized at import time (the audio
+# gate above only works because it tests import availability). Each handler
+# gates on settings.video.enabled / manager presence / worker venv instead.
+from .api.video_routes import router as video_router
+app.include_router(video_router, dependencies=[Depends(verify_api_key)])
+
 # Include admin routes
 from .admin.routes import router as admin_router, set_admin_getters
 from .admin.auth import _RedirectToLogin
@@ -690,15 +723,41 @@ async def get_engine(
     # Resolve alias to real model_id
     model_id = pool.resolve_model_id(model_id, _server_state.settings_manager)
 
+    # Video models are job-managed; reject BEFORE pool.get_engine so the
+    # 42GB entry never enters the admission/eviction loop (a misrouted
+    # chat request must not evict resident LLMs). Spec section 3.
+    _entry = pool.get_entry(model_id)
+    if _entry is not None and getattr(_entry, "model_type", "") == "video":
+        raise HTTPException(
+            status_code=400,
+            detail=(
+                f"Model '{model_id}' is a video generation model. "
+                "Use POST /v1/videos."
+            ),
+        )
+
     try:
         engine = await pool.get_engine(model_id)
     except ModelNotFoundError as e:
-        # Fallback to default model if enabled (LLM only)
+        # Fallback to default model if enabled (LLM only). The default can
+        # still be set to a non-chat model via admin/settings; verify its
+        # type before retrying (spec 4.1 fallback hygiene).
+        _default_entry = (
+            pool.get_entry(_server_state.default_model)
+            if _server_state.default_model
+            else None
+        )
+        _default_type = getattr(_default_entry, "model_type", None)
         if (
             engine_type == EngineType.LLM
             and _server_state.global_settings
             and _server_state.global_settings.model.model_fallback
             and _server_state.default_model
+            # Block fallback only onto a KNOWN non-chat type; unknown
+            # entries (or non-string types from test doubles) preserve
+            # the old fallback behavior.
+            and (not isinstance(_default_type, str)
+                 or _default_type in ("llm", "vlm"))
         ):
             logger.info(
                 f"Model '{model_id}' not found, falling back to "
@@ -729,6 +788,9 @@ async def get_engine(
         raise HTTPException(status_code=507, detail=str(e))
     except ModelLoadingError as e:
         raise HTTPException(status_code=409, detail=str(e))
+    except ModelTypeNotLoadableError as e:
+        # Defense in depth: the pre-pool check above normally catches this
+        raise HTTPException(status_code=400, detail=str(e))
     except EnginePoolError as e:
         raise HTTPException(status_code=500, detail=str(e))
 
@@ -1274,8 +1336,16 @@ def init_server(
             f"No models found in {', '.join(dir_list)}. Add models to serve them."
         )
 
-    # Set default model (from settings file, fallback to first model)
+    # Set default model (from settings file, fallback to first model).
+    # Implicit selection filters to chat-capable types so a video (or
+    # embedding/audio) model that sorts first never becomes the target of
+    # model-less chat requests (spec 4.1 default-model hygiene).
     available_models = _server_state.engine_pool.get_model_ids()
+
+    def _chat_capable(mid: str) -> bool:
+        entry = _server_state.engine_pool.get_entry(mid)
+        return entry is not None and entry.model_type in ("llm", "vlm")
+
     if available_models:
         if settings_default:
             if settings_default in available_models:
@@ -1284,9 +1354,13 @@ def init_server(
                 logger.warning(
                     f"Default model '{settings_default}' not found, using first model"
                 )
-                _server_state.default_model = available_models[0]
+                _server_state.default_model = next(
+                    (m for m in available_models if _chat_capable(m)), None
+                )
         else:
-            _server_state.default_model = available_models[0]
+            _server_state.default_model = next(
+                (m for m in available_models if _chat_capable(m)), None
+            )
     else:
         _server_state.default_model = None
 
@@ -1717,6 +1791,7 @@ async def list_models(_: bool = Depends(verify_api_key)) -> ModelsResponse:
                 ModelInfo(
                     id=display_id,
                     owned_by="omlx",
+                    model_type=m.get("model_type", "llm"),
                 )
             )
 
@@ -1773,6 +1848,16 @@ async def load_model_public(model_id: str, _: bool = Depends(verify_api_key)):
     entry = _server_state.engine_pool.get_entry(model_id)
     if entry is None:
         raise HTTPException(status_code=404, detail=f"Model not found: {model_id}")
+    if entry.model_type == "video":
+        # Pre-pool check: the blanket except below would swallow the typed
+        # rejection into a 500 (spec 4.1).
+        raise HTTPException(
+            status_code=400,
+            detail=(
+                f"Model '{model_id}' is a video generation model and is "
+                "not pool-loaded. Use POST /v1/videos."
+            ),
+        )
     if entry.engine is not None:
         return {"status": "ok", "model_id": model_id, "message": f"Already loaded: {model_id}"}
 
diff --git a/omlx/settings.py b/omlx/settings.py
index adba2a6d8..b818d835e 100644
--- a/omlx/settings.py
+++ b/omlx/settings.py
@@ -391,6 +391,33 @@ class MemorySettings:
     # aborted via the same cleanup path the hard-limit RuntimeError uses.
     prefill_safe_zone_ratio: float = 0.80
     prefill_min_chunk_tokens: int = 32
+    # Conservative transient margin used by the scheduler's forward-FRONT memory
+    # gate (_prefill_forward_gate). The gate is PHYS-based: it refuses a prefill
+    # chunk before its forward when current(max active/phys/recent_peak) + this
+    # margin would breach the hard cap, so the transient never lands on the Metal
+    # ceiling (which would kernel-panic the whole machine -- an after-the-fact
+    # Python check cannot catch it). The model-dim estimate is optional and, in
+    # production, absent (scheduler.memory_monitor is never wired), so this margin
+    # IS the safety mechanism -- it is propagated from the ENFORCER (live), unlike
+    # the dead monitor.
+    #
+    # The load-bearing guarantee: margin >= the worst-case single-step transient.
+    # That transient (chiefly MoE expert-dequant on glm4.5-air) is SUB-POLL --
+    # faster than the enforcer's 1s sample -- so it is invisible to every memory
+    # read and MUST be carried here, not by reading the footprint more cleverly.
+    # On 2026-06-06 m5max a single glm4.5-air-106b prefill peaked at 110.4GB vs a
+    # 107.5GB cap, an effective transient up to ~10.6GB above the pre-step
+    # baseline; margin 10 was too small. 12 = ceil(10.6) padded. Extra cushion:
+    # the box only actually panics nearer ~110, so an admitted chunk needs a
+    # transient > ~14.5GB above the trip point to crash -- margin 12 clears that.
+    #
+    # Trip point: gate fires once current > cap - margin. Functional residual: a
+    # model that fills most of the cap (85GB on 128GB) gets long prompts refused
+    # cleanly (503-class) -- correct (refuse the request, do not crash the box);
+    # fit longer contexts with a smaller quant. Watch [memgate]/[memcheck] on
+    # hardware. Set to 0 only to disable the margin (gate degrades to the bare cap
+    # check; logged as a WARNING at startup).
+    prefill_transient_margin_gb: float = 12.0
 
     def to_dict(self) -> dict[str, Any]:
         """Convert to dictionary."""
@@ -402,6 +429,7 @@ def to_dict(self) -> dict[str, Any]:
             "hard_threshold": self.hard_threshold,
             "prefill_safe_zone_ratio": self.prefill_safe_zone_ratio,
             "prefill_min_chunk_tokens": self.prefill_min_chunk_tokens,
+            "prefill_transient_margin_gb": self.prefill_transient_margin_gb,
         }
 
     @classmethod
@@ -440,6 +468,9 @@ def from_dict(cls, data: dict[str, Any]) -> MemorySettings:
             prefill_min_chunk_tokens=int(
                 data.get("prefill_min_chunk_tokens", 32)
             ),
+            prefill_transient_margin_gb=float(
+                data.get("prefill_transient_margin_gb", 12.0)
+            ),
         )
 
 
@@ -539,15 +570,24 @@ class HuggingFaceSettings:
     """HuggingFace Hub configuration settings."""
 
     endpoint: str = ""  # Empty string = use HF default (https://huggingface.co)
+    # Disable the Xet chunk-CAS transfer backend (cas-bridge.xethub.hf.co).
+    # huggingface_hub freezes HF_HUB_DISABLE_XET at import time, so this can
+    # only be applied process-wide at serve startup (cli.py env block) --
+    # never per-download. Xet is unreachable from some networks (observed:
+    # mainland China); the plain LFS path works.
+    disable_xet: bool = False
 
     def to_dict(self) -> dict[str, Any]:
         """Convert to dictionary."""
-        return {"endpoint": self.endpoint}
+        return {"endpoint": self.endpoint, "disable_xet": self.disable_xet}
 
     @classmethod
     def from_dict(cls, data: dict[str, Any]) -> HuggingFaceSettings:
         """Create from dictionary."""
-        return cls(endpoint=data.get("endpoint", ""))
+        return cls(
+            endpoint=data.get("endpoint", ""),
+            disable_xet=bool(data.get("disable_xet", False)),
+        )
 
 
 @dataclass
@@ -566,6 +606,77 @@ def from_dict(cls, data: dict[str, Any]) -> ModelScopeSettings:
         return cls(endpoint=data.get("endpoint", ""))
 
 
+@dataclass
+class VideoSettings:
+    """Video generation engine settings (docs/video-generation-engine-spec.md).
+
+    The video engine runs mlx-gen in a subprocess worker from its own venv
+    ({base_path}/venvs/video by default); these settings gate the /v1/videos
+    API and bound its resource use. memory_lease_gb is reserved against the
+    process memory enforcer ceiling for the duration of a job so co-resident
+    LLM serving throttles instead of stacking toward the Metal cap.
+    """
+
+    enabled: bool = False  # Master switch; handlers return 503 when off
+    worker_python: str = ""  # Empty = {base_path}/venvs/video/bin/python
+    memory_lease_gb: float = 36.0  # Reserved against the enforcer ceiling per job (P0-calibrated)
+    max_queued_jobs: int = 4  # Submissions beyond this 503
+    job_timeout_seconds: int = 7200  # Per-run clock, starts at worker spawn
+    progress_stall_timeout_seconds: int = 600  # Kill when worker JSONL goes silent
+    default_steps: int = 20
+    default_fps: int = 16
+    max_frames: int = 121  # UX bound; memory bound is the peak predictor
+    max_steps: int = 50
+    max_pixels_per_frame: int = 1280 * 720
+    artifacts_max_count: int = 50  # LRU-purge artifact blobs beyond this
+    artifacts_max_gb: float = 50.0
+
+    def get_worker_python(self, base_path: Path) -> Path:
+        """Resolve the worker venv python path."""
+        if self.worker_python:
+            return Path(self.worker_python).expanduser()
+        return base_path / "venvs" / "video" / "bin" / "python"
+
+    def to_dict(self) -> dict[str, Any]:
+        """Convert to dictionary."""
+        return {
+            "enabled": self.enabled,
+            "worker_python": self.worker_python,
+            "memory_lease_gb": self.memory_lease_gb,
+            "max_queued_jobs": self.max_queued_jobs,
+            "job_timeout_seconds": self.job_timeout_seconds,
+            "progress_stall_timeout_seconds": self.progress_stall_timeout_seconds,
+            "default_steps": self.default_steps,
+            "default_fps": self.default_fps,
+            "max_frames": self.max_frames,
+            "max_steps": self.max_steps,
+            "max_pixels_per_frame": self.max_pixels_per_frame,
+            "artifacts_max_count": self.artifacts_max_count,
+            "artifacts_max_gb": self.artifacts_max_gb,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict[str, Any]) -> VideoSettings:
+        """Create from dictionary."""
+        return cls(
+            enabled=bool(data.get("enabled", False)),
+            worker_python=data.get("worker_python", ""),
+            memory_lease_gb=float(data.get("memory_lease_gb", 36.0)),
+            max_queued_jobs=int(data.get("max_queued_jobs", 4)),
+            job_timeout_seconds=int(data.get("job_timeout_seconds", 7200)),
+            progress_stall_timeout_seconds=int(
+                data.get("progress_stall_timeout_seconds", 600)
+            ),
+            default_steps=int(data.get("default_steps", 20)),
+            default_fps=int(data.get("default_fps", 16)),
+            max_frames=int(data.get("max_frames", 121)),
+            max_steps=int(data.get("max_steps", 50)),
+            max_pixels_per_frame=int(data.get("max_pixels_per_frame", 1280 * 720)),
+            artifacts_max_count=int(data.get("artifacts_max_count", 50)),
+            artifacts_max_gb=float(data.get("artifacts_max_gb", 50.0)),
+        )
+
+
 @dataclass
 class NetworkSettings:
     """Network proxy and TLS trust settings."""
@@ -784,6 +895,7 @@ class GlobalSettings:
     integrations: IntegrationSettings = field(default_factory=IntegrationSettings)
     ui: UISettings = field(default_factory=UISettings)
     idle_timeout: ModelIdleTimeoutSettings = field(default_factory=ModelIdleTimeoutSettings)
+    video: VideoSettings = field(default_factory=VideoSettings)
 
     @classmethod
     def load(
@@ -879,6 +991,8 @@ def _load_from_file(self, path: Path) -> None:
                 self.ui = UISettings.from_dict(data["ui"])
             if "idle_timeout" in data:
                 self.idle_timeout = ModelIdleTimeoutSettings.from_dict(data["idle_timeout"])
+            if "video" in data:
+                self.video = VideoSettings.from_dict(data["video"])
 
         except json.JSONDecodeError as e:
             logger.warning(f"Failed to parse settings file {path}: {e}")
@@ -1120,6 +1234,7 @@ def save(self) -> None:
             "integrations": self.integrations.to_dict(),
             "ui": self.ui.to_dict(),
             "idle_timeout": self.idle_timeout.to_dict(),
+            "video": self.video.to_dict(),
         }
 
         try:
@@ -1363,6 +1478,7 @@ def to_dict(self) -> dict[str, Any]:
             "integrations": self.integrations.to_dict(),
             "ui": self.ui.to_dict(),
             "idle_timeout": self.idle_timeout.to_dict(),
+            "video": self.video.to_dict(),
         }
 
 
diff --git a/omlx/video/__init__.py b/omlx/video/__init__.py
new file mode 100644
index 000000000..f50b05583
--- /dev/null
+++ b/omlx/video/__init__.py
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: Apache-2.0
+"""Video generation engine: job manager + subprocess worker.
+
+The video engine runs mlx-gen (Wan2.2 text-to-video) in a subprocess worker
+from its own venv, coordinated by VideoJobManager with a memory lease held
+against the ProcessMemoryEnforcer ceiling. Design:
+docs/video-generation-engine-spec.md.
+
+Note: worker.py is NOT imported here -- it runs under the video venv python
+and must stay importable without omlx on sys.path.
+"""
+
+from .manager import VideoJob, VideoJobManager
+
+__all__ = ["VideoJob", "VideoJobManager"]
diff --git a/omlx/video/manager.py b/omlx/video/manager.py
new file mode 100644
index 000000000..e1065d8b7
--- /dev/null
+++ b/omlx/video/manager.py
@@ -0,0 +1,666 @@
+# SPDX-License-Identifier: Apache-2.0
+"""VideoJobManager: async job queue for subprocess video generation.
+
+Job shape follows the admin downloader/OQ patterns (task dict + status enum
++ cooperative cancel) with persistence (one JSON per job, atomic write) and
+a memory lease held against the ProcessMemoryEnforcer for the duration of
+each run. Design: docs/video-generation-engine-spec.md sections 4.2/4.4.
+
+Wire status is exactly the OpenAI four-value enum: queued | in_progress |
+completed | failed. Cancellation is not a wire state -- DELETE kills the
+worker and removes the record entirely.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+import os
+import shutil
+import time
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, Optional
+
+from ..utils.proc_memory import get_phys_footprint
+
+logger = logging.getLogger(__name__)
+
+GB = 1024**3
+
+# Stable error codes (spec 4.2). The worker failure manifest uses the same
+# {code, message, detail?} schema and is passed through.
+ERR_WORKER_CRASHED = "worker_crashed"
+ERR_WORKER_STALLED = "worker_stalled"
+ERR_JOB_TIMEOUT = "job_timeout"
+ERR_LEASE_EXCEEDED = "memory_lease_exceeded"
+ERR_MONITOR_FAILED = "monitor_failed"
+ERR_SERVER_RESTARTED = "server_restarted"
+ERR_OUTPUT_INVALID = "output_invalid"
+
+_WATCHDOG_INTERVAL_S = 2.0
+_ADMISSION_RECHECK_S = 5.0
+_SIGTERM_GRACE_S = 5.0
+
+
+@dataclass
+class VideoJob:
+    """One video generation job."""
+
+    id: str
+    model_id: str
+    model_dir: str
+    params: dict[str, Any]  # prompt, width, height, frames, steps, fps, seed, ...
+    status: str = "queued"  # queued | in_progress | completed | failed
+    progress: int = 0  # 0-100
+    phase: str = ""
+    error: Optional[dict[str, str]] = None  # {code, message} when failed
+    created_at: float = field(default_factory=time.time)
+    started_at: Optional[float] = None
+    completed_at: Optional[float] = None
+    expires_at: Optional[float] = None  # Set when the artifact blob is purged
+    artifact_path: Optional[str] = None
+    wall_seconds: Optional[float] = None
+    peak_memory_gb: Optional[float] = None  # Worker lifetime-max, for records
+
+    def to_dict(self) -> dict[str, Any]:
+        """Wire shape: OpenAI video object fields + fmlx extensions."""
+        return {
+            "id": self.id,
+            "object": "video",
+            "model": self.model_id,
+            "status": self.status,
+            "progress": self.progress,
+            "created_at": int(self.created_at),
+            "completed_at": int(self.completed_at) if self.completed_at else None,
+            "expires_at": int(self.expires_at) if self.expires_at else None,
+            "error": self.error,
+            "seconds": str(self.params.get("seconds", "")),
+            "size": f"{self.params.get('width')}x{self.params.get('height')}",
+            # fmlx extensions
+            "phase": self.phase,
+            "prompt": self.params.get("prompt", ""),
+            "frames": self.params.get("frames"),
+            "fps": self.params.get("fps"),
+            "steps": self.params.get("steps"),
+            "seed": self.params.get("seed"),
+            "wall_seconds": self.wall_seconds,
+        }
+
+    def to_persist(self) -> dict[str, Any]:
+        return {
+            "id": self.id,
+            "model_id": self.model_id,
+            "model_dir": self.model_dir,
+            "params": self.params,
+            "status": self.status,
+            "progress": self.progress,
+            "phase": self.phase,
+            "error": self.error,
+            "created_at": self.created_at,
+            "started_at": self.started_at,
+            "completed_at": self.completed_at,
+            "expires_at": self.expires_at,
+            "artifact_path": self.artifact_path,
+            "wall_seconds": self.wall_seconds,
+            "peak_memory_gb": self.peak_memory_gb,
+        }
+
+    @classmethod
+    def from_persist(cls, data: dict[str, Any]) -> "VideoJob":
+        return cls(
+            id=str(data["id"]),
+            model_id=str(data.get("model_id", "")),
+            model_dir=str(data.get("model_dir", "")),
+            params=dict(data.get("params") or {}),
+            status=str(data.get("status", "failed")),
+            progress=int(data.get("progress") or 0),
+            phase=str(data.get("phase", "") or ""),
+            error=data.get("error"),
+            created_at=float(data.get("created_at") or time.time()),
+            started_at=data.get("started_at"),
+            completed_at=data.get("completed_at"),
+            expires_at=data.get("expires_at"),
+            artifact_path=data.get("artifact_path"),
+            wall_seconds=data.get("wall_seconds"),
+            peak_memory_gb=data.get("peak_memory_gb"),
+        )
+
+
+class VideoJobManager:
+    """Serializes video generation jobs against a memory lease.
+
+    Constructed in server lifespan AFTER the ProcessMemoryEnforcer so the
+    enforcer can be constructor-injected (testability seam, spec 4.2).
+    """
+
+    def __init__(
+        self,
+        *,
+        settings: Any,  # VideoSettings
+        base_path: Path,
+        enforcer: Any | None,  # ProcessMemoryEnforcer | None
+        worker_script: Path | None = None,
+    ):
+        self._settings = settings
+        self._base_path = Path(base_path)
+        self._enforcer = enforcer
+        self._worker_script = worker_script or (
+            Path(__file__).parent / "worker.py"
+        )
+        self._jobs: dict[str, VideoJob] = {}
+        self._queue: list[str] = []  # FIFO of queued job ids
+        self._dispatcher: asyncio.Task | None = None
+        self._current_proc: asyncio.subprocess.Process | None = None
+        self._current_job_id: str | None = None
+        self._wake = asyncio.Event()
+        self._shutdown = False
+        self._venv_probe_result: tuple[bool, str] | None = None
+
+        self.jobs_dir.mkdir(parents=True, exist_ok=True)
+        self.artifacts_dir.mkdir(parents=True, exist_ok=True)
+        self._replay_persisted()
+
+    # -- paths ---------------------------------------------------------------
+
+    @property
+    def jobs_dir(self) -> Path:
+        return self._base_path / "video-jobs"
+
+    @property
+    def artifacts_dir(self) -> Path:
+        return self._base_path / "video-artifacts"
+
+    def worker_python(self) -> Path:
+        return self._settings.get_worker_python(self._base_path)
+
+    # -- persistence ---------------------------------------------------------
+
+    def _persist(self, job: VideoJob) -> None:
+        path = self.jobs_dir / f"{job.id}.json"
+        tmp = path.with_suffix(".tmp")
+        try:
+            with open(tmp, "w") as f:
+                json.dump(job.to_persist(), f, indent=1)
+            os.replace(tmp, path)
+        except OSError as e:
+            logger.error(f"[video] failed to persist job {job.id}: {e}")
+
+    def _replay_persisted(self) -> None:
+        """Reload job records at startup; in-flight jobs become failed."""
+        for path in sorted(self.jobs_dir.glob("video_*.json")):
+            try:
+                with open(path) as f:
+                    job = VideoJob.from_persist(json.load(f))
+            except Exception as e:
+                logger.warning(f"[video] skipping unreadable job file {path}: {e}")
+                continue
+            if job.status in ("queued", "in_progress"):
+                job.status = "failed"
+                job.error = {
+                    "code": ERR_SERVER_RESTARTED,
+                    "message": "Server restarted while the job was active",
+                }
+                job.completed_at = time.time()
+                self._persist(job)
+            self._jobs[job.id] = job
+
+    # -- venv probe ----------------------------------------------------------
+
+    async def probe_worker_venv(self, force: bool = False) -> tuple[bool, str]:
+        """Check the worker venv is usable (cached after first success)."""
+        if self._venv_probe_result and self._venv_probe_result[0] and not force:
+            return self._venv_probe_result
+        py = self.worker_python()
+        install_hint = (
+            "Install with: uv venv -p 3.12 {base}/venvs/video && "
+            "uv pip sync --python {base}/venvs/video/bin/python "
+            "omlx/video/requirements.lock".format(base=self._base_path)
+        )
+        if not py.exists():
+            self._venv_probe_result = (
+                False, f"Video worker python not found at {py}. {install_hint}"
+            )
+            return self._venv_probe_result
+        try:
+            proc = await asyncio.create_subprocess_exec(
+                str(py), "-c", "import mflux",
+                stdout=asyncio.subprocess.DEVNULL,
+                stderr=asyncio.subprocess.PIPE,
+            )
+            _, stderr = await asyncio.wait_for(proc.communicate(), timeout=60)
+            if proc.returncode != 0:
+                self._venv_probe_result = (
+                    False,
+                    f"Video worker venv at {py} cannot import mflux: "
+                    f"{(stderr or b'').decode()[-300:]}. {install_hint}",
+                )
+                return self._venv_probe_result
+        except Exception as e:
+            self._venv_probe_result = (False, f"Video worker venv probe failed: {e}")
+            return self._venv_probe_result
+        self._venv_probe_result = (True, "")
+        return self._venv_probe_result
+
+    # -- memory admission ----------------------------------------------------
+
+    def _lease_bytes(self) -> int:
+        return int(float(self._settings.memory_lease_gb) * GB)
+
+    def guard_available(self) -> tuple[bool, str]:
+        """Submission-time check: refuse jobs without a live memory guard."""
+        enf = self._enforcer
+        if enf is None or not getattr(enf, "is_running", False):
+            return False, (
+                "Video jobs require the process memory guard, which is not "
+                "running on this server"
+            )
+        if enf.get_final_ceiling() <= 0:
+            return False, (
+                "Video jobs require memory.prefill_memory_guard to be "
+                "enabled (the guard is currently disabled)"
+            )
+        return True, ""
+
+    def _memory_admission(self) -> tuple[bool, str]:
+        """Dispatch-time predicate (spec 4.4): the lease must land with the
+        system already at ok pressure and resident load below both the
+        post-lease soft watermark and the post-lease prefill-gate trip."""
+        enf = self._enforcer
+        ok, reason = self.guard_available()
+        if not ok:
+            return False, reason
+        assert enf is not None  # guard_available() established this
+        ceiling = enf.get_final_ceiling()
+        lease = self._lease_bytes()
+        post = ceiling - lease
+        if post <= 0:
+            return False, (
+                f"memory lease {lease / GB:.0f}GB does not fit under the "
+                f"ceiling {ceiling / GB:.1f}GB"
+            )
+        soft_ratio = float(getattr(enf, "_soft_threshold", 0.85) or 0.85)
+        margin = int(getattr(enf, "_prefill_transient_margin_bytes", 12 * GB)
+                     or 12 * GB)
+        budget = min(int(post * soft_ratio), post - margin)
+        if budget <= 0:
+            budget = int(post * soft_ratio)
+        peak = int(enf.recent_peak_bytes() or 0)
+        if peak > budget:
+            return False, (
+                f"waiting for memory: resident usage {peak / GB:.1f}GB above "
+                f"post-lease budget {budget / GB:.1f}GB "
+                f"(ceiling {ceiling / GB:.1f}GB, lease {lease / GB:.0f}GB)"
+            )
+        return True, ""
+
+    # -- public API ----------------------------------------------------------
+
+    def get(self, job_id: str) -> VideoJob | None:
+        return self._jobs.get(job_id)
+
+    def list_jobs(
+        self, limit: int = 20, after: str | None = None, order: str = "desc"
+    ) -> tuple[list[VideoJob], bool]:
+        jobs = sorted(
+            self._jobs.values(),
+            key=lambda j: j.created_at,
+            reverse=(order != "asc"),
+        )
+        if after:
+            ids = [j.id for j in jobs]
+            try:
+                start = ids.index(after) + 1
+                jobs = jobs[start:]
+            except ValueError:
+                pass
+        page = jobs[:limit]
+        return page, len(jobs) > limit
+
+    def queue_depth(self) -> int:
+        return len(self._queue)
+
+    async def submit(self, job: VideoJob) -> VideoJob:
+        """Accept a job into the queue (caller validates params + caps)."""
+        if len(self._queue) >= int(self._settings.max_queued_jobs):
+            raise QueueFullError(
+                f"Video queue is full ({len(self._queue)}/"
+                f"{self._settings.max_queued_jobs})"
+            )
+        self._jobs[job.id] = job
+        self._queue.append(job.id)
+        self._persist(job)
+        self._ensure_dispatcher()
+        self._wake.set()
+        logger.info(f"[video] queued {job.id} ({job.model_id})")
+        return job
+
+    async def delete(self, job_id: str) -> bool:
+        """DELETE semantics (spec 4.3): kill if running, drop record+blobs."""
+        job = self._jobs.get(job_id)
+        if job is None:
+            return False
+        if job_id in self._queue:
+            self._queue.remove(job_id)
+        if self._current_job_id == job_id and self._current_proc is not None:
+            await self._terminate_proc(self._current_proc)
+        self._jobs.pop(job_id, None)
+        try:
+            (self.jobs_dir / f"{job_id}.json").unlink(missing_ok=True)
+        except OSError:
+            pass
+        blob_dir = self.artifacts_dir / job_id
+        if blob_dir.exists():
+            shutil.rmtree(blob_dir, ignore_errors=True)
+        logger.info(f"[video] deleted {job_id}")
+        return True
+
+    async def shutdown(self) -> None:
+        self._shutdown = True
+        self._wake.set()
+        if self._current_proc is not None:
+            await self._terminate_proc(self._current_proc)
+        if self._dispatcher is not None:
+            self._dispatcher.cancel()
+            try:
+                await self._dispatcher
+            except (asyncio.CancelledError, Exception):
+                pass
+
+    # -- dispatcher ----------------------------------------------------------
+
+    def _ensure_dispatcher(self) -> None:
+        if self._dispatcher is None or self._dispatcher.done():
+            self._dispatcher = asyncio.create_task(self._dispatch_loop())
+
+    async def _dispatch_loop(self) -> None:
+        while not self._shutdown:
+            if not self._queue:
+                self._wake.clear()
+                try:
+                    await asyncio.wait_for(self._wake.wait(), timeout=60)
+                except asyncio.TimeoutError:
+                    continue
+                continue
+            job_id = self._queue[0]
+            job = self._jobs.get(job_id)
+            if job is None or job.status != "queued":
+                self._queue.pop(0)
+                continue
+            ok, reason = self._memory_admission()
+            if not ok:
+                if job.phase != reason:
+                    job.phase = reason
+                    self._persist(job)
+                await asyncio.sleep(_ADMISSION_RECHECK_S)
+                continue
+            self._queue.pop(0)
+            try:
+                await self._run_job(job)
+            except Exception as e:  # noqa: BLE001 -- dispatcher must survive
+                logger.exception(f"[video] job {job.id} runner crashed: {e}")
+                if job.status == "in_progress":
+                    self._finish(job, "failed", ERR_WORKER_CRASHED, str(e))
+
+    # -- job execution -------------------------------------------------------
+
+    def _finish(
+        self, job: VideoJob, status: str, code: str | None = None,
+        message: str | None = None,
+    ) -> None:
+        job.status = status
+        job.completed_at = time.time()
+        if job.started_at:
+            job.wall_seconds = round(job.completed_at - job.started_at, 1)
+        if status == "failed":
+            job.error = {"code": code or ERR_WORKER_CRASHED,
+                         "message": message or ""}
+        else:
+            job.progress = 100
+            job.phase = "done"
+        self._persist(job)
+        logger.info(
+            f"[video] {job.id} -> {status}"
+            + (f" ({code}: {message})" if code else "")
+        )
+
+    async def _terminate_proc(self, proc: asyncio.subprocess.Process) -> None:
+        if proc.returncode is not None:
+            return
+        try:
+            proc.terminate()
+        except ProcessLookupError:
+            return
+        try:
+            await asyncio.wait_for(proc.wait(), timeout=_SIGTERM_GRACE_S)
+        except asyncio.TimeoutError:
+            try:
+                proc.kill()
+            except ProcessLookupError:
+                pass
+            await proc.wait()
+
+    def _worker_env(self) -> dict[str, str]:
+        """Explicit whitelist -- never inherit the full server env."""
+        env = {}
+        for key in ("PATH", "HOME", "TMPDIR", "USER", "LANG"):
+            if key in os.environ:
+                env[key] = os.environ[key]
+        # The worker loads everything from the local model dir; forbid
+        # accidental network fetches.
+        env["HF_HUB_OFFLINE"] = "1"
+        env["HF_HUB_DISABLE_TELEMETRY"] = "1"
+        return env
+
+    async def _run_job(self, job: VideoJob) -> None:
+        enf = self._enforcer
+        lease = self._lease_bytes()
+        blob_dir = self.artifacts_dir / job.id
+        blob_dir.mkdir(parents=True, exist_ok=True)
+        output_path = blob_dir / "output.mp4"
+        manifest_path = blob_dir / "manifest.json"
+        spec_path = blob_dir / "spec.json"
+
+        spec = dict(job.params)
+        spec.update(
+            model_dir=job.model_dir,
+            output_path=str(output_path),
+            manifest_path=str(manifest_path),
+            lease_bytes=lease,
+        )
+        with open(spec_path, "w") as f:
+            json.dump(spec, f, indent=1)
+
+        job.status = "in_progress"
+        job.started_at = time.time()
+        job.phase = "starting"
+        self._persist(job)
+
+        if enf is not None:
+            enf.acquire_video_lease(lease)
+        try:
+            proc = await asyncio.create_subprocess_exec(
+                str(self.worker_python()), "-I", str(self._worker_script),
+                "--spec", str(spec_path),
+                stdout=asyncio.subprocess.PIPE,
+                stderr=asyncio.subprocess.DEVNULL,
+                env=self._worker_env(),
+            )
+            self._current_proc = proc
+            self._current_job_id = job.id
+            if enf is not None:
+                enf.set_video_worker_pid(proc.pid)
+
+            kill_reason: list[tuple[str, str]] = []
+            last_line_at = time.time()
+
+            async def watchdog() -> None:
+                zero_reads = 0
+                while proc.returncode is None:
+                    await asyncio.sleep(_WATCHDOG_INTERVAL_S)
+                    if proc.returncode is not None:
+                        return
+                    now = time.time()
+                    # Per-run timeout (clock starts at spawn, spec 4.2)
+                    if (now - (job.started_at or now)
+                            > int(self._settings.job_timeout_seconds)):
+                        kill_reason.append((
+                            ERR_JOB_TIMEOUT,
+                            f"exceeded job_timeout_seconds="
+                            f"{self._settings.job_timeout_seconds}",
+                        ))
+                        await self._terminate_proc(proc)
+                        return
+                    # Stall detection: no JSONL line for too long
+                    if (now - last_line_at
+                            > int(self._settings.progress_stall_timeout_seconds)):
+                        kill_reason.append((
+                            ERR_WORKER_STALLED,
+                            "no progress output for "
+                            f"{int(now - last_line_at)}s",
+                        ))
+                        await self._terminate_proc(proc)
+                        return
+                    # Footprint vs lease (secondary cleanup; layer 1 is the
+                    # worker's own Metal wired limit)
+                    footprint = get_phys_footprint(proc.pid)
+                    if footprint <= 0:
+                        zero_reads += 1
+                        if zero_reads >= 3:
+                            kill_reason.append((
+                                ERR_MONITOR_FAILED,
+                                "cannot read worker memory footprint",
+                            ))
+                            await self._terminate_proc(proc)
+                            return
+                        continue
+                    zero_reads = 0
+                    if footprint > lease:
+                        kill_reason.append((
+                            ERR_LEASE_EXCEEDED,
+                            f"worker footprint {footprint / GB:.1f}GB "
+                            f"exceeded lease {lease / GB:.1f}GB",
+                        ))
+                        try:
+                            proc.kill()  # immediate, no grace
+                        except ProcessLookupError:
+                            pass
+                        return
+
+            wd_task = asyncio.create_task(watchdog())
+            try:
+                assert proc.stdout is not None
+                async for raw in proc.stdout:
+                    last_line_at = time.time()
+                    try:
+                        ev = json.loads(raw.decode().strip())
+                    except (ValueError, UnicodeDecodeError):
+                        continue
+                    self._apply_progress(job, ev)
+                await proc.wait()
+            finally:
+                wd_task.cancel()
+                try:
+                    await wd_task
+                except (asyncio.CancelledError, Exception):
+                    pass
+
+            self._conclude(job, proc.returncode or 0, kill_reason,
+                           output_path, manifest_path)
+        finally:
+            self._current_proc = None
+            self._current_job_id = None
+            if enf is not None:
+                enf.set_video_worker_pid(None)
+                enf.release_video_lease()
+            self._retention_sweep()
+
+    def _apply_progress(self, job: VideoJob, ev: dict[str, Any]) -> None:
+        phase = str(ev.get("phase", "") or "")
+        if phase:
+            job.phase = phase
+        total = int(ev.get("total_steps") or 0)
+        step = int(ev.get("step") or 0)
+        if phase == "loading":
+            job.progress = max(job.progress, 2)
+        elif phase == "loaded":
+            job.progress = max(job.progress, 5)
+        elif total > 0 and step > 0:
+            job.progress = max(job.progress, 5 + int(90 * step / total))
+        elif phase == "saving":
+            job.progress = max(job.progress, 97)
+        # Persist sparsely: every phase change and ~every 5 progress points
+        if phase in ("loading", "loaded", "saving", "done", "failed") or (
+            job.progress % 5 == 0
+        ):
+            self._persist(job)
+
+    def _conclude(
+        self, job: VideoJob, returncode: int,
+        kill_reason: list[tuple[str, str]],
+        output_path: Path, manifest_path: Path,
+    ) -> None:
+        # Job may have been deleted mid-run (DELETE endpoint kills the proc)
+        if job.id not in self._jobs:
+            return
+        if kill_reason:
+            code, message = kill_reason[0]
+            self._finish(job, "failed", code, message)
+            return
+        manifest: dict[str, Any] = {}
+        try:
+            with open(manifest_path) as f:
+                manifest = json.load(f)
+        except (OSError, ValueError):
+            pass
+        if returncode != 0:
+            self._finish(
+                job, "failed",
+                str(manifest.get("code", ERR_WORKER_CRASHED)),
+                str(manifest.get("message",
+                                 f"worker exited with code {returncode}")),
+            )
+            return
+        if not output_path.exists() or output_path.stat().st_size == 0:
+            self._finish(job, "failed", ERR_OUTPUT_INVALID,
+                         "worker exited 0 but produced no output file")
+            return
+        job.artifact_path = str(output_path)
+        if isinstance(manifest.get("lifetime_max_phys_gb"), (int, float)):
+            job.peak_memory_gb = float(manifest["lifetime_max_phys_gb"])
+        self._finish(job, "completed")
+
+    # -- retention -----------------------------------------------------------
+
+    def _retention_sweep(self) -> None:
+        """LRU-purge artifact blobs beyond count/bytes caps. Records stay;
+        expires_at marks the purge time (spec 4.2)."""
+        max_count = int(self._settings.artifacts_max_count)
+        max_bytes = int(float(self._settings.artifacts_max_gb) * GB)
+        holders = [
+            j for j in self._jobs.values()
+            if j.artifact_path and j.expires_at is None
+            and Path(j.artifact_path).exists()
+        ]
+        holders.sort(key=lambda j: j.completed_at or j.created_at)  # oldest first
+        total = sum(
+            Path(j.artifact_path).stat().st_size for j in holders  # type: ignore[arg-type]
+        )
+        while holders and (len(holders) > max_count or total > max_bytes):
+            victim = holders.pop(0)
+            blob_dir = self.artifacts_dir / victim.id
+            try:
+                size = Path(victim.artifact_path).stat().st_size  # type: ignore[arg-type]
+            except OSError:
+                size = 0
+            shutil.rmtree(blob_dir, ignore_errors=True)
+            victim.artifact_path = None
+            victim.expires_at = time.time()
+            self._persist(victim)
+            total -= size
+            logger.info(f"[video] purged artifact of {victim.id} (retention)")
+
+
+class QueueFullError(Exception):
+    """Submission rejected: queue depth cap reached (HTTP 503)."""
diff --git a/omlx/video/requirements.in b/omlx/video/requirements.in
new file mode 100644
index 000000000..338b0f41b
--- /dev/null
+++ b/omlx/video/requirements.in
@@ -0,0 +1 @@
+mlx-gen==0.18.14
diff --git a/omlx/video/requirements.lock b/omlx/video/requirements.lock
new file mode 100644
index 000000000..5c8dcb21f
--- /dev/null
+++ b/omlx/video/requirements.lock
@@ -0,0 +1,1437 @@
+# This file was autogenerated by uv via the following command:
+#    uv pip compile --python /Users/yuanwei/.fmlx/venvs/video/bin/python --generate-hashes -o /tmp/video-req.lock /tmp/video-req.in
+annotated-doc==0.0.4 \
+    --hash=sha256:571ac1dc6991c450b25a9c2d84a3705e2ae7a53467b5d111c24fa8baabbed320 \
+    --hash=sha256:fbcda96e87e9c92ad167c2e53839e57503ecfda18804ea28102353485033faa4
+    # via typer
+anyio==4.13.0 \
+    --hash=sha256:08b310f9e24a9594186fd75b4f73f4a4152069e3853f1ed8bfbf58369f4ad708 \
+    --hash=sha256:334b70e641fd2221c1505b3890c69882fe4a2df910cba14d97019b90b24439dc
+    # via httpx
+av==17.1.0 \
+    --hash=sha256:1284addf3c0dd939887a9722dc30df2241a97471ad52c3c507e31583ae22ff02 \
+    --hash=sha256:1370b11a697eb3f2555906f8ab3519b0cfe48425d7830a3996ad42e6bffafda5 \
+    --hash=sha256:19264c9bb4bee404accc7ce9ec461f2044b7f577a70234d29aafde31ed17de46 \
+    --hash=sha256:19c84fd72af5ef81a20f18fbc6f9aedff9e1455e53a7062c1d4c95926d73da4e \
+    --hash=sha256:22dff0ae582d10ef08c75c2150a4fd27cfc26653b54930c7c27b9f7b3aa20723 \
+    --hash=sha256:3453b06075c7bb973fdb6de52563f7692ff05cbc64c0bb45f4fd6e8709131f2f \
+    --hash=sha256:3dcd41e53f53f9a3260751d9c3c11d34e93d70d61e506c81f13dbc1e3606e07b \
+    --hash=sha256:43ebbe977f19a7f2d2bd1a4e119675a0b15e05852cf7309846b6ab922ba7ffe9 \
+    --hash=sha256:5327807c1219293803ef0c5d1578ff3ae1cf638c09e5998962026e1a554ec240 \
+    --hash=sha256:58f7593726437cda5bd19793027e027768450b5c4a594777bf487798a33db702 \
+    --hash=sha256:5df5c1172ef1cf65a1529d612f7da7798ce2cf82c1ff7212466b538a6cc7214c \
+    --hash=sha256:6a20658ec7d96a70e14b1196eff00b7cdd8831ac3b99868e16b8ba8b24090847 \
+    --hash=sha256:6c9b71fe5c0c5a8d303b1588d4d8ce9397d6b023f467cfef95000ba1f75507fa \
+    --hash=sha256:7f1e71ff621b66253333926f948e00faae11d855b2442133c65128bca64cdeb3 \
+    --hash=sha256:90c49bc9608377d01e82e747377505419a229464873341db18202d5dddecce5a \
+    --hash=sha256:9514cfda85180554c430695282faf4be3ffdf95775d8519733821244eecb58e0 \
+    --hash=sha256:ad7b4aa011093324b7118245f50ac6db244cfe9900d4072508a5245a2b0d3f41 \
+    --hash=sha256:b41647e42884bf543b8e8d0a1dabd4d1b006c99183eb1a2d7afc5b01f73eeff4 \
+    --hash=sha256:bbab058bd965309f39962e53caac8126987c68c0be094fc4f9427e5615b0218f \
+    --hash=sha256:bff8896454b38fcb785a70e5ae0485d7021cb776303a5849393128a30b8f850b \
+    --hash=sha256:cc5a5247622cb77e24c342364eb68f88c1442ddfaab60c1f1f483359d3cc7879 \
+    --hash=sha256:e1c90f85cd7431ede95b11e8e711571a896ebea433f298849c2c0f1594c8d86e \
+    --hash=sha256:ec630be6321b04e317862f6082e84812bbd801e55a3c2298312e3fc8a0a4af4f \
+    --hash=sha256:ee98534242a74da847af78624779ac5a3177dc7c69f956a4da9e6f0fdb37d7f6 \
+    --hash=sha256:efe9b1397300b67b644ad220c89df4892a76f2debe70f16bae1749fa20526e63 \
+    --hash=sha256:f997e3351bdf51127c07a74e21741a2996e9230cbeb2d81c14acde761b116c9c \
+    --hash=sha256:f9a65d1f48b818323fb411e80358f89d77dec340b01d27c6b2dfbb9cbf4b779f \
+    --hash=sha256:fa64e1f1500d01c4a98e7a41dc1a9a35fb4dfe71f5de0389264ec1192200c76a \
+    --hash=sha256:ff457ed419348e5b8e8c811d341389b052c5e4d5839da3794d019b125b9fe830 \
+    --hash=sha256:ffbd78d73d2c9bf31e9a007c992faec3991428b2941a3b085b84fb82e8c32d19
+    # via mlx-gen
+certifi==2026.5.20 \
+    --hash=sha256:3c52e209ba0a4ad7aebe60436a4ab349c39e1e602e8c134221e546902ad25897 \
+    --hash=sha256:69dea482ab64caa7b9f6aba1c6bf48bb6a5448d1c0f1b17ab42ad8c763a5344d
+    # via
+    #   httpcore
+    #   httpx
+    #   requests
+charset-normalizer==3.4.7 \
+    --hash=sha256:007d05ec7321d12a40227aae9e2bc6dca73f3cb21058999a1df9e193555a9dcc \
+    --hash=sha256:03853ed82eeebbce3c2abfdbc98c96dc205f32a79627688ac9a27370ea61a49c \
+    --hash=sha256:07d9e39b01743c3717745f4c530a6349eadbfa043c7577eef86c502c15df2c67 \
+    --hash=sha256:08e721811161356f97b4059a9ba7bafb23ea5ee2255402c42881c214e173c6b4 \
+    --hash=sha256:0c96c3b819b5c3e9e165495db84d41914d6894d55181d2d108cc1a69bfc9cce0 \
+    --hash=sha256:0ea948db76d31190bf08bd371623927ee1339d5f2a0b4b1b4a4439a65298703c \
+    --hash=sha256:0f7eb884681e3938906ed0434f20c63046eacd0111c4ba96f27b76084cd679f5 \
+    --hash=sha256:12a6fff75f6bc66711b73a2f0addfc4c8c15a20e805146a02d147a318962c444 \
+    --hash=sha256:12d8baf840cc7889b37c7c770f478adea7adce3dcb3944d02ec87508e2dcf153 \
+    --hash=sha256:14265bfe1f09498b9d8ec91e9ec9fa52775edf90fcbde092b25f4a33d444fea9 \
+    --hash=sha256:16d971e29578a5e97d7117866d15889a4a07befe0e87e703ed63cd90cb348c01 \
+    --hash=sha256:177a0ba5f0211d488e295aaf82707237e331c24788d8d76c96c5a41594723217 \
+    --hash=sha256:1a87ca9d5df6fe460483d9a5bbf2b18f620cbed41b432e2bddb686228282d10b \
+    --hash=sha256:1c2a768fdd44ee4a9339a9b0b130049139b8ce3c01d2ce09f67f5a68048d477c \
+    --hash=sha256:1c2aed2e5e41f24ea8ef1590b8e848a79b56f3a5564a65ceec43c9d692dc7d8a \
+    --hash=sha256:1dc8b0ea451d6e69735094606991f32867807881400f808a106ee1d963c46a83 \
+    --hash=sha256:1efde3cae86c8c273f1eb3b287be7d8499420cf2fe7585c41d370d3e790054a5 \
+    --hash=sha256:202389074300232baeb53ae2569a60901f7efadd4245cf3a3bf0617d60b439d7 \
+    --hash=sha256:203104ed3e428044fd943bc4bf45fa73c0730391f9621e37fe39ecf477b128cb \
+    --hash=sha256:2257141f39fe65a3fdf38aeccae4b953e5f3b3324f4ff0daf9f15b8518666a2c \
+    --hash=sha256:298930cec56029e05497a76988377cbd7457ba864beeea92ad7e844fe74cd1f1 \
+    --hash=sha256:2cd4a60d0e2fb04537162c62bbbb4182f53541fe0ede35cdf270a1c1e723cc42 \
+    --hash=sha256:2d6eb928e13016cea4f1f21d1e10c1cebd5a421bc57ddf5b1142ae3f86824fab \
+    --hash=sha256:2fe249cb4651fd12605b7288b24751d8bfd46d35f12a20b1ba33dea122e690df \
+    --hash=sha256:30b8d1d8c52a48c2c5690e152c169b673487a2a58de1ec7393196753063fcd5e \
+    --hash=sha256:320ade88cfb846b8cd6b4ddf5ee9e80ee0c1f52401f2456b84ae1ae6a1a5f207 \
+    --hash=sha256:3534e7dcbdcf757da6b85a0bbf5b6868786d5982dd959b065e65481644817a18 \
+    --hash=sha256:36836d6ff945a00b88ba1e4572d721e60b5b8c98c155d465f56ad19d68f23734 \
+    --hash=sha256:38c0109396c4cfc574d502df99742a45c72c08eff0a36158b6f04000043dbf38 \
+    --hash=sha256:3946fa46a0cf3e4c8cb1cc52f56bb536310d34f25f01ca9b6c16afa767dab110 \
+    --hash=sha256:3bec022aec2c514d9cf199522a802bd007cd588ab17ab2525f20f9c34d067c18 \
+    --hash=sha256:3c9a494bc5ec77d43cea229c4f6db1e4d8fe7e1bbffa8b6f0f0032430ff8ab44 \
+    --hash=sha256:3dce51d0f5e7951f8bb4900c257dad282f49190fdbebecd4ba99bcc41fef404d \
+    --hash=sha256:3dedcc22d73ec993f42055eff4fcfed9318d1eeb9a6606c55892a26964964e48 \
+    --hash=sha256:4042d5c8f957e15221d423ba781e85d553722fc4113f523f2feb7b188cc34c5e \
+    --hash=sha256:481551899c856c704d58119b5025793fa6730adda3571971af568f66d2424bb5 \
+    --hash=sha256:4dc1e73c36828f982bfe79fadf5919923f8a6f4df2860804db9a98c48824ce8d \
+    --hash=sha256:4e5163c14bffd570ef2affbfdd77bba66383890797df43dc8b4cc7d6f500bf53 \
+    --hash=sha256:511ef87c8aec0783e08ac18565a16d435372bc1ac25a91e6ac7f5ef2b0bff790 \
+    --hash=sha256:532bc9bf33a68613fd7d65e4b1c71a6a38d7d42604ecf239c77392e9b4e8998c \
+    --hash=sha256:54523e136b8948060c0fa0bc7b1b50c32c186f2fceee897a495406bb6e311d2b \
+    --hash=sha256:5649fd1c7bade02f320a462fdefd0b4bd3ce036065836d4f42e0de958038e116 \
+    --hash=sha256:56be790f86bfb2c98fb742ce566dfb4816e5a83384616ab59c49e0604d49c51d \
+    --hash=sha256:5b77459df20e08151cd6f8b9ef8ef1f961ef73d85c21a555c7eed5b79410ec10 \
+    --hash=sha256:5ed6ab538499c8644b8a3e18debabcd7ce684f3fa91cf867521a7a0279cab2d6 \
+    --hash=sha256:6178f72c5508bfc5fd446a5905e698c6212932f25bcdd4b47a757a50605a90e2 \
+    --hash=sha256:6370e8686f662e6a3941ee48ed4742317cafbe5707e36406e9df792cdb535776 \
+    --hash=sha256:64f02c6841d7d83f832cd97ccf8eb8a906d06eb95d5276069175c696b024b60a \
+    --hash=sha256:65bcd23054beab4d166035cabbc868a09c1a49d1efe458fe8e4361215df40265 \
+    --hash=sha256:66671f93accb62ed07da56613636f3641f1a12c13046ce91ffc923721f23c008 \
+    --hash=sha256:6696b7688f54f5af4462118f0bfa7c1621eeb87154f77fa04b9295ce7a8f2943 \
+    --hash=sha256:6785f414ae0f3c733c437e0f3929197934f526d19dfaa75e18fdb4f94c6fb374 \
+    --hash=sha256:67f6279d125ca0046a7fd386d01b311c6363844deac3e5b069b514ba3e63c246 \
+    --hash=sha256:6c114670c45346afedc0d947faf3c7f701051d2518b943679c8ff88befe14f8e \
+    --hash=sha256:6e0d51f618228538a3e8f46bd246f87a6cd030565e015803691603f55e12afb5 \
+    --hash=sha256:6ed74185b2db44f41ef35fd1617c5888e59792da9bbc9190d6c7300617182616 \
+    --hash=sha256:708838739abf24b2ceb208d0e22403dd018faeef86ddac04319a62ae884c4f15 \
+    --hash=sha256:715479b9a2802ecac752a3b0efa2b0b60285cf962ee38414211abdfccc233b41 \
+    --hash=sha256:733784b6d6def852c814bce5f318d25da2ee65dd4839a0718641c696e09a2960 \
+    --hash=sha256:750e02e074872a3fad7f233b47734166440af3cdea0add3e95163110816d6752 \
+    --hash=sha256:752a45dc4a6934060b3b0dab47e04edc3326575f82be64bc4fc293914566503e \
+    --hash=sha256:7579e913a5339fb8fa133f6bbcfd8e6749696206cf05acdbdca71a1b436d8e72 \
+    --hash=sha256:7641bb8895e77f921102f72833904dcd9901df5d6d72a2ab8f31d04b7e51e4e7 \
+    --hash=sha256:7804338df6fcc08105c7745f1502ba68d900f45fd770d5bdd5288ddccb8a42d8 \
+    --hash=sha256:80d04837f55fc81da168b98de4f4b797ef007fc8a79ab71c6ec9bc4dd662b15b \
+    --hash=sha256:813c0e0132266c08eb87469a642cb30aaff57c5f426255419572aaeceeaa7bf4 \
+    --hash=sha256:82b271f5137d07749f7bf32f70b17ab6eaabedd297e75dce75081a24f76eb545 \
+    --hash=sha256:84c018e49c3bf790f9c2771c45e9313a08c2c2a6342b162cd650258b57817706 \
+    --hash=sha256:8751d2787c9131302398b11e6c8068053dcb55d5a8964e114b6e196cf16cb366 \
+    --hash=sha256:8778f0c7a52e56f75d12dae53ae320fae900a8b9b4164b981b9c5ce059cd1fcb \
+    --hash=sha256:87fad7d9ba98c86bcb41b2dc8dbb326619be2562af1f8ff50776a39e55721c5a \
+    --hash=sha256:8d828b6667a32a728a1ad1d93957cdf37489c57b97ae6c4de2860fa749b8fc1e \
+    --hash=sha256:8e385e4267ab76874ae30db04c627faaaf0b509e1ccc11a95b3fc3e83f855c00 \
+    --hash=sha256:92a0a01ead5e668468e952e4238cccd7c537364eb7d851ab144ab6627dbbe12f \
+    --hash=sha256:94e1885b270625a9a828c9793b4d52a64445299baa1fea5a173bf1d3dd9a1a5a \
+    --hash=sha256:a180c5e59792af262bf263b21a3c49353f25945d8d9f70628e73de370d55e1e1 \
+    --hash=sha256:a277ab8928b9f299723bc1a2dabb1265911b1a76341f90a510368ca44ad9ab66 \
+    --hash=sha256:a5fe03b42827c13cdccd08e6c0247b6a6d4b5e3cdc53fd1749f5896adcdc2356 \
+    --hash=sha256:a6c5863edfbe888d9eff9c8b8087354e27618d9da76425c119293f11712a6319 \
+    --hash=sha256:a89c23ef8d2c6b27fd200a42aa4ac72786e7c60d40efdc76e6011260b6e949c4 \
+    --hash=sha256:adb2597b428735679446b46c8badf467b4ca5f5056aae4d51a19f9570301b1ad \
+    --hash=sha256:ae196f021b5e7c78e918242d217db021ed2a6ace2bc6ae94c0fc596221c7f58d \
+    --hash=sha256:ae89db9e5f98a11a4bf50407d4363e7b09b31e55bc117b4f7d80aab97ba009e5 \
+    --hash=sha256:aed52fea0513bac0ccde438c188c8a471c4e0f457c2dd20cdbf6ea7a450046c7 \
+    --hash=sha256:aef65cd602a6d0e0ff6f9930fcb1c8fec60dd2cfcb6facaf4bdb0e5873042db0 \
+    --hash=sha256:af21eb4409a119e365397b2adbaca4c9ccab56543a65d5dbd9f920d6ac29f686 \
+    --hash=sha256:b14b2d9dac08e28bb8046a1a0434b1750eb221c8f5b87a68f4fa11a6f97b5e34 \
+    --hash=sha256:bb6d88045545b26da47aa879dd4a89a71d1dce0f0e549b1abcb31dfe4a8eac49 \
+    --hash=sha256:bb8cc7534f51d9a017b93e3e85b260924f909601c3df002bcdb58ddb4dc41a5c \
+    --hash=sha256:bc17a677b21b3502a21f66a8cc64f5bfad4df8a0b8434d661666f8ce90ac3af1 \
+    --hash=sha256:bd6c2a1c7573c64738d716488d2cdd3c00e340e4835707d8fdb8dc1a66ef164e \
+    --hash=sha256:bd9b23791fe793e4968dba0c447e12f78e425c59fc0e3b97f6450f4781f3ee60 \
+    --hash=sha256:c03a41a8784091e67a39648f70c5f97b5b6a37f216896d44d2cdcb82615339a0 \
+    --hash=sha256:c0f081d69a6e58272819b70288d3221a6ee64b98df852631c80f293514d3b274 \
+    --hash=sha256:c35abb8bfff0185efac5878da64c45dafd2b37fb0383add1be155a763c1f083d \
+    --hash=sha256:c36c333c39be2dbca264d7803333c896ab8fa7d4d6f0ab7edb7dfd7aea6e98c0 \
+    --hash=sha256:c45e9440fb78f8ddabcf714b68f936737a121355bf59f3907f4e17721b9d1aae \
+    --hash=sha256:c593052c465475e64bbfe5dbd81680f64a67fdc752c56d7a0ae205dc8aeefe0f \
+    --hash=sha256:cdd68a1fb318e290a2077696b7eb7a21a49163c455979c639bf5a5dcdc46617d \
+    --hash=sha256:ce3412fbe1e31eb81ea42f4169ed94861c56e643189e1e75f0041f3fe7020abe \
+    --hash=sha256:cf1493cd8607bec4d8a7b9b004e699fcf8f9103a9284cc94962cb73d20f9d4a3 \
+    --hash=sha256:cf29836da5119f3c8a8a70667b0ef5fdca3bb12f80fd06487cfa575b3909b393 \
+    --hash=sha256:d4a48e5b3c2a489fae013b7589308a40146ee081f6f509e047e0e096084ceca1 \
+    --hash=sha256:d560742f3c0d62afaccf9f41fe485ed69bd7661a241f86a3ef0f0fb8b1a397af \
+    --hash=sha256:d6038d37043bced98a66e68d3aa2b6a35505dc01328cd65217cefe82f25def44 \
+    --hash=sha256:d61f00a0869d77422d9b2aba989e2d24afa6ffd552af442e0e58de4f35ea6d00 \
+    --hash=sha256:d635aab80466bc95771bb78d5370e74d36d1fe31467b6b29b8b57b2a3cd7d22c \
+    --hash=sha256:dca4bbc466a95ba9c0234ef56d7dd9509f63da22274589ebd4ed7f1f4d4c54e3 \
+    --hash=sha256:dd915403e231e6b1809fe9b6d9fc55cf8fb5e02765ac625d9cd623342a7905d7 \
+    --hash=sha256:e044c39e41b92c845bc815e5ae4230804e8e7bc29e399b0437d64222d92809dd \
+    --hash=sha256:e060d01aec0a910bdccb8be71faf34e7799ce36950f8294c8bf612cba65a2c9e \
+    --hash=sha256:e1421b502d83040e6d7fb2fb18dff63957f720da3d77b2fbd3187ceb63755d7b \
+    --hash=sha256:e17b8d5d6a8c47c85e68ca8379def1303fd360c3e22093a807cd34a71cd082b8 \
+    --hash=sha256:e5f4d355f0a2b1a31bc3edec6795b46324349c9cb25eed068049e4f472fb4259 \
+    --hash=sha256:e712b419df8ba5e42b226c510472b37bd57b38e897d3eca5e8cfd410a29fa859 \
+    --hash=sha256:e74327fb75de8986940def6e8dee4f127cc9752bee7355bb323cc5b2659b6d46 \
+    --hash=sha256:e80c8378d8f3d83cd3164da1ad2df9e37a666cdde7b1cb2298ed0b558064be30 \
+    --hash=sha256:e8ac484bf18ce6975760921bb6148041faa8fef0547200386ea0b52b5d27bf7b \
+    --hash=sha256:eca9705049ad3c7345d574e3510665cb2cf844c2f2dcfe675332677f081cbd46 \
+    --hash=sha256:ed065083d0898c9d5b4bbec7b026fd755ff7454e6e8b73a67f8c744b13986e24 \
+    --hash=sha256:edac0f1ab77644605be2cbba52e6b7f630731fc42b34cb0f634be1a6eface56a \
+    --hash=sha256:effc3f449787117233702311a1b7d8f59cba9ced946ba727bdc329ec69028e24 \
+    --hash=sha256:f22dec1690b584cea26fade98b2435c132c1b5f68e39f5a0b7627cd7ae31f1dc \
+    --hash=sha256:f495a1652cf3fbab2eb0639776dad966c2fb874d79d87ca07f9d5f059b8bd215 \
+    --hash=sha256:f496c9c3cc02230093d8330875c4c3cdfc3b73612a5fd921c65d39cbcef08063 \
+    --hash=sha256:f59099f9b66f0d7145115e6f80dd8b1d847176df89b234a5a6b3f00437aa0832 \
+    --hash=sha256:f59ad4c0e8f6bba240a9bb85504faa1ab438237199d4cce5f622761507b8f6a6 \
+    --hash=sha256:fbccdc05410c9ee21bbf16a35f4c1d16123dcdeb8a1d38f33654fa21d0234f79 \
+    --hash=sha256:fea24543955a6a729c45a73fe90e08c743f0b3334bbf3201e6c4bc1b0c7fa464
+    # via requests
+click==8.4.1 \
+    --hash=sha256:482be17c6991b8c19c5429a1e995d9b0efdbb63172824c41f99965dc0ade8ec2 \
+    --hash=sha256:918b5633eddf6b41c32d4f454bf0de810065c74e3f7dbf8ee5452f8be88d3e96
+    # via
+    #   huggingface-hub
+    #   typer
+contourpy==1.3.3 \
+    --hash=sha256:023b44101dfe49d7d53932be418477dba359649246075c996866106da069af69 \
+    --hash=sha256:07ce5ed73ecdc4a03ffe3e1b3e3c1166db35ae7584be76f65dbbe28a7791b0cc \
+    --hash=sha256:083e12155b210502d0bca491432bb04d56dc3432f95a979b429f2848c3dbe880 \
+    --hash=sha256:0bf67e0e3f482cb69779dd3061b534eb35ac9b17f163d851e2a547d56dba0a3a \
+    --hash=sha256:0c1fc238306b35f246d61a1d416a627348b5cf0648648a031e14bb8705fcdfe8 \
+    --hash=sha256:13b68d6a62db8eafaebb8039218921399baf6e47bf85006fd8529f2a08ef33fc \
+    --hash=sha256:15ff10bfada4bf92ec8b31c62bf7c1834c244019b4a33095a68000d7075df470 \
+    --hash=sha256:177fb367556747a686509d6fef71d221a4b198a3905fe824430e5ea0fda54eb5 \
+    --hash=sha256:1cadd8b8969f060ba45ed7c1b714fe69185812ab43bd6b86a9123fe8f99c3263 \
+    --hash=sha256:1fd43c3be4c8e5fd6e4f2baeae35ae18176cf2e5cced681cca908addf1cdd53b \
+    --hash=sha256:22e9b1bd7a9b1d652cd77388465dc358dafcd2e217d35552424aa4f996f524f5 \
+    --hash=sha256:23416f38bfd74d5d28ab8429cc4d63fa67d5068bd711a85edb1c3fb0c3e2f381 \
+    --hash=sha256:283edd842a01e3dcd435b1c5116798d661378d83d36d337b8dde1d16a5fc9ba3 \
+    --hash=sha256:2a2a8b627d5cc6b7c41a4beff6c5ad5eb848c88255fda4a8745f7e901b32d8e4 \
+    --hash=sha256:2b7e9480ffe2b0cd2e787e4df64270e3a0440d9db8dc823312e2c940c167df7e \
+    --hash=sha256:322ab1c99b008dad206d406bb61d014cf0174df491ae9d9d0fac6a6fda4f977f \
+    --hash=sha256:33c82d0138c0a062380332c861387650c82e4cf1747aaa6938b9b6516762e772 \
+    --hash=sha256:348ac1f5d4f1d66d3322420f01d42e43122f43616e0f194fc1c9f5d830c5b286 \
+    --hash=sha256:3519428f6be58431c56581f1694ba8e50626f2dd550af225f82fb5f5814d2a42 \
+    --hash=sha256:3c30273eb2a55024ff31ba7d052dde990d7d8e5450f4bbb6e913558b3d6c2301 \
+    --hash=sha256:3d1a3799d62d45c18bafd41c5fa05120b96a28079f2393af559b843d1a966a77 \
+    --hash=sha256:451e71b5a7d597379ef572de31eeb909a87246974d960049a9848c3bc6c41bf7 \
+    --hash=sha256:459c1f020cd59fcfe6650180678a9993932d80d44ccde1fa1868977438f0b411 \
+    --hash=sha256:4d00e655fcef08aba35ec9610536bfe90267d7ab5ba944f7032549c55a146da1 \
+    --hash=sha256:4debd64f124ca62069f313a9cb86656ff087786016d76927ae2cf37846b006c9 \
+    --hash=sha256:4feffb6537d64b84877da813a5c30f1422ea5739566abf0bd18065ac040e120a \
+    --hash=sha256:50ed930df7289ff2a8d7afeb9603f8289e5704755c7e5c3bbd929c90c817164b \
+    --hash=sha256:51e79c1f7470158e838808d4a996fa9bac72c498e93d8ebe5119bc1e6becb0db \
+    --hash=sha256:556dba8fb6f5d8742f2923fe9457dbdd51e1049c4a43fd3986a0b14a1d815fc6 \
+    --hash=sha256:598c3aaece21c503615fd59c92a3598b428b2f01bfb4b8ca9c4edeecc2438620 \
+    --hash=sha256:5ed3657edf08512fc3fe81b510e35c2012fbd3081d2e26160f27ca28affec989 \
+    --hash=sha256:626d60935cf668e70a5ce6ff184fd713e9683fb458898e4249b63be9e28286ea \
+    --hash=sha256:644a6853d15b2512d67881586bd03f462c7ab755db95f16f14d7e238f2852c67 \
+    --hash=sha256:655456777ff65c2c548b7c454af9c6f33f16c8884f11083244b5819cc214f1b5 \
+    --hash=sha256:66c8a43a4f7b8df8b71ee1840e4211a3c8d93b214b213f590e18a1beca458f7d \
+    --hash=sha256:6afc576f7b33cf00996e5c1102dc2a8f7cc89e39c0b55df93a0b78c1bd992b36 \
+    --hash=sha256:6c3d53c796f8647d6deb1abe867daeb66dcc8a97e8455efa729516b997b8ed99 \
+    --hash=sha256:709a48ef9a690e1343202916450bc48b9e51c049b089c7f79a267b46cffcdaa1 \
+    --hash=sha256:70f9aad7de812d6541d29d2bbf8feb22ff7e1c299523db288004e3157ff4674e \
+    --hash=sha256:8153b8bfc11e1e4d75bcb0bff1db232f9e10b274e0929de9d608027e0d34ff8b \
+    --hash=sha256:87acf5963fc2b34825e5b6b048f40e3635dd547f590b04d2ab317c2619ef7ae8 \
+    --hash=sha256:88df9880d507169449d434c293467418b9f6cbe82edd19284aa0409e7fdb933d \
+    --hash=sha256:929ddf8c4c7f348e4c0a5a3a714b5c8542ffaa8c22954862a46ca1813b667ee7 \
+    --hash=sha256:92d9abc807cf7d0e047b95ca5d957cf4792fcd04e920ca70d48add15c1a90ea7 \
+    --hash=sha256:95b181891b4c71de4bb404c6621e7e2390745f887f2a026b2d99e92c17892339 \
+    --hash=sha256:9e999574eddae35f1312c2b4b717b7885d4edd6cb46700e04f7f02db454e67c1 \
+    --hash=sha256:a15459b0f4615b00bbd1e91f1b9e19b7e63aea7483d03d804186f278c0af2659 \
+    --hash=sha256:a22738912262aa3e254e4f3cb079a95a67132fc5a063890e224393596902f5a4 \
+    --hash=sha256:ab2fd90904c503739a75b7c8c5c01160130ba67944a7b77bbf36ef8054576e7f \
+    --hash=sha256:ab3074b48c4e2cf1a960e6bbeb7f04566bf36b1861d5c9d4d8ac04b82e38ba20 \
+    --hash=sha256:afe5a512f31ee6bd7d0dda52ec9864c984ca3d66664444f2d72e0dc4eb832e36 \
+    --hash=sha256:b08a32ea2f8e42cf1d4be3169a98dd4be32bafe4f22b6c4cb4ba810fa9e5d2cb \
+    --hash=sha256:b20c7c9a3bf701366556e1b1984ed2d0cedf999903c51311417cf5f591d8c78d \
+    --hash=sha256:b2e8faa0ed68cb29af51edd8e24798bb661eac3bd9f65420c1887b6ca89987c8 \
+    --hash=sha256:b7301b89040075c30e5768810bc96a8e8d78085b47d8be6e4c3f5a0b4ed478a0 \
+    --hash=sha256:b7448cb5a725bb1e35ce88771b86fba35ef418952474492cf7c764059933ff8b \
+    --hash=sha256:ca0fdcd73925568ca027e0b17ab07aad764be4706d0a925b89227e447d9737b7 \
+    --hash=sha256:ca658cd1a680a5c9ea96dc61cdbae1e85c8f25849843aa799dfd3cb370ad4fbe \
+    --hash=sha256:cbedb772ed74ff5be440fa8eee9bd49f64f6e3fc09436d9c7d8f1c287b121d77 \
+    --hash=sha256:cd5dfcaeb10f7b7f9dc8941717c6c2ade08f587be2226222c12b25f0483ed497 \
+    --hash=sha256:cf9022ef053f2694e31d630feaacb21ea24224be1c3ad0520b13d844274614fd \
+    --hash=sha256:d002b6f00d73d69333dac9d0b8d5e84d9724ff9ef044fd63c5986e62b7c9e1b1 \
+    --hash=sha256:d06bb1f751ba5d417047db62bca3c8fde202b8c11fb50742ab3ab962c81e8216 \
+    --hash=sha256:d304906ecc71672e9c89e87c4675dc5c2645e1f4269a5063b99b0bb29f232d13 \
+    --hash=sha256:e4e6b05a45525357e382909a4c1600444e2a45b4795163d3b22669285591c1ae \
+    --hash=sha256:e74a9a0f5e3fff48fb5a7f2fd2b9b70a3fe014a67522f79b7cca4c0c7e43c9ae \
+    --hash=sha256:ea37e7b45949df430fe649e5de8351c423430046a2af20b1c1961cae3afcda77 \
+    --hash=sha256:f64836de09927cba6f79dcd00fdd7d5329f3fccc633468507079c829ca4db4e3 \
+    --hash=sha256:fd6ec6be509c787f1caf6b247f0b1ca598bef13f4ddeaa126b7658215529ba0f \
+    --hash=sha256:fd907ae12cd483cd83e414b12941c632a969171bf90fc937d0c9f268a31cafff \
+    --hash=sha256:fd914713266421b7536de2bfa8181aa8c699432b6763a0ea64195ebe28bff6a9 \
+    --hash=sha256:fde6c716d51c04b1c25d0b90364d0be954624a0ee9d60e23e850e8d48353d07a
+    # via matplotlib
+cycler==0.12.1 \
+    --hash=sha256:85cef7cff222d8644161529808465972e51340599459b8ac3ccbac5a854e0d30 \
+    --hash=sha256:88bb128f02ba341da8ef447245a9e138fae777f6a23943da4540077d3601eb1c
+    # via matplotlib
+docutils==0.23 \
+    --hash=sha256:25d013af9bf23bc1c7b2b093dff4208166c53a94786c9e447808335ef1185fea \
+    --hash=sha256:746f5060322511280a1e50eb76846ed6bf2342984b2ac04dc42caa1a8d78799e
+    # via readme-renderer
+filelock==3.29.2 \
+    --hash=sha256:779d2f5443b584750c6b90457abffd49235bfb0e66ce82ef5a680867e518ca1c \
+    --hash=sha256:f5d3feb44b2b8824832587543af5226822fe86baf086678ede47aa177fe47ca5
+    # via
+    #   huggingface-hub
+    #   mlx-gen
+    #   torch
+fonttools==4.63.0 \
+    --hash=sha256:032038247a96c1690f9f31e377c389383c902531b085aa4e4dabd6f57f870e69 \
+    --hash=sha256:063e08bd17bd5a90127a14123de0d6a952dbc847695fd98b63c043d58057f90c \
+    --hash=sha256:0c18358a155d75034911c5ee397a5b44cd19dd325dbb8b35fb60bf421d6a72ac \
+    --hash=sha256:0eac00b9118c3c2f87d272e45341871c5b3066baa3c86897fa634a7c3fb59096 \
+    --hash=sha256:1e874792a8212b44583ea02189d9e693906b2f78b261f372f95d6c563210ac1d \
+    --hash=sha256:22135da48a348785c5e2d5d2d9d6bec5ed44adacbaeb9db12d9493bf6c6bfa68 \
+    --hash=sha256:22693918177bd9ceabec4736d338045f357769416fc6b0b2508eefef75b08616 \
+    --hash=sha256:27fdc65af8da6f88b9c6121c47a464cbe359fcfff7ff6fc2d37a1f395d755b78 \
+    --hash=sha256:2b8ae05d9eacf6081414d759c0a352769ac28ce31280d6bb8e77b03f9e3c449f \
+    --hash=sha256:2c14b4fd138c4bafcca294765c547914e1aa431ae1ca94ab99d8db08c958bd3b \
+    --hash=sha256:308f957cdeaf8abe4e5f2f124902ef405448af92c90f80e302a3b771c2e6116b \
+    --hash=sha256:37dd23e621e3b0aef1baa70a303b80aaf38449632cfc8fd2a55fb285bbccfc02 \
+    --hash=sha256:445af2eab030a16b9171ea8bdda7ebf7d96bda2df88ee182a464252f6e05e20d \
+    --hash=sha256:51394295f1a51de8b5f30bdb1e1b9a4231536c7064ef5c6e211eec19fa36036f \
+    --hash=sha256:58dc6bb86a78d782f00f9190ca02c119cf5bbe2807536e361e18d42019f877d8 \
+    --hash=sha256:59ac449f8cca9b4ffa08d2e7bbadad87ce710d69d1eda5c3c1ce579baa987272 \
+    --hash=sha256:6b2248c5decb223562f7902ff6325077a073f608ee8e33e88ad88db734eb9f49 \
+    --hash=sha256:6d4741eb179121cab9eea4cb2393d24492373a260d7945006358c08cfbf45419 \
+    --hash=sha256:6db5140a60a5d731d21ec076745b40a310607731b0a565b50776393188649001 \
+    --hash=sha256:6e528da43bc3791085f8cb6141b1d13e459226790240340fcbb4625649238b03 \
+    --hash=sha256:796f27556dbe094c4824f75ca85267e4df776c79036c8441469a4df37038c196 \
+    --hash=sha256:79cdc9f567aec74a72918fd060283911406750cbc9fd28c1316023deb6ce31a9 \
+    --hash=sha256:7d76edbff9014094dbf03bd2d074709dfa6ec7aba13d838c937a2b33d2d6a86e \
+    --hash=sha256:7d782fac32985914c351556f68ac0855391572bcd87de50e05970d3cd4c96fc5 \
+    --hash=sha256:7dd683fef0663e9f0f45cf541d788d24caa3ec9db50796b588e1757d8b3bc007 \
+    --hash=sha256:85be818f5506e8a7753153def2c9550178f0ecae6a47b5e0e8dbb23f7cc90380 \
+    --hash=sha256:948428a275741f0b64b113c955425a953314f4b9ab9997f73a72c83e68e569c8 \
+    --hash=sha256:9ced0bd02ac751dd6319b0da88aaef24414e3b0dbc32bb4f24944821a3741a27 \
+    --hash=sha256:9e12f105d2b6342c559c298afb674006bb2893afc7102dcf8a1b55b0486b4e40 \
+    --hash=sha256:a8b33a82979e0a6a34ff435cc81317be1f95ec1ebb7a3a2d1c8a6a54f02ae44e \
+    --hash=sha256:a9faff9e0c1f76f9fd55899d2ce785832efebab37eb8ae13995853aef178bef0 \
+    --hash=sha256:af2fd1664d00a397d75f806985ddb36282091c2131a73a6485c23b4a34722263 \
+    --hash=sha256:afefc1ed0a59785a7fb06ea7e1678e849c193e1e387db783579bc7b3056fcfcb \
+    --hash=sha256:b1cd75a03ad8cb5bc40c90bfde68c0c47de423aa19e5c0f362b43520645eea94 \
+    --hash=sha256:ba04cb5891d4c0c21b6da95eda8d7b090021508a294fff33464fc7d241e0856b \
+    --hash=sha256:bf00f21eb5fb721dbaf73d1e9da6d02a1af7768f2ebcf9798be98beab8ba90f6 \
+    --hash=sha256:c0425b277a59cff3d80ca42162a8de360f318438a2ac83570842a678d826d579 \
+    --hash=sha256:c1aaa4b9c75798400ac043ce04d74e7830376c85095a5a6ed7cba2f17a266bf4 \
+    --hash=sha256:c2a2a42198b696a6f48fad91709afb55176e66a5e566131219dba372fb7f8c59 \
+    --hash=sha256:caeb583deeb5168e694b65cda8b4ee62abedfa66cf88488734466f2366b9c4e0 \
+    --hash=sha256:cb014d58140a38135f16064c74c652ed57aa0b75cbf8bb59cac821f7edb5334e \
+    --hash=sha256:ccf41f2efdf56994d22d73bef4ced1052161958169428d06ba9724ea9e9a64be \
+    --hash=sha256:cd7e9857e5e63738b9d9fd707bc1f59c8b09e5177726d23664db393c59bb08bd \
+    --hash=sha256:d76ac49f929aecaf82d83250b8347e099d7aecba0f4726c1d9b6df3b8bb5fe18 \
+    --hash=sha256:d7e5c9973aa04c95650c96e5f5ad865fbf42d62079163ecfab1e01cbc2504c22 \
+    --hash=sha256:dcf076a4474fe0d7367e5bbf5b052c7284fa1feca729c04176ce513521afd8a0 \
+    --hash=sha256:e3297a6a4059b4acc3a1e9a8b04741f240a80044eef08ebd32e8b5bcdddce75b \
+    --hash=sha256:ee08ebfa58f6e1aeff5697ab9582105bb620008c1caafb681e4c557e7483027b \
+    --hash=sha256:ef3048ef05dbb552b89817713d9cac912e00d0fde4a3105c00d29e52e10c89af \
+    --hash=sha256:fd1e3094f42d806d3d7c79162fc59e5910fcbe3a7360c385b8da969bc4493745
+    # via
+    #   matplotlib
+    #   mlx-gen
+fsspec==2026.4.0 \
+    --hash=sha256:11ef7bb35dab8a394fde6e608221d5cf3e8499401c249bebaeaad760a1a8dec2 \
+    --hash=sha256:301d8ac70ae90ef3ad05dcf94d6c3754a097f9b5fe4667d2787aa359ec7df7e4
+    # via
+    #   huggingface-hub
+    #   torch
+h11==0.16.0 \
+    --hash=sha256:4e35b956cf45792e4caa5885e69fba00bdbc6ffafbfa020300e549b208ee5ff1 \
+    --hash=sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86
+    # via httpcore
+hf-transfer==0.1.9 \
+    --hash=sha256:035572865dab29d17e783fbf1e84cf1cb24f3fcf8f1b17db1cfc7fdf139f02bf \
+    --hash=sha256:0d991376f0eac70a60f0cbc95602aa708a6f7c8617f28b4945c1431d67b8e3c8 \
+    --hash=sha256:16f208fc678911c37e11aa7b586bc66a37d02e636208f18b6bc53d29b5df40ad \
+    --hash=sha256:1a6bd16c667ebe89a069ca163060127a794fa3a3525292c900b8c8cc47985b0d \
+    --hash=sha256:2c7fc1b85f4d0f76e452765d7648c9f4bfd0aedb9ced2ae1ebfece2d8cfaf8e2 \
+    --hash=sha256:3a736dfbb2c84f5a2c975478ad200c0c8bfcb58a25a35db402678fb87ce17fa4 \
+    --hash=sha256:3ebc4ab9023414880c8b1d3c38174d1c9989eb5022d37e814fa91a3060123eb0 \
+    --hash=sha256:435cc3cdc8524ce57b074032b8fd76eed70a4224d2091232fa6a8cef8fd6803e \
+    --hash=sha256:504b8427fd785dd8546d53b9fafe6e436bd7a3adf76b9dce556507650a7b4567 \
+    --hash=sha256:57fd9880da1ee0f47250f735f791fab788f0aa1ee36afc49f761349869c8b4d9 \
+    --hash=sha256:5828057e313de59300dd1abb489444bc452efe3f479d3c55b31a8f680936ba42 \
+    --hash=sha256:5d561f0520f493c66b016d99ceabe69c23289aa90be38dd802d2aef279f15751 \
+    --hash=sha256:6e94e8822da79573c9b6ae4d6b2f847c59a7a06c5327d7db20751b68538dc4f6 \
+    --hash=sha256:8669dbcc7a3e2e8d61d42cd24da9c50d57770bd74b445c65123291ca842a7e7a \
+    --hash=sha256:8674026f21ed369aa2a0a4b46000aca850fc44cd2b54af33a172ce5325b4fc82 \
+    --hash=sha256:89a23f58b7b7effbc047b8ca286f131b17728c99a9f972723323003ffd1bb916 \
+    --hash=sha256:8fd0167c4407a3bc4cdd0307e65ada2294ec04f1813d8a69a5243e379b22e9d8 \
+    --hash=sha256:a5b366d34cd449fe9b20ef25941e6eef0460a2f74e7389f02e673e1f88ebd538 \
+    --hash=sha256:cdca9bfb89e6f8f281890cc61a8aff2d3cecaff7e1a4d275574d96ca70098557 \
+    --hash=sha256:d2fde99d502093ade3ab1b53f80da18480e9902aa960dab7f74fb1b9e5bc5746 \
+    --hash=sha256:dc7fff1345980d6c0ebb92c811d24afa4b98b3e07ed070c8e38cc91fd80478c5 \
+    --hash=sha256:e66acf91df4a8b72f60223059df3003062a5ae111757187ed1a06750a30e911b \
+    --hash=sha256:e6ac4eddcd99575ed3735ed911ddf9d1697e2bd13aa3f0ad7e3904dd4863842e \
+    --hash=sha256:ee8b10afedcb75f71091bcc197c526a6ebf5c58bbbadb34fdeee6160f55f619f \
+    --hash=sha256:fc6bd19e1cc177c66bdef15ef8636ad3bde79d5a4f608c158021153b4573509d
+    # via mlx-gen
+hf-xet==1.5.1 \
+    --hash=sha256:0c97106032ef70467b4f6bc2d0ccc266d7613ee076afc56516c502f87ce1c4a6 \
+    --hash=sha256:3474760d10e3bb6f92ff3f024fcb00c0b3e4001e9b035c7483e49a5dd17aa70f \
+    --hash=sha256:4f561cbbb92f80960772059864b7fb07eae879adde1b2e781ec6f86f6ac26c59 \
+    --hash=sha256:51ef4500dab3764b41135ee1381a4b62ce56fc54d4c92b719b59e597d6df5bf6 \
+    --hash=sha256:6071d5ccb4d8d2cbd5fea5cc798da4f0ba3f44e25369591c4e89a4987050e61d \
+    --hash=sha256:6208adb15d192b90e4c2ad2a27ed864359b2cb0f2494eb6d7c7f3699ac02e2bf \
+    --hash=sha256:6762d89b9e3267dfd502b29b2a327b4525f33b17e7b509a78d94e2151a30ce30 \
+    --hash=sha256:6abd35c3221eff63836618ddfb954dcf84798603f71d8e33e3ed7b04acfdbe6e \
+    --hash=sha256:6f7a04a8ad962422e225bc49fbbac99dc1806764b1f3e54dbd154bffa7593947 \
+    --hash=sha256:8298485c1e36e7e67cbd01eeb1376619b7af43d4f1ec245caae306f890a8a32d \
+    --hash=sha256:892e3a3a3aecc12aded8b93cf4f9cd059282c7de0732f7d55026f3abdf474350 \
+    --hash=sha256:93d090b57b211133f6c0dab0205ef5cb6d89162979ba75a74845045cc3063b8e \
+    --hash=sha256:94e761bbd266bf4c03cee73753916062665ce8365aa40ed321f45afcb934b41e \
+    --hash=sha256:97f212a88d14bbf573619a74b7fecb238de77d08fc702e54dec6f78276ca3283 \
+    --hash=sha256:a93df2039190502835b1db8cd7e178b0b7b889fe9ab51299d5ced26e0dd879a4 \
+    --hash=sha256:bf67e6ed10260cef62e852789dc91ebb03f382d5bdc4b1dbeb64763ea275e7d6 \
+    --hash=sha256:c6b6cd08ca095058780b50b8ce4d6cbf6787bcf27841705d58a9d32246e3e47a \
+    --hash=sha256:d48199c2bf4f8df0adc55d31d1368b6ec0e4d4f45bc86b08038089c23db0bed8 \
+    --hash=sha256:dbf48c0d02cf0b2e568944330c60d9120c272dabe013bd892d48e25bc6797577 \
+    --hash=sha256:e1af0de8ca6f190d4294a28b88023db64a1e2d1d719cab044baf75bec569e7a9 \
+    --hash=sha256:e78e4e5192ad2b674c2e1160b651cb9134db974f8ae1835bdfbfb0166b894a43 \
+    --hash=sha256:e7dbb40617410f432182d918e37c12303fe6700fd6aa6c5964e30a535a4461d6 \
+    --hash=sha256:f4ad3ebd4c32dd2b27099d69dc7b2df821e30767e46fb6ee6a0713778243b8ff \
+    --hash=sha256:f61e3665892a6c8c5e765395838b8ddf36185da835253d4bc4509a81e49fb342 \
+    --hash=sha256:f7b3002f95d1c13e24bcb4537baa8f0eb3838957067c91bb4959bc004a6435f5
+    # via huggingface-hub
+httpcore==1.0.9 \
+    --hash=sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55 \
+    --hash=sha256:6e34463af53fd2ab5d807f399a9b45ea31c3dfa2276f15a2c3f00afff6e176e8
+    # via httpx
+httpx==0.28.1 \
+    --hash=sha256:75e98c5f16b0f35b567856f597f06ff2270a374470a5c2392242528e3e3e42fc \
+    --hash=sha256:d909fcccc110f8c7faf814ca82a9a4d816bc5a6dbfea25d6591d6985b8ba59ad
+    # via huggingface-hub
+huggingface-hub==1.18.0 \
+    --hash=sha256:729be4a976fb706dcc02d176bcda8a3f32bdf21a294e8f4b3dda6fbcbc9c1ab1 \
+    --hash=sha256:f0c5ecd1ef8c6a60f86f61ee278f2c1570ba9e279c9f54de9094210723b3613b
+    # via
+    #   mlx-gen
+    #   tokenizers
+    #   transformers
+id==1.6.1 \
+    --hash=sha256:d0732d624fb46fd4e7bc4e5152f00214450953b9e772c182c1c22964def1a069 \
+    --hash=sha256:f5ec41ed2629a508f5d0988eda142e190c9c6da971100612c4de9ad9f9b237ca
+    # via twine
+idna==3.18 \
+    --hash=sha256:7f952cbe720b688055e3f87de14f5c3e5fdaa8bc3928985c4077ca689de849a2 \
+    --hash=sha256:ffb385a7e039654cef1ab9ef32c6fafe283c0c0467bba1d9029738ce4a14a848
+    # via
+    #   anyio
+    #   httpx
+    #   requests
+jaraco-classes==3.4.0 \
+    --hash=sha256:47a024b51d0239c0dd8c8540c6c7f484be3b8fcf0b2d85c13825780d3b3f3acd \
+    --hash=sha256:f662826b6bed8cace05e7ff873ce0f9283b5c924470fe664fff1c2f00f581790
+    # via keyring
+jaraco-context==6.1.2 \
+    --hash=sha256:bf8150b79a2d5d91ae48629d8b427a8f7ba0e1097dd6202a9059f29a36379535 \
+    --hash=sha256:f1a6c9d391e661cc5b8d39861ff077a7dc24dc23833ccee564b234b81c82dfe3
+    # via keyring
+jaraco-functools==4.5.0 \
+    --hash=sha256:3bb5665ea4a020cf78a7040e89154c77edadb3ca74f366479669c5999aa70b03 \
+    --hash=sha256:79ce39246eddbde4b3a03b77ea5f0f7878dc669b166a66cf3fa8e266aa3fa2f4
+    # via keyring
+jinja2==3.1.6 \
+    --hash=sha256:0137fb05990d35f1275a587e9aee6d56da821fc83491a0fb838183be43f66d6d \
+    --hash=sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67
+    # via torch
+keyring==25.7.0 \
+    --hash=sha256:be4a0b195f149690c166e850609a477c532ddbfbaed96a404d4e43f8d5e2689f \
+    --hash=sha256:fe01bd85eb3f8fb3dd0405defdeac9a5b4f6f0439edbb3149577f244a2e8245b
+    # via twine
+kiwisolver==1.5.0 \
+    --hash=sha256:012b1eb16e28718fa782b5e61dc6f2da1f0792ca73bd05d54de6cb9561665fc9 \
+    --hash=sha256:01808c6d15f4c3e8559595d6d1fe6411c68e4a3822b4b9972b44473b24f4e679 \
+    --hash=sha256:0255a027391d52944eae1dbb5d4cc5903f57092f3674e8e544cdd2622826b3f0 \
+    --hash=sha256:0b85aad90cea8ac6797a53b5d5f2e967334fa4d1149f031c4537569972596cb8 \
+    --hash=sha256:0bf3acf1419fa93064a4c2189ac0b58e3be7872bf6ee6177b0d4c63dc4cea276 \
+    --hash=sha256:0c50b89ffd3e1a911c69a1dd3de7173c0cd10b130f56222e57898683841e4f96 \
+    --hash=sha256:0cbe94b69b819209a62cb27bdfa5dc2a8977d8de2f89dfd97ba4f53ed3af754e \
+    --hash=sha256:0df54df7e686afa55e6f21fb86195224a6d9beb71d637e8d7920c95cf0f89aac \
+    --hash=sha256:0e3aafb33aed7479377e5e9a82e9d4bf87063741fc99fc7ae48b0f16e32bdd6f \
+    --hash=sha256:12e91c215a96e39f57989c8912ae761286ac5a9584d04030ceb3368a357f017a \
+    --hash=sha256:1465387ac63576c3e125e5337a6892b9e99e0627d52317f3ca79e6930d889d15 \
+    --hash=sha256:16b85d37c2cbb3253226d26e64663f755d88a03439a9c47df6246b35defbdfb7 \
+    --hash=sha256:1b0feb50971481a2cc44d94e88bdb02cdd497618252ae226b8eb1201b957e368 \
+    --hash=sha256:1d49a49ac4cbfb7c1375301cd1ec90169dfeae55ff84710d782260ce77a75a02 \
+    --hash=sha256:1d9daea4ea6b9be74fe2f01f7fbade8d6ffab263e781274cffca0dba9be9eec9 \
+    --hash=sha256:1dd9b0b119a350976a6d781e7278ec7aca0b201e1a9e2d23d9804afecb6ca681 \
+    --hash=sha256:1f1489f769582498610e015a8ef2d36f28f505ab3096d0e16b4858a9ec214f57 \
+    --hash=sha256:2517e24d7315eb51c10664cdb865195df38ab74456c677df67bb47f12d088a27 \
+    --hash=sha256:295d9ffe712caa9f8a3081de8d32fc60191b4b51c76f02f951fd8407253528f4 \
+    --hash=sha256:2a075bd7bd19c70cf67c8badfa36cf7c5d8de3c9ddb8420c51e10d9c50e94920 \
+    --hash=sha256:32cc0a5365239a6ea0c6ed461e8838d053b57e397443c0ca894dcc8e388d4374 \
+    --hash=sha256:332b4f0145c30b5f5ad9374881133e5aa64320428a57c2c2b61e9d891a51c2f3 \
+    --hash=sha256:377815a8616074cabbf3f53354e1d040c35815a134e01d7614b7692e4bf8acfa \
+    --hash=sha256:38f4a703656f493b0ad185211ccfca7f0386120f022066b018eb5296d8613e23 \
+    --hash=sha256:3ac2360e93cb41be81121755c6462cff3beaa9967188c866e5fce5cf13170859 \
+    --hash=sha256:3c4923e404d6bcd91b6779c009542e5647fef32e4a5d75e115e3bbac6f2335eb \
+    --hash=sha256:3cdcb35dc9d807259c981a85531048ede628eabcffb3239adf3d17463518992d \
+    --hash=sha256:41024ed50e44ab1a60d3fe0a9d15a4ccc9f5f2b1d814ff283c8d01134d5b81bc \
+    --hash=sha256:413b820229730d358efd838ecbab79902fe97094565fdc80ddb6b0a18c18a581 \
+    --hash=sha256:4432b835675f0ea7414aab3d37d119f7226d24869b7a829caeab49ebda407b0c \
+    --hash=sha256:4db576bb8c3ef9365f8b40fe0f671644de6736ae2c27a2c62d7d8a1b4329f099 \
+    --hash=sha256:4e7f886f47ab881692f278ae901039a234e4025a68e6dfab514263a0b1c4ae05 \
+    --hash=sha256:4e9750bc21b886308024f8a54ccb9a2cc38ac9fa813bf4348434e3d54f337ff9 \
+    --hash=sha256:5060731cc3ed12ca3a8b57acd4aeca5bbc2f49216dd0bec1650a1acd89486bcd \
+    --hash=sha256:50847dca5d197fcbd389c805aa1a1cf32f25d2e7273dc47ab181a517666b68cc \
+    --hash=sha256:5092eb5b1172947f57d6ea7d89b2f29650414e4293c47707eb499ec07a0ac796 \
+    --hash=sha256:5124d1ea754509b09e53738ec185584cc609aae4a3b510aaf4ed6aa047ef9303 \
+    --hash=sha256:51e8c4084897de9f05898c2c2a39af6318044ae969d46ff7a34ed3f96274adca \
+    --hash=sha256:530a3fd64c87cffa844d4b6b9768774763d9caa299e9b75d8eca6a4423b31314 \
+    --hash=sha256:56fa888f10d0f367155e76ce849fa1166fc9730d13bd2d65a2aa13b6f5424489 \
+    --hash=sha256:58f812017cd2985c21fbffb4864d59174d4903dd66fa23815e74bbc7a0e2dd57 \
+    --hash=sha256:59cd8683f575d96df5bb48f6add94afc055012c29e28124fcae2b63661b9efb1 \
+    --hash=sha256:5ae8e62c147495b01a0f4765c878e9bfdf843412446a247e28df59936e99e797 \
+    --hash=sha256:5b233ea3e165e43e35dba1d2b8ecc21cf070b45b65ae17dd2747d2713d942021 \
+    --hash=sha256:6176c1811d9d5a04fa391c490cc44f451e240697a16977f11c6f722efb9041db \
+    --hash=sha256:62f59da443c4f4849f73a51a193b1d9d258dcad0c41bc4d1b8fb2bcc04bfeb22 \
+    --hash=sha256:6783e069732715ad0c3ce96dbf21dbc2235ab0593f2baf6338101f70371f4028 \
+    --hash=sha256:6ab8ba9152203feec73758dad83af9a0bbe05001eb4639e547207c40cfb52083 \
+    --hash=sha256:70d593af6a6ca332d1df73d519fddb5148edb15cd90d5f0155e3746a6d4fcc65 \
+    --hash=sha256:72ec46b7eba5b395e0a7b63025490d3214c11013f4aacb4f5e8d6c3041829588 \
+    --hash=sha256:7a32f72973f0f950c1920475d5c5ea3d971b81b6f0ec53b8d0a956cc965f22e0 \
+    --hash=sha256:7a4aa69609f40fce3cbc3f87b2061f042eee32f94b8f11db707b66a26461591a \
+    --hash=sha256:7c60d3c9b06fb23bd9c6139281ccbdc384297579ae037f08ae90c69f6845c0b1 \
+    --hash=sha256:800ee55980c18545af444d93fdd60c56b580db5cc54867d8cbf8a1dc0829938c \
+    --hash=sha256:80aa065ffd378ff784822a6d7c3212f2d5f5e9c3589614b5c228b311fd3063ac \
+    --hash=sha256:86e0287879f75621ae85197b0877ed2f8b7aa57b511c7331dce2eb6f4de7d476 \
+    --hash=sha256:893ff3a711d1b515ba9da14ee090519bad4610ed1962fbe298a434e8c5f8db53 \
+    --hash=sha256:89fc958c702ee9a745e4700378f5d23fddbc46ff89e8fdbf5395c24d5c1452a3 \
+    --hash=sha256:8c63c91f95173f9c2a67c7c526b2cea976828a0e7fced9cdcead2802dc10f8a4 \
+    --hash=sha256:8df31fe574b8b3993cc61764f40941111b25c2d9fea13d3ce24a49907cd2d615 \
+    --hash=sha256:8f9baf6f0a6e7571c45c8863010b45e837c3ee1c2c77fcd6ef423be91b21fedb \
+    --hash=sha256:9027d773c4ff81487181a925945743413f6069634d0b122d0b37684ccf4f1e18 \
+    --hash=sha256:9190426b7aa26c5229501fa297b8d0653cfd3f5a36f7990c264e157cbf886b3b \
+    --hash=sha256:940dda65d5e764406b9fb92761cbf462e4e63f712ab60ed98f70552e496f3bf1 \
+    --hash=sha256:94eff26096eb5395136634622515b234ecb6c9979824c1f5004c6e3c3c85ccd2 \
+    --hash=sha256:9eed0f7edbb274413b6ee781cca50541c8c0facd3d6fd289779e494340a2b85c \
+    --hash=sha256:ad4ae4ffd1ee9cd11357b4c66b612da9888f4f4daf2f36995eda64bd45370cac \
+    --hash=sha256:b0f172dc8ffaccb8522d7c5d899de00133f2f1ca7b0a49b7da98e901de87bf2d \
+    --hash=sha256:b2af221f268f5af85e776a73d62b0845fc8baf8ef0abfae79d29c77d0e776aaf \
+    --hash=sha256:b7d335370ae48a780c6e6a6bbfa97342f563744c39c35562f3f367665f5c1de2 \
+    --hash=sha256:b83af57bdddef03c01a9138034c6ff03181a3028d9a1003b301eb1a55e161a3f \
+    --hash=sha256:bb5136fb5352d3f422df33f0c879a1b0c204004324150cc3b5e3c4f310c9049f \
+    --hash=sha256:bc4d8e252f532ab46a1de9349e2d27b91fce46736a9eedaa37beaca66f574ed4 \
+    --hash=sha256:bdd3e53429ff02aa319ba59dfe4ceeec345bf46cf180ec2cf6fd5b942e7975e9 \
+    --hash=sha256:be12f931839a3bdfe28b584db0e640a65a8bcbc24560ae3fdb025a449b3d754e \
+    --hash=sha256:be4a51a55833dc29ab5d7503e7bcb3b3af3402d266018137127450005cdfe737 \
+    --hash=sha256:beb7f344487cdcb9e1efe4b7a29681b74d34c08f0043a327a74da852a6749e7b \
+    --hash=sha256:bf4679a3d71012a7c2bf360e5cd878fbd5e4fcac0896b56393dec239d81529ed \
+    --hash=sha256:c0e1403fd7c26d77c1f03e096dc58a5c726503fa0db0456678b8668f76f521e3 \
+    --hash=sha256:c31c13da98624f957b0fb1b5bae5383b2333c2c3f6793d9825dd5ce79b525cb7 \
+    --hash=sha256:c438f6ca858697c9ab67eb28246c92508af972e114cac34e57a6d4ba17a3ac08 \
+    --hash=sha256:c8277104ded0a51e699c8c3aff63ce2c56d4ed5519a5f73e0fd7057f959a2b9e \
+    --hash=sha256:c95cab08d1965db3d84a121f1c7ce7479bdd4072c9b3dafd8fecce48a2e6b902 \
+    --hash=sha256:cc0b66c1eec9021353a4b4483afb12dfd50e3669ffbb9152d6842eb34c7e29fd \
+    --hash=sha256:cdee07c4d7f6d72008d3f73b9bf027f4e11550224c7c50d8df1ae4a37c1402a6 \
+    --hash=sha256:ce9bf03dad3b46408c08649c6fbd6ca28a9fce0eb32fdfffa6775a13103b5310 \
+    --hash=sha256:cff8e5383db4989311f99e814feeb90c4723eb4edca425b9d5d9c3fefcdd9537 \
+    --hash=sha256:d168fda2dbff7b9b5f38e693182d792a938c31db4dac3a80a4888de603c99554 \
+    --hash=sha256:d1ffeb80b5676463d7a7d56acbe8e37a20ce725570e09549fe738e02ca6b7e1e \
+    --hash=sha256:d36ca54cb4c6c4686f7cbb7b817f66f5911c12ddb519450bbe86707155028f87 \
+    --hash=sha256:d4193f3d9dc3f6f79aaed0e5637f45d98850ebf01f7ca20e69457f3e8946b66a \
+    --hash=sha256:d5cd5189fc2b6a538b75ae45433140c4823463918f7b1617c31e68b085c0022c \
+    --hash=sha256:d618fd27420381a4f6044faa71f46d8bfd911bd077c555f7138ed88729bfbe79 \
+    --hash=sha256:d76e2d8c75051d58177e762164d2e9ab92886534e3a12e795f103524f221dd8e \
+    --hash=sha256:daae526907e262de627d8f70058a0f64acc9e2641c164c99c8f594b34a799a16 \
+    --hash=sha256:db485b3847d182b908b483b2ed133c66d88d49cacf98fd278fadafe11b4478d1 \
+    --hash=sha256:dd952e03bfbb096cfe2dd35cd9e00f269969b67536cb4370994afc20ff2d0875 \
+    --hash=sha256:dda366d548e89a90d88a86c692377d18d8bd64b39c1fb2b92cb31370e2896bbd \
+    --hash=sha256:e315e5ec90d88e140f57696ff85b484ff68bb311e36f2c414aa4286293e6dee0 \
+    --hash=sha256:e4415a8db000bf49a6dd1c478bf70062eaacff0f462b92b0ba68791a905861f9 \
+    --hash=sha256:e7a116ae737f0000343218c4edf5bd45893bfeaff0993c0b215d7124c9f77646 \
+    --hash=sha256:e7c4c09a490dc4d4a7f8cbee56c606a320f9dc28cf92a7157a39d1ce7676a657 \
+    --hash=sha256:ebae99ed6764f2b5771c522477b311be313e8841d2e0376db2b10922daebbba4 \
+    --hash=sha256:ec4c85dc4b687c7f7f15f553ff26a98bfe8c58f5f7f0ac8905f0ba4c7be60232 \
+    --hash=sha256:ed3a984b31da7481b103f68776f7128a89ef26ed40f4dc41a2223cda7fb24819 \
+    --hash=sha256:f18c2d9782259a6dc132fdc7a63c168cbc74b35284b6d75c673958982a378384 \
+    --hash=sha256:f1f9f4121ec58628c96baa3de1a55a4e3a333c5102c8e94b64e23bf7b2083309 \
+    --hash=sha256:f42c23db5d1521218a3276bb08666dcb662896a0be7347cba864eca45ff64ede \
+    --hash=sha256:f443b4825c50a51ee68585522ab4a1d1257fac65896f282b4c6763337ac9f5d2 \
+    --hash=sha256:f6764a4ccab3078db14a632420930f6186058750df066b8ea2a7106df91d3203 \
+    --hash=sha256:f7c7553b13f69c1b29a5bde08ddc6d9d0c8bfb84f9ed01c30db25944aeb852a7 \
+    --hash=sha256:fa6248cd194edff41d7ea9425ced8ca3a6f838bfb295f6f1d6e6bb694a8518df \
+    --hash=sha256:fa8eb9ecdb7efb0b226acec134e0d709e87a909fa4971a54c0c4f6e88635484c \
+    --hash=sha256:fc20894c3d21194d8041a28b65622d5b86db786da6e3cfe73f0c762951a61167 \
+    --hash=sha256:fc4d3f1fb9ca0ae9f97b095963bc6326f1dbfd3779d6679a1e016b9baaa153d3 \
+    --hash=sha256:fd40bb9cd0891c4c3cb1ddf83f8bbfa15731a248fdc8162669405451e2724b09 \
+    --hash=sha256:ff710414307fefa903e0d9bdf300972f892c23477829f49504e59834f4195398
+    # via matplotlib
+markdown-it-py==4.2.0 \
+    --hash=sha256:04a21681d6fbb623de53f6f364d352309d4094dd4194040a10fd51833e418d49 \
+    --hash=sha256:9f7ebbcd14fe59494226453aed97c1070d83f8d24b6fc3a3bcf9a38092641c4a
+    # via rich
+markupsafe==3.0.3 \
+    --hash=sha256:0303439a41979d9e74d18ff5e2dd8c43ed6c6001fd40e5bf2e43f7bd9bbc523f \
+    --hash=sha256:068f375c472b3e7acbe2d5318dea141359e6900156b5b2ba06a30b169086b91a \
+    --hash=sha256:0bf2a864d67e76e5c9a34dc26ec616a66b9888e25e7b9460e1c76d3293bd9dbf \
+    --hash=sha256:0db14f5dafddbb6d9208827849fad01f1a2609380add406671a26386cdf15a19 \
+    --hash=sha256:0eb9ff8191e8498cca014656ae6b8d61f39da5f95b488805da4bb029cccbfbaf \
+    --hash=sha256:0f4b68347f8c5eab4a13419215bdfd7f8c9b19f2b25520968adfad23eb0ce60c \
+    --hash=sha256:1085e7fbddd3be5f89cc898938f42c0b3c711fdcb37d75221de2666af647c175 \
+    --hash=sha256:116bb52f642a37c115f517494ea5feb03889e04df47eeff5b130b1808ce7c219 \
+    --hash=sha256:12c63dfb4a98206f045aa9563db46507995f7ef6d83b2f68eda65c307c6829eb \
+    --hash=sha256:133a43e73a802c5562be9bbcd03d090aa5a1fe899db609c29e8c8d815c5f6de6 \
+    --hash=sha256:1353ef0c1b138e1907ae78e2f6c63ff67501122006b0f9abad68fda5f4ffc6ab \
+    --hash=sha256:15d939a21d546304880945ca1ecb8a039db6b4dc49b2c5a400387cdae6a62e26 \
+    --hash=sha256:177b5253b2834fe3678cb4a5f0059808258584c559193998be2601324fdeafb1 \
+    --hash=sha256:1872df69a4de6aead3491198eaf13810b565bdbeec3ae2dc8780f14458ec73ce \
+    --hash=sha256:1b4b79e8ebf6b55351f0d91fe80f893b4743f104bff22e90697db1590e47a218 \
+    --hash=sha256:1b52b4fb9df4eb9ae465f8d0c228a00624de2334f216f178a995ccdcf82c4634 \
+    --hash=sha256:1ba88449deb3de88bd40044603fafffb7bc2b055d626a330323a9ed736661695 \
+    --hash=sha256:1cc7ea17a6824959616c525620e387f6dd30fec8cb44f649e31712db02123dad \
+    --hash=sha256:218551f6df4868a8d527e3062d0fb968682fe92054e89978594c28e642c43a73 \
+    --hash=sha256:26a5784ded40c9e318cfc2bdb30fe164bdb8665ded9cd64d500a34fb42067b1c \
+    --hash=sha256:2713baf880df847f2bece4230d4d094280f4e67b1e813eec43b4c0e144a34ffe \
+    --hash=sha256:2a15a08b17dd94c53a1da0438822d70ebcd13f8c3a95abe3a9ef9f11a94830aa \
+    --hash=sha256:2f981d352f04553a7171b8e44369f2af4055f888dfb147d55e42d29e29e74559 \
+    --hash=sha256:32001d6a8fc98c8cb5c947787c5d08b0a50663d139f1305bac5885d98d9b40fa \
+    --hash=sha256:3524b778fe5cfb3452a09d31e7b5adefeea8c5be1d43c4f810ba09f2ceb29d37 \
+    --hash=sha256:3537e01efc9d4dccdf77221fb1cb3b8e1a38d5428920e0657ce299b20324d758 \
+    --hash=sha256:35add3b638a5d900e807944a078b51922212fb3dedb01633a8defc4b01a3c85f \
+    --hash=sha256:38664109c14ffc9e7437e86b4dceb442b0096dfe3541d7864d9cbe1da4cf36c8 \
+    --hash=sha256:3a7e8ae81ae39e62a41ec302f972ba6ae23a5c5396c8e60113e9066ef893da0d \
+    --hash=sha256:3b562dd9e9ea93f13d53989d23a7e775fdfd1066c33494ff43f5418bc8c58a5c \
+    --hash=sha256:457a69a9577064c05a97c41f4e65148652db078a3a509039e64d3467b9e7ef97 \
+    --hash=sha256:4bd4cd07944443f5a265608cc6aab442e4f74dff8088b0dfc8238647b8f6ae9a \
+    --hash=sha256:4e885a3d1efa2eadc93c894a21770e4bc67899e3543680313b09f139e149ab19 \
+    --hash=sha256:4faffd047e07c38848ce017e8725090413cd80cbc23d86e55c587bf979e579c9 \
+    --hash=sha256:509fa21c6deb7a7a273d629cf5ec029bc209d1a51178615ddf718f5918992ab9 \
+    --hash=sha256:5678211cb9333a6468fb8d8be0305520aa073f50d17f089b5b4b477ea6e67fdc \
+    --hash=sha256:591ae9f2a647529ca990bc681daebdd52c8791ff06c2bfa05b65163e28102ef2 \
+    --hash=sha256:5a7d5dc5140555cf21a6fefbdbf8723f06fcd2f63ef108f2854de715e4422cb4 \
+    --hash=sha256:69c0b73548bc525c8cb9a251cddf1931d1db4d2258e9599c28c07ef3580ef354 \
+    --hash=sha256:6b5420a1d9450023228968e7e6a9ce57f65d148ab56d2313fcd589eee96a7a50 \
+    --hash=sha256:722695808f4b6457b320fdc131280796bdceb04ab50fe1795cd540799ebe1698 \
+    --hash=sha256:729586769a26dbceff69f7a7dbbf59ab6572b99d94576a5592625d5b411576b9 \
+    --hash=sha256:77f0643abe7495da77fb436f50f8dab76dbc6e5fd25d39589a0f1fe6548bfa2b \
+    --hash=sha256:795e7751525cae078558e679d646ae45574b47ed6e7771863fcc079a6171a0fc \
+    --hash=sha256:7be7b61bb172e1ed687f1754f8e7484f1c8019780f6f6b0786e76bb01c2ae115 \
+    --hash=sha256:7c3fb7d25180895632e5d3148dbdc29ea38ccb7fd210aa27acbd1201a1902c6e \
+    --hash=sha256:7e68f88e5b8799aa49c85cd116c932a1ac15caaa3f5db09087854d218359e485 \
+    --hash=sha256:83891d0e9fb81a825d9a6d61e3f07550ca70a076484292a70fde82c4b807286f \
+    --hash=sha256:8485f406a96febb5140bfeca44a73e3ce5116b2501ac54fe953e488fb1d03b12 \
+    --hash=sha256:8709b08f4a89aa7586de0aadc8da56180242ee0ada3999749b183aa23df95025 \
+    --hash=sha256:8f71bc33915be5186016f675cd83a1e08523649b0e33efdb898db577ef5bb009 \
+    --hash=sha256:915c04ba3851909ce68ccc2b8e2cd691618c4dc4c4232fb7982bca3f41fd8c3d \
+    --hash=sha256:949b8d66bc381ee8b007cd945914c721d9aba8e27f71959d750a46f7c282b20b \
+    --hash=sha256:94c6f0bb423f739146aec64595853541634bde58b2135f27f61c1ffd1cd4d16a \
+    --hash=sha256:9a1abfdc021a164803f4d485104931fb8f8c1efd55bc6b748d2f5774e78b62c5 \
+    --hash=sha256:9b79b7a16f7fedff2495d684f2b59b0457c3b493778c9eed31111be64d58279f \
+    --hash=sha256:a320721ab5a1aba0a233739394eb907f8c8da5c98c9181d1161e77a0c8e36f2d \
+    --hash=sha256:a4afe79fb3de0b7097d81da19090f4df4f8d3a2b3adaa8764138aac2e44f3af1 \
+    --hash=sha256:ad2cf8aa28b8c020ab2fc8287b0f823d0a7d8630784c31e9ee5edea20f406287 \
+    --hash=sha256:b8512a91625c9b3da6f127803b166b629725e68af71f8184ae7e7d54686a56d6 \
+    --hash=sha256:bc51efed119bc9cfdf792cdeaa4d67e8f6fcccab66ed4bfdd6bde3e59bfcbb2f \
+    --hash=sha256:bdc919ead48f234740ad807933cdf545180bfbe9342c2bb451556db2ed958581 \
+    --hash=sha256:bdd37121970bfd8be76c5fb069c7751683bdf373db1ed6c010162b2a130248ed \
+    --hash=sha256:be8813b57049a7dc738189df53d69395eba14fb99345e0a5994914a3864c8a4b \
+    --hash=sha256:c0c0b3ade1c0b13b936d7970b1d37a57acde9199dc2aecc4c336773e1d86049c \
+    --hash=sha256:c47a551199eb8eb2121d4f0f15ae0f923d31350ab9280078d1e5f12b249e0026 \
+    --hash=sha256:c4ffb7ebf07cfe8931028e3e4c85f0357459a3f9f9490886198848f4fa002ec8 \
+    --hash=sha256:ccfcd093f13f0f0b7fdd0f198b90053bf7b2f02a3927a30e63f3ccc9df56b676 \
+    --hash=sha256:d2ee202e79d8ed691ceebae8e0486bd9a2cd4794cec4824e1c99b6f5009502f6 \
+    --hash=sha256:d53197da72cc091b024dd97249dfc7794d6a56530370992a5e1a08983ad9230e \
+    --hash=sha256:d6dd0be5b5b189d31db7cda48b91d7e0a9795f31430b7f271219ab30f1d3ac9d \
+    --hash=sha256:d88b440e37a16e651bda4c7c2b930eb586fd15ca7406cb39e211fcff3bf3017d \
+    --hash=sha256:de8a88e63464af587c950061a5e6a67d3632e36df62b986892331d4620a35c01 \
+    --hash=sha256:df2449253ef108a379b8b5d6b43f4b1a8e81a061d6537becd5582fba5f9196d7 \
+    --hash=sha256:e1c1493fb6e50ab01d20a22826e57520f1284df32f2d8601fdd90b6304601419 \
+    --hash=sha256:e1cf1972137e83c5d4c136c43ced9ac51d0e124706ee1c8aa8532c1287fa8795 \
+    --hash=sha256:e2103a929dfa2fcaf9bb4e7c091983a49c9ac3b19c9061b6d5427dd7d14d81a1 \
+    --hash=sha256:e56b7d45a839a697b5eb268c82a71bd8c7f6c94d6fd50c3d577fa39a9f1409f5 \
+    --hash=sha256:e8afc3f2ccfa24215f8cb28dcf43f0113ac3c37c2f0f0806d8c70e4228c5cf4d \
+    --hash=sha256:e8fc20152abba6b83724d7ff268c249fa196d8259ff481f3b1476383f8f24e42 \
+    --hash=sha256:eaa9599de571d72e2daf60164784109f19978b327a3910d3e9de8c97b5b70cfe \
+    --hash=sha256:ec15a59cf5af7be74194f7ab02d0f59a62bdcf1a537677ce67a2537c9b87fcda \
+    --hash=sha256:f190daf01f13c72eac4efd5c430a8de82489d9cff23c364c3ea822545032993e \
+    --hash=sha256:f34c41761022dd093b4b6896d4810782ffbabe30f2d443ff5f083e0cbbb8c737 \
+    --hash=sha256:f3e98bb3798ead92273dc0e5fd0f31ade220f59a266ffd8a4f6065e0a3ce0523 \
+    --hash=sha256:f42d0984e947b8adf7dd6dde396e720934d12c506ce84eea8476409563607591 \
+    --hash=sha256:f71a396b3bf33ecaa1626c255855702aca4d3d9fea5e051b41ac59a9c1c41edc \
+    --hash=sha256:f9e130248f4462aaa8e2552d547f36ddadbeaa573879158d721bbd33dfe4743a \
+    --hash=sha256:fed51ac40f757d41b7c48425901843666a6677e3e8eb0abcff09e4ba6e664f50
+    # via jinja2
+matplotlib==3.10.9 \
+    --hash=sha256:09218df8a93712bd6ea133e83a153c755448cf7868316c531cffcc43f69d1cc9 \
+    --hash=sha256:10cc5ce06d10231c36f40e875f3c7e8050362a4ee8f0ee5d29a6b3277d57bb42 \
+    --hash=sha256:172db52c9e683f5d12eaf57f0f54834190e12581fe1cc2a19595a8f5acb4e77d \
+    --hash=sha256:1872fb212a05b729e649754a72d5da61d03e0554d76e80303b6f83d1d2c0552b \
+    --hash=sha256:1aa972116abb4c9d201bf245620b433726cb6856f3bef6a78f776a00f5c92d37 \
+    --hash=sha256:1e7698ac9868428e84d2c967424803b2472ff7167d9d6590d4204ed775343c3b \
+    --hash=sha256:2dc9477819ffd78ad12a20df1d9d6a6bd4fec6aaa9072681465fddca052f1456 \
+    --hash=sha256:3225f4e1edcb8c86c884ddf79ebe20ecd0a67d30188f279897554ccd8fded4dc \
+    --hash=sha256:336b9acc64d309063126edcdaca00db9373af3c476bb94388fe9c5a53ad13e6f \
+    --hash=sha256:345f6f68ecc8da0ca56fad2ea08fde1a115eda530079eca185d50a7bc3e146c6 \
+    --hash=sha256:34cf8167e023ad956c15f36302911d5406bd99a9862c1a8499ea6f7c0e015dc2 \
+    --hash=sha256:3fc0364dfbe1d07f6d15c5ebd0c5bf89e126916e5a8667dd4a7a6e84c36653d4 \
+    --hash=sha256:41cb28c2bd769aa3e98322c6ab09854cbcc52ab69d2759d681bba3e327b2b320 \
+    --hash=sha256:42fb814efabe95c06c1994d8ab5a8385f43a249e23badd3ba931d4308e5bca20 \
+    --hash=sha256:4e42042d54db34fda4e95a7bd3e5789c2a995d2dad3eb8850232ee534092fbbf \
+    --hash=sha256:4edcfbd8565339aa62f1cd4012f7180926fdbe71850f7b0d3c379c175cd6b66c \
+    --hash=sha256:51bf0ddbdc598e060d46c16b5590708f81a1624cefbaaf62f6a81bf9285b8c80 \
+    --hash=sha256:56fc0bd271b00025c6edfdc7c2dcd247372c8e1544971d62e1dc7c17367e8bf9 \
+    --hash=sha256:59476c6d29d612b8e9bb6ce8c5b631be6ba8f9e3a2421f22a02b192c7dd28716 \
+    --hash=sha256:6640f75af2c6148293caa0a2b39dd806a492dd66c8a8b04035813e33d0fd2585 \
+    --hash=sha256:68cfdcede415f7c8f5577b03303dd94526cdb6d11036cecdc205e08733b2d2bb \
+    --hash=sha256:6b63d9c7c769b88ab81e10dc86e4e0607cf56817b9f9e6cf24b2a5f1693b8e38 \
+    --hash=sha256:6be157fe17fc37cb95ac1d7374cf717ce9259616edec911a78d9d26dae8522d4 \
+    --hash=sha256:6c63ebcd8b4b169eb2f5c200552ae6b8be8999a005b6b507ed76fb8d7d674fe2 \
+    --hash=sha256:77210dce9cb8153dffc967efaae990543392563d5a376d4dd8539bebcb0ed217 \
+    --hash=sha256:7a8d66a55def891c33147ba3ba9bfcabf0b526a43764c818acbb4525e5ed0838 \
+    --hash=sha256:82368699727bfb7b0182e1aa13082e3c08e092fa1a25d3e1fd92405bff96f6d4 \
+    --hash=sha256:82834c3c292d24d3a8aae77cd2d20019de69d692a34a970e4fdb8d33e2ea3dda \
+    --hash=sha256:8e436d155fa8a3399dc62683f8f5d0e2e50d25d0144a73edd73f82eec8f4abfb \
+    --hash=sha256:8f3bcac1ca5ed000a6f4337d47ba67dfddf37ed6a46c15fd7f014997f7bf865f \
+    --hash=sha256:97e35e8d39ccc85859095e01a53847432ba9a53ddf7986f7a54a11b73d0e143f \
+    --hash=sha256:985f2238880e2e69093f588f5fe2e46771747febf0649f3cf7f7b7480875317f \
+    --hash=sha256:a49f1eadc84ca85fd72fa4e89e70e61bf86452df6f971af04b12c60761a0772c \
+    --hash=sha256:a5a6104ed666402ba5106d7f36e0e0cdca4e8d7fa4d39708ca88019e2835a2eb \
+    --hash=sha256:aba1615dabe83188e19d4f75a253c6a08423e04c1425e64039f800050a69de6b \
+    --hash=sha256:ae20801130378b82d647ff5047c07316295b68dc054ca6b3c13519d0ea624285 \
+    --hash=sha256:ae2f11957b27ce53497dd4d7b235c4d4f1faf383dfb39d0c5beb833bff883294 \
+    --hash=sha256:b049278ddce116aaa1c1377ebf58adea909132dfce0281cf7e3a1ea9fc2e2c65 \
+    --hash=sha256:b1b745c489cd1a77a0dc1120a05dc87af9798faebc913601feb8c73d89bf2d1e \
+    --hash=sha256:b2b9516251cb89ff618d757daec0e2ed1bf21248013844a853d87ef85ab3081d \
+    --hash=sha256:b580440f1ff81a0e34122051a3dfabb7e4b7f9e380629929bde0eff9af72165f \
+    --hash=sha256:ba7b3b8ef09eab7df0e86e9ae086faa433efbfbdb46afcb3aa16aabf779469a8 \
+    --hash=sha256:c27df8b3848f32a83d1767566595e43cfaa4460380974da06f4279a7ec143c39 \
+    --hash=sha256:d091f9d758b34aaaaa6331d13574bf01891d903b3dec59bfff458ef7551de5d6 \
+    --hash=sha256:d730e984eddf56974c3e72b6129c7ca462ac38dc624338f4b0b23eb23ecba00f \
+    --hash=sha256:d75d11c949914165976c621b2324f9ef162af7ebf4b057ddf95dd1dba7e5edcf \
+    --hash=sha256:d843374407c4017a6403b59c6c81606773d136f3259d5b6da3131bc814542cc2 \
+    --hash=sha256:da4e09638420548f31c354032a6250e473c68e5a4e96899b4844cf39ddea23fe \
+    --hash=sha256:de2445a0c6690d21b7eb6ce071cebad6d40a2e9bdf10d039074a96ba19797b99 \
+    --hash=sha256:dfca0129678bd56379db26c52b5d77ed7de314c047492fbdc763aa7501710cfb \
+    --hash=sha256:e9fae004b941b23ff2edcf1567a857ed77bafc8086ffa258190462328434faf8 \
+    --hash=sha256:f0c3c28d9fbcc1fe7a03be236d73430cf6409c41fb2383a7ac52fe932b072cb1 \
+    --hash=sha256:f4399f64b3e94cd500195490972ae1ee81170df1636fa15364d157d5bdd7b921 \
+    --hash=sha256:f76e640a5268850bfda54b5131b1b1941cc685e42c5fa98ed9f2d64038308cba \
+    --hash=sha256:fd66508e8c6877d98e586654b608a0456db8d7e8a546eb1e2600efd957302358
+    # via mlx-gen
+mdurl==0.1.2 \
+    --hash=sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8 \
+    --hash=sha256:bb413d29f5eea38f31dd4754dd7377d4465116fb207585f97bf925588687c1ba
+    # via markdown-it-py
+mlx==0.31.2 \
+    --hash=sha256:117c7583cae0ca107cd53c591cc34f8e75f97a505aa47088844b7dc0fc69dc67 \
+    --hash=sha256:1b3fb0dda955b0d552ce57bdd6f42b3309ab21b067e40587d6848443d307e91f \
+    --hash=sha256:2a64db61b2840f28bae08354e6f999698e30381af201cc12354290673c96213b \
+    --hash=sha256:34b0171cd9eb5c43fdd82091f6135d6ccc5a065363a4a3e68fac64fb4e53d37c \
+    --hash=sha256:4a3f181b367d404e44a6bd68ef5eb573930809ac60cacd51d0c851c629b1b651 \
+    --hash=sha256:51ca102db641b01e7cb083ce8ecb580e281530a141a7ca12544bb370641630ae \
+    --hash=sha256:53c8d57ffa9ce77f8355663be05014c0dd37280e57f19126fb0a24389a30684b \
+    --hash=sha256:59ccbd0f0044d4f97f11ebcbf0c480bc9e962935fd96275f120954afea65be8a \
+    --hash=sha256:69fbc94bf53607a75af9eb3e22c354738a6fe4e25aa4e2b20934b009a4bba1f3 \
+    --hash=sha256:70297cbef7479429f69c966bfed10da20a6f0c2aa997eec2b4f6ba1a07caf2ef \
+    --hash=sha256:99572133181481640a8bf8d449daf083816d0af3ee050c8adfc5bf45ceca91c6 \
+    --hash=sha256:a13c9ce23c3deef6aa5a09315e7953e1a5dc311e851fa16fc74c81fb2509c0b9 \
+    --hash=sha256:b0764bf11fc3a71dee988e19275eef67775cab63112d8bb7ef173ca8b2a1247c \
+    --hash=sha256:b29cf940f34205f09bb552ac60465ae833c4ae640b52777c6d725ddbad8461ca \
+    --hash=sha256:b368f7ede4238cc44076e4843820338c453c21ee50bd3ee26d4b182c179fd8e1 \
+    --hash=sha256:c05981684279a8935d58b0dde3ea5b02d210c3bad3319aa0e9934ec2df165752 \
+    --hash=sha256:c0ff158b7ac93a4b5659adbc70053498b30a5964fc45f78596398e056a96c36a \
+    --hash=sha256:c71dff00cc1b363d542f111d9e8b7b59dadb65b29d027f798b71ea34da75b665 \
+    --hash=sha256:cd1f4189e5f1bc68735f44eb63ce98ae09d66ac75d7ab5b15a41afae7e9f0513 \
+    --hash=sha256:cd5d42b0b2bee7efe1b0680a7e302943dd33b92c879cffa0358ffdb5a4a8d27b \
+    --hash=sha256:e3e2818157371501de097887f371784227f9dd9c91e177f986db7b25319c55d7 \
+    --hash=sha256:e5067aaf2be1f3d7bba5be52348775804f111173c1ed04639618fd713b1a530f \
+    --hash=sha256:e81798c610f95a09c642c89214ba5c23b72ce18ce4728184aceabe7eddca33d7 \
+    --hash=sha256:ebdc47b87b4b0216ceab3b5961716804bba3107c16454b65ae51d0e0c059f298 \
+    --hash=sha256:edb9797db7d852477ca1c99708058654ee860d4148fe5765f0d55528e2b1aa22
+    # via mlx-gen
+mlx-gen==0.18.14 \
+    --hash=sha256:0f1e9f473e712f740e8f082af33facc16c398d221dfb4de49d200906cb537a62 \
+    --hash=sha256:f5400df55fe6a611cc47df638816a29228f9fe6c2e1ef5ad2224f4d16dfe77ae
+    # via -r /tmp/video-req.in
+mlx-metal==0.31.2 \
+    --hash=sha256:84ffb60ee503f03eb684f5fb168d5cff31e2a16b7f27c1731eaf7662bd6e9b46 \
+    --hash=sha256:b25385bcee18fc194092255b8b53b9a3d8489eb650e59160f1b57aadd07aa2dc \
+    --hash=sha256:e9d4e5fce6ca10a87a0e388597f99519ad594d09e674708b5312bd8bd4f5997d
+    # via mlx
+more-itertools==11.1.0 \
+    --hash=sha256:48e8f4d9e7e5878571ecf6f2b4e57634f93cd474cc8cfbd2376f2d11b396e30d \
+    --hash=sha256:4b65538ae22f6fed0ce4874efd317463a7489796a0939fa66824dd542125a192
+    # via
+    #   jaraco-classes
+    #   jaraco-functools
+mpmath==1.3.0 \
+    --hash=sha256:7a28eb2a9774d00c7bc92411c19a89209d5da7c4c9a9e227be8330a23a25b91f \
+    --hash=sha256:a0b2b9fe80bbcd81a6647ff13108738cfb482d481d826cc0e02f5b35e5c88d2c
+    # via sympy
+networkx==3.6.1 \
+    --hash=sha256:26b7c357accc0c8cde558ad486283728b65b6a95d85ee1cd66bafab4c8168509 \
+    --hash=sha256:d47fbf302e7d9cbbb9e2555a0d267983d2aa476bac30e90dfbe5669bd57f3762
+    # via torch
+nh3==0.3.5 \
+    --hash=sha256:0a09f51806fd51b4fedbf9ea2b61fef388f19aef0d62fe51199d41648be14588 \
+    --hash=sha256:207c01801d3e9bb8ec08f08689346bdd30ce15b8bf60013a925d08b5388962a4 \
+    --hash=sha256:23a312224875f72cd16bde417f49071451877e29ef646a60e50fcb69407cc18a \
+    --hash=sha256:2c069570b06aa848457713ad7af4a9905691291548c4466a9ad78ee95808382b \
+    --hash=sha256:38748140bf76383ab7ce2dce0ad4cb663855d8fbc9098f7f3483673d09616a17 \
+    --hash=sha256:387abd011e81959d5a35151a11350a0795c6edeb53ebfa02d2e882dc01299263 \
+    --hash=sha256:3bb854485c9b33e5bb143ff3e49e577073bc6bc320f0ff8fc316dd89c0d3c101 \
+    --hash=sha256:45855e14ff056064fec77133bfcf7cd691838168e5e17bbef075394954dc9dc8 \
+    --hash=sha256:45e6a65dc88a300a2e3502cb9c8e6d1d6b831d6fba7470643333609c6aab1f30 \
+    --hash=sha256:488928988caad25ba14b1eb5bc74e25e21f3b5e40341d956f3ce4a8bc19460dc \
+    --hash=sha256:48f45e3e914be93a596431aa143dedf1582557bf41a58153c296048d6e3798c9 \
+    --hash=sha256:50d401ab2d8e86d59e2126e3ab2a2f45840c405842b626d9a51624b3a33b6878 \
+    --hash=sha256:52d877980d7ca01dc3baf3936bf844828bc6f332962227a684ed79c18cce14c3 \
+    --hash=sha256:559e4c73b689e9a7aa97ac9760b1bc488038d7c1a575aa4ab5a0e19ee9630c0f \
+    --hash=sha256:6ea58cc44d274c643b83547ca9654a0b1a817609b160601356f76a2b744c49ad \
+    --hash=sha256:72c5bdedec27fa33de6a5326346ea8aa3fe54f6ac294d54c4b204fb66a9f1e79 \
+    --hash=sha256:84bdeb082544fbcb77a12c034dd77d7da0556fdc0727b787eb6214b958c15e29 \
+    --hash=sha256:8f85285700a18e9f3fc5bff41fe573fa84f81542ef13b48a89f9fecca0474d3b \
+    --hash=sha256:acfd354e61accbe4c74f8017c6e397a776916dfe47c48643cf7fd84ade826f93 \
+    --hash=sha256:c357f1d042c67f135a5e6babb2b0e3b9d9224ff4a3543240f597767b01384ffd \
+    --hash=sha256:c3aae321f67ae66cff2a627115f106a377d4475d10b0e13d97959a13486b9a88 \
+    --hash=sha256:c88605d8d468f7fc1b31e06129bc91d6c96f6c621776c9b504a0da9beac9df5f \
+    --hash=sha256:de8e8621853b6470fe928c684ee0d3f39ea8086cebafe4c416486488dea7b68d \
+    --hash=sha256:e49c9b564e6bcb03ecd2f057213df9a0de15a95812ac9db9600b590db23d3ae9 \
+    --hash=sha256:ea232933394d1d58bf7c4bb348dc4660eae6604e1ae81cd2ba6d9ed80d390f3b \
+    --hash=sha256:eeedc90ed8c42c327e8e10e621ccfa314fc6cce35d5929f4297ff1cdb89667c4 \
+    --hash=sha256:fe3a787dc76b50de6bee54ef242f26c41dfe47654428e3e94f0fae5bb6dd2cc1
+    # via readme-renderer
+numpy==2.4.6 \
+    --hash=sha256:001fbb8e08d942dd57599e781f2472269ee7f2755fae407b4f67b2f0b17da3f1 \
+    --hash=sha256:0280e0356c0829a18d9de1cb7eee50ec22ca639878d7240307ca0943d73cd2c4 \
+    --hash=sha256:043191bfa8eab18c776647b62723ac9dddece59743b13f49b2016094129c2b3f \
+    --hash=sha256:06ca2f61ec4385a07a6977c55ba998a4466c123642b4a32694d3128fce18c079 \
+    --hash=sha256:0a041d3d761dc3c35cc56ce0351506a02bcbc25f7b169f652435141a17db9096 \
+    --hash=sha256:0ab0a9c4ffb1a6d95ef519fe4247dba8eb6b18ad93999f76b7f657039acabd47 \
+    --hash=sha256:0c9136e14ed34a9e343a31c533d78a9813a69a3148332bce5e9821cb2f996e66 \
+    --hash=sha256:110f8b71aacb688ec69062bb7f6938a0f8acb01b7c1c4beb453c65b6d234584d \
+    --hash=sha256:112b06a867b235ef466ed3508ddf0238050df9c727cafb5301ac385b899189a1 \
+    --hash=sha256:17f9ade344e7d9b464a084d69bcf18fc691cb1db67c62ed80820bf4926d78f0e \
+    --hash=sha256:1e254a00cdf42b1e4d5b3d68d33af63268d41340d8885df2ab6470f2e1500147 \
+    --hash=sha256:1e978ec1e8bd0e0e4de6bb75de9d30cbb74db6b6a2bb727618613703ca0167dd \
+    --hash=sha256:25c692919ac5a01f170a3bfcd62d745b24fd095c353d50812637d6fcab442e75 \
+    --hash=sha256:260a5d70215b61ab4fadf5c7baacd64821842975eea312125ed3c39a6391b063 \
+    --hash=sha256:2803abfebfc990042cd494d8ce2d5f82e9d847af6d35ec486923aa19dbad5e73 \
+    --hash=sha256:29a287e0cf63ff528da061de6b9f64a4618da591ca1046aafc54062e40ca7eab \
+    --hash=sha256:29cb7f67d10b479ff07c17d33e39f78c07f71c40ef30d63c153d340e96cd3fb4 \
+    --hash=sha256:3213d622a0283a39a93d188f3cf72b26862df52fbb4ca3697f51705016523d41 \
+    --hash=sha256:33111801a01c12a8a1e3721f0a9232f8cfc8ae2c6b7098167e6f623c6073f402 \
+    --hash=sha256:357cc07a6d7b0b182ff02249616a03742827ebb1277546b5c7cd7f7620a45698 \
+    --hash=sha256:38efbc8de75c7a0fc1ac190162d892787f3f47b57cc291231aafee36b80982b7 \
+    --hash=sha256:4081eb135ac24158bd51cdfbef16f1c64df7063b1143f24731387137c092bec8 \
+    --hash=sha256:40fdc1ae7125e518ea98e53e69a4ebc27e1fd50510c47b7ea130cf21e5e1d42b \
+    --hash=sha256:4cfe66903cc32a9921a6733d96b19bb6abf310397581bbad89c228f5abaf0ee8 \
+    --hash=sha256:511dbaf848decaaaf4b4ca48032619fb3138710c4bf7da7617765edad1ef96b0 \
+    --hash=sha256:55cced7c52e981362f708ad635198e97a752dfba412cc03c23bbf3bd8d5cd662 \
+    --hash=sha256:56b39e5e0622a09a25bf5baf62f4bcf0cb8a41ae6e2819cf49bbc5a74c083f91 \
+    --hash=sha256:5dbbdb29840ca3d91ee0fece42fc29278886d908280bfec0a5846c6f901a3eb0 \
+    --hash=sha256:5f9fb9157b4ce2971008323afe46053787b526ef624fea915b261468a8421a0f \
+    --hash=sha256:6180d8b35af935aed8ece3a85e0a43f87393ae0ac87c8d2c8bd2c993f7270ef3 \
+    --hash=sha256:68a5124b13fa6cc2086764a20005d30bc0548146f7f5322f02fce212ca14317f \
+    --hash=sha256:68bb27509ac1b9a3443094260f6326150663b06abe40b73a2f81160623da5b67 \
+    --hash=sha256:6f41ae150c4e32db4f3310cdaf64b1593a03dbabe29eec77fc9b50fe64061df6 \
+    --hash=sha256:7265a2f3d436e54ef9f2b52b5c937e6be778781bd97a590319d7348f1c1ca997 \
+    --hash=sha256:72fbe16c6fac95aedf5937fa873445cec2110be35d8a4e9433d7501fd98dae6b \
+    --hash=sha256:7d92c3819208a60205a12a245c91ad70cb0a85336659b19b834205573ac8456e \
+    --hash=sha256:8155154c7c691289fe18f510b5d4657c68c67989f293f0535a91360392ff6538 \
+    --hash=sha256:81a1cca95ed5bb92aa8b10dd2cdc9a0d3853a50fad926c28b5d7e8ea54389627 \
+    --hash=sha256:89cd468399cfd2504718f0ba50e410dca55a170b61a02ad92bb18c8a65186e93 \
+    --hash=sha256:8ad03c0965fb3c692200e74d458ca28c1dbb4ce96f9a479a8aa041ad5fabca02 \
+    --hash=sha256:90f9849678c75fe7afa2d348ac842c168b0a4d3d61919687216dfc547976d853 \
+    --hash=sha256:948424b06129ce883307e8cff868c31396d8dc7630a59c61d70d98dbe70f222c \
+    --hash=sha256:9cd5ffd25db4e7ba6a375693b3fc0fc1791ec636c17db3720da19bde7180ec43 \
+    --hash=sha256:a0df0043bdb289bde1f62da130d20df23d58b45429f752bc7a8fc5325a225ecd \
+    --hash=sha256:a2c306dea656c12c68f51f4cea133cbe78ca7435eb28c735eac1d3ebe73be6e8 \
+    --hash=sha256:a7830bab239b79cda9c08c2da014761cafb48da6150e1da17ac06283f43b6089 \
+    --hash=sha256:a7c711e21628b52034bb5ab8d1bce291f752fcc5e92accc615778acee1ff4778 \
+    --hash=sha256:aaf159caa35993cb1f56fb9b8e4610d35758e7ca005412eb1daa856a78c9c4b1 \
+    --hash=sha256:ae506e6902902557576a26ff33eda8695e7ecb3cb36c3b573a0765dee114ebdb \
+    --hash=sha256:b507f5c4c1d508876d1819b6bf9a49d365b96320b5d4993426b33a23ca4b8261 \
+    --hash=sha256:bf162abab1c1a736333192707cef898e735a5ca00f38f27eeedf44b39d9e85eb \
+    --hash=sha256:c1a2af6c6ef86344a6b0db6b97834208bf598db514f2b155042439b62605601a \
+    --hash=sha256:c2d37ab77531417474168eb79d6d80b14f821a966818505d03013d0833edb7a8 \
+    --hash=sha256:c4fc99836233ea196540b17ab0983aff60ed07941751930f5f4d05bc3b3b7359 \
+    --hash=sha256:d581b735e177fdcdce6fed8e7e8880a3fb6ee4e3653a3ac6af01c6f4c03effc5 \
+    --hash=sha256:d6da64deb6b8ed903e7560180a92f2d804ee1ba5eeb849ac2748b8c1aba1f6d7 \
+    --hash=sha256:d8e8286dd7cea7895157318d1b91cdacac64c479f3cbc8dce548331728484751 \
+    --hash=sha256:ddea102b48f9e339f3948bf22040944184627a30fdf7f858667673b9c5f033c8 \
+    --hash=sha256:dfa20cc6ca228e6b155b11da03825975ce66aea520985dbbddf0f2a5a495c605 \
+    --hash=sha256:e3e5193ef5a3dc73bceee50f7fdc2c90dbb76c42df8d8fae3d1067a583df579e \
+    --hash=sha256:e3eeb0aabd6bd5ce64faae67e9935203a6991b4bc2a485a767fbafb2c5125f45 \
+    --hash=sha256:e5805d5a22fd19c8ccff10a9561f9df94436b0545619ea579db2d3c35294bce2 \
+    --hash=sha256:e85b752a1e912b70eaad4fafbd4d1238007ab221de2009b9a2f5ae7461239895 \
+    --hash=sha256:eaf7fa2de5c0be8ae6ff8e9bea2ccd725e980541244521d8d4b5f3354a27babe \
+    --hash=sha256:ebfb099f8dcf083deef3ac1ca4c1503f387cf76296fcb3816b66f5ecb5f54fdb \
+    --hash=sha256:ece3d2cfe132e7d51f44a832b303895e6f2d499c5e74dfbdb06ee246147a304a \
+    --hash=sha256:ed9749eef4cbd126da3dc1d6bcb3a57f5eb7ac6a6484146bdbf743f552dfc577 \
+    --hash=sha256:ede83e07a75dd06bc501566c1eca2afc0d61677c1472ac9ad93fdee6e638a48d \
+    --hash=sha256:ef4aea96ce4d3b074422cb4f2f64e216bf9e213004bb58ecfdf50ea02ea8eb9a \
+    --hash=sha256:f3a3570c4a2a16746ac2c31a7c7c7b0c186b95ce902e33db6f28094ed7387dda \
+    --hash=sha256:f407cb6b8e9d6d8c626bc73c945db1706035af8fd632295547bf1c9e46d092d6 \
+    --hash=sha256:f74a575920ab21fe304421a3fc28793d82e299cae9eccb37084e9fc7f3617c20
+    # via
+    #   contourpy
+    #   matplotlib
+    #   mlx-gen
+    #   opencv-python
+    #   transformers
+opencv-python==4.13.0.92 \
+    --hash=sha256:0bc2596e68f972ca452d80f444bc404e08807d021fbba40df26b61b18e01838a \
+    --hash=sha256:372fe164a3148ac1ca51e5f3ad0541a4a276452273f503441d718fab9c5e5f59 \
+    --hash=sha256:402033cddf9d294693094de5ef532339f14ce821da3ad7df7c9f6e8316da32cf \
+    --hash=sha256:423d934c9fafb91aad38edf26efb46da91ffbc05f3f59c4b0c72e699720706f5 \
+    --hash=sha256:5868a8c028a0b37561579bfb8ac1875babdc69546d236249fff296a8c010ccf9 \
+    --hash=sha256:620d602b8f7d8b8dab5f4b99c6eb353e78d3fb8b0f53db1bd258bb1aa001c1d5 \
+    --hash=sha256:bccaabf9eb7f897ca61880ce2869dcd9b25b72129c28478e7f2a5e8dee945616 \
+    --hash=sha256:caf60c071ec391ba51ed00a4a920f996d0b64e3e46068aac1f646b5de0326a19
+    # via mlx-gen
+packaging==26.2 \
+    --hash=sha256:5fc45236b9446107ff2415ce77c807cee2862cb6fac22b8a73826d0693b0980e \
+    --hash=sha256:ff452ff5a3e828ce110190feff1178bb1f2ea2281fa2075aadb987c2fb221661
+    # via
+    #   huggingface-hub
+    #   matplotlib
+    #   transformers
+    #   twine
+piexif==1.1.3 \
+    --hash=sha256:3bc435d171720150b81b15d27e05e54b8abbde7b4242cddd81ef160d283108b6 \
+    --hash=sha256:83cb35c606bf3a1ea1a8f0a25cb42cf17e24353fd82e87ae3884e74a302a5f1b
+    # via mlx-gen
+pillow==12.2.0 \
+    --hash=sha256:00a2865911330191c0b818c59103b58a5e697cae67042366970a6b6f1b20b7f9 \
+    --hash=sha256:01afa7cf67f74f09523699b4e88c73fb55c13346d212a59a2db1f86b0a63e8c5 \
+    --hash=sha256:03e7e372d5240cc23e9f07deca4d775c0817bffc641b01e9c3af208dbd300987 \
+    --hash=sha256:03f6fab9219220f041c74aeaa2939ff0062bd5c364ba9ce037197f4c6d498cd9 \
+    --hash=sha256:042db20a421b9bafecc4b84a8b6e444686bd9d836c7fd24542db3e7df7baad9b \
+    --hash=sha256:0538bd5e05efec03ae613fd89c4ce0368ecd2ba239cc25b9f9be7ed426b0af1f \
+    --hash=sha256:0a34329707af4f73cf1782a36cd2289c0368880654a2c11f027bcee9052d35dd \
+    --hash=sha256:0c838a5125cee37e68edec915651521191cef1e6aa336b855f495766e77a366e \
+    --hash=sha256:144748b3af2d1b358d41286056d0003f47cb339b8c43a9ea42f5fea4d8c66b6e \
+    --hash=sha256:1610dd6c61621ae1cf811bef44d77e149ce3f7b95afe66a4512f8c59f25d9ebe \
+    --hash=sha256:1e1757442ed87f4912397c6d35a0db6a7b52592156014706f17658ff58bbf795 \
+    --hash=sha256:22db17c68434de69d8ecfc2fe821569195c0c373b25cccb9cbdacf2c6e53c601 \
+    --hash=sha256:25373b66e0dd5905ed63fa3cae13c82fbddf3079f2c8bf15c6fb6a35586324c1 \
+    --hash=sha256:2bb4a8d594eacdfc59d9e5ad972aa8afdd48d584ffd5f13a937a664c3e7db0ed \
+    --hash=sha256:2c727a6d53cb0018aadd8018c2b938376af27914a68a492f59dfcaca650d5eea \
+    --hash=sha256:2d192a155bbcec180f8564f693e6fd9bccff5a7af9b32e2e4bf8c9c69dbad6b5 \
+    --hash=sha256:2e589959f10d9824d39b350472b92f0ce3b443c0a3442ebf41c40cb8361c5b97 \
+    --hash=sha256:2e5a76d03a6c6dcef67edabda7a52494afa4035021a79c8558e14af25313d453 \
+    --hash=sha256:325ca0528c6788d2a6c3d40e3568639398137346c3d6e66bb61db96b96511c98 \
+    --hash=sha256:34c0d99ecccea270c04882cb3b86e7b57296079c9a4aff88cb3b33563d95afaa \
+    --hash=sha256:390ede346628ccc626e5730107cde16c42d3836b89662a115a921f28440e6a3b \
+    --hash=sha256:394167b21da716608eac917c60aa9b969421b5dcbbe02ae7f013e7b85811c69d \
+    --hash=sha256:3997232e10d2920a68d25191392e3a4487d8183039e1c74c2297f00ed1c50705 \
+    --hash=sha256:3adc9215e8be0448ed6e814966ecf3d9952f0ea40eb14e89a102b87f450660d8 \
+    --hash=sha256:3e080565d8d7c671db5802eedfb438e5565ffa40115216eabb8cd52d0ecce024 \
+    --hash=sha256:4a6c9fa44005fa37a91ebfc95d081e8079757d2e904b27103f4f5fa6f0bf78c0 \
+    --hash=sha256:4bfd07bc812fbd20395212969e41931001fd59eb55a60658b0e5710872e95286 \
+    --hash=sha256:4e6c62e9d237e9b65fac06857d511e90d8461a32adcc1b9065ea0c0fa3a28150 \
+    --hash=sha256:50d8520da2a6ce0af445fa6d648c4273c3eeefbc32d7ce049f22e8b5c3daecc2 \
+    --hash=sha256:51c4167c34b0d8ba05b547a3bb23578d0ba17b80a5593f93bd8ecb123dd336a3 \
+    --hash=sha256:56a3f9c60a13133a98ecff6197af34d7824de9b7b38c3654861a725c970c197b \
+    --hash=sha256:56b25336f502b6ed02e889f4ece894a72612fe885889a6e8c4c80239ff6e5f5f \
+    --hash=sha256:57850958fe9c751670e49b2cecf6294acc99e562531f4bd317fa5ddee2068463 \
+    --hash=sha256:58f62cc0f00fd29e64b29f4fd923ffdb3859c9f9e6105bfc37ba1d08994e8940 \
+    --hash=sha256:5c0a9f29ca8e79f09de89293f82fc9b0270bb4af1d58bc98f540cc4aedf03166 \
+    --hash=sha256:5cdfebd752ec52bf5bb4e35d9c64b40826bc5b40a13df7c3cda20a2c03a0f5ed \
+    --hash=sha256:5d04bfa02cc2d23b497d1e90a0f927070043f6cbf303e738300532379a4b4e0f \
+    --hash=sha256:5d2fd0fa6b5d9d1de415060363433f28da8b1526c1c129020435e186794b3795 \
+    --hash=sha256:62f5409336adb0663b7caa0da5c7d9e7bdbaae9ce761d34669420c2a801b2780 \
+    --hash=sha256:632ff19b2778e43162304d50da0181ce24ac5bb8180122cbe1bf4673428328c7 \
+    --hash=sha256:6562ace0d3fb5f20ed7290f1f929cae41b25ae29528f2af1722966a0a02e2aa1 \
+    --hash=sha256:673aa32138f3e7531ccdbca7b3901dba9b70940a19ccecc6a37c77d5fdeb05b5 \
+    --hash=sha256:6a6e67ea2e6feda684ed370f9a1c52e7a243631c025ba42149a2cc5934dec295 \
+    --hash=sha256:6a9adfc6d24b10f89588096364cc726174118c62130c817c2837c60cf08a392b \
+    --hash=sha256:6bb77b2dcb06b20f9f4b4a8454caa581cd4dd0643a08bacf821216a16d9c8354 \
+    --hash=sha256:6e6b2a0c538fc200b38ff9eb6628228b77908c319a005815f2dde585a0664b60 \
+    --hash=sha256:71cde9a1e1551df7d34a25462fc60325e8a11a82cc2e2f54578e5e9a1e153d65 \
+    --hash=sha256:7371b48c4fa448d20d2714c9a1f775a81155050d383333e0a6c15b1123dda005 \
+    --hash=sha256:766cef22385fa1091258ad7e6216792b156dc16d8d3fa607e7545b2b72061f1c \
+    --hash=sha256:7b14cc0106cd9aecda615dd6903840a058b4700fcb817687d0ee4fc8b6e389be \
+    --hash=sha256:7f84204dee22a783350679a0333981df803dac21a0190d706a50475e361c93f5 \
+    --hash=sha256:8023abc91fba39036dbce14a7d6535632f99c0b857807cbbbf21ecc9f4717f06 \
+    --hash=sha256:80b2da48193b2f33ed0c32c38140f9d3186583ce7d516526d462645fd98660ae \
+    --hash=sha256:8297651f5b5679c19968abefd6bb84d95fe30ef712eb1b2d9b2d31ca61267f4c \
+    --hash=sha256:88d387ff40b3ff7c274947ed3125dedf5262ec6919d83946753b5f3d7c67ea4c \
+    --hash=sha256:88ddbc66737e277852913bd1e07c150cc7bb124539f94c4e2df5344494e0a612 \
+    --hash=sha256:8bd7903a5f2a4545f6fd5935c90058b89d30045568985a71c79f5fd6edf9b91e \
+    --hash=sha256:8be29e59487a79f173507c30ddf57e733a357f67881430449bb32614075a40ab \
+    --hash=sha256:8c984051042858021a54926eb597d6ee3012393ce9c181814115df4c60b9a808 \
+    --hash=sha256:8cbeb542b2ebc6fcdacabf8aca8c1a97c9b3ad3927d46b8723f9d4f033288a0f \
+    --hash=sha256:8e9c4f5b3c546fa3458a29ab22646c1c6c787ea8f5ef51300e5a60300736905e \
+    --hash=sha256:90e6f81de50ad6b534cab6e5aef77ff6e37722b2f5d908686f4a5c9eba17a909 \
+    --hash=sha256:975385f4776fafde056abb318f612ef6285b10a1f12b8570f3647ad0d74b48ec \
+    --hash=sha256:9a8a34cc89c67a65ea7437ce257cea81a9dad65b29805f3ecee8c8fe8ff25ffe \
+    --hash=sha256:9aba9a17b623ef750a4d11b742cbafffeb48a869821252b30ee21b5e91392c50 \
+    --hash=sha256:9f08483a632889536b8139663db60f6724bfcb443c96f1b18855860d7d5c0fd4 \
+    --hash=sha256:a4e8f36e677d3336f35089648c8955c51c6d386a13cf6ee9c189c5f5bd713a9f \
+    --hash=sha256:a52edc8bfff4429aaabdf4d9ee0daadbbf8562364f940937b941f87a4290f5ff \
+    --hash=sha256:a830b1a40919539d07806aa58e1b114df53ddd43213d9c8b75847eee6c0182b5 \
+    --hash=sha256:aa88ccfe4e32d362816319ed727a004423aab09c5cea43c01a4b435643fa34eb \
+    --hash=sha256:af73337013e0b3b46f175e79492d96845b16126ddf79c438d7ea7ff27783a414 \
+    --hash=sha256:b1c1fbd8a5a1af3412a0810d060a78b5136ec0836c8a4ef9aa11807f2a22f4e1 \
+    --hash=sha256:b85f66ae9eb53e860a873b858b789217ba505e5e405a24b85c0464822fe88032 \
+    --hash=sha256:b86024e52a1b269467a802258c25521e6d742349d760728092e1bc2d135b4d76 \
+    --hash=sha256:bd9c0c7a0c681a347b3194c500cb1e6ca9cab053ea4d82a5cf45b6b754560136 \
+    --hash=sha256:bfa9c230d2fe991bed5318a5f119bd6780cda2915cca595393649fc118ab895e \
+    --hash=sha256:d362d1878f00c142b7e1a16e6e5e780f02be8195123f164edf7eddd911eefe7c \
+    --hash=sha256:d5d38f1411c0ed9f97bcb49b7bd59b6b7c314e0e27420e34d99d844b9ce3b6f3 \
+    --hash=sha256:dac8d77255a37e81a2efcbd1fc05f1c15ee82200e6c240d7e127e25e365c39ea \
+    --hash=sha256:dd025009355c926a84a612fecf58bb315a3f6814b17ead51a8e48d3823d9087f \
+    --hash=sha256:deede7c263feb25dba4e82ea23058a235dcc2fe1f6021025dc71f2b618e26104 \
+    --hash=sha256:e74473c875d78b8e9d5da2a70f7099549f9eb37ded4e2f6a463e60125bccd176 \
+    --hash=sha256:ee3120ae9dff32f121610bb08e4313be87e03efeadfc6c0d18f89127e24d0c24 \
+    --hash=sha256:eedf4b74eda2b5a4b2b2fb4c006d6295df3bf29e459e198c90ea48e130dc75c3 \
+    --hash=sha256:efd8c21c98c5cc60653bcb311bef2ce0401642b7ce9d09e03a7da87c878289d4 \
+    --hash=sha256:f1c943e96e85df3d3478f7b691f229887e143f81fedab9b20205349ab04d73ed \
+    --hash=sha256:f278f034eb75b4e8a13a54a876cc4a5ab39173d2cdd93a638e1b467fc545ac43 \
+    --hash=sha256:f3f40b3c5a968281fd507d519e444c35f0ff171237f4fdde090dd60699458421 \
+    --hash=sha256:f490f9368b6fc026f021db16d7ec2fbf7d89e2edb42e8ec09d2c60505f5729c7 \
+    --hash=sha256:fb043ee2f06b41473269765c2feae53fc2e2fbf96e5e22ca94fb5ad677856f06 \
+    --hash=sha256:fc3d34d4a8fbec3e88a79b92e5465e0f9b842b628675850d860b8bd300b159f5
+    # via
+    #   matplotlib
+    #   mlx-gen
+platformdirs==4.10.0 \
+    --hash=sha256:31e761a6a0ca04faf7353ea759bdba55652be214725111e5aac52dfa29d4bef7 \
+    --hash=sha256:fb516cdb12eb0d857d0cd85a7c57cea4d060bee4578d6cf5a14dfdf8cbf8784a
+    # via mlx-gen
+protobuf==7.35.0 \
+    --hash=sha256:4c4617b83ade0e279d1d2bfe04025a1adb87f9ed657de038620dc0ff959357f6 \
+    --hash=sha256:4cbf5cc286130e06a6c9bbefac442431173906dfcc979712183d4adcc01b37ee \
+    --hash=sha256:66be6c513931c794fa92c080ffee41671390da3d79da219cf9c0c0907f035dda \
+    --hash=sha256:6c0f98f10c8a05ea30f8993dfef2de093d27b490fdae78bb60c8343795d55011 \
+    --hash=sha256:a2efd84605f41e559f1881b0912b44099d0a2ac9bf46b3474823f10fb393b0e6 \
+    --hash=sha256:c13f325cf242bad135c350629eeb5d54b24228eb472fb3e2e9ebbd4c5dc20ca0 \
+    --hash=sha256:f05bcadf9a2a6b8dda047007075135fb7d08c73d9177aabc067e1be46881a201 \
+    --hash=sha256:fcbe42a4ac09d3ec9c987ddfcd956afd0b15f1ff613bd8371bde9405ffd5c8e5
+    # via mlx-gen
+pygments==2.20.0 \
+    --hash=sha256:6757cd03768053ff99f3039c1a36d6c0aa0b263438fcab17520b30a303a82b5f \
+    --hash=sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176
+    # via
+    #   readme-renderer
+    #   rich
+pyparsing==3.3.2 \
+    --hash=sha256:850ba148bd908d7e2411587e247a1e4f0327839c40e2e5e6d05a007ecc69911d \
+    --hash=sha256:c777f4d763f140633dcb6d8a3eda953bf7a214dc4eff598413c070bcdc117cbc
+    # via matplotlib
+python-dateutil==2.9.0.post0 \
+    --hash=sha256:37dd54208da7e1cd875388217d5e00ebd4179249f90fb72437e91a35459a0ad3 \
+    --hash=sha256:a8b2bc7bffae282281c8140a97d3aa9c14da0b136dfe83f850eea9a5f7470427
+    # via matplotlib
+pyyaml==6.0.3 \
+    --hash=sha256:00c4bdeba853cc34e7dd471f16b4114f4162dc03e6b7afcc2128711f0eca823c \
+    --hash=sha256:0150219816b6a1fa26fb4699fb7daa9caf09eb1999f3b70fb6e786805e80375a \
+    --hash=sha256:02893d100e99e03eda1c8fd5c441d8c60103fd175728e23e431db1b589cf5ab3 \
+    --hash=sha256:02ea2dfa234451bbb8772601d7b8e426c2bfa197136796224e50e35a78777956 \
+    --hash=sha256:0f29edc409a6392443abf94b9cf89ce99889a1dd5376d94316ae5145dfedd5d6 \
+    --hash=sha256:10892704fc220243f5305762e276552a0395f7beb4dbf9b14ec8fd43b57f126c \
+    --hash=sha256:16249ee61e95f858e83976573de0f5b2893b3677ba71c9dd36b9cf8be9ac6d65 \
+    --hash=sha256:1d37d57ad971609cf3c53ba6a7e365e40660e3be0e5175fa9f2365a379d6095a \
+    --hash=sha256:1ebe39cb5fc479422b83de611d14e2c0d3bb2a18bbcb01f229ab3cfbd8fee7a0 \
+    --hash=sha256:214ed4befebe12df36bcc8bc2b64b396ca31be9304b8f59e25c11cf94a4c033b \
+    --hash=sha256:2283a07e2c21a2aa78d9c4442724ec1eb15f5e42a723b99cb3d822d48f5f7ad1 \
+    --hash=sha256:22ba7cfcad58ef3ecddc7ed1db3409af68d023b7f940da23c6c2a1890976eda6 \
+    --hash=sha256:27c0abcb4a5dac13684a37f76e701e054692a9b2d3064b70f5e4eb54810553d7 \
+    --hash=sha256:28c8d926f98f432f88adc23edf2e6d4921ac26fb084b028c733d01868d19007e \
+    --hash=sha256:2e71d11abed7344e42a8849600193d15b6def118602c4c176f748e4583246007 \
+    --hash=sha256:34d5fcd24b8445fadc33f9cf348c1047101756fd760b4dacb5c3e99755703310 \
+    --hash=sha256:37503bfbfc9d2c40b344d06b2199cf0e96e97957ab1c1b546fd4f87e53e5d3e4 \
+    --hash=sha256:3c5677e12444c15717b902a5798264fa7909e41153cdf9ef7ad571b704a63dd9 \
+    --hash=sha256:3ff07ec89bae51176c0549bc4c63aa6202991da2d9a6129d7aef7f1407d3f295 \
+    --hash=sha256:41715c910c881bc081f1e8872880d3c650acf13dfa8214bad49ed4cede7c34ea \
+    --hash=sha256:418cf3f2111bc80e0933b2cd8cd04f286338bb88bdc7bc8e6dd775ebde60b5e0 \
+    --hash=sha256:44edc647873928551a01e7a563d7452ccdebee747728c1080d881d68af7b997e \
+    --hash=sha256:4a2e8cebe2ff6ab7d1050ecd59c25d4c8bd7e6f400f5f82b96557ac0abafd0ac \
+    --hash=sha256:4ad1906908f2f5ae4e5a8ddfce73c320c2a1429ec52eafd27138b7f1cbe341c9 \
+    --hash=sha256:501a031947e3a9025ed4405a168e6ef5ae3126c59f90ce0cd6f2bfc477be31b7 \
+    --hash=sha256:5190d403f121660ce8d1d2c1bb2ef1bd05b5f68533fc5c2ea899bd15f4399b35 \
+    --hash=sha256:5498cd1645aa724a7c71c8f378eb29ebe23da2fc0d7a08071d89469bf1d2defb \
+    --hash=sha256:5cf4e27da7e3fbed4d6c3d8e797387aaad68102272f8f9752883bc32d61cb87b \
+    --hash=sha256:5e0b74767e5f8c593e8c9b5912019159ed0533c70051e9cce3e8b6aa699fcd69 \
+    --hash=sha256:5ed875a24292240029e4483f9d4a4b8a1ae08843b9c54f43fcc11e404532a8a5 \
+    --hash=sha256:5fcd34e47f6e0b794d17de1b4ff496c00986e1c83f7ab2fb8fcfe9616ff7477b \
+    --hash=sha256:5fdec68f91a0c6739b380c83b951e2c72ac0197ace422360e6d5a959d8d97b2c \
+    --hash=sha256:6344df0d5755a2c9a276d4473ae6b90647e216ab4757f8426893b5dd2ac3f369 \
+    --hash=sha256:64386e5e707d03a7e172c0701abfb7e10f0fb753ee1d773128192742712a98fd \
+    --hash=sha256:652cb6edd41e718550aad172851962662ff2681490a8a711af6a4d288dd96824 \
+    --hash=sha256:66291b10affd76d76f54fad28e22e51719ef9ba22b29e1d7d03d6777a9174198 \
+    --hash=sha256:66e1674c3ef6f541c35191caae2d429b967b99e02040f5ba928632d9a7f0f065 \
+    --hash=sha256:6adc77889b628398debc7b65c073bcb99c4a0237b248cacaf3fe8a557563ef6c \
+    --hash=sha256:79005a0d97d5ddabfeeea4cf676af11e647e41d81c9a7722a193022accdb6b7c \
+    --hash=sha256:7c6610def4f163542a622a73fb39f534f8c101d690126992300bf3207eab9764 \
+    --hash=sha256:7f047e29dcae44602496db43be01ad42fc6f1cc0d8cd6c83d342306c32270196 \
+    --hash=sha256:8098f252adfa6c80ab48096053f512f2321f0b998f98150cea9bd23d83e1467b \
+    --hash=sha256:850774a7879607d3a6f50d36d04f00ee69e7fc816450e5f7e58d7f17f1ae5c00 \
+    --hash=sha256:8d1fab6bb153a416f9aeb4b8763bc0f22a5586065f86f7664fc23339fc1c1fac \
+    --hash=sha256:8da9669d359f02c0b91ccc01cac4a67f16afec0dac22c2ad09f46bee0697eba8 \
+    --hash=sha256:8dc52c23056b9ddd46818a57b78404882310fb473d63f17b07d5c40421e47f8e \
+    --hash=sha256:9149cad251584d5fb4981be1ecde53a1ca46c891a79788c0df828d2f166bda28 \
+    --hash=sha256:93dda82c9c22deb0a405ea4dc5f2d0cda384168e466364dec6255b293923b2f3 \
+    --hash=sha256:96b533f0e99f6579b3d4d4995707cf36df9100d67e0c8303a0c55b27b5f99bc5 \
+    --hash=sha256:9c57bb8c96f6d1808c030b1687b9b5fb476abaa47f0db9c0101f5e9f394e97f4 \
+    --hash=sha256:9c7708761fccb9397fe64bbc0395abcae8c4bf7b0eac081e12b809bf47700d0b \
+    --hash=sha256:9f3bfb4965eb874431221a3ff3fdcddc7e74e3b07799e0e84ca4a0f867d449bf \
+    --hash=sha256:a33284e20b78bd4a18c8c2282d549d10bc8408a2a7ff57653c0cf0b9be0afce5 \
+    --hash=sha256:a80cb027f6b349846a3bf6d73b5e95e782175e52f22108cfa17876aaeff93702 \
+    --hash=sha256:b30236e45cf30d2b8e7b3e85881719e98507abed1011bf463a8fa23e9c3e98a8 \
+    --hash=sha256:b3bc83488de33889877a0f2543ade9f70c67d66d9ebb4ac959502e12de895788 \
+    --hash=sha256:b865addae83924361678b652338317d1bd7e79b1f4596f96b96c77a5a34b34da \
+    --hash=sha256:b8bb0864c5a28024fac8a632c443c87c5aa6f215c0b126c449ae1a150412f31d \
+    --hash=sha256:ba1cc08a7ccde2d2ec775841541641e4548226580ab850948cbfda66a1befcdc \
+    --hash=sha256:bdb2c67c6c1390b63c6ff89f210c8fd09d9a1217a465701eac7316313c915e4c \
+    --hash=sha256:c1ff362665ae507275af2853520967820d9124984e0f7466736aea23d8611fba \
+    --hash=sha256:c2514fceb77bc5e7a2f7adfaa1feb2fb311607c9cb518dbc378688ec73d8292f \
+    --hash=sha256:c3355370a2c156cffb25e876646f149d5d68f5e0a3ce86a5084dd0b64a994917 \
+    --hash=sha256:c458b6d084f9b935061bc36216e8a69a7e293a2f1e68bf956dcd9e6cbcd143f5 \
+    --hash=sha256:d0eae10f8159e8fdad514efdc92d74fd8d682c933a6dd088030f3834bc8e6b26 \
+    --hash=sha256:d76623373421df22fb4cf8817020cbb7ef15c725b9d5e45f17e189bfc384190f \
+    --hash=sha256:ebc55a14a21cb14062aa4162f906cd962b28e2e9ea38f9b4391244cd8de4ae0b \
+    --hash=sha256:eda16858a3cab07b80edaf74336ece1f986ba330fdb8ee0d6c0d68fe82bc96be \
+    --hash=sha256:ee2922902c45ae8ccada2c5b501ab86c36525b883eff4255313a253a3160861c \
+    --hash=sha256:efd7b85f94a6f21e4932043973a7ba2613b059c4a000551892ac9f1d11f5baf3 \
+    --hash=sha256:f7057c9a337546edc7973c0d3ba84ddcdf0daa14533c2065749c9075001090e6 \
+    --hash=sha256:fa160448684b4e94d80416c0fa4aac48967a969efe22931448d853ada8baf926 \
+    --hash=sha256:fc09d0aa354569bc501d4e787133afc08552722d3ab34836a80547331bb5d4a0
+    # via
+    #   huggingface-hub
+    #   transformers
+readme-renderer==45.0 \
+    --hash=sha256:030a8fac74904f8fba11ad1bb6964e3f76e896dc7e5e71f16af190c9056696d1 \
+    --hash=sha256:3385ed220117104a2bceb4a9dac8c5fdf6d1f96890d7ea2a9c7174fd5c84091f
+    # via twine
+regex==2026.5.9 \
+    --hash=sha256:002205cafd2a9e78c6290c7d1df277bf3277b3b7a30e0b4bb0dac2e2e3f7cb2d \
+    --hash=sha256:01f0f5f55f4b64dacec85dc116d3c05fd23ad3ff037bbc73a2085775953c2611 \
+    --hash=sha256:01f28d868834624c934b8d2e0aa1c8341337e37831f4a012f18a5afcba4cbaf3 \
+    --hash=sha256:075160bf16658e16d35233300b8453aac25de4cbea808d22348b6979668e924d \
+    --hash=sha256:0de5cf193997384ed2ca6f1cd4f78055b255d93d82d5a8cd6ba0d11c10b167e4 \
+    --hash=sha256:0e1b1b4e496afbb24f4a62aba855ee4f88f25578927697b340702e48c9ee6bc2 \
+    --hash=sha256:0f03aa6898aaaac4592479821df16e68e8d0e29e903e65d8f2dfb2f19028a989 \
+    --hash=sha256:0f9eede6a5cbdc02d4978090186390936e1776a7d1359b21e41014c609880bcf \
+    --hash=sha256:1268eddd8486dc561d08eee1156e40aa3a8fe10f4bdec8fa653b455fcbffd12c \
+    --hash=sha256:15ee42209947f4ca045412eae98416317238163618ace2a8e54f99586a466733 \
+    --hash=sha256:164eba9b755ea6f244b0d881196fbc1fac09714e9782c9e2732b813142033c8e \
+    --hash=sha256:19c16ceb4a267a8789e25733e583983eeab9f0f8664e66b0bd1c5d21f14c2d4b \
+    --hash=sha256:1bd7587a2948b4085195d5a3374eaf4a425dc3e55784c038175355ecf3bbbf8a \
+    --hash=sha256:1e6da47d679b7010ef27556b6e0f99771b744936db1792a10ceac6547ae1503e \
+    --hash=sha256:205109e96b3cf5adf8f4cd62bedde9487feb282b9497a3535451e5a24cd706a0 \
+    --hash=sha256:2099f7e7ff7b6aa3192312650a56e91cc091e49d50b04e4f6f8b6e28b3b27f1c \
+    --hash=sha256:246de9d60aa3f8538b519834dd95cbf276ea263d6a7bd5a3666dc3fa0230505b \
+    --hash=sha256:24b2355ef5cc9aa5b8f07d17704face1c166fdcc2290fa7bd6e6c925655a8346 \
+    --hash=sha256:2a661a7d270a61f7cf460caee8b9fa2d5ef9e5c681234bcb9e0fe14f488e7dfc \
+    --hash=sha256:2acfb48634f64996b57f90f39afa692ff362162722581921fe92239a59960f3c \
+    --hash=sha256:2efa205e6d98b24d1f3ab395c11aa15cdf10935bca283d0285e0499c284fba21 \
+    --hash=sha256:31037c82eccb44b7ea2e9e221d7c01429430e989a1f4b91ea5a855f6017b509a \
+    --hash=sha256:3527bb4942d2c14552155406cdedd906567456821848aed1cb4933a391bf5eca \
+    --hash=sha256:39617fb0cde9c0e6306dc70e3bfc096f3da793219879f7ae7aa341a69fbdcf6d \
+    --hash=sha256:398c521292f4c7fb807001dcd54694d3a1fcafc179a36ad9cc56f98df85930b6 \
+    --hash=sha256:3b1e39888c5e0c7d92cea4fc777396c4a90363b05de75d02eb459a4752200808 \
+    --hash=sha256:3dd4a3ff360dfb836fecdb93a4598f9d6e2ac81e3e397125145c6221bf58cf4c \
+    --hash=sha256:3ddd90103f9e5c471c49c7852ecc1fe27c7e45eb99e977aefe7caa4e779f4f58 \
+    --hash=sha256:446ddd671e43ab535810c4b21cff7104945c701d4a14d1e6d1cd6f4e445a8bea \
+    --hash=sha256:45375819235558a4ff1c4971dc32881f022613abdb180128f5cb4768c1765a1c \
+    --hash=sha256:46f1326ca6e65b0879d23ca302c0f2415aad42ff0309b9c818e7949fe19a41d8 \
+    --hash=sha256:48036f6374aaa79eb3b754ec29c61d1c6b1606749d705a13f8854fa2539671f6 \
+    --hash=sha256:4ebe8f0b5ec5a5024dc4a4c59f444c4e9afc5f2abdbb8962065b75d27fb971f9 \
+    --hash=sha256:4eeb011098fcb77af513dcef521a3dbecbf8849b1e38940759d293b7a93f5026 \
+    --hash=sha256:508f56a89ba9cb26e4168cbc37dbd60a28d82430a9e18ad1d25fe0883c314ca2 \
+    --hash=sha256:5604dfd046dc37eca90250fc3be938b076c8059fa772ac0ed6f499b0f0fb0415 \
+    --hash=sha256:56a33f191f17d8c417f99945ebdc1e691d3af9605d86ec68c7e54a57e3e17af6 \
+    --hash=sha256:57e8915c7986aa33d25e4d3629cef711cd2863f2961b10409f0c04cb8b7d9020 \
+    --hash=sha256:57eeeb05db7979413dec5438f2db21d7ecbba787cde7a711df1a6f6df672aa06 \
+    --hash=sha256:5b73ab8afcf66c622db143d1c6fda4e58e4d537ee4f125229ad47b1ab80f34c0 \
+    --hash=sha256:5e41809d2683fcde7d5a8c87a6567ba1fb1ce0de9f31bff578de00a4b2d76daa \
+    --hash=sha256:6351571c8a42b505eb555c0dc47d740d0fb66977dc142919eea6f4325b7c56a0 \
+    --hash=sha256:6441cc660d76107934a09c22167200839a0e89604a6297f78a974e66e931d2c0 \
+    --hash=sha256:65c8c8c37377794bd5b2f3ebe51919042bf17aec802e23c833d89782ed0c78af \
+    --hash=sha256:6ba42b2e7e7f46cf68cc6a5ca36fa07959f9bbd9c6bdcc47b6ee76549a590248 \
+    --hash=sha256:71b61c5bfe1c806332defc42ad6c780b3c55f661986d7f40283a3a88274b4c00 \
+    --hash=sha256:728d8bfd28a8845c8b6bc5dc7ce010453d206396786c0765c2740cb65f37791e \
+    --hash=sha256:7b92817338591505f282cf3864c145244b1edcf5381d237038df955001091538 \
+    --hash=sha256:7e30b874d341fac767d7df5a0870540541c2c054b80cfaac116e8d367a8a7ff2 \
+    --hash=sha256:7e87577720152d2caae19fe2baaf1f8d5ca12091e9e229f03915c37d1e4b9178 \
+    --hash=sha256:83d0ee4a57d1c87cb549e195ec300b8f0ec3a82eba66d835e4e2ed8634fe4499 \
+    --hash=sha256:8676474c07469d6f33dd1085ca2cd45f65785f32518f2b20e36d9953ca07f994 \
+    --hash=sha256:86f40a5d6444db30a125c9c9177e6b25dad981cbc37451fd838f145e6edac92e \
+    --hash=sha256:872acc074bd29ffc9913ecdfedf6ea77502312ca44a4aa0d3779089c6069d8de \
+    --hash=sha256:8abd33fef90b2a9efac5557d6033ca82d1195ed3a15fea5af15ba7b463c6a63b \
+    --hash=sha256:8c6e4218fbdfbcd4f6c19efca40930d24a621bf4b48cb76bc6640543bd28ef20 \
+    --hash=sha256:8e76e8161ad00694cfce6767d5dea860c6391ac5b83e5c3a39661e696f11fc7e \
+    --hash=sha256:8f3af7a4903c5c04a11a196a5aa75cdd7dd3f8508132f9fb3259d9f5908e3b88 \
+    --hash=sha256:91328f1c23d47595ca3ef0a7557fa129c5a23404b775c770697d2f35b33e0107 \
+    --hash=sha256:916714069da19329ef7de197dcbc77bb3104145c7c2c864dbfbe318f46b88b14 \
+    --hash=sha256:93a7860539414dddaefba2b40f8771765ae17949d4c7182b876ce429e11a8309 \
+    --hash=sha256:954cc214c04663ee6d266fc61739cad83054683048de65c5bd1d640ad28098ac \
+    --hash=sha256:96f5f58b54a063d7ea9dca08e1cf57bfe10499c4d579ee672da284f57f5f0070 \
+    --hash=sha256:97cf3bc1b7d7d2306772ec07366c80d9df00ff79e79cea32898883a646d2fae2 \
+    --hash=sha256:98bd73080e8756255137e1bd3f3f00295bbc5aa383c0e0f973920e9134d7c4ad \
+    --hash=sha256:992604d02e6d9c6d786c24a706a71ecffe1020fc1ef264044474cd81fa2c3919 \
+    --hash=sha256:a24852d3c29ad9e47593593d8a247c44ccc3d0548ef12c822d6ed0810affe676 \
+    --hash=sha256:a6a563446a41adc451393dc6b8e6ad87979efaee3c8738690a8d1b08ebead1b4 \
+    --hash=sha256:a8234aa23ec39894bfe4a3f1b85616a7032481964a13ac6fc9f10de4f6fca270 \
+    --hash=sha256:a8820737949116ffff55fe18f9fc644530063ba6ebfcb8314239416e78f1347c \
+    --hash=sha256:a9e1328e17c84c1a5d22ec9f785ecef4a967fab9a42b6a8dc3bcbebd0a0c9e44 \
+    --hash=sha256:aa0fbdbac82cb3e4450d0ccde7d7a35607f4cb2dd9fba4b8b69bfaf8c9fa6aed \
+    --hash=sha256:b310768746dd314ea6e2ff4cc89ef215426813396ff4e94ee8e6f7096c8b6e03 \
+    --hash=sha256:b46b0f094dc1d3b90356c85a0bd2c9bafc4a6a190b9d6f8ddd5a033b6e088ed4 \
+    --hash=sha256:b4bb445ff3f725f59df8f6014edb547ee928ec7023a774f6a39a3f953038cbb2 \
+    --hash=sha256:b6d189041f15691cfa2b6c4290448ec221244d225b3f5fe9e7771b34ffcdf6e2 \
+    --hash=sha256:b96350aa424e79d4fd6b567b344dcbe2b2d6bfc48dfe7717587e1fa6d43da6ff \
+    --hash=sha256:be3372b9df6ddecff6486d37e19095a7b4973137caf5512407a89f4455361f41 \
+    --hash=sha256:bfe1ce50cbfb569d74e1e4337da6468961f31dbea55fd85aa5de59c0947a805a \
+    --hash=sha256:c010eb8caca74bdb40c07498d7ece26b4428fd3f04aa8a72c9ac6f79e8faaac6 \
+    --hash=sha256:c8b9b9d294cfea3cd19c718ade7cc93492b2c4991abd9a68d0b3477ae6d8e100 \
+    --hash=sha256:c9411dd64ca95477225734a93dfc8583b51916b8d5942f99d6cac21e09965451 \
+    --hash=sha256:ca518ed29c46eecba6010b15f1b9a479314d2de409536e71b6a13aa04e3b8a77 \
+    --hash=sha256:ccf5249114cc3e772ecdd88a98a86eca0fd74c61ce32a94743758c083fc05d48 \
+    --hash=sha256:cd2846168eb9ee3c513902bc8225409cb1caab31d04728b145171fa1625d9621 \
+    --hash=sha256:d29eebfc9525db68cad3c97eedd7f754fa265aa5cd0cf4f863b2421e1b48fc9f \
+    --hash=sha256:d3d7eb5c9a7f6df82ed3cfac9beb93882a5cbcb5b8b157b56cb2b3b276574ac1 \
+    --hash=sha256:d626b84406444b165fc0ba981604edea39f0588ff1f92baa23fe50799ea9afdb \
+    --hash=sha256:d641a8c9a61618047796d572a39a79b26167b0411d2c3031937b2fe2d081e2cf \
+    --hash=sha256:d659eee77986549c9ea45b861c7567e44d6287c3dc9a4565478853f7b9fe2ff6 \
+    --hash=sha256:d6b8a143aca6c39b446ea8092cde25cc8fe9304d4f5fecfbc1a9dbb0282703c2 \
+    --hash=sha256:d726ca3f0d76969bf1e8e477d160d3d666bbf999f6860bd314889e5345782046 \
+    --hash=sha256:d7bdc0ab8f3dd7e1b4f9ab88634e13374669db86bb3c72e8292f07ae313f539f \
+    --hash=sha256:daff2bdbaf1d23e52fdff7c0b7bc2048b68f978df6a4d107ac981f94caef2e66 \
+    --hash=sha256:dd2810d22146b6d838acc5ec15602cb6b47920aa4e33015df3868eedfd20bab8 \
+    --hash=sha256:ddda5340e6c01a293027dd46232fa79eaff1b48058ce7a98f572b6445b088041 \
+    --hash=sha256:dea2e88e1cce4522496cce630e11e67b98b7076620bc4336c3f674bc21a375f4 \
+    --hash=sha256:debb893095e944091c16e641a6e33c1b0f4cb61ab945ec5afbf53ce7068834d8 \
+    --hash=sha256:dfbe4579b9f08036aa7d101d1835437a20783574ac66327e6b29b4018a138081 \
+    --hash=sha256:e1d93bf647916292e8edcec150c07ddf3dc50179ccaf770c04a7f9e452155372 \
+    --hash=sha256:e82db382b44d0111b22601c509c89f64434816c9e0eef9d1989cda8cc6ff1c04 \
+    --hash=sha256:ea9c8ecfa1b73c73b626534d6626e5340d429630943672b8480724f44e84b962 \
+    --hash=sha256:ead4b163ac30a29574510cd4b3e2e985ac5290c05fc7095557d6a5f403fc31b5 \
+    --hash=sha256:ecd353045824e4477562a2ac718c25799cdaaa41f7aa925a806a8a3e6848a5b9 \
+    --hash=sha256:ed2c9e8068b614c574d8d30e543d617cf5379b0535d46f97ef00e904745a08b5 \
+    --hash=sha256:ed457d8e98ae812ed7732bef7bf78de78e834eae0372a74e23ca90ef21d910f9 \
+    --hash=sha256:ef31cbfe458e21c6122ba8150ff060e0c7789ed0d26eb423f25472584920b555 \
+    --hash=sha256:f079e50a0d3cc3cd5091fa9ff45869a2e6b2cd35895731edafb0327901a8d86d \
+    --hash=sha256:f3844f134e834076677dd369976e9f5068679fcb8e50102fdf6b7ac96a3ec127 \
+    --hash=sha256:f7a7c26137296beba7784de6eba69c6a93a63ccebc385e4962fe67e267a91225 \
+    --hash=sha256:fa411799ca8da32a8d38d020a88faa5b6f91657d284761352940ecf9f7c3bbdd \
+    --hash=sha256:fd03c4f0e33280d15cae17159b899245d6b7c53d21def19b263b39655061f5ce \
+    --hash=sha256:fd190e88a895a8901325fad284a3f74ea52b1da8525b76cc811fa9b1edf0ce2b \
+    --hash=sha256:ff8d372ac2acdc048d1c19916f27ee61bc5722728458ba6ca5052f2c72d51763
+    # via
+    #   mlx-gen
+    #   transformers
+requests==2.34.2 \
+    --hash=sha256:2a0d60c172f83ac6ab31e4554906c0f3b3588d37b5cb939b1c061f4907e278e0 \
+    --hash=sha256:f288924cae4e29463698d6d60bc6a4da69c89185ad1e0bcc4104f584e960b9ed
+    # via
+    #   mlx-gen
+    #   requests-toolbelt
+    #   twine
+requests-toolbelt==1.0.0 \
+    --hash=sha256:7681a0a3d047012b5bdc0ee37d7f8f07ebe76ab08caeccfc3921ce23c88d5bc6 \
+    --hash=sha256:cccfdd665f0a24fcf4726e690f65639d272bb0637b9b92dfd91a5568ccf6bd06
+    # via twine
+rfc3986==2.0.0 \
+    --hash=sha256:50b1502b60e289cb37883f3dfd34532b8873c7de9f49bb546641ce9cbd256ebd \
+    --hash=sha256:97aacf9dbd4bfd829baad6e6309fa6573aaf1be3f6fa735c8ab05e46cecb261c
+    # via twine
+rich==15.0.0 \
+    --hash=sha256:33bd4ef74232fb73fe9279a257718407f169c09b78a87ad3d296f548e27de0bb \
+    --hash=sha256:edd07a4824c6b40189fb7ac9bc4c52536e9780fbbfbddf6f1e2502c31b068c36
+    # via
+    #   twine
+    #   typer
+safetensors==0.8.0 \
+    --hash=sha256:040070828e36dc8e122178bbbd5830ff9e97920affb84cbe0f46442497bed358 \
+    --hash=sha256:096ec1a98435df7beb08853bb5aa9081a84f23d0adc67ed1a0a10550f608373f \
+    --hash=sha256:2ddf52eac562eda224f99acfa7889d02968c1fd59a5b011ae7d8137c37e9c02d \
+    --hash=sha256:3ae091f16662658bdc019a4ff6cb4c085bb7d725eb5978b183ffd265863b6d2d \
+    --hash=sha256:4124502b78f03534117c848f87a39b8f31e577b15eff423bf8bfb95f2a8c30d0 \
+    --hash=sha256:4a95ae2b05d7726d751da4ebf626a2ca782b706e101bd894c95bc2450b1cffcc \
+    --hash=sha256:7a46e5ff292c356d6991e60942ba7f79817682d3a2cef0702136448cb9c4d235 \
+    --hash=sha256:7bc0a787ba8a35be368ee3574edfa2b1ad389eebd0a72e482ae275490e3f6c98 \
+    --hash=sha256:87eec7ffed2b809f05a398a8becb7d013f19f7837cd15d9748580d6cf30dbaf4 \
+    --hash=sha256:8e080062fcde23be189565e1c3305d16751a218ecf9412c8601e64204eb6f846 \
+    --hash=sha256:8e9f537aa183a38ace122d27303dcd986b26bd2a7591f9181d7f0c396f4677ca \
+    --hash=sha256:c554f85858e05226d3c2828e32395e677434685d6d94594a41643361c5e837f0 \
+    --hash=sha256:c80201d22cbf405b80647a60ada77bba06c8fba2da2743ba1e89cdcc39a81f25 \
+    --hash=sha256:f7838e5135a406ad3e02efdcb8cf2e5397d368b0154537c4fec682dbc544d452 \
+    --hash=sha256:fabaf3e0f18a6618d9b36560682562157f77c2b71fcffc7b432be2baed9d753d \
+    --hash=sha256:fcdd41ec4628fee5799f807c73c353629130fbd942aa23d83c623dd6c9d52d78 \
+    --hash=sha256:fd6f3f93c9a0a7cc2788ee63fb763353d4bd2e89b0751bc78fcf7dda00bea774
+    # via
+    #   mlx-gen
+    #   transformers
+sentencepiece==0.2.1 \
+    --hash=sha256:010f025a544ef770bb395091d57cb94deb9652d8972e0d09f71d85d5a0816c8c \
+    --hash=sha256:017f97b274d4b0baa84b2dc743bf4517be81156f413bb24f12aacacde378e5ab \
+    --hash=sha256:01e6912125cb45d3792f530a4d38f8e21bf884d6b4d4ade1b2de5cf7a8d2a52b \
+    --hash=sha256:02593eca45440ef39247cee8c47322a34bdcc1d8ae83ad28ba5a899a2cf8d79a \
+    --hash=sha256:097f3394e99456e9e4efba1737c3749d7e23563dd1588ce71a3d007f25475fff \
+    --hash=sha256:0a0d15781a171d188b661ae4bde1d998c303f6bd8621498c50c671bd45a4798e \
+    --hash=sha256:0a81799d0a68d618e89063fb423c3001a034c893069135ffe51fee439ae474d6 \
+    --hash=sha256:0c0f672da370cc490e4c59d89e12289778310a0e71d176c541e4834759e1ae07 \
+    --hash=sha256:0cdfecef430d985f1c2bcbfff3defd1d95dae876fbd0173376012d2d7d24044b \
+    --hash=sha256:105e36e75cbac1292642045458e8da677b2342dcd33df503e640f0b457cb6751 \
+    --hash=sha256:10ed3dab2044c47f7a2e7b4969b0c430420cdd45735d78c8f853191fa0e3148b \
+    --hash=sha256:1855f57db07b51fb51ed6c9c452f570624d2b169b36f0f79ef71a6e6c618cd8b \
+    --hash=sha256:2005242a16d2dc3ac5fe18aa7667549134d37854823df4c4db244752453b78a8 \
+    --hash=sha256:22c4ebcb3c6ab1496ab1c37c79ef7bb563b8726f29548c30773b7a4cb152df1a \
+    --hash=sha256:251874d720ac7f28024a168501f3c7bb15d1802245f6e66de565f18bbb9b5eaa \
+    --hash=sha256:27e38eee653abc3d387862e67bc5c8b6f428cd604e688b85d29170b7e725c26c \
+    --hash=sha256:2af5a1fb05013332ad94343b8b5f3973e006a2dde2dfba55a819549e054e2f0f \
+    --hash=sha256:2f27ae6deea72efdb6f361750c92f6c21fd0ad087445082770cc34015213c526 \
+    --hash=sha256:33f068c9382dc2e7c228eedfd8163b52baa86bb92f50d0488bf2b7da7032e484 \
+    --hash=sha256:39f8651bd10974eafb9834ce30d9bcf5b73e1fc798a7f7d2528f9820ca86e119 \
+    --hash=sha256:3d165fbb9bf8fba35f1946ba2617c3f9995679f07438325f07c026d53f33e746 \
+    --hash=sha256:477c81505db072b3ab627e7eab972ea1025331bd3a92bacbf798df2b75ea86ec \
+    --hash=sha256:4cdc7c36234fda305e85c32949c5211faaf8dd886096c7cea289ddc12a2d02de \
+    --hash=sha256:4f5a3e0d9f445ed9d66c0fec47d4b23d12cfc858b407a03c194c1b26c2ac2a63 \
+    --hash=sha256:56dd39a3c4d6493db3cdca7e8cc68c6b633f0d4195495cbadfcf5af8a22d05a6 \
+    --hash=sha256:57cae326c8727de58c85977b175af132a7138d84c764635d7e71bbee7e774133 \
+    --hash=sha256:5d0350b686c320068702116276cfb26c066dc7e65cfef173980b11bb4d606719 \
+    --hash=sha256:5e4366c97b68218fd30ea72d70c525e6e78a6c0a88650f57ac4c43c63b234a9d \
+    --hash=sha256:60937c959e6f44159fdd9f56fbdd302501f96114a5ba436829496d5f32d8de3f \
+    --hash=sha256:6356d0986b8b8dc351b943150fcd81a1c6e6e4d439772e8584c64230e58ca987 \
+    --hash=sha256:6d297a1748d429ba8534eebe5535448d78b8acc32d00a29b49acf28102eeb094 \
+    --hash=sha256:733e59ff1794d26db706cd41fc2d7ca5f6c64a820709cb801dc0ea31780d64ab \
+    --hash=sha256:8138cec27c2f2282f4a34d9a016e3374cd40e5c6e9cb335063db66a0a3b71fad \
+    --hash=sha256:814978ac05130dd5812b4b03215c766bc6abaef13e7bd72bc534e4d1e12e9a4c \
+    --hash=sha256:82d9ead6591015f009cb1be1cb1c015d5e6f04046dbb8c9588b931e869a29728 \
+    --hash=sha256:881b2e44b14fc19feade3cbed314be37de639fc415375cefaa5bc81a4be137fd \
+    --hash=sha256:891ade6503dd93d418c03993f7d6a8aa20260c422cefff5096b9068185e67642 \
+    --hash=sha256:89a3ea015517c42c0341d0d962f3e6aaf2cf10d71b1932d475c44ba48d00aa2b \
+    --hash=sha256:8dd4b477a7b069648d19363aad0cab9bad2f4e83b2d179be668efa672500dc94 \
+    --hash=sha256:8f8ba89a3acb3dc1ae90f65ec1894b0b9596fdb98ab003ff38e058f898b39bc7 \
+    --hash=sha256:9076430ac25dfa7147d9d05751dbc66a04bc1aaac371c07f84952979ea59f0d0 \
+    --hash=sha256:92b3816aa2339355fda2c8c4e021a5de92180b00aaccaf5e2808972e77a4b22f \
+    --hash=sha256:99f955df238021bf11f0fc37cdb54fd5e5b5f7fd30ecc3d93fb48b6815437167 \
+    --hash=sha256:a19adcec27c524cb7069a1c741060add95f942d1cbf7ad0d104dffa0a7d28a2b \
+    --hash=sha256:a483fd29a34c3e34c39ac5556b0a90942bec253d260235729e50976f5dba1068 \
+    --hash=sha256:ac650534e2251083c5f75dde4ff28896ce7c8904133dc8fef42780f4d5588fcd \
+    --hash=sha256:ad8493bea8432dae8d6830365352350f3b4144415a1d09c4c8cb8d30cf3b6c3c \
+    --hash=sha256:afefe50a0cdcb4f2fd9733cb52001a2c164181ee2d82c32d38f5b1b326a8528c \
+    --hash=sha256:b3616ad246f360e52c85781e47682d31abfb6554c779e42b65333d4b5f44ecc0 \
+    --hash=sha256:b81a24733726e3678d2db63619acc5a8dccd074f7aa7a54ecd5ca33ca6d2d596 \
+    --hash=sha256:c415c9de1447e0a74ae3fdb2e52f967cb544113a3a5ce3a194df185cbc1f962f \
+    --hash=sha256:c6c8f42949f419ff8c7e9960dbadcfbc982d7b5efc2f6748210d3dd53a7de062 \
+    --hash=sha256:c7f0fd2f2693309e6628aeeb2e2faf6edd221134dfccac3308ca0de01f8dab47 \
+    --hash=sha256:c7f54a31cde6fa5cb030370566f68152a742f433f8d2be458463d06c208aef33 \
+    --hash=sha256:c83b85ab2d6576607f31df77ff86f28182be4a8de6d175d2c33ca609925f5da1 \
+    --hash=sha256:caa4e560c72c151da80036aecc2159e51a7fd8ae9efebefd96860460ce6bd025 \
+    --hash=sha256:d3233770f78e637dc8b1fda2cd7c3b99ec77e7505041934188a4e7fe751de3b0 \
+    --hash=sha256:d7b670879c370d350557edabadbad1f6561a9e6968126e6debca4029e5547820 \
+    --hash=sha256:d8b1d91545578852f128650b8cce4ec20f93d39b378ff554ebe66290f2dabb92 \
+    --hash=sha256:d9381351182ff9888cc80e41c632e7e274b106f450de33d67a9e8f6043da6f76 \
+    --hash=sha256:daeb5e9e9fcad012324807856113708614d534f596d5008638eb9b40112cd9e4 \
+    --hash=sha256:dcd8161eee7b41aae57ded06272905dbd680a0a04b91edd0f64790c796b2f706 \
+    --hash=sha256:e10fa50bdbaa5e2445dbd387979980d391760faf0ec99a09bd7780ff37eaec44 \
+    --hash=sha256:e37e4b4c4a11662b5db521def4e44d4d30ae69a1743241412a93ae40fdcab4bb \
+    --hash=sha256:e52144670738b4b477fade6c2a9b6af71a8d0094514c9853ac9f6fc1fcfabae7
+    # via mlx-gen
+setuptools==81.0.0 \
+    --hash=sha256:487b53915f52501f0a79ccfd0c02c165ffe06631443a886740b91af4b7a5845a \
+    --hash=sha256:fdd925d5c5d9f62e4b74b30d6dd7828ce236fd6ed998a08d81de62ce5a6310d6
+    # via torch
+shellingham==1.5.4 \
+    --hash=sha256:7ecfff8f2fd72616f7481040475a65b2bf8af90a56c89140852d1120324e8686 \
+    --hash=sha256:8dbca0739d487e5bd35ab3ca4b36e11c4078f3a234bfce294b0a0291363404de
+    # via typer
+six==1.17.0 \
+    --hash=sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274 \
+    --hash=sha256:ff70335d468e7eb6ec65b95b99d3a2836546063f63acc5171de367e834932a81
+    # via python-dateutil
+sympy==1.14.0 \
+    --hash=sha256:d3d3fe8df1e5a0b42f0e7bdf50541697dbe7d23746e894990c030e2b05e72517 \
+    --hash=sha256:e091cc3e99d2141a0ba2847328f5479b05d94a6635cb96148ccb3f34671bd8f5
+    # via torch
+tokenizers==0.22.2 \
+    --hash=sha256:143b999bdc46d10febb15cbffb4207ddd1f410e2c755857b5a0797961bbdc113 \
+    --hash=sha256:1a62ba2c5faa2dd175aaeed7b15abf18d20266189fb3406c5d0550dd34dd5f37 \
+    --hash=sha256:1c774b1276f71e1ef716e5486f21e76333464f47bece56bbd554485982a9e03e \
+    --hash=sha256:1e418a55456beedca4621dbab65a318981467a2b188e982a23e117f115ce5001 \
+    --hash=sha256:1e50f8554d504f617d9e9d6e4c2c2884a12b388a97c5c77f0bc6cf4cd032feee \
+    --hash=sha256:2249487018adec45d6e3554c71d46eb39fa8ea67156c640f7513eb26f318cec7 \
+    --hash=sha256:25b85325d0815e86e0bac263506dd114578953b7b53d7de09a6485e4a160a7dd \
+    --hash=sha256:29c30b83d8dcd061078b05ae0cb94d3c710555fbb44861139f9f83dcca3dc3e4 \
+    --hash=sha256:319f659ee992222f04e58f84cbf407cfa66a65fe3a8de44e8ad2bc53e7d99012 \
+    --hash=sha256:369cc9fc8cc10cb24143873a0d95438bb8ee257bb80c71989e3ee290e8d72c67 \
+    --hash=sha256:37ae80a28c1d3265bb1f22464c856bd23c02a05bb211e56d0c5301a435be6c1a \
+    --hash=sha256:38337540fbbddff8e999d59970f3c6f35a82de10053206a7562f1ea02d046fa5 \
+    --hash=sha256:473b83b915e547aa366d1eee11806deaf419e17be16310ac0a14077f1e28f917 \
+    --hash=sha256:544dd704ae7238755d790de45ba8da072e9af3eea688f698b137915ae959281c \
+    --hash=sha256:64d94e84f6660764e64e7e0b22baa72f6cd942279fdbb21d46abd70d179f0195 \
+    --hash=sha256:753d47ebd4542742ef9261d9da92cd545b2cacbb48349a1225466745bb866ec4 \
+    --hash=sha256:791135ee325f2336f498590eb2f11dc5c295232f288e75c99a36c5dbce63088a \
+    --hash=sha256:9ce725d22864a1e965217204946f830c37876eee3b2ba6fc6255e8e903d5fcbc \
+    --hash=sha256:a6bf3f88c554a2b653af81f3204491c818ae2ac6fbc09e76ef4773351292bc92 \
+    --hash=sha256:bfb88f22a209ff7b40a576d5324bf8286b519d7358663db21d6246fb17eea2d5 \
+    --hash=sha256:c9ea31edff2968b44a88f97d784c2f16dc0729b8b143ed004699ebca91f05c48 \
+    --hash=sha256:df6c4265b289083bf710dff49bc51ef252f9d5be33a45ee2bed151114a56207b \
+    --hash=sha256:e10bf9113d209be7cd046d40fbabbaf3278ff6d18eb4da4c500443185dc1896c \
+    --hash=sha256:f01a9c019878532f98927d2bacb79bbb404b43d3437455522a00a30718cdedb5
+    # via transformers
+toml==0.10.2 \
+    --hash=sha256:806143ae5bfb6a3c6e736a764057db0e6a0e05e338b5630894a5f779cabb4f9b \
+    --hash=sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f
+    # via mlx-gen
+torch==2.12.0 \
+    --hash=sha256:10802fd383bbfed646212e765a72c37d2185205d4f26eb197a254e8ac7ddcb25 \
+    --hash=sha256:10ee1448a9f304d3b987eb4656f664ba6e4d7b410ca7a5a7c642199777a2cf88 \
+    --hash=sha256:1834bd984f8a2f4f16bdfbeecca9146184b220aa46276bf5756735b5dae12812 \
+    --hash=sha256:2140e373e9a51a3e22ef62e8d14366d0b470d18f0adf19fdc757368077133a34 \
+    --hash=sha256:3fee918902090ade827643e758e98363278815de583c75d111fdd665ebffde9f \
+    --hash=sha256:415c1b8d0412f67551c8e89a2daca0fb3e56694af0281ba155eaa9da481f58b4 \
+    --hash=sha256:4b4f64c2c2b11f7510d93dd6412b87025ff6eddd6bb61c3b5a3d892ea20c4756 \
+    --hash=sha256:5d6b560dfa7d56291c07d615c3bb73e8d9943d9b6d87f76cd0d9d570c4797fa6 \
+    --hash=sha256:5f96b63f8287f66a005dd1b5a6abba2920f11156c5e5c4d815f3e2050fd1aa16 \
+    --hash=sha256:6a7512adfdd7f6732e40de1c620831e3c75b39b98cef60b11d0c5f0a76473ec5 \
+    --hash=sha256:864392c73b7654f4d2b3ae712f607937d0dbb1101c4555fbb41848106b297f39 \
+    --hash=sha256:891c769072637c74e9a5a77a3bc782894696d8ffec83b938df8536dee7f0ba78 \
+    --hash=sha256:8b958caff4a14d3a3b0b2dfc6a378f64dda9728a9dad28c08a0db9ce4dafb549 \
+    --hash=sha256:8fbef9f108a863e7722a73740998967e3b074742a834fc5be3a535a2befa7057 \
+    --hash=sha256:90dd587a5f61bfe1307148b581e2084fc5bc4a06e2b90a20e9a36b81087ff16b \
+    --hash=sha256:a43ac605a5e13116c72b64c359644cce0229f213dde48d2ae0ae5eb5becf7feb \
+    --hash=sha256:a6a2eebb237d3b1d9ad3b378e86d9b9e0782afdea8b1e0eba6a13646b9b49c07 \
+    --hash=sha256:af68dbf403439cae9ceaeaaf92f8352b460787dcd27b92aa05c40dd4a19c0f1e \
+    --hash=sha256:b41339df93d491435e790ff8bcbae1c0ce777175889bfd1281d119862793e6a2 \
+    --hash=sha256:b4556715c8572758625d62b6e0ae3b1f76c440221913a6fb5e100f321fb4fb02 \
+    --hash=sha256:c12592630aef72feaf18bd3f197ef587bbfa21131b31c38b23ab2e55fce92e36 \
+    --hash=sha256:c66696857e987efb8bc1777a37357ec4f60ab5e8af6250b83d6034437fa2d8f3 \
+    --hash=sha256:cf9839790285dd472e7a16aafcb4a4e6bf58ec1b494045044b0eefb0eb4bd1f2 \
+    --hash=sha256:d47e7dee68ac4cd7a068b26bcd6b989935427709fae1c8f7bd0019978f829e15 \
+    --hash=sha256:d4d029801cb7b6df858804a2a21b00cc2aa0bf0ee5d2ab18d343c9e9e5681f35 \
+    --hash=sha256:dd37188ea325042cb1f6cafa56822b11ada2520c04791a52629b0af25bdfbfd9 \
+    --hash=sha256:e2ad3eb85d39c3cab62dfa93ed5a73516e6a53c6713cb97d004004fe089f0f1f \
+    --hash=sha256:f7dfae4a519197dfa050e98d8e36378a0fb5899625a875c2b54445005a2e404e
+    # via mlx-gen
+tqdm==4.68.2 \
+    --hash=sha256:89c230e8dbc67c7615c142487111222f878c77427ea09549960f62389e258add \
+    --hash=sha256:d4240441fb5353290b87d6a85968c9decc131a99b8c7faa28269d829de669ede
+    # via
+    #   huggingface-hub
+    #   mlx-gen
+    #   transformers
+transformers==5.10.2 \
+    --hash=sha256:8a669db546f82c7c3618cb46ceb0f0afd89292bc70f319c058f8332ec63e268d \
+    --hash=sha256:f9a44b9c8ca9ab1156b467f574d832ea066284299c2fd0ed84641ccb592751fc
+    # via mlx-gen
+twine==6.2.0 \
+    --hash=sha256:418ebf08ccda9a8caaebe414433b0ba5e25eb5e4a927667122fbe8f829f985d8 \
+    --hash=sha256:e5ed0d2fd70c9959770dce51c8f39c8945c574e18173a7b81802dab51b4b75cf
+    # via mlx-gen
+typer==0.25.1 \
+    --hash=sha256:75caa44ed46a03fb2dab8808753ffacdbfea88495e74c85a28c5eefcf5f39c89 \
+    --hash=sha256:9616eb8853a09ffeabab1698952f33c6f29ffdbceb4eaeecf571880e8d7664cc
+    # via
+    #   huggingface-hub
+    #   transformers
+typing-extensions==4.15.0 \
+    --hash=sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466 \
+    --hash=sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548
+    # via
+    #   anyio
+    #   huggingface-hub
+    #   torch
+urllib3==2.7.0 \
+    --hash=sha256:231e0ec3b63ceb14667c67be60f2f2c40a518cb38b03af60abc813da26505f4c \
+    --hash=sha256:9fb4c81ebbb1ce9531cce37674bbc6f1360472bc18ca9a553ede278ef7276897
+    # via
+    #   id
+    #   mlx-gen
+    #   requests
+    #   twine
diff --git a/omlx/video/worker.py b/omlx/video/worker.py
new file mode 100644
index 000000000..9e6ce3dfa
--- /dev/null
+++ b/omlx/video/worker.py
@@ -0,0 +1,209 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: Apache-2.0
+"""Video generation subprocess worker.
+
+Runs ONE generation job and exits. Spawned by VideoJobManager as:
+
+    <video_venv>/bin/python -I <repo>/omlx/video/worker.py --spec job_spec.json
+
+HARD RULE: this script must not import omlx. It runs under the video venv
+(mlx-gen + its deps); only mflux, mlx and the standard library are
+available. See docs/video-generation-engine-spec.md section 4.2.
+
+Protocol:
+- stdout: one JSON object per line. Phase heartbeats ({"phase": ...}) are
+  emitted on every phase transition so silent long phases (42GB weight
+  load, torch text encoding, VAE decode) still show liveness; denoise
+  steps additionally carry step/total_steps. The manager tracks the last
+  line timestamp for stall detection.
+- Exit 0 + the output mp4 present and healthy = success. A result manifest
+  with timings and the kernel lifetime-max memory peak is written next to
+  the output for calibration records.
+- Any failure: a failure manifest {code, message, detail} is written at
+  spec["manifest_path"] and the exit code is non-zero.
+
+Memory: before loading anything the worker pins its own Metal wired limit
+inside the lease (spec 4.4 layer 1) -- overshoot degrades to non-resident
+pages or an in-process allocation failure, never wired-sum growth toward
+the machine cap.
+"""
+
+from __future__ import annotations
+
+import argparse
+import ctypes
+import json
+import os
+import sys
+import time
+import traceback
+
+GB = 1024**3
+_T0 = time.time()
+
+
+def _emit(**kw) -> None:
+    kw["t"] = round(time.time() - _T0, 1)
+    try:
+        print(json.dumps(kw), flush=True)
+    except Exception:
+        # Never let progress reporting kill the generation (a raising
+        # progress callback aborts mlx-gen's denoise loop).
+        pass
+
+
+def _lifetime_max_phys() -> int:
+    """Own-process lifetime-max phys_footprint via libproc (best effort).
+
+    rusage_info_v4 layout from sys/resource.h: ri_uuid (16 bytes), then 28
+    c_uint64 fields, then ri_lifetime_max_phys_footprint. Standalone copy --
+    this script cannot import omlx/utils/proc_memory.py.
+    """
+    try:
+        class _RusageInfoV4(ctypes.Structure):
+            _fields_ = (
+                [("ri_uuid", ctypes.c_uint8 * 16)]
+                + [(f"_u{i}", ctypes.c_uint64) for i in range(28)]
+                + [("ri_lifetime_max_phys_footprint", ctypes.c_uint64)]
+                + [("_tail", ctypes.c_uint64 * 6)]
+            )
+
+        libproc = ctypes.CDLL("/usr/lib/libproc.dylib", use_errno=True)
+        fn = libproc.proc_pid_rusage
+        fn.argtypes = [ctypes.c_int, ctypes.c_int, ctypes.c_void_p]
+        fn.restype = ctypes.c_int
+        info = _RusageInfoV4()
+        if fn(os.getpid(), 4, ctypes.byref(info)) != 0:
+            return 0
+        return int(info.ri_lifetime_max_phys_footprint)
+    except Exception:
+        return 0
+
+
+def _write_manifest(path: str, payload: dict) -> None:
+    try:
+        tmp = path + ".tmp"
+        with open(tmp, "w") as f:
+            json.dump(payload, f, indent=1)
+        os.replace(tmp, path)
+    except Exception:
+        pass
+
+
+def run(spec: dict) -> int:
+    manifest_path = spec["manifest_path"]
+    output_path = spec["output_path"]
+
+    # Layer-1 memory containment: pin our Metal wired limit inside the
+    # lease BEFORE any weights load.
+    lease = int(spec.get("lease_bytes", 0))
+    margin = int(spec.get("wired_margin_bytes", 2 * GB))
+    if lease > 0:
+        import mlx.core as mx
+
+        limit = max(1 * GB, lease - margin)
+        try:
+            mx.set_wired_limit(limit)
+            _emit(phase="wired_limit_set", limit_gb=round(limit / GB, 1))
+        except Exception as e:
+            _emit(phase="wired_limit_failed", error=str(e))
+
+    # Low-RAM mode (default ON): release the inactive/high-noise denoiser
+    # after the boundary step, free both transformers before VAE decode and
+    # clear the MLX cache per step. P0 measurement showed the natural-mode
+    # peak at ~49GB even for small profiles; the low-RAM knobs are what the
+    # official benchmarks (20.7GB) use. Cost: the model instance is dead
+    # after one generation -- irrelevant here, one process per job.
+    low_ram = bool(spec.get("low_ram", True))
+    if low_ram:
+        import mlx.core as mx
+
+        try:
+            mx.set_cache_limit(1 * GB)
+        except Exception:
+            pass
+
+    _emit(phase="loading")
+    from mflux.models.common.config.model_config import ModelConfig
+    from mflux.models.wan.variants import Wan2_2_TI2V
+
+    model = Wan2_2_TI2V(
+        model_config=ModelConfig.wan2_2_t2v_a14b(),
+        model_path=spec["model_dir"],
+    )
+    _emit(phase="loaded")
+
+    def cb(ev) -> None:
+        _emit(
+            phase=str(getattr(ev, "phase", "denoise")),
+            step=int(getattr(ev, "step", 0) or 0),
+            total_steps=int(getattr(ev, "total_steps", 0) or 0),
+        )
+
+    kwargs = dict(
+        seed=int(spec["seed"]),
+        prompt=spec["prompt"],
+        num_inference_steps=int(spec["steps"]),
+        height=int(spec["height"]),
+        width=int(spec["width"]),
+        num_frames=int(spec["frames"]),
+        fps=int(spec["fps"]),
+        progress_callback=cb,
+    )
+    if low_ram:
+        kwargs["release_inactive_denoiser"] = True
+        kwargs["release_denoisers_before_decode"] = True
+        kwargs["clear_cache_each_step"] = True
+    if spec.get("negative_prompt"):
+        kwargs["negative_prompt"] = spec["negative_prompt"]
+    if spec.get("guidance") is not None:
+        kwargs["guidance"] = float(spec["guidance"])
+    if spec.get("guidance_2") is not None:
+        kwargs["guidance_2"] = float(spec["guidance_2"])
+
+    video = model.generate_video(**kwargs)
+
+    _emit(phase="saving")
+    os.makedirs(os.path.dirname(output_path), exist_ok=True)
+    video.save(output_path)
+
+    wall = round(time.time() - _T0, 1)
+    _write_manifest(
+        manifest_path,
+        {
+            "status": "completed",
+            "wall_seconds": wall,
+            "lifetime_max_phys_gb": round(_lifetime_max_phys() / GB, 2),
+            "output_bytes": (
+                os.path.getsize(output_path) if os.path.exists(output_path) else 0
+            ),
+        },
+    )
+    _emit(phase="done", wall_seconds=wall)
+    return 0
+
+
+def main() -> int:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--spec", required=True)
+    args = ap.parse_args()
+    with open(args.spec) as f:
+        spec = json.load(f)
+    try:
+        return run(spec)
+    except Exception as e:
+        _write_manifest(
+            spec.get("manifest_path", args.spec + ".manifest.json"),
+            {
+                "status": "failed",
+                "code": "worker_crashed",
+                "message": f"{type(e).__name__}: {e}",
+                "detail": traceback.format_exc()[-4000:],
+            },
+        )
+        _emit(phase="failed", error=f"{type(e).__name__}: {e}")
+        return 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/scripts/video_p0_measure.py b/scripts/video_p0_measure.py
new file mode 100644
index 000000000..5bd3728c3
--- /dev/null
+++ b/scripts/video_p0_measure.py
@@ -0,0 +1,302 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: Apache-2.0
+"""Standalone P0 measurement harness for the fmlx video engine spec.
+
+Runs Wan2.2 T2V generation profiles under the video venv and measures the
+true per-run memory peak via the kernel lifetime-max phys_footprint ledger
+(ri_lifetime_max_phys_footprint). Each profile runs in a fresh child process
+so the lifetime max is exact for that run: model load + text encoding +
+denoise + VAE decode + every sub-poll spike.
+
+Must run under the video venv python (needs mflux). Does NOT import omlx
+(see docs/video-generation-engine-spec.md section 4.2: worker venv isolation).
+
+Parent mode (default): spawns one child per profile, samples the child's
+phys_footprint every 0.5s, writes per-profile samples + results and a
+summary.json.
+
+Child mode (--single): loads the model, generates, saves the mp4, then reads
+its OWN lifetime-max ledger and writes a result JSON.
+
+Usage:
+  video_p0_measure.py --model DIR --out DIR [--profiles default,steps40,...]
+"""
+
+from __future__ import annotations
+
+import argparse
+import ctypes
+import json
+import os
+import subprocess
+import sys
+import threading
+import time
+
+# ---------------------------------------------------------------------------
+# phys_footprint via libproc (standalone copy of omlx/utils/proc_memory.py
+# layout; this script must not import omlx)
+# ---------------------------------------------------------------------------
+
+
+class _RusageInfoV4(ctypes.Structure):
+    _fields_ = [
+        ("ri_uuid", ctypes.c_uint8 * 16),
+        ("ri_user_time", ctypes.c_uint64),
+        ("ri_system_time", ctypes.c_uint64),
+        ("ri_pkg_idle_wkups", ctypes.c_uint64),
+        ("ri_interrupt_wkups", ctypes.c_uint64),
+        ("ri_pageins", ctypes.c_uint64),
+        ("ri_wired_size", ctypes.c_uint64),
+        ("ri_resident_size", ctypes.c_uint64),
+        ("ri_phys_footprint", ctypes.c_uint64),
+        ("ri_proc_start_abstime", ctypes.c_uint64),
+        ("ri_proc_exit_abstime", ctypes.c_uint64),
+        ("ri_child_user_time", ctypes.c_uint64),
+        ("ri_child_system_time", ctypes.c_uint64),
+        ("ri_child_pkg_idle_wkups", ctypes.c_uint64),
+        ("ri_child_interrupt_wkups", ctypes.c_uint64),
+        ("ri_child_pageins", ctypes.c_uint64),
+        ("ri_child_elapsed_abstime", ctypes.c_uint64),
+        ("ri_diskio_bytesread", ctypes.c_uint64),
+        ("ri_diskio_byteswritten", ctypes.c_uint64),
+        ("ri_cpu_time_qos_default", ctypes.c_uint64),
+        ("ri_cpu_time_qos_maintenance", ctypes.c_uint64),
+        ("ri_cpu_time_qos_background", ctypes.c_uint64),
+        ("ri_cpu_time_qos_utility", ctypes.c_uint64),
+        ("ri_cpu_time_qos_legacy", ctypes.c_uint64),
+        ("ri_cpu_time_qos_user_initiated", ctypes.c_uint64),
+        ("ri_cpu_time_qos_user_interactive", ctypes.c_uint64),
+        ("ri_billed_system_time", ctypes.c_uint64),
+        ("ri_serviced_system_time", ctypes.c_uint64),
+        ("ri_logical_writes", ctypes.c_uint64),
+        ("ri_lifetime_max_phys_footprint", ctypes.c_uint64),
+        ("ri_instructions", ctypes.c_uint64),
+        ("ri_cycles", ctypes.c_uint64),
+        ("ri_billed_energy", ctypes.c_uint64),
+        ("ri_serviced_energy", ctypes.c_uint64),
+        ("ri_interval_max_phys_footprint", ctypes.c_uint64),
+        ("ri_runnable_time", ctypes.c_uint64),
+    ]
+
+
+_RUSAGE_INFO_V4 = 4
+_libproc = ctypes.CDLL("/usr/lib/libproc.dylib", use_errno=True)
+_proc_pid_rusage = _libproc.proc_pid_rusage
+_proc_pid_rusage.argtypes = [ctypes.c_int, ctypes.c_int, ctypes.c_void_p]
+_proc_pid_rusage.restype = ctypes.c_int
+
+
+def _rusage(pid: int) -> _RusageInfoV4 | None:
+    info = _RusageInfoV4()
+    if _proc_pid_rusage(pid, _RUSAGE_INFO_V4, ctypes.byref(info)) != 0:
+        return None
+    return info
+
+
+def phys_footprint(pid: int) -> int:
+    info = _rusage(pid)
+    return info.ri_phys_footprint if info else 0
+
+
+def lifetime_max_phys(pid: int) -> int:
+    info = _rusage(pid)
+    return info.ri_lifetime_max_phys_footprint if info else 0
+
+
+# ---------------------------------------------------------------------------
+# profiles
+# ---------------------------------------------------------------------------
+
+PROMPT = "A red fox running through a snowy forest at dawn, cinematic, soft light"
+SEED = 42
+
+PROFILES: dict[str, dict] = {
+    # name: width height frames steps fps (frames must be 4n+1, dims /16).
+    # lowram=True mirrors the production worker defaults (mx cache limit
+    # 1GB + release denoisers + clear cache per step) -- the numbers that
+    # calibrate the shipped lease/predictor. Natural-mode profiles measure
+    # the unconstrained envelope.
+    "default": dict(width=480, height=272, frames=49, steps=20, fps=16),
+    "steps40": dict(width=480, height=272, frames=49, steps=40, fps=16),
+    "mid_spatial": dict(width=832, height=480, frames=49, steps=20, fps=16),
+    "frames101": dict(width=480, height=272, frames=101, steps=20, fps=16),
+    "default_lowram": dict(
+        width=480, height=272, frames=49, steps=20, fps=16, lowram=True
+    ),
+    "mid_spatial_lowram": dict(
+        width=832, height=480, frames=49, steps=20, fps=16, lowram=True
+    ),
+    "frames101_lowram": dict(
+        width=480, height=272, frames=101, steps=20, fps=16, lowram=True
+    ),
+}
+
+GB = 1024**3
+
+
+# ---------------------------------------------------------------------------
+# child mode: run one profile, report own lifetime max
+# ---------------------------------------------------------------------------
+
+
+def run_single(model_dir: str, out_dir: str, name: str) -> int:
+    p = PROFILES[name]
+    t0 = time.time()
+    lowram = bool(p.get("lowram", False))
+
+    def emit(**kw):
+        kw["t"] = round(time.time() - t0, 1)
+        print(json.dumps(kw), flush=True)
+
+    if lowram:
+        import mlx.core as mx
+
+        try:
+            mx.set_cache_limit(1 * GB)
+        except Exception:
+            pass
+
+    emit(phase="loading")
+    from mflux.models.common.config.model_config import ModelConfig
+    from mflux.models.wan.variants import Wan2_2_TI2V
+
+    model = Wan2_2_TI2V(
+        model_config=ModelConfig.wan2_2_t2v_a14b(), model_path=model_dir
+    )
+    emit(phase="loaded")
+
+    def cb(ev):
+        emit(
+            phase=getattr(ev, "phase", "?"),
+            step=getattr(ev, "step", 0),
+            total_steps=getattr(ev, "total_steps", 0),
+        )
+
+    gen_kwargs = dict(
+        seed=SEED,
+        prompt=PROMPT,
+        num_inference_steps=p["steps"],
+        height=p["height"],
+        width=p["width"],
+        num_frames=p["frames"],
+        fps=p["fps"],
+        progress_callback=cb,
+    )
+    if lowram:
+        gen_kwargs.update(
+            release_inactive_denoiser=True,
+            release_denoisers_before_decode=True,
+            clear_cache_each_step=True,
+        )
+    video = model.generate_video(**gen_kwargs)
+    emit(phase="saving")
+    out_mp4 = os.path.join(out_dir, f"{name}.mp4")
+    video.save(out_mp4)
+    wall = time.time() - t0
+    # read own ledger BEFORE exit (proc_pid_rusage fails on a reaped pid)
+    result = {
+        "profile": name,
+        "params": p,
+        "wall_seconds": round(wall, 1),
+        "lifetime_max_phys_gb": round(lifetime_max_phys(os.getpid()) / GB, 2),
+        "final_phys_gb": round(phys_footprint(os.getpid()) / GB, 2),
+        "output": out_mp4,
+        "output_bytes": os.path.getsize(out_mp4) if os.path.exists(out_mp4) else 0,
+        "seed": SEED,
+    }
+    with open(os.path.join(out_dir, f"{name}.result.json"), "w") as f:
+        json.dump(result, f, indent=1)
+    emit(phase="done", wall_seconds=result["wall_seconds"])
+    return 0
+
+
+# ---------------------------------------------------------------------------
+# parent mode: spawn child per profile, sample its footprint
+# ---------------------------------------------------------------------------
+
+
+def run_parent(model_dir: str, out_dir: str, names: list[str], timeout_s: int) -> int:
+    os.makedirs(out_dir, exist_ok=True)
+    summary = {"profiles": {}, "started_at": time.strftime("%Y-%m-%d %H:%M:%S")}
+    for name in names:
+        print(f"=== profile {name} ===", flush=True)
+        log_path = os.path.join(out_dir, f"{name}.events.jsonl")
+        samples_path = os.path.join(out_dir, f"{name}.samples.jsonl")
+        child = subprocess.Popen(
+            [sys.executable, os.path.abspath(__file__), "--single", name,
+             "--model", model_dir, "--out", out_dir],
+            stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True,
+        )
+        stop = threading.Event()
+        peak = {"sampled_max": 0, "max_delta_per_sample": 0}
+
+        def sampler():
+            last = 0
+            with open(samples_path, "w") as sf:
+                while not stop.is_set():
+                    b = phys_footprint(child.pid)
+                    if b:
+                        t = round(time.time(), 1)
+                        sf.write(json.dumps({"t": t, "gb": round(b / GB, 3)}) + "\n")
+                        sf.flush()
+                        peak["sampled_max"] = max(peak["sampled_max"], b)
+                        if last:
+                            peak["max_delta_per_sample"] = max(
+                                peak["max_delta_per_sample"], b - last
+                            )
+                        last = b
+                    stop.wait(0.5)
+
+        th = threading.Thread(target=sampler, daemon=True)
+        th.start()
+        deadline = time.time() + timeout_s
+        with open(log_path, "w") as lf:
+            for line in child.stdout:  # type: ignore[union-attr]
+                lf.write(line)
+                lf.flush()
+                print(f"  [{name}] {line.rstrip()}", flush=True)
+                if time.time() > deadline:
+                    child.kill()
+                    print(f"  [{name}] TIMEOUT after {timeout_s}s, killed", flush=True)
+                    break
+        rc = child.wait()
+        stop.set()
+        th.join(timeout=2)
+        entry = {
+            "exit_code": rc,
+            "sampled_max_gb": round(peak["sampled_max"] / GB, 2),
+            "max_delta_per_0p5s_gb": round(peak["max_delta_per_sample"] / GB, 2),
+        }
+        rpath = os.path.join(out_dir, f"{name}.result.json")
+        if os.path.exists(rpath):
+            with open(rpath) as f:
+                entry.update(json.load(f))
+        summary["profiles"][name] = entry
+        with open(os.path.join(out_dir, "summary.json"), "w") as f:
+            json.dump(summary, f, indent=1)
+        print(f"=== {name} done: {json.dumps(entry)} ===", flush=True)
+    print("=== ALL DONE ===", flush=True)
+    return 0
+
+
+def main() -> int:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--model", required=True)
+    ap.add_argument("--out", required=True)
+    ap.add_argument("--profiles", default="default,steps40,mid_spatial,frames101")
+    ap.add_argument("--single", default=None, help="internal: run one profile in-process")
+    ap.add_argument("--timeout", type=int, default=10800)
+    args = ap.parse_args()
+    if args.single:
+        return run_single(args.model, args.out, args.single)
+    names = [n.strip() for n in args.profiles.split(",") if n.strip()]
+    for n in names:
+        if n not in PROFILES:
+            print(f"unknown profile {n}; known: {list(PROFILES)}", file=sys.stderr)
+            return 2
+    return run_parent(args.model, args.out, names, args.timeout)
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/tests/test_process_memory_enforcer.py b/tests/test_process_memory_enforcer.py
index b456ecc66..64ae81c0b 100644
--- a/tests/test_process_memory_enforcer.py
+++ b/tests/test_process_memory_enforcer.py
@@ -965,6 +965,63 @@ async def test_check_and_enforce_walks_caps_on_soft(self, enforcer):
         scheduler.adjust_store_cache_cap.assert_called_with("soft")
 
 
+class TestRecentPeakTracking:
+    """Tests for recent-peak high-water tracking across poll ticks."""
+
+    @pytest.mark.asyncio
+    async def test_recent_peak_is_window_max(self, enforcer):
+        """After several poll ticks, recent_peak == max over the window.
+
+        The window update lives after the ceiling > 0 early return in
+        _check_and_enforce, so the fixture's positive ceiling (10 GB) is
+        required for the update to run at all.
+        """
+        readings = [3 * 1024**3, 5 * 1024**3, 2 * 1024**3, 4 * 1024**3]
+        with patch("omlx.process_memory_enforcer.mx") as mock_mx, patch(
+            "omlx.process_memory_enforcer.get_phys_footprint", return_value=0
+        ):
+            mock_mx.get_active_memory.side_effect = _cycling(readings)
+            for _ in readings:
+                await enforcer._check_and_enforce()
+
+        assert enforcer.recent_peak_bytes() == 5 * 1024**3
+
+    @pytest.mark.asyncio
+    async def test_recent_peak_drops_after_window_slides(self, enforcer):
+        """Old high readings age out once they leave the maxlen=5 window."""
+        # Feed one big reading, then enough small ones to push it out of the
+        # 5-slot window.
+        big = 9 * 1024**3
+        small = 1 * 1024**3
+        readings = [big, small, small, small, small, small]
+        with patch("omlx.process_memory_enforcer.mx") as mock_mx, patch(
+            "omlx.process_memory_enforcer.get_phys_footprint", return_value=0
+        ):
+            mock_mx.get_active_memory.side_effect = _cycling(readings)
+            for _ in readings:
+                await enforcer._check_and_enforce()
+
+        # After 6 ticks the first (big) reading has slid out of the window,
+        # leaving only small readings.
+        assert enforcer.recent_peak_bytes() == small
+
+    def test_propagates_recent_peak_to_scheduler(self, enforcer):
+        """_propagate_memory_limit pushes recent_peak onto each scheduler."""
+        scheduler = MagicMock(spec=[])
+        scheduler._memory_limit_bytes = 0
+        scheduler._memory_hard_limit_bytes = 0
+        scheduler._memory_recent_peak_bytes = 0
+        engine = MagicMock(spec=[])
+        engine.scheduler = scheduler
+        entry = _make_entry("model-a", engine=engine)
+        enforcer._engine_pool._entries = {"model-a": entry}
+
+        enforcer._recent_peak_bytes = 7 * 1024**3
+        enforcer._propagate_memory_limit()
+
+        assert scheduler._memory_recent_peak_bytes == 7 * 1024**3
+
+
 class TestProperties:
     """Tests for enforcer properties."""
 
diff --git a/tests/test_scheduler_admission.py b/tests/test_scheduler_admission.py
index 0ed0c6458..1640ef8d3 100644
--- a/tests/test_scheduler_admission.py
+++ b/tests/test_scheduler_admission.py
@@ -2,13 +2,15 @@
 """Tests for scheduler admission control (queue depth cap + admission_paused)."""
 
 from collections import deque
-from unittest.mock import MagicMock
+from unittest.mock import MagicMock, patch
 
 import pytest
 
 from omlx.exceptions import SchedulerQueueFullError
 from omlx.scheduler import Scheduler
 
+GB = 1024**3
+
 
 @pytest.fixture
 def scheduler():
@@ -100,3 +102,77 @@ def test_default_false(self):
         s._prefill_memory_guard = False
         s._admission_paused = False
         assert s._admission_paused is False
+
+
+def _preflight_scheduler(hard_limit: int, recent_peak: int, peak: int):
+    """Build a bare Scheduler wired for _preflight_memory_check.
+
+    `peak` is the value the (mocked) memory_monitor estimates for the
+    prefill chunk; `recent_peak` is the propagated high-water mark.
+    """
+    s = Scheduler.__new__(Scheduler)
+    s._prefill_memory_guard = True
+    s._memory_hard_limit_bytes = hard_limit
+    s._memory_recent_peak_bytes = recent_peak
+    s.config = MagicMock(prefill_step_size=2048)
+    s.memory_monitor = MagicMock()
+    s.memory_monitor.estimate_prefill_peak_bytes = MagicMock(return_value=peak)
+    return s
+
+
+def _preflight_request():
+    r = MagicMock()
+    r.num_prompt_tokens = 8192
+    r.cached_tokens = 0
+    return r
+
+
+class TestPreflightRecentPeak:
+    """_preflight_memory_check uses the recent high-water mark, not just the
+    instant reading, so it does not wave through a request during a prefill
+    trough that would wall the next chunk."""
+
+    def test_rejects_on_recent_peak_when_instant_is_low(self):
+        """Instant active/phys low but recent_peak high -> reject.
+
+        Picks numbers so that low + peak fits (pre-change behaviour would
+        admit) but recent_peak + peak exceeds the hard limit. This pins the
+        fix.
+        """
+        hard_limit = 100 * GB
+        peak = 20 * GB
+        low = 10 * GB
+        high = 85 * GB
+        # Sanity: old code (low + peak) would have passed.
+        assert low + peak <= hard_limit
+        # New code (high + peak) must exceed the limit.
+        assert high + peak > hard_limit
+
+        s = _preflight_scheduler(
+            hard_limit=hard_limit, recent_peak=high, peak=peak
+        )
+        with patch("omlx.scheduler.mx") as mock_mx, patch(
+            "omlx.scheduler.get_phys_footprint", return_value=low
+        ):
+            mock_mx.get_active_memory.return_value = low
+            result = s._preflight_memory_check(_preflight_request())
+
+        assert result is not None
+        assert "Prefill would require" in result
+
+    def test_admits_when_recent_peak_also_low(self):
+        """Control: when recent_peak is low too, the request passes."""
+        hard_limit = 100 * GB
+        peak = 20 * GB
+        low = 10 * GB
+
+        s = _preflight_scheduler(
+            hard_limit=hard_limit, recent_peak=low, peak=peak
+        )
+        with patch("omlx.scheduler.mx") as mock_mx, patch(
+            "omlx.scheduler.get_phys_footprint", return_value=low
+        ):
+            mock_mx.get_active_memory.return_value = low
+            result = s._preflight_memory_check(_preflight_request())
+
+        assert result is None
diff --git a/tests/test_scheduler_prefill_forward_gate.py b/tests/test_scheduler_prefill_forward_gate.py
new file mode 100644
index 000000000..ff6932645
--- /dev/null
+++ b/tests/test_scheduler_prefill_forward_gate.py
@@ -0,0 +1,427 @@
+# SPDX-License-Identifier: Apache-2.0
+"""Tests for the forward-FRONT prefill memory gate (P0c).
+
+The gate (_prefill_forward_gate) predicts a prefill chunk's peak memory
+BEFORE running self.model(...) and raises RuntimeError when it would breach
+the hard cap, so the request is aborted cleanly instead of the transient
+landing on the Metal ceiling and kernel-panicking the machine. The legacy
+chunk-END check only fires after the allocation has already happened, which
+on Apple Silicon is too late.
+
+Strategy: pure mocks, no model load. The discriminating assertion is that
+when the predicted peak exceeds the cap the model forward is NOT called --
+on pre-change code (no forward-front gate) the forward WOULD run.
+"""
+
+import logging
+from unittest.mock import MagicMock, patch
+
+import mlx.core as mx
+import pytest
+
+from omlx.request import Request, RequestStatus, SamplingParams
+from omlx.scheduler import Scheduler, SchedulerConfig, _PrefillState
+
+GB = 1024**3
+
+
+# ---------------------------------------------------------------------------
+# Direct unit tests of _prefill_forward_gate
+# ---------------------------------------------------------------------------
+
+
+def _gate_scheduler(
+    *,
+    hard_limit: int,
+    recent_peak: int,
+    estimate: int,
+    margin: int,
+    guard: bool = True,
+    monitor: bool = True,
+):
+    """Build a bare Scheduler wired only for _prefill_forward_gate."""
+    s = Scheduler.__new__(Scheduler)
+    s._prefill_memory_guard = guard
+    s._memory_hard_limit_bytes = hard_limit
+    s._memory_recent_peak_bytes = recent_peak
+    s._prefill_transient_margin_bytes = margin
+    s.config = MagicMock(prefill_step_size=2048)
+    if monitor:
+        s.memory_monitor = MagicMock()
+        s.memory_monitor.estimate_prefill_peak_bytes = MagicMock(
+            return_value=estimate
+        )
+    else:
+        s.memory_monitor = None
+    return s
+
+
+def _call_gate(s, chunk_tokens, *, instant):
+    """Invoke the gate with patched instant memory probes."""
+    with patch("omlx.scheduler.mx") as mock_mx, patch(
+        "omlx.scheduler.get_phys_footprint", return_value=instant
+    ):
+        mock_mx.get_active_memory.return_value = instant
+        s._prefill_forward_gate(
+            chunk_tokens, request_id="rid-1", loop_label="external"
+        )
+
+
+class TestPrefillForwardGateUnit:
+    """Direct tests of the gate predicate."""
+
+    def test_raises_when_predicted_peak_exceeds_cap(self):
+        """current(high-water) + estimate + margin > cap -> RuntimeError.
+
+        Numbers chosen so the instant reading alone (low) + estimate would
+        fit, but the high-water recent_peak + estimate + margin overflow.
+        """
+        hard = 107 * GB
+        estimate = 2 * GB
+        margin = 10 * GB
+        instant = 50 * GB
+        recent_peak = 96 * GB
+        # Instant + estimate (no margin) fits; this is the trough the legacy
+        # check could read.
+        assert instant + estimate <= hard
+        # High-water + estimate + margin overflows -> must refuse.
+        assert recent_peak + estimate + margin > hard
+
+        s = _gate_scheduler(
+            hard_limit=hard,
+            recent_peak=recent_peak,
+            estimate=estimate,
+            margin=margin,
+        )
+        with pytest.raises(RuntimeError, match="refused before forward"):
+            _call_gate(s, 256, instant=instant)
+
+    def test_passes_when_predicted_peak_fits(self):
+        """current + estimate + margin <= cap -> no raise."""
+        hard = 107 * GB
+        s = _gate_scheduler(
+            hard_limit=hard,
+            recent_peak=80 * GB,
+            estimate=2 * GB,
+            margin=10 * GB,
+        )
+        # 80 + 2 + 10 = 92 < 107.
+        _call_gate(s, 256, instant=80 * GB)  # must not raise
+
+    def test_margin_is_what_tips_it_over(self):
+        """Without the margin it would pass; the margin alone forces refusal.
+
+        Pins that the margin term is actually applied (not dropped).
+        """
+        hard = 100 * GB
+        estimate = 1 * GB
+        instant = 90 * GB
+        recent_peak = 90 * GB
+        # current + estimate (no margin) = 91 < 100 -> would pass.
+        assert recent_peak + estimate < hard
+        # current + estimate + margin = 101 > 100 -> must refuse.
+        margin = 10 * GB
+        assert recent_peak + estimate + margin > hard
+
+        s = _gate_scheduler(
+            hard_limit=hard,
+            recent_peak=recent_peak,
+            estimate=estimate,
+            margin=margin,
+        )
+        with pytest.raises(RuntimeError):
+            _call_gate(s, 256, instant=instant)
+
+        # Same setup, margin=0 -> passes (control).
+        s0 = _gate_scheduler(
+            hard_limit=hard,
+            recent_peak=recent_peak,
+            estimate=estimate,
+            margin=0,
+        )
+        _call_gate(s0, 256, instant=instant)  # must not raise
+
+    def test_uses_recent_peak_high_water_not_just_instant(self):
+        """A mid-prefill trough in the instant reading must not mask the
+        real footprint: recent_peak high + low instant still refuses."""
+        hard = 107 * GB
+        s = _gate_scheduler(
+            hard_limit=hard,
+            recent_peak=100 * GB,  # real footprint
+            estimate=2 * GB,
+            margin=10 * GB,
+        )
+        # Instant reads a trough at 50GB; without recent_peak it would pass.
+        assert 50 * GB + 2 * GB + 10 * GB < hard
+        with pytest.raises(RuntimeError):
+            _call_gate(s, 256, instant=50 * GB)
+
+    def test_noop_when_guard_off(self):
+        s = _gate_scheduler(
+            hard_limit=107 * GB,
+            recent_peak=200 * GB,
+            estimate=200 * GB,
+            margin=10 * GB,
+            guard=False,
+        )
+        _call_gate(s, 256, instant=200 * GB)  # guard off -> never raises
+
+    def test_noop_when_hard_limit_unset(self):
+        s = _gate_scheduler(
+            hard_limit=0,
+            recent_peak=200 * GB,
+            estimate=200 * GB,
+            margin=10 * GB,
+        )
+        _call_gate(s, 256, instant=200 * GB)  # no limit -> never raises
+
+    def test_fires_without_monitor_phys_based(self):
+        """THE fix: in production scheduler.memory_monitor is never wired, so
+        the gate must still fire on current(phys) + margin. estimate is treated
+        as 0 and the margin carries the guarantee."""
+        s = _gate_scheduler(
+            hard_limit=107 * GB,
+            recent_peak=100 * GB,
+            estimate=0,
+            margin=10 * GB,
+            monitor=False,
+        )
+        # current 100 + estimate 0 + margin 10 = 110 > cap 107 -> refuse.
+        with pytest.raises(RuntimeError, match="refused before forward"):
+            _call_gate(s, 256, instant=100 * GB)
+
+    def test_passes_without_monitor_when_fits(self):
+        """Phys-based gate does not false-fire: current + margin <= cap passes
+        even with no monitor."""
+        s = _gate_scheduler(
+            hard_limit=107 * GB,
+            recent_peak=90 * GB,
+            estimate=0,
+            margin=10 * GB,
+            monitor=False,
+        )
+        # current 90 + margin 10 = 100 <= cap 107 -> no raise.
+        _call_gate(s, 256, instant=90 * GB)
+
+    def test_fires_with_zero_estimate_margin_carries(self):
+        """Monitor present but estimate==0 (model can't be dim-estimated): the
+        gate still fires on current + margin -- the margin, not the estimate, is
+        the safety mechanism."""
+        s = _gate_scheduler(
+            hard_limit=107 * GB,
+            recent_peak=100 * GB,
+            estimate=0,
+            margin=10 * GB,
+        )
+        with pytest.raises(RuntimeError, match="refused before forward"):
+            _call_gate(s, 256, instant=100 * GB)
+
+    def test_noop_when_chunk_zero(self):
+        s = _gate_scheduler(
+            hard_limit=107 * GB,
+            recent_peak=200 * GB,
+            estimate=2 * GB,
+            margin=10 * GB,
+        )
+        _call_gate(s, 0, instant=200 * GB)  # nothing to process -> never raises
+
+
+# ---------------------------------------------------------------------------
+# Integration: gate fires BEFORE the model forward in the real chunked loop
+# ---------------------------------------------------------------------------
+
+
+def _integration_scheduler(*, hard_gb: float, estimate_bytes: int, margin_gb: float):
+    """Scheduler with a mock model, hard cap on but soft off (so the adaptive
+    throttle passes through and only the forward-front gate can fire)."""
+    model = MagicMock()
+    model.layers = []
+    tokenizer = MagicMock()
+    tokenizer.eos_token_id = 2
+    config = SchedulerConfig(
+        max_num_seqs=8,
+        prefill_step_size=256,
+        chunked_prefill=True,
+        paged_cache_block_size=0,
+    )
+    s = Scheduler(model=model, tokenizer=tokenizer, config=config)
+    s.batch_generator = MagicMock()
+    # Soft limit 0 -> _adaptive_chunk_size is a pure passthrough.
+    s._memory_limit_bytes = 0
+    s._memory_hard_limit_bytes = int(hard_gb * GB)
+    s._prefill_memory_guard = True
+    s._prefill_transient_margin_bytes = int(margin_gb * GB)
+    s.memory_monitor = MagicMock()
+    s.memory_monitor.estimate_prefill_peak_bytes = MagicMock(
+        return_value=estimate_bytes
+    )
+    return s, model
+
+
+def _prefill_state(n_tokens: int) -> _PrefillState:
+    req = Request(
+        request_id="rid-int",
+        prompt=list(range(n_tokens + 1)),
+        sampling_params=SamplingParams(max_tokens=8),
+    )
+    req.prompt_token_ids = list(range(n_tokens + 1))
+    req.num_prompt_tokens = n_tokens + 1
+    req.status = RequestStatus.WAITING
+    return _PrefillState(
+        request=req,
+        cache=[],
+        tokens_remaining=mx.array(list(range(n_tokens)))[None],
+        last_token=[n_tokens],
+        tokens_processed=0,
+        base_size=0,
+        emitted_boundaries={},
+        boundary_enabled=False,
+        block_size=0,
+        total_length=n_tokens + 1,
+    )
+
+
+class TestForwardGateBlocksForward:
+    """The gate must abort the chunk BEFORE self.model(...) runs."""
+
+    def test_over_cap_does_not_call_model_forward(self):
+        """Predicted peak over cap -> RuntimeError raised and model NOT called.
+
+        This is the discriminating assertion that pins the fix: pre-change
+        code (no forward-front gate) reaches self.model(chunk, ...) and the
+        transient lands on the cap (kernel panic on real hardware). With the
+        gate, the forward never runs.
+        """
+        # recent_peak high (set via instant probes) + estimate + margin > cap.
+        s, model = _integration_scheduler(
+            hard_gb=107.0, estimate_bytes=2 * GB, margin_gb=10 * 1.0
+        )
+        state = _prefill_state(n_tokens=200)
+
+        high = int(100 * GB)
+        with patch(
+            "omlx.scheduler.mx.get_active_memory", return_value=high
+        ), patch("omlx.scheduler.get_phys_footprint", return_value=high), patch(
+            "omlx.scheduler.mx.eval"
+        ) as mock_eval:
+            with pytest.raises(RuntimeError, match="refused before forward"):
+                s._step_prefill_chunk(state)
+
+        # The whole point: the model forward must not have executed.
+        model.assert_not_called()
+        mock_eval.assert_not_called()
+
+    def test_under_cap_runs_model_forward(self):
+        """Predicted peak under cap -> forward runs as normal (control)."""
+        s, model = _integration_scheduler(
+            hard_gb=107.0, estimate_bytes=1 * GB, margin_gb=2.0
+        )
+        state = _prefill_state(n_tokens=200)
+
+        low = int(50 * GB)  # 50 + 1 + 2 = 53 < 107
+        with patch(
+            "omlx.scheduler.mx.get_active_memory", return_value=low
+        ), patch("omlx.scheduler.get_phys_footprint", return_value=low), patch(
+            "omlx.scheduler.mx.eval"
+        ), patch("omlx.scheduler._sync_and_clear_cache"), patch(
+            "omlx.scheduler.get_prefill_tracker"
+        ):
+            done = s._step_prefill_chunk(state)
+
+        # Forward ran exactly once; prefill consumed the only chunk.
+        assert model.call_count == 1
+        assert done is True
+
+
+class TestForwardGateExternalLoopWiring:
+    """Sanity that the external loop wiring calls the gate before the forward.
+
+    Patch _prefill_forward_gate to raise; the model forward must not run.
+    Uses a tiny text-only request through _do_external_prefill.
+    """
+
+    def test_external_loop_calls_gate_before_forward(self):
+        model = MagicMock()
+        model.layers = []
+        tokenizer = MagicMock()
+        tokenizer.eos_token_id = 2
+        config = SchedulerConfig(
+            max_num_seqs=8,
+            prefill_step_size=256,
+            chunked_prefill=False,
+            paged_cache_block_size=0,
+        )
+        s = Scheduler(model=model, tokenizer=tokenizer, config=config)
+
+        req = Request(
+            request_id="rid-ext",
+            prompt=[1, 2, 3, 4, 5],
+            sampling_params=SamplingParams(max_tokens=8),
+        )
+        req.prompt_token_ids = [1, 2, 3, 4, 5]
+        req.num_prompt_tokens = 5
+
+        with patch.object(
+            s,
+            "_prefill_forward_gate",
+            side_effect=RuntimeError("Prefill refused before forward"),
+        ) as mock_gate, patch(
+            "omlx.scheduler.make_prompt_cache", return_value=[]
+        ):
+            with pytest.raises(RuntimeError, match="refused before forward"):
+                s._do_external_prefill(req, [1, 2, 3, 4, 5], None)
+
+        mock_gate.assert_called_once()
+        # Gate raised -> forward must not have run.
+        model.assert_not_called()
+
+
+class TestGateStateLog:
+    """_log_prefill_gate_state_once surfaces the resolved gate config loudly --
+    the prior monitor-based gate shipped inert and SILENT, found only on metal."""
+
+    def test_logs_resolved_margin_once(self, caplog):
+        s = _gate_scheduler(
+            hard_limit=107 * GB, recent_peak=0, estimate=2 * GB, margin=12 * GB
+        )
+        with caplog.at_level(logging.INFO, logger="omlx.scheduler"):
+            s._log_prefill_gate_state_once()
+            s._log_prefill_gate_state_once()  # second call must be a no-op
+        hits = [
+            r for r in caplog.records
+            if "prefill forward gate ACTIVE" in r.getMessage()
+        ]
+        assert len(hits) == 1
+        msg = hits[0].getMessage()
+        assert "margin=12.0GB" in msg
+        assert "cap=107.0GB" in msg
+        assert "estimator=active" in msg  # monitor returns >0 here
+
+    def test_warns_when_margin_zero(self, caplog):
+        s = _gate_scheduler(
+            hard_limit=107 * GB, recent_peak=0, estimate=2 * GB, margin=0
+        )
+        with caplog.at_level(logging.INFO, logger="omlx.scheduler"):
+            s._log_prefill_gate_state_once()
+        rec = [
+            r for r in caplog.records
+            if "prefill forward gate" in r.getMessage()
+        ][0]
+        assert rec.levelno == logging.WARNING
+        assert "margin=0" in rec.getMessage().lower()
+
+    def test_reports_estimator_disabled_without_monitor(self, caplog):
+        s = _gate_scheduler(
+            hard_limit=107 * GB,
+            recent_peak=0,
+            estimate=0,
+            margin=12 * GB,
+            monitor=False,
+        )
+        with caplog.at_level(logging.INFO, logger="omlx.scheduler"):
+            s._log_prefill_gate_state_once()
+        rec = [
+            r for r in caplog.records
+            if "prefill forward gate ACTIVE" in r.getMessage()
+        ][0]
+        assert "DISABLED" in rec.getMessage()
diff --git a/tests/test_settings.py b/tests/test_settings.py
index a16cde3e1..665cda404 100644
--- a/tests/test_settings.py
+++ b/tests/test_settings.py
@@ -444,13 +444,13 @@ def test_to_dict(self):
         """Test conversion to dictionary."""
         settings = HuggingFaceSettings(endpoint="https://hf-mirror.com")
         result = settings.to_dict()
-        assert result == {"endpoint": "https://hf-mirror.com"}
+        assert result == {"endpoint": "https://hf-mirror.com", "disable_xet": False}
 
     def test_to_dict_empty(self):
         """Test conversion to dictionary with empty endpoint."""
         settings = HuggingFaceSettings()
         result = settings.to_dict()
-        assert result == {"endpoint": ""}
+        assert result == {"endpoint": "", "disable_xet": False}
 
     def test_from_dict(self):
         """Test creation from dictionary."""
diff --git a/tests/test_video_discovery.py b/tests/test_video_discovery.py
new file mode 100644
index 000000000..125cdef04
--- /dev/null
+++ b/tests/test_video_discovery.py
@@ -0,0 +1,409 @@
+# SPDX-License-Identifier: Apache-2.0
+"""Tests for video (diffusers-layout) model discovery.
+
+Covers the discovery-layer changes from docs/video-generation-engine-spec.md
+section 4.1: model_index.json as a model root, WanPipeline -> "video",
+unknown-pipeline skip, no phantom component entries from org-folder descent,
+and regression guards for the existing config.json (LLM) path.
+"""
+
+import json
+from pathlib import Path
+
+import pytest
+
+from omlx.model_discovery import (
+    VIDEO_PIPELINE_CLASSES,
+    _is_model_dir,
+    detect_model_type,
+    discover_models,
+    estimate_model_size,
+    read_model_index_pipeline_class,
+)
+
+# Component subdirs of a Wan2.2-style diffusers repo. Each carries its own
+# config.json + weights, so pre-fix org-folder descent would have registered
+# them as phantom standalone "llm" models.
+_WAN_COMPONENTS = ("transformer", "transformer_2", "vae", "text_encoder")
+
+
+def make_diffusers_dir(
+    parent: Path,
+    name: str = "Wan2.2-T2V-A14B",
+    class_name: str | None = "WanPipeline",
+    component_weight_bytes: int = 1024,
+) -> Path:
+    """Create a diffusers-layout model dir with fake component weights."""
+    model_dir = parent / name
+    model_dir.mkdir(parents=True)
+
+    index: dict = {"_diffusers_version": "0.35.0"}
+    if class_name is not None:
+        index["_class_name"] = class_name
+    (model_dir / "model_index.json").write_text(json.dumps(index))
+
+    for comp in _WAN_COMPONENTS:
+        comp_dir = model_dir / comp
+        comp_dir.mkdir()
+        # Component config.json is registerable on its own (would detect as
+        # "llm") -- exactly what made the phantom-entry bug dangerous.
+        (comp_dir / "config.json").write_text(
+            json.dumps({"model_type": "llama", "architectures": ["LlamaForCausalLM"]})
+        )
+        (comp_dir / "model.safetensors").write_bytes(b"\0" * component_weight_bytes)
+
+    return model_dir
+
+
+def make_llm_dir(parent: Path, name: str = "llama-3b", weight_bytes: int = 512) -> Path:
+    """Create a plain transformers-layout LLM model dir."""
+    model_dir = parent / name
+    model_dir.mkdir(parents=True)
+    (model_dir / "config.json").write_text(
+        json.dumps({"model_type": "llama", "architectures": ["LlamaForCausalLM"]})
+    )
+    (model_dir / "model.safetensors").write_bytes(b"\0" * weight_bytes)
+    return model_dir
+
+
+class TestReadModelIndexPipelineClass:
+    """Unit tests for read_model_index_pipeline_class."""
+
+    def test_valid_wan_pipeline(self, tmp_path):
+        (tmp_path / "model_index.json").write_text(
+            json.dumps({"_class_name": "WanPipeline", "_diffusers_version": "0.35.0"})
+        )
+        assert read_model_index_pipeline_class(tmp_path) == "WanPipeline"
+
+    def test_unknown_pipeline_class_still_returned(self, tmp_path):
+        """The reader returns the raw class; the allowlist filter lives elsewhere."""
+        (tmp_path / "model_index.json").write_text(
+            json.dumps({"_class_name": "FluxPipeline"})
+        )
+        assert read_model_index_pipeline_class(tmp_path) == "FluxPipeline"
+
+    def test_missing_model_index(self, tmp_path):
+        assert read_model_index_pipeline_class(tmp_path) is None
+
+    def test_missing_class_name_key(self, tmp_path):
+        (tmp_path / "model_index.json").write_text(
+            json.dumps({"_diffusers_version": "0.35.0"})
+        )
+        assert read_model_index_pipeline_class(tmp_path) is None
+
+    def test_non_string_class_name(self, tmp_path):
+        (tmp_path / "model_index.json").write_text(json.dumps({"_class_name": 123}))
+        assert read_model_index_pipeline_class(tmp_path) is None
+
+    def test_invalid_json(self, tmp_path):
+        (tmp_path / "model_index.json").write_text("{not valid json")
+        assert read_model_index_pipeline_class(tmp_path) is None
+
+    def test_wan_pipeline_in_allowlist(self):
+        assert "WanPipeline" in VIDEO_PIPELINE_CLASSES
+
+
+class TestDetectModelTypeVideo:
+    """Tests for detect_model_type video branch + LLM regression."""
+
+    def test_wan_pipeline_dir_is_video(self, tmp_path):
+        model_dir = make_diffusers_dir(tmp_path)
+        assert detect_model_type(model_dir) == "video"
+
+    def test_wan_pipeline_index_alone_is_video(self, tmp_path):
+        """model_index.json alone (no components yet) already types as video."""
+        (tmp_path / "model_index.json").write_text(
+            json.dumps({"_class_name": "WanPipeline"})
+        )
+        assert detect_model_type(tmp_path) == "video"
+
+    def test_unknown_pipeline_falls_through_to_llm(self, tmp_path):
+        """Unknown pipeline class does not type as video. detect_model_type
+        falls back to "llm" (the skip happens in _register_model)."""
+        model_dir = make_diffusers_dir(tmp_path, class_name="FluxPipeline")
+        assert detect_model_type(model_dir) == "llm"
+
+    def test_config_json_llm_unchanged(self, tmp_path):
+        """Regression guard: plain config.json LLM detection is unaffected."""
+        model_dir = make_llm_dir(tmp_path)
+        assert detect_model_type(model_dir) == "llm"
+
+    def test_video_branch_runs_before_missing_config_fallback(self, tmp_path):
+        """A WanPipeline dir has no root config.json; without the video branch
+        the missing-config early-exit would have returned "llm"."""
+        model_dir = make_diffusers_dir(tmp_path)
+        assert not (model_dir / "config.json").exists()
+        assert detect_model_type(model_dir) == "video"
+
+
+class TestIsModelDir:
+    """_is_model_dir accepts model_index.json as a model root."""
+
+    def test_model_index_json_is_model_root(self, tmp_path):
+        (tmp_path / "model_index.json").write_text(
+            json.dumps({"_class_name": "WanPipeline"})
+        )
+        assert _is_model_dir(tmp_path) is True
+
+    def test_config_json_is_model_root(self, tmp_path):
+        (tmp_path / "config.json").write_text("{}")
+        assert _is_model_dir(tmp_path) is True
+
+    def test_empty_dir_is_not_model_root(self, tmp_path):
+        assert _is_model_dir(tmp_path) is False
+
+    def test_adapter_wins_over_model_index(self, tmp_path):
+        """adapter_config.json + model_index.json -> adapter check wins."""
+        (tmp_path / "model_index.json").write_text(
+            json.dumps({"_class_name": "WanPipeline"})
+        )
+        (tmp_path / "adapter_config.json").write_text("{}")
+        assert _is_model_dir(tmp_path) is False
+
+
+class TestDiscoverVideoOwnerRepoLayout:
+    """Owner/repo (organized two-level) layout."""
+
+    def test_single_video_entry_no_phantoms(self, tmp_path):
+        make_diffusers_dir(tmp_path / "Wan-AI", name="Wan2.2-T2V-A14B")
+
+        models = discover_models(tmp_path)
+
+        assert set(models.keys()) == {"Wan2.2-T2V-A14B"}
+        entry = models["Wan2.2-T2V-A14B"]
+        assert entry.model_type == "video"
+        assert entry.engine_type == "video"
+        assert entry.config_model_type == "WanPipeline"
+        # No phantom component entries
+        for comp in _WAN_COMPONENTS:
+            assert comp not in models
+
+    def test_entry_paths_and_size(self, tmp_path):
+        model_dir = make_diffusers_dir(
+            tmp_path / "Wan-AI", component_weight_bytes=1000
+        )
+
+        models = discover_models(tmp_path)
+        entry = models["Wan2.2-T2V-A14B"]
+        assert Path(entry.model_path) == model_dir
+        # 4 components x 1000 bytes, 5% runtime overhead
+        assert entry.estimated_size == int(4 * 1000 * 1.05)
+
+    def test_video_alongside_llm_in_same_org(self, tmp_path):
+        org = tmp_path / "Wan-AI"
+        make_diffusers_dir(org)
+        make_llm_dir(org, name="some-llm")
+
+        models = discover_models(tmp_path)
+        assert set(models.keys()) == {"Wan2.2-T2V-A14B", "some-llm"}
+        assert models["Wan2.2-T2V-A14B"].model_type == "video"
+        assert models["some-llm"].model_type == "llm"
+
+
+class TestDiscoverVideoFlatLayout:
+    """Flat layout: the diffusers dir sits directly under model_dir.
+
+    This is the org-folder-descent fix: pre-fix, a dir without root
+    config.json was treated as an organization folder and its component
+    subdirs (transformer/, vae/, ...) were registered as phantom llm models.
+    """
+
+    def test_single_video_entry_no_phantoms(self, tmp_path):
+        make_diffusers_dir(tmp_path, name="Wan2.2-T2V-A14B")
+
+        models = discover_models(tmp_path)
+
+        assert set(models.keys()) == {"Wan2.2-T2V-A14B"}
+        entry = models["Wan2.2-T2V-A14B"]
+        assert entry.model_type == "video"
+        assert entry.engine_type == "video"
+        assert entry.config_model_type == "WanPipeline"
+        for comp in _WAN_COMPONENTS:
+            assert comp not in models
+
+    def test_flat_and_owner_repo_give_same_result(self, tmp_path):
+        flat_root = tmp_path / "flat"
+        flat_root.mkdir()
+        make_diffusers_dir(flat_root)
+
+        org_root = tmp_path / "org"
+        org_root.mkdir()
+        make_diffusers_dir(org_root / "Wan-AI")
+
+        flat = discover_models(flat_root)
+        org = discover_models(org_root)
+
+        assert set(flat.keys()) == set(org.keys()) == {"Wan2.2-T2V-A14B"}
+        for key in ("model_type", "engine_type", "config_model_type", "estimated_size"):
+            assert getattr(flat["Wan2.2-T2V-A14B"], key) == getattr(
+                org["Wan2.2-T2V-A14B"], key
+            )
+
+
+class TestUnknownPipelineSkipped:
+    """Unknown diffusers pipelines are skipped at registration -- no entry,
+    no phantom component entries."""
+
+    def test_flux_pipeline_flat_not_registered(self, tmp_path):
+        make_diffusers_dir(tmp_path, name="FLUX.2-dev", class_name="FluxPipeline")
+
+        models = discover_models(tmp_path)
+        assert models == {}
+
+    def test_flux_pipeline_owner_repo_not_registered(self, tmp_path):
+        make_diffusers_dir(
+            tmp_path / "black-forest-labs", name="FLUX.2-dev", class_name="FluxPipeline"
+        )
+
+        models = discover_models(tmp_path)
+        assert models == {}
+
+    def test_flux_skip_logs_warning(self, tmp_path, caplog):
+        make_diffusers_dir(tmp_path, name="FLUX.2-dev", class_name="FluxPipeline")
+
+        with caplog.at_level("WARNING", logger="omlx.model_discovery"):
+            discover_models(tmp_path)
+
+        assert any(
+            "FluxPipeline" in rec.message and "FLUX.2-dev" in rec.message
+            for rec in caplog.records
+        )
+
+    def test_flux_does_not_block_sibling_models(self, tmp_path):
+        make_diffusers_dir(tmp_path, name="FLUX.2-dev", class_name="FluxPipeline")
+        make_llm_dir(tmp_path, name="llama-3b")
+        make_diffusers_dir(tmp_path, name="Wan2.2-T2V-A14B")
+
+        models = discover_models(tmp_path)
+        assert set(models.keys()) == {"Wan2.2-T2V-A14B", "llama-3b"}
+
+
+class TestMalformedModelIndex:
+    """model_index.json with no _class_name or broken JSON: never video,
+    discovery does not crash."""
+
+    def test_missing_class_name_not_video(self, tmp_path):
+        make_diffusers_dir(tmp_path, name="no-class-name", class_name=None)
+
+        models = discover_models(tmp_path)
+
+        # A model_index.json without a readable _class_name (and no root
+        # config.json) is skipped entirely: registering it would produce
+        # an unloadable llm entry. No phantoms either.
+        assert "no-class-name" not in models
+        for comp in _WAN_COMPONENTS:
+            assert comp not in models
+
+    def test_invalid_json_not_video(self, tmp_path):
+        model_dir = make_diffusers_dir(tmp_path, name="bad-json", class_name=None)
+        (model_dir / "model_index.json").write_text("{definitely not json")
+
+        models = discover_models(tmp_path)
+
+        for entry in models.values():
+            assert entry.model_type != "video"
+        for comp in _WAN_COMPONENTS:
+            assert comp not in models
+
+    def test_malformed_index_without_weights_not_registered(self, tmp_path):
+        """No weights anywhere -> estimate_model_size raises -> entry dropped
+        gracefully (no exception escapes discover_models)."""
+        model_dir = tmp_path / "empty-index"
+        model_dir.mkdir()
+        (model_dir / "model_index.json").write_text(json.dumps({"foo": "bar"}))
+
+        models = discover_models(tmp_path)
+        assert models == {}
+
+
+class TestAdapterExclusion:
+    """adapter_config.json wins over model_index.json."""
+
+    def test_adapter_with_model_index_excluded_flat(self, tmp_path):
+        model_dir = make_diffusers_dir(tmp_path, name="wan-lora")
+        (model_dir / "adapter_config.json").write_text("{}")
+
+        models = discover_models(tmp_path)
+
+        assert "wan-lora" not in models
+        # Adapter dirs are skipped wholesale -- no descent, no phantoms.
+        for comp in _WAN_COMPONENTS:
+            assert comp not in models
+
+    def test_adapter_with_model_index_excluded_in_org(self, tmp_path):
+        model_dir = make_diffusers_dir(tmp_path / "Wan-AI", name="wan-lora")
+        (model_dir / "adapter_config.json").write_text("{}")
+
+        models = discover_models(tmp_path)
+        assert models == {}
+
+
+class TestEstimateModelSizeDiffusers:
+    """estimate_model_size sums recursive **/*.safetensors for diffusers
+    layouts (no root-level weight files)."""
+
+    def test_recursive_sum_with_overhead(self, tmp_path):
+        model_dir = tmp_path / "Wan2.2-T2V-A14B"
+        model_dir.mkdir()
+        (model_dir / "model_index.json").write_text(
+            json.dumps({"_class_name": "WanPipeline"})
+        )
+        sizes = {
+            "transformer/diffusion_pytorch_model-00001-of-00002.safetensors": 3000,
+            "transformer/diffusion_pytorch_model-00002-of-00002.safetensors": 2000,
+            "transformer_2/diffusion_pytorch_model.safetensors": 1500,
+            "vae/diffusion_pytorch_model.safetensors": 700,
+            "text_encoder/model.safetensors": 300,
+        }
+        for rel, size in sizes.items():
+            f = model_dir / rel
+            f.parent.mkdir(exist_ok=True)
+            f.write_bytes(b"\0" * size)
+
+        expected = int(sum(sizes.values()) * 1.05)
+        assert estimate_model_size(model_dir) == expected
+
+    def test_no_weights_raises(self, tmp_path):
+        model_dir = tmp_path / "wan-empty"
+        model_dir.mkdir()
+        (model_dir / "model_index.json").write_text(
+            json.dumps({"_class_name": "WanPipeline"})
+        )
+        with pytest.raises(ValueError):
+            estimate_model_size(model_dir)
+
+
+class TestLLMRegressionGuard:
+    """Normal LLM dirs (config.json) still discover exactly as before."""
+
+    def test_flat_llm(self, tmp_path):
+        make_llm_dir(tmp_path, name="llama-3b", weight_bytes=2048)
+
+        models = discover_models(tmp_path)
+
+        assert set(models.keys()) == {"llama-3b"}
+        entry = models["llama-3b"]
+        assert entry.model_type == "llm"
+        assert entry.engine_type == "batched"
+        assert entry.config_model_type == "llama"
+        assert entry.estimated_size == int(2048 * 1.05)
+
+    def test_org_folder_llm_descent_still_works(self, tmp_path):
+        org = tmp_path / "mlx-community"
+        make_llm_dir(org, name="llama-3b")
+        make_llm_dir(org, name="qwen-7b")
+
+        models = discover_models(tmp_path)
+        assert set(models.keys()) == {"llama-3b", "qwen-7b"}
+        assert all(m.model_type == "llm" for m in models.values())
+
+    def test_mixed_llm_and_video(self, tmp_path):
+        make_llm_dir(tmp_path, name="llama-3b")
+        make_diffusers_dir(tmp_path / "Wan-AI")
+
+        models = discover_models(tmp_path)
+        assert set(models.keys()) == {"llama-3b", "Wan2.2-T2V-A14B"}
+        assert models["llama-3b"].model_type == "llm"
+        assert models["llama-3b"].engine_type == "batched"
+        assert models["Wan2.2-T2V-A14B"].model_type == "video"
+        assert models["Wan2.2-T2V-A14B"].engine_type == "video"
diff --git a/tests/test_video_manager.py b/tests/test_video_manager.py
new file mode 100644
index 000000000..a71e7b7c2
--- /dev/null
+++ b/tests/test_video_manager.py
@@ -0,0 +1,513 @@
+# SPDX-License-Identifier: Apache-2.0
+"""Tests for VideoJobManager (omlx/video/manager.py) with a fake worker.
+
+The manager spawns [worker_python, -I, worker_script, --spec, spec.json];
+these tests point worker_python at sys.executable and worker_script at a
+tiny stdlib-only script written into tmp_path, so no model / mflux / venv
+is needed. Spec reference: docs/video-generation-engine-spec.md section 4.2.
+"""
+
+import asyncio
+import json
+import sys
+import time
+from pathlib import Path
+
+import pytest
+
+import omlx.video.manager as vm
+from omlx.settings import VideoSettings
+from omlx.video.manager import QueueFullError, VideoJob, VideoJobManager
+
+GB = 1024**3
+
+
+# ---------------------------------------------------------------------------
+# Fake worker scripts (stdlib only -- they run under python -I)
+# ---------------------------------------------------------------------------
+
+_PRELUDE = """\
+import json, sys, time
+
+def emit(obj):
+    sys.stdout.write(json.dumps(obj) + "\\n")
+    sys.stdout.flush()
+
+spec_path = sys.argv[sys.argv.index("--spec") + 1]
+with open(spec_path) as f:
+    spec = json.load(f)
+"""
+
+_SUCCESS_BODY = """\
+emit({"phase": "loading"})
+emit({"phase": "loaded"})
+emit({"phase": "denoise", "step": 1, "total_steps": 2})
+emit({"phase": "denoise", "step": 2, "total_steps": 2})
+emit({"phase": "saving"})
+with open(spec["output_path"], "wb") as f:
+    f.write(b"FAKE-MP4-BYTES")
+with open(spec["manifest_path"], "w") as f:
+    json.dump({"status": "completed", "lifetime_max_phys_gb": 1.5}, f)
+sys.exit(0)
+"""
+
+_CRASH_BODY = """\
+emit({"phase": "loading"})
+with open(spec["manifest_path"], "w") as f:
+    json.dump({"status": "failed", "code": "worker_crashed",
+               "message": "boom"}, f)
+sys.exit(1)
+"""
+
+_NO_OUTPUT_BODY = """\
+emit({"phase": "loading"})
+emit({"phase": "saving"})
+sys.exit(0)
+"""
+
+_STALL_BODY = """\
+emit({"phase": "loading"})
+time.sleep(60)
+sys.exit(0)
+"""
+
+# Prints a heartbeat every 0.5s "forever" (bounded so a leaked process
+# cannot outlive the test session by much)
+_CHATTY_BODY = """\
+for _ in range(240):
+    emit({"phase": "denoise"})
+    time.sleep(0.5)
+sys.exit(0)
+"""
+
+
+def _write_worker(tmp_path: Path, name: str, body: str) -> Path:
+    script = tmp_path / name
+    script.write_text(_PRELUDE + body)
+    return script
+
+
+# ---------------------------------------------------------------------------
+# Fake enforcer
+# ---------------------------------------------------------------------------
+
+
+class FakeEnforcer:
+    """Records lease-related calls so tests can assert order + release."""
+
+    def __init__(self, ceiling_gb: float = 100.0, peak_bytes: int = 0):
+        self.is_running = True
+        self._ceiling = int(ceiling_gb * GB)
+        self.peak = peak_bytes
+        self._soft_threshold = 0.85
+        self._prefill_transient_margin_bytes = 0
+        self.calls: list[tuple] = []
+
+    def get_final_ceiling(self) -> int:
+        return self._ceiling
+
+    def recent_peak_bytes(self) -> int:
+        return self.peak
+
+    def acquire_video_lease(self, lease_bytes: int) -> None:
+        self.calls.append(("acquire", lease_bytes))
+
+    def set_video_worker_pid(self, pid) -> None:
+        self.calls.append(("set_pid", pid))
+
+    def release_video_lease(self) -> None:
+        self.calls.append(("release",))
+
+    # assertion helpers ----------------------------------------------------
+
+    def call_names(self) -> list[str]:
+        return [c[0] for c in self.calls]
+
+    def assert_lease_cycle(self, lease_bytes: int) -> None:
+        """One acquire -> set_pid(real) -> set_pid(None) -> release cycle."""
+        names = self.call_names()
+        assert names.count("acquire") == names.count("release") == 1
+        assert ("acquire", lease_bytes) in self.calls
+        assert names.index("acquire") < names.index("release")
+        pids = [c[1] for c in self.calls if c[0] == "set_pid"]
+        assert pids[-1] is None  # cleared before release
+        assert isinstance(pids[0], int) and pids[0] > 0
+        # acquire happens before the pid is registered
+        assert names.index("acquire") < names.index("set_pid")
+        # release is the very last lease call
+        assert names[-1] == "release"
+
+
+# ---------------------------------------------------------------------------
+# Construction helpers
+# ---------------------------------------------------------------------------
+
+
+def _make_settings(**overrides) -> VideoSettings:
+    kwargs = dict(
+        enabled=True,
+        worker_python=sys.executable,
+        memory_lease_gb=1.0,
+        max_queued_jobs=4,
+        job_timeout_seconds=60,
+        progress_stall_timeout_seconds=30,
+        artifacts_max_count=50,
+        artifacts_max_gb=50.0,
+    )
+    kwargs.update(overrides)
+    return VideoSettings(**kwargs)
+
+
+def _make_job(job_id: str = "video_t1", **param_overrides) -> VideoJob:
+    params = dict(prompt="a cat", width=256, height=256, frames=5,
+                  steps=2, fps=16, seed=7)
+    params.update(param_overrides)
+    return VideoJob(id=job_id, model_id="wan-test",
+                    model_dir="/nonexistent/model", params=params)
+
+
+def _make_manager(tmp_path: Path, worker_body: str,
+                  settings: VideoSettings | None = None,
+                  enforcer: FakeEnforcer | None = None,
+                  ) -> tuple[VideoJobManager, FakeEnforcer]:
+    enforcer = enforcer or FakeEnforcer()
+    script = _write_worker(tmp_path, "fake_worker.py", worker_body)
+    manager = VideoJobManager(
+        settings=settings or _make_settings(),
+        base_path=tmp_path,
+        enforcer=enforcer,
+        worker_script=script,
+    )
+    return manager, enforcer
+
+
+async def _wait_until(cond, timeout: float = 12.0, interval: float = 0.05):
+    deadline = time.monotonic() + timeout
+    while time.monotonic() < deadline:
+        if cond():
+            return True
+        await asyncio.sleep(interval)
+    return False
+
+
+async def _wait_terminal(job: VideoJob, timeout: float = 12.0) -> None:
+    ok = await _wait_until(
+        lambda: job.status in ("completed", "failed"), timeout=timeout
+    )
+    assert ok, (
+        f"job did not reach a terminal state within {timeout}s "
+        f"(status={job.status}, phase={job.phase!r})"
+    )
+
+
+# ---------------------------------------------------------------------------
+# (1) success path
+# ---------------------------------------------------------------------------
+
+
+async def test_success_completes_with_artifact_and_lease_cycle(tmp_path):
+    manager, enforcer = _make_manager(tmp_path, _SUCCESS_BODY)
+    try:
+        job = await manager.submit(_make_job("video_ok1"))
+        await _wait_terminal(job)
+
+        assert job.status == "completed"
+        assert job.error is None
+        assert job.progress == 100
+        assert job.phase == "done"
+        assert job.artifact_path is not None
+        artifact = Path(job.artifact_path)
+        assert artifact.exists() and artifact.stat().st_size > 0
+        assert artifact == manager.artifacts_dir / job.id / "output.mp4"
+        assert job.peak_memory_gb == 1.5
+        assert job.wall_seconds is not None
+
+        # wire shape
+        wire = job.to_dict()
+        assert wire["object"] == "video"
+        assert wire["status"] == "completed"
+        assert wire["progress"] == 100
+        assert wire["error"] is None
+        assert wire["size"] == "256x256"
+
+        # lease acquired AND released, in order, pid registered then cleared
+        enforcer.assert_lease_cycle(lease_bytes=1 * GB)
+
+        # persisted record reflects completion
+        with open(manager.jobs_dir / f"{job.id}.json") as f:
+            persisted = json.load(f)
+        assert persisted["status"] == "completed"
+    finally:
+        await manager.shutdown()
+
+
+# ---------------------------------------------------------------------------
+# (2) crash with failure manifest
+# ---------------------------------------------------------------------------
+
+
+async def test_crash_propagates_manifest_error_and_releases_lease(tmp_path):
+    manager, enforcer = _make_manager(tmp_path, _CRASH_BODY)
+    try:
+        job = await manager.submit(_make_job("video_crash1"))
+        await _wait_terminal(job)
+
+        assert job.status == "failed"
+        assert job.error == {"code": "worker_crashed", "message": "boom"}
+        assert job.artifact_path is None
+        # lease released even on failure
+        enforcer.assert_lease_cycle(lease_bytes=1 * GB)
+    finally:
+        await manager.shutdown()
+
+
+# ---------------------------------------------------------------------------
+# (3) exit 0 but no output file
+# ---------------------------------------------------------------------------
+
+
+async def test_exit_zero_without_output_is_output_invalid(tmp_path):
+    manager, enforcer = _make_manager(tmp_path, _NO_OUTPUT_BODY)
+    try:
+        job = await manager.submit(_make_job("video_noout1"))
+        await _wait_terminal(job)
+
+        assert job.status == "failed"
+        assert job.error is not None
+        assert job.error["code"] == vm.ERR_OUTPUT_INVALID
+        enforcer.assert_lease_cycle(lease_bytes=1 * GB)
+    finally:
+        await manager.shutdown()
+
+
+# ---------------------------------------------------------------------------
+# (4) stall: silent worker killed by the watchdog
+# ---------------------------------------------------------------------------
+
+
+async def test_stalled_worker_is_killed(tmp_path):
+    settings = _make_settings(progress_stall_timeout_seconds=2)
+    manager, enforcer = _make_manager(tmp_path, _STALL_BODY,
+                                      settings=settings)
+    try:
+        job = await manager.submit(_make_job("video_stall1"))
+        # one heartbeat then 60s of silence; watchdog ticks every 2s so the
+        # kill should land well within ~8s
+        await _wait_terminal(job, timeout=12.0)
+
+        assert job.status == "failed"
+        assert job.error is not None
+        assert job.error["code"] == vm.ERR_WORKER_STALLED
+        enforcer.assert_lease_cycle(lease_bytes=1 * GB)
+    finally:
+        await manager.shutdown()
+
+
+# ---------------------------------------------------------------------------
+# (5) per-run timeout
+# ---------------------------------------------------------------------------
+
+
+async def test_job_timeout_kills_chatty_worker(tmp_path):
+    settings = _make_settings(job_timeout_seconds=2)
+    manager, enforcer = _make_manager(tmp_path, _CHATTY_BODY,
+                                      settings=settings)
+    try:
+        job = await manager.submit(_make_job("video_timeout1"))
+        await _wait_terminal(job, timeout=12.0)
+
+        assert job.status == "failed"
+        assert job.error is not None
+        assert job.error["code"] == vm.ERR_JOB_TIMEOUT
+        enforcer.assert_lease_cycle(lease_bytes=1 * GB)
+    finally:
+        await manager.shutdown()
+
+
+# ---------------------------------------------------------------------------
+# (6) queue depth cap
+# ---------------------------------------------------------------------------
+
+
+async def test_queue_full_raises_when_cap_reached(tmp_path):
+    settings = _make_settings(max_queued_jobs=1)
+    manager, _ = _make_manager(tmp_path, _CHATTY_BODY, settings=settings)
+    try:
+        job_a = await manager.submit(_make_job("video_qa"))
+        # wait until the dispatcher picks A up (queue drains)
+        ok = await _wait_until(
+            lambda: job_a.status == "in_progress" and manager.queue_depth() == 0
+        )
+        assert ok, "first job never started"
+
+        await manager.submit(_make_job("video_qb"))  # fills the queue
+        assert manager.queue_depth() == 1
+        with pytest.raises(QueueFullError):
+            await manager.submit(_make_job("video_qc"))
+        assert manager.get("video_qc") is None
+    finally:
+        await manager.shutdown()
+
+
+# ---------------------------------------------------------------------------
+# (7) DELETE a running job
+# ---------------------------------------------------------------------------
+
+
+async def test_delete_running_job_kills_worker_and_removes_record(tmp_path):
+    manager, _ = _make_manager(tmp_path, _CHATTY_BODY)
+    try:
+        job = await manager.submit(_make_job("video_del1"))
+        ok = await _wait_until(
+            lambda: job.status == "in_progress"
+            and manager._current_proc is not None
+        )
+        assert ok, "job never started"
+        proc = manager._current_proc
+
+        assert await manager.delete(job.id) is True
+
+        assert proc.returncode is not None  # worker terminated
+        assert manager.get(job.id) is None
+        assert not (manager.jobs_dir / f"{job.id}.json").exists()
+        assert not (manager.artifacts_dir / job.id).exists()
+        # deleting again reports not found
+        assert await manager.delete(job.id) is False
+    finally:
+        await manager.shutdown()
+
+
+# ---------------------------------------------------------------------------
+# (8) startup replay marks in-flight jobs as failed
+# ---------------------------------------------------------------------------
+
+
+async def test_restart_replay_fails_inflight_jobs(tmp_path):
+    jobs_dir = tmp_path / "video-jobs"
+    jobs_dir.mkdir(parents=True)
+    inflight = _make_job("video_replay1")
+    inflight.status = "in_progress"
+    inflight.started_at = time.time()
+    with open(jobs_dir / "video_replay1.json", "w") as f:
+        json.dump(inflight.to_persist(), f)
+    done = _make_job("video_replay2")
+    done.status = "completed"
+    done.progress = 100
+    done.completed_at = time.time()
+    with open(jobs_dir / "video_replay2.json", "w") as f:
+        json.dump(done.to_persist(), f)
+
+    manager, _ = _make_manager(tmp_path, _SUCCESS_BODY)
+    try:
+        replayed = manager.get("video_replay1")
+        assert replayed is not None
+        assert replayed.status == "failed"
+        assert replayed.error is not None
+        assert replayed.error["code"] == vm.ERR_SERVER_RESTARTED
+        assert replayed.completed_at is not None
+        # the failure is persisted back to disk
+        with open(jobs_dir / "video_replay1.json") as f:
+            assert json.load(f)["status"] == "failed"
+        # terminal jobs replay unchanged
+        survivor = manager.get("video_replay2")
+        assert survivor is not None and survivor.status == "completed"
+    finally:
+        await manager.shutdown()
+
+
+# ---------------------------------------------------------------------------
+# (9) retention: LRU purge beyond artifacts_max_count
+# ---------------------------------------------------------------------------
+
+
+async def test_retention_purges_oldest_artifact_but_keeps_record(tmp_path):
+    settings = _make_settings(artifacts_max_count=1)
+    manager, _ = _make_manager(tmp_path, _SUCCESS_BODY, settings=settings)
+    try:
+        job1 = await manager.submit(_make_job("video_ret1"))
+        await _wait_terminal(job1)
+        assert job1.status == "completed"
+        assert job1.artifact_path is not None
+
+        job2 = await manager.submit(_make_job("video_ret2"))
+        await _wait_terminal(job2)
+        assert job2.status == "completed"
+
+        ok = await _wait_until(lambda: job1.artifact_path is None)
+        assert ok, "retention sweep did not purge the older artifact"
+        assert job1.expires_at is not None
+        assert job1.status == "completed"  # record kept, status unchanged
+        assert manager.get(job1.id) is not None
+        assert not (manager.artifacts_dir / job1.id).exists()
+        # newest artifact survives
+        assert job2.artifact_path is not None
+        assert Path(job2.artifact_path).exists()
+        assert job2.expires_at is None
+    finally:
+        await manager.shutdown()
+
+
+# ---------------------------------------------------------------------------
+# (10) memory admission deferral
+# ---------------------------------------------------------------------------
+
+
+async def test_admission_defers_then_proceeds(tmp_path, monkeypatch):
+    monkeypatch.setattr(vm, "_ADMISSION_RECHECK_S", 0.2)
+    enforcer = FakeEnforcer(ceiling_gb=100.0, peak_bytes=200 * GB)
+    manager, _ = _make_manager(tmp_path, _SUCCESS_BODY, enforcer=enforcer)
+    try:
+        job = await manager.submit(_make_job("video_adm1"))
+        ok = await _wait_until(lambda: "waiting for memory" in job.phase)
+        assert ok, f"job never reported memory wait (phase={job.phase!r})"
+        assert job.status == "queued"
+        assert enforcer.call_names() == []  # no lease while deferred
+
+        enforcer.peak = 0  # pressure clears
+        await _wait_terminal(job)
+        assert job.status == "completed"
+        enforcer.assert_lease_cycle(lease_bytes=1 * GB)
+    finally:
+        await manager.shutdown()
+
+
+# ---------------------------------------------------------------------------
+# (11) watchdog: footprint over lease
+# ---------------------------------------------------------------------------
+
+
+async def test_watchdog_kills_worker_over_lease(tmp_path, monkeypatch):
+    lease = 1 * GB
+    monkeypatch.setattr(vm, "get_phys_footprint", lambda pid=None: lease + GB)
+    manager, enforcer = _make_manager(tmp_path, _CHATTY_BODY)
+    try:
+        job = await manager.submit(_make_job("video_lease1"))
+        await _wait_terminal(job, timeout=12.0)
+
+        assert job.status == "failed"
+        assert job.error is not None
+        assert job.error["code"] == vm.ERR_LEASE_EXCEEDED
+        enforcer.assert_lease_cycle(lease_bytes=lease)
+    finally:
+        await manager.shutdown()
+
+
+# ---------------------------------------------------------------------------
+# (12) watchdog: footprint monitor failure (3x zero reads)
+# ---------------------------------------------------------------------------
+
+
+async def test_watchdog_kills_worker_when_monitor_fails(tmp_path, monkeypatch):
+    monkeypatch.setattr(vm, "get_phys_footprint", lambda pid=None: 0)
+    manager, enforcer = _make_manager(tmp_path, _CHATTY_BODY)
+    try:
+        job = await manager.submit(_make_job("video_mon1"))
+        # 3 zero reads at 2s watchdog cadence -> killed around t=6s
+        await _wait_terminal(job, timeout=14.0)
+
+        assert job.status == "failed"
+        assert job.error is not None
+        assert job.error["code"] == vm.ERR_MONITOR_FAILED
+        enforcer.assert_lease_cycle(lease_bytes=1 * GB)
+    finally:
+        await manager.shutdown()
diff --git a/tests/test_video_pool_and_lease.py b/tests/test_video_pool_and_lease.py
new file mode 100644
index 000000000..6f8e1012e
--- /dev/null
+++ b/tests/test_video_pool_and_lease.py
@@ -0,0 +1,332 @@
+# SPDX-License-Identifier: Apache-2.0
+"""Tests for video-model pool rejection and the enforcer video memory lease.
+
+Part A: EnginePool.get_engine must reject model_type == "video" entries with
+ModelTypeNotLoadableError BEFORE the memory-admission loop, so a misrouted
+chat request can never evict resident LLM engines
+(docs/video-generation-engine-spec.md section 3).
+
+Part B: ProcessMemoryEnforcer video lease (spec section 4.4): the lease is
+subtracted from the final ceiling at a single choke point, the dynamic
+ceiling adds back min(worker_footprint, lease) so the worker is counted
+exactly once, and acquire/release move the Metal wired-limit request.
+"""
+
+import asyncio
+import time
+from unittest.mock import MagicMock
+
+import pytest
+
+import omlx.process_memory_enforcer as pme
+from omlx.engine_pool import EngineEntry, EnginePool
+from omlx.exceptions import EnginePoolError, ModelTypeNotLoadableError
+
+GB = 1024**3
+
+# Deterministic static ceiling patched onto enforcer instances so the
+# wired-limit math does not depend on the host machine's RAM.
+STATIC_CEILING = 100 * GB
+CUSTOM_GB = 20.0
+
+
+# =========================================================================
+# Part A -- pool rejection of video entries
+# =========================================================================
+
+
+class FakeLLMEngine:
+    """Loaded-engine stand-in that records eviction attempts."""
+
+    def __init__(self):
+        self.stop_called = False
+
+    def has_active_requests(self) -> bool:
+        return False
+
+    async def stop(self) -> None:
+        self.stop_called = True
+
+
+def _make_pool_with_video_and_llm():
+    pool = EnginePool(scheduler_config=None)
+    fake_engine = FakeLLMEngine()
+    pool._entries["llm-id"] = EngineEntry(
+        model_id="llm-id",
+        model_path="/nonexistent/llm-id",
+        model_type="llm",
+        engine_type="batched",
+        estimated_size=4 * GB,
+        engine=fake_engine,
+        last_access=time.time(),
+    )
+    pool._entries["video-id"] = EngineEntry(
+        model_id="video-id",
+        model_path="/nonexistent/video-id",
+        model_type="video",
+        engine_type="video",
+        estimated_size=42 * GB,
+    )
+    return pool, fake_engine
+
+
+class TestVideoPoolRejection:
+    def test_model_type_not_loadable_is_engine_pool_error(self):
+        assert issubclass(ModelTypeNotLoadableError, EnginePoolError)
+        exc = ModelTypeNotLoadableError("video-id", "video")
+        assert exc.model_id == "video-id"
+        assert exc.model_type == "video"
+        assert "/v1/videos" in str(exc)
+
+    def test_model_type_map_has_video_engine(self):
+        assert EnginePool._MODEL_TYPE_TO_ENGINE["video"] == "video"
+
+    async def test_get_engine_rejects_video_before_admission(self, monkeypatch):
+        """Video rejection fires before admission: no LLM eviction happens.
+
+        Memory is mocked so that, had the 42GB video entry reached the
+        admission loop, projected (20 + 42 GB) > ceiling (50 GB) would
+        have evicted the idle llm entry. The rejection must fire first.
+        """
+        pool, fake_engine = _make_pool_with_video_and_llm()
+        pool._get_final_ceiling = lambda: 50 * GB
+        # Make current usage high enough that admission WOULD evict.
+        monkeypatch.setattr(
+            "omlx.engine_pool.get_phys_footprint", lambda pid=None: 20 * GB
+        )
+
+        with pytest.raises(ModelTypeNotLoadableError) as excinfo:
+            await pool.get_engine("video-id")
+
+        assert excinfo.value.model_id == "video-id"
+        assert excinfo.value.model_type == "video"
+        assert "/v1/videos" in str(excinfo.value)
+        # The resident llm engine must be untouched -- not stopped, not
+        # unloaded.
+        assert pool._entries["llm-id"].engine is fake_engine
+        assert fake_engine.stop_called is False
+
+
+# =========================================================================
+# Part B -- enforcer video memory lease
+# =========================================================================
+
+
+@pytest.fixture
+def wired_calls(monkeypatch):
+    """Replace _apply_metal_wired_limit with a recorder; no mx side effects."""
+    calls: list[int] = []
+
+    def _recorder(desired_bytes):
+        calls.append(desired_bytes)
+        return desired_bytes, None
+
+    monkeypatch.setattr(pme, "_apply_metal_wired_limit", _recorder)
+    return calls
+
+
+def _make_pool_stub():
+    pool = MagicMock()
+    pool._entries = {}
+    pool._lock = asyncio.Lock()
+    return pool
+
+
+def _make_enforcer(monkeypatch, tier="custom", custom_gb=CUSTOM_GB, **kwargs):
+    """Enforcer with deterministic ceilings; never started (no loop).
+
+    custom tier -> dynamic ceiling == custom_gb verbatim. The static
+    ceiling is pinned to STATIC_CEILING and the Metal cap mocked away so
+    get_final_ceiling() == min(STATIC_CEILING, custom) == custom on any
+    machine.
+    """
+    monkeypatch.setattr(pme, "get_effective_metal_cap_bytes", lambda: 0)
+    enforcer = pme.ProcessMemoryEnforcer(
+        engine_pool=_make_pool_stub(),
+        memory_guard_tier=tier,
+        memory_guard_custom_ceiling_gb=custom_gb,
+        **kwargs,
+    )
+    enforcer._get_static_ceiling = lambda: STATIC_CEILING
+    return enforcer
+
+
+class TestVideoLeaseCeiling:
+    def test_acquire_reduces_final_ceiling_by_lease(self, monkeypatch, wired_calls):
+        enforcer = _make_enforcer(monkeypatch)
+        base = enforcer.get_final_ceiling()
+        assert base == int(CUSTOM_GB * GB)
+
+        enforcer.acquire_video_lease(8 * GB)
+        assert enforcer.get_final_ceiling() == base - 8 * GB
+
+    def test_huge_lease_clamps_ceiling_to_one(self, monkeypatch, wired_calls):
+        enforcer = _make_enforcer(monkeypatch)
+        enforcer.acquire_video_lease(1024 * GB)
+        # Never 0: consumers treat ceiling 0 as "guard disabled".
+        assert enforcer.get_final_ceiling() == 1
+
+    def test_lease_equal_to_ceiling_clamps_to_one(self, monkeypatch, wired_calls):
+        enforcer = _make_enforcer(monkeypatch)
+        enforcer.acquire_video_lease(int(CUSTOM_GB * GB))
+        assert enforcer.get_final_ceiling() == 1
+
+    def test_double_acquire_raises_runtime_error(self, monkeypatch, wired_calls):
+        enforcer = _make_enforcer(monkeypatch)
+        enforcer.acquire_video_lease(8 * GB)
+        with pytest.raises(RuntimeError):
+            enforcer.acquire_video_lease(1 * GB)
+
+    def test_non_positive_lease_raises_value_error(self, monkeypatch, wired_calls):
+        enforcer = _make_enforcer(monkeypatch)
+        with pytest.raises(ValueError):
+            enforcer.acquire_video_lease(0)
+        with pytest.raises(ValueError):
+            enforcer.acquire_video_lease(-5)
+        # Failed acquires must not leave a partial lease behind.
+        assert enforcer.video_lease_bytes == 0
+
+    def test_release_restores_ceiling(self, monkeypatch, wired_calls):
+        enforcer = _make_enforcer(monkeypatch)
+        base = enforcer.get_final_ceiling()
+        enforcer.acquire_video_lease(8 * GB)
+        assert enforcer.get_final_ceiling() == base - 8 * GB
+        enforcer.release_video_lease()
+        assert enforcer.get_final_ceiling() == base
+
+    def test_release_when_not_held_is_noop(self, monkeypatch, wired_calls):
+        enforcer = _make_enforcer(monkeypatch)
+        before = list(wired_calls)
+        enforcer.release_video_lease()  # must not raise
+        assert enforcer.video_lease_bytes == 0
+        # Early return: no Metal wired-limit churn either.
+        assert wired_calls == before
+
+    def test_video_lease_bytes_property_tracks(self, monkeypatch, wired_calls):
+        enforcer = _make_enforcer(monkeypatch)
+        assert enforcer.video_lease_bytes == 0
+        enforcer.acquire_video_lease(8 * GB)
+        assert enforcer.video_lease_bytes == 8 * GB
+        enforcer.release_video_lease()
+        assert enforcer.video_lease_bytes == 0
+
+    def test_release_clears_worker_pid(self, monkeypatch, wired_calls):
+        enforcer = _make_enforcer(monkeypatch)
+        enforcer.acquire_video_lease(8 * GB)
+        enforcer.set_video_worker_pid(12345)
+        assert enforcer._video_worker_pid == 12345
+        enforcer.release_video_lease()
+        assert enforcer._video_worker_pid is None
+
+    def test_guard_disabled_ceiling_stays_zero(self, monkeypatch, wired_calls):
+        """Guard off: ceiling is 0 (= disabled) and acquire skips Metal calls."""
+        enforcer = _make_enforcer(monkeypatch, prefill_memory_guard=False)
+        assert enforcer.get_final_ceiling() == 0
+        enforcer.acquire_video_lease(8 * GB)
+        assert enforcer.get_final_ceiling() == 0
+        assert wired_calls == []
+
+
+class TestVideoLeaseWiredLimit:
+    def test_acquire_and_release_move_wired_limit_request(
+        self, monkeypatch, wired_calls
+    ):
+        enforcer = _make_enforcer(monkeypatch)
+        assert enforcer._metal_wired_limit_request == 0
+        assert wired_calls == []
+
+        enforcer.acquire_video_lease(8 * GB)
+        assert wired_calls[-1] == STATIC_CEILING - 8 * GB
+        assert enforcer._metal_wired_limit_request == STATIC_CEILING - 8 * GB
+
+        enforcer.release_video_lease()
+        assert wired_calls[-1] == STATIC_CEILING
+        assert enforcer._metal_wired_limit_request == STATIC_CEILING
+
+    def test_oversized_lease_clamps_wired_target_to_one(
+        self, monkeypatch, wired_calls
+    ):
+        enforcer = _make_enforcer(monkeypatch)
+        enforcer.acquire_video_lease(STATIC_CEILING + 5 * GB)
+        assert wired_calls[-1] == 1
+        assert enforcer._metal_wired_limit_request == 1
+
+
+class TestDynamicCeilingWorkerAddBack:
+    """Non-custom tier: dynamic ceiling adds back min(worker_footprint, lease).
+
+    Inputs are fully mocked: get_macos_vm_stats returns fixed numbers and
+    get_phys_footprint is a per-pid fake. balanced tier -> active ratio 0.5,
+    so the base dynamic ceiling is own + free + inactive + active * 0.5.
+    """
+
+    OWN = 5 * GB
+    WORKER_PID = 4242
+    VM = {"free": 10 * GB, "inactive": 4 * GB, "active": 8 * GB, "wired": 0}
+    # own 5 + free 10 + inactive 4 + active 8 * 0.5 = 23 GB
+    BASE = 23 * GB
+
+    def _setup(self, monkeypatch, worker_footprint):
+        monkeypatch.setattr(pme, "get_macos_vm_stats", lambda: dict(self.VM))
+
+        def fake_phys(pid=None):
+            if pid is None:
+                return self.OWN
+            if pid == self.WORKER_PID:
+                return worker_footprint
+            return 0
+
+        monkeypatch.setattr(pme, "get_phys_footprint", fake_phys)
+        return _make_enforcer(monkeypatch, tier="balanced")
+
+    def test_no_pid_no_add_back(self, monkeypatch, wired_calls):
+        enforcer = self._setup(monkeypatch, worker_footprint=3 * GB)
+        enforcer.acquire_video_lease(8 * GB)
+        # Lease held but no worker pid bound yet (pre-spawn): add-back 0.
+        assert enforcer._get_dynamic_ceiling() == self.BASE
+
+    def test_add_back_equals_worker_footprint_under_lease(
+        self, monkeypatch, wired_calls
+    ):
+        enforcer = self._setup(monkeypatch, worker_footprint=3 * GB)
+        enforcer.acquire_video_lease(8 * GB)
+        enforcer.set_video_worker_pid(self.WORKER_PID)
+        assert enforcer._get_dynamic_ceiling() == self.BASE + 3 * GB
+
+    def test_add_back_clamped_to_lease(self, monkeypatch, wired_calls):
+        # Runaway worker (50 GB footprint) must not raise the parent
+        # ceiling beyond the 8 GB lease.
+        enforcer = self._setup(monkeypatch, worker_footprint=50 * GB)
+        enforcer.acquire_video_lease(8 * GB)
+        enforcer.set_video_worker_pid(self.WORKER_PID)
+        assert enforcer._get_dynamic_ceiling() == self.BASE + 8 * GB
+
+    def test_zero_footprint_read_no_add_back(self, monkeypatch, wired_calls):
+        # Footprint read failure (0) degrades to double-counting, which
+        # is fail-conservative.
+        enforcer = self._setup(monkeypatch, worker_footprint=0)
+        enforcer.acquire_video_lease(8 * GB)
+        enforcer.set_video_worker_pid(self.WORKER_PID)
+        assert enforcer._get_dynamic_ceiling() == self.BASE
+
+    def test_no_lease_no_add_back_even_with_pid(self, monkeypatch, wired_calls):
+        enforcer = self._setup(monkeypatch, worker_footprint=3 * GB)
+        enforcer.set_video_worker_pid(self.WORKER_PID)
+        assert enforcer._get_dynamic_ceiling() == self.BASE
+
+    def test_lease_then_release_round_trip_final_ceiling(
+        self, monkeypatch, wired_calls
+    ):
+        """End to end on a non-custom tier: final ceiling tightens by the
+        lease minus the worker add-back, then restores after release."""
+        enforcer = self._setup(monkeypatch, worker_footprint=3 * GB)
+        base = enforcer.get_final_ceiling()
+        assert base == self.BASE  # min(static 100 GB, dynamic 23 GB)
+
+        enforcer.acquire_video_lease(8 * GB)
+        enforcer.set_video_worker_pid(self.WORKER_PID)
+        # dynamic = BASE + 3 GB add-back; final = dynamic - 8 GB lease.
+        assert enforcer.get_final_ceiling() == self.BASE + 3 * GB - 8 * GB
+
+        enforcer.release_video_lease()
+        assert enforcer.get_final_ceiling() == base
diff --git a/tests/test_video_routes.py b/tests/test_video_routes.py
new file mode 100644
index 000000000..26bffff65
--- /dev/null
+++ b/tests/test_video_routes.py
@@ -0,0 +1,608 @@
+# SPDX-License-Identifier: Apache-2.0
+"""Tests for the /v1/videos API routes (omlx/api/video_routes.py).
+
+A minimal FastAPI app mounts the video router; the module-level accessors
+(_get_video_manager / _get_engine_pool / _resolve_model) are monkeypatched.
+get/list/delete semantics run against a REAL VideoJobManager constructed on
+tmp_path with enforcer=None; only submit and the guard/venv probes are
+stubbed per test. create_video also reads omlx.server._server_state
+.global_settings.video inside the handler, so a settings stub is patched
+onto the real ServerState instance (monkeypatch restores it afterwards).
+
+No real model dirs, no ~/.fmlx, no worker subprocess is ever spawned.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from types import SimpleNamespace
+
+import pytest
+from fastapi import FastAPI
+from fastapi.testclient import TestClient
+
+import omlx.api.video_routes as video_routes
+import omlx.server as omlx_server
+from omlx.settings import VideoSettings
+from omlx.video.manager import QueueFullError, VideoJob, VideoJobManager
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+VIDEO_MODEL = "wan-t2v"
+LLM_MODEL = "llama-llm"
+
+
+def _video_settings(**overrides) -> VideoSettings:
+    """Enabled settings; lease defaults to the dataclass default (36GB,
+    P0-calibrated to admit the caps corner under the spatial-token
+    predictor). Tests that exercise the 413 boundary pass a smaller
+    explicit lease."""
+    params = dict(enabled=True)
+    params.update(overrides)
+    return VideoSettings(**params)
+
+
+def _make_manager(
+    tmp_path: Path, settings: VideoSettings, stub_submit: bool = True
+) -> VideoJobManager:
+    """Real manager (real get/list_jobs/delete) with probe seams stubbed."""
+    manager = VideoJobManager(
+        settings=settings, base_path=tmp_path, enforcer=None
+    )
+    manager.guard_available = lambda: (True, "")  # type: ignore[method-assign]
+
+    async def _probe(force: bool = False):
+        return True, ""
+
+    manager.probe_worker_venv = _probe  # type: ignore[method-assign]
+
+    if stub_submit:
+        submitted: list[VideoJob] = []
+
+        async def _submit(job: VideoJob) -> VideoJob:
+            # Record without waking the real dispatcher (no admission loop,
+            # no subprocess)
+            manager._jobs[job.id] = job
+            submitted.append(job)
+            return job
+
+        manager.submit = _submit  # type: ignore[method-assign]
+        manager.test_submitted = submitted  # type: ignore[attr-defined]
+    return manager
+
+
+def _seed_job(
+    manager: VideoJobManager,
+    job_id: str,
+    created_at: float = 100.0,
+    status: str = "queued",
+    **kwargs,
+) -> VideoJob:
+    job = VideoJob(
+        id=job_id,
+        model_id=VIDEO_MODEL,
+        model_dir="/nonexistent/model-dir",
+        params={
+            "prompt": "a cat",
+            "width": 480,
+            "height": 272,
+            "frames": 49,
+            "steps": 20,
+            "fps": 16,
+            "seed": 7,
+            "seconds": 3.06,
+        },
+        status=status,
+        created_at=created_at,
+        **kwargs,
+    )
+    manager._jobs[job_id] = job
+    return job
+
+
+@pytest.fixture
+def video_env(monkeypatch, tmp_path):
+    """Builder returning (TestClient, manager) with accessors patched."""
+
+    def build(
+        settings: VideoSettings | None = None,
+        stub_submit: bool = True,
+        patch_manager_accessor: bool = True,
+    ):
+        vs = settings or _video_settings()
+        manager = _make_manager(tmp_path, vs, stub_submit=stub_submit)
+
+        entries = {
+            VIDEO_MODEL: SimpleNamespace(
+                model_path=tmp_path / "models" / "wan", model_type="video"
+            ),
+            LLM_MODEL: SimpleNamespace(
+                model_path=tmp_path / "models" / "llama", model_type="llm"
+            ),
+        }
+        pool = SimpleNamespace(get_entry=lambda mid: entries.get(mid))
+
+        if patch_manager_accessor:
+            monkeypatch.setattr(
+                video_routes, "_get_video_manager", lambda: manager
+            )
+        monkeypatch.setattr(video_routes, "_get_engine_pool", lambda: pool)
+        monkeypatch.setattr(video_routes, "_resolve_model", lambda m: m)
+        # create_video reads _server_state.global_settings.video directly
+        monkeypatch.setattr(
+            omlx_server._server_state,
+            "global_settings",
+            SimpleNamespace(video=vs),
+        )
+
+        app = FastAPI()
+        app.include_router(video_routes.router)
+        return TestClient(app), manager
+
+    return build
+
+
+def _post(client: TestClient, **fields):
+    body = {"model": VIDEO_MODEL, "prompt": "a cat"}
+    body.update(fields)
+    return client.post("/v1/videos", json=body)
+
+
+# ---------------------------------------------------------------------------
+# POST /v1/videos -- happy paths
+# ---------------------------------------------------------------------------
+
+
+class TestCreateVideo:
+    def test_post_json_happy_path(self, video_env):
+        client, manager = video_env()
+        r = _post(client, size="480x272", seconds=3)
+        assert r.status_code == 200
+        body = r.json()
+        assert body["id"].startswith("video_")
+        assert body["object"] == "video"
+        assert body["status"] == "queued"
+        assert body["model"] == VIDEO_MODEL
+        assert body["size"] == "480x272"
+        # seconds=3 * default_fps=16 = 48 frames -> 4n+1 -> 49
+        assert body["frames"] == 49
+        # Derived seconds string = round(49/16, 2)
+        assert body["seconds"] == "3.06"
+        assert body["progress"] == 0
+        assert body["error"] is None
+        # Job actually reached the manager
+        assert manager.get(body["id"]) is not None
+        assert len(manager.test_submitted) == 1
+
+    def test_post_multipart_all_string_fields(self, video_env):
+        """openai SDK shape: multipart/form-data, every field a string."""
+        client, manager = video_env()
+        r = client.post(
+            "/v1/videos",
+            data={
+                "model": VIDEO_MODEL,
+                "prompt": "a cat",
+                "seconds": "4",
+                "steps": "10",
+            },
+            # File part forces multipart encoding; non-str form values are
+            # filtered out by the handler
+            files={"input_reference": ("ref.png", b"\x89PNG", "image/png")},
+        )
+        assert r.status_code == 200
+        body = r.json()
+        assert body["status"] == "queued"
+        assert body["steps"] == 10
+        # "4" * fps 16 = 64 -> 4n+1 -> 65
+        assert body["frames"] == 65
+        assert body["seconds"] == str(round(65 / 16, 2))
+        # Defaults applied when size omitted
+        assert body["size"] == "480x272"
+
+    def test_seed_and_explicit_params_pass_through(self, video_env):
+        client, manager = video_env()
+        r = _post(client, width=480, height=272, frames=49, seed=1234, fps=8)
+        assert r.status_code == 200
+        body = r.json()
+        assert body["seed"] == 1234
+        assert body["fps"] == 8
+        job = manager.get(body["id"])
+        assert job.params["seed"] == 1234
+
+
+# ---------------------------------------------------------------------------
+# POST /v1/videos -- model resolution errors
+# ---------------------------------------------------------------------------
+
+
+class TestCreateVideoModelErrors:
+    def test_unknown_model_404(self, video_env):
+        client, _ = video_env()
+        r = _post(client, model="no-such-model")
+        assert r.status_code == 404
+        assert "not found" in r.json()["detail"]
+
+    def test_non_video_model_400(self, video_env):
+        client, _ = video_env()
+        r = _post(client, model=LLM_MODEL)
+        assert r.status_code == 400
+        detail = r.json()["detail"]
+        assert "not a video generation model" in detail
+        assert "model_type=llm" in detail
+
+    def test_missing_prompt_400(self, video_env):
+        client, _ = video_env()
+        r = client.post("/v1/videos", json={"model": VIDEO_MODEL})
+        assert r.status_code == 400
+
+    def test_malformed_body_400(self, video_env):
+        client, _ = video_env()
+        r = client.post(
+            "/v1/videos",
+            content=b"not json",
+            headers={"content-type": "application/json"},
+        )
+        assert r.status_code == 400
+        assert "Malformed request body" in r.json()["detail"]
+
+
+# ---------------------------------------------------------------------------
+# POST /v1/videos -- normalization
+# ---------------------------------------------------------------------------
+
+
+class TestNormalization:
+    def test_dimensions_round_up_to_multiple_of_16(self, video_env):
+        client, _ = video_env()
+        r = _post(client, width=470, height=270)
+        assert r.status_code == 200
+        assert r.json()["size"] == "480x272"
+
+    def test_frames_from_seconds_times_fps(self, video_env):
+        client, _ = video_env()
+        r = _post(client, seconds=3, fps=16)
+        assert r.status_code == 200
+        assert r.json()["frames"] == 49  # round(3*16)=48 -> 4n+1 -> 49
+
+    def test_explicit_frames_rounded_to_4n_plus_1(self, video_env):
+        client, _ = video_env()
+        r = _post(client, frames=50)
+        assert r.status_code == 200
+        body = r.json()
+        assert body["frames"] == 53  # 4*ceil(49/4)+1
+        assert body["seconds"] == str(round(53 / 16, 2))
+
+    def test_invalid_size_string_400(self, video_env):
+        client, _ = video_env()
+        r = _post(client, size="480by272")
+        assert r.status_code == 400
+        assert "Invalid size" in r.json()["detail"]
+
+    def test_nonpositive_seconds_400(self, video_env):
+        client, _ = video_env()
+        r = _post(client, seconds=0)
+        assert r.status_code == 400
+        assert "seconds must be positive" in r.json()["detail"]
+
+
+# ---------------------------------------------------------------------------
+# POST /v1/videos -- static caps (400) and peak predictor (413)
+# ---------------------------------------------------------------------------
+
+
+class TestCapsAndPredictor:
+    def test_steps_over_max_400(self, video_env):
+        client, _ = video_env()  # default max_steps=50
+        r = _post(client, steps=51)
+        assert r.status_code == 400
+        assert "max_steps" in r.json()["detail"]
+
+    def test_pixels_over_max_400(self, video_env):
+        client, _ = video_env()  # default cap 1280*720
+        r = _post(client, width=1280, height=736)
+        assert r.status_code == 400
+        assert "max_pixels_per_frame" in r.json()["detail"]
+
+    def test_frames_over_max_400(self, video_env):
+        client, _ = video_env()  # default max_frames=121
+        r = _post(client, frames=125)
+        assert r.status_code == 400
+        assert "max_frames" in r.json()["detail"]
+
+    def test_peak_predictor_413_when_over_lease(self, video_env):
+        # P0-calibrated formula: predicted = 17.5 + 0.0029 * (W/16 * H/16),
+        # frame-count-invariant. 1280x720 -> 3600 spatial tokens ->
+        # 17.5 + 10.44 = 27.94GB, +6 margin = 33.94 > lease 30 -> 413
+        client, _ = video_env(settings=_video_settings(memory_lease_gb=30.0))
+        r = _post(client, width=1280, height=720, frames=81)
+        assert r.status_code == 413
+        detail = r.json()["detail"]
+        assert "memory_lease_gb" in detail
+        assert "Predicted memory peak" in detail
+
+    def test_peak_predictor_small_request_fits_same_lease(self, video_env):
+        # Same 30GB lease: 480x272 -> 510 tokens -> 17.5 + 1.48 = 18.98GB,
+        # +6 margin = 24.98 < 30 -> ok
+        client, _ = video_env(settings=_video_settings(memory_lease_gb=30.0))
+        r = _post(client, width=480, height=272, frames=49)
+        assert r.status_code == 200
+
+    def test_peak_predictor_frame_count_invariant(self, video_env):
+        # Frames do not enter the memory formula (P0: 49f == 101f peaks);
+        # a long video at modest resolution must NOT 413.
+        client, _ = video_env(settings=_video_settings(memory_lease_gb=30.0))
+        r = _post(client, width=480, height=272, frames=121)
+        assert r.status_code == 200
+
+    def test_default_lease_admits_cap_corner(self, video_env):
+        # Out-of-the-box settings must admit the caps corner (the v1 bug:
+        # default lease below the predictor floor 413'd everything).
+        client, _ = video_env(settings=_video_settings())
+        r = _post(client, width=1280, height=720)
+        assert r.status_code == 200
+
+
+# ---------------------------------------------------------------------------
+# POST /v1/videos -- 503 gates
+# ---------------------------------------------------------------------------
+
+
+class TestServiceGates:
+    def test_queue_full_503(self, video_env):
+        # Real submit with max_queued_jobs=0 raises QueueFullError before
+        # the dispatcher would start
+        client, _ = video_env(
+            settings=_video_settings(max_queued_jobs=0), stub_submit=False
+        )
+        r = _post(client)
+        assert r.status_code == 503
+        assert "queue is full" in r.json()["detail"].lower()
+
+    def test_queue_full_error_importable_and_raised_by_submit(
+        self, video_env
+    ):
+        _, manager = video_env(
+            settings=_video_settings(max_queued_jobs=0), stub_submit=False
+        )
+        job = VideoJob(id="video_x", model_id="m", model_dir="d", params={})
+        import asyncio
+
+        with pytest.raises(QueueFullError):
+            asyncio.run(manager.submit(job))
+
+    def test_guard_unavailable_503(self, video_env):
+        client, manager = video_env()
+        manager.guard_available = lambda: (False, "guard is not running")
+        r = _post(client)
+        assert r.status_code == 503
+        assert r.json()["detail"] == "guard is not running"
+
+    def test_venv_probe_failure_503(self, video_env):
+        client, manager = video_env()
+
+        async def _probe(force: bool = False):
+            return False, "Video worker python not found at /x"
+
+        manager.probe_worker_venv = _probe
+        r = _post(client)
+        assert r.status_code == 503
+        assert "worker python not found" in r.json()["detail"]
+
+    def test_video_disabled_503(self, video_env, monkeypatch):
+        # Do NOT patch _get_video_manager: the real accessor must gate on
+        # settings.video.enabled via _server_state.global_settings
+        client, _ = video_env(
+            settings=_video_settings(enabled=False),
+            patch_manager_accessor=False,
+        )
+        monkeypatch.setattr(
+            omlx_server._server_state, "video_job_manager", None
+        )
+        r = _post(client)
+        assert r.status_code == 503
+        assert "disabled" in r.json()["detail"]
+        # Every endpoint shares the gate
+        assert client.get("/v1/videos").status_code == 503
+        assert client.get("/v1/videos/video_x").status_code == 503
+        assert client.delete("/v1/videos/video_x").status_code == 503
+
+    def test_manager_missing_503(self, video_env, monkeypatch):
+        # Enabled but lifespan never built the manager -> 503
+        client, _ = video_env(patch_manager_accessor=False)
+        monkeypatch.setattr(
+            omlx_server._server_state, "video_job_manager", None
+        )
+        r = _post(client)
+        assert r.status_code == 503
+        assert "not initialized" in r.json()["detail"]
+
+
+# ---------------------------------------------------------------------------
+# GET /v1/videos/{id}
+# ---------------------------------------------------------------------------
+
+
+class TestGetVideo:
+    def test_get_unknown_404(self, video_env):
+        client, _ = video_env()
+        r = client.get("/v1/videos/video_doesnotexist")
+        assert r.status_code == 404
+
+    def test_get_known_returns_wire_shape(self, video_env):
+        client, manager = video_env()
+        job = _seed_job(manager, "video_aaa", status="in_progress")
+        job.progress = 42
+        job.phase = "denoising"
+        r = client.get("/v1/videos/video_aaa")
+        assert r.status_code == 200
+        assert r.json() == job.to_dict()
+        body = r.json()
+        assert body["object"] == "video"
+        assert body["status"] == "in_progress"
+        assert body["progress"] == 42
+        assert body["phase"] == "denoising"
+        assert body["size"] == "480x272"
+
+
+# ---------------------------------------------------------------------------
+# GET /v1/videos/{id}/content
+# ---------------------------------------------------------------------------
+
+
+class TestGetContent:
+    def test_content_not_completed_409(self, video_env):
+        client, manager = video_env()
+        _seed_job(manager, "video_q", status="queued")
+        r = client.get("/v1/videos/video_q/content")
+        assert r.status_code == 409
+        assert "queued" in r.json()["detail"]
+
+    def test_content_unknown_404(self, video_env):
+        client, _ = video_env()
+        assert client.get("/v1/videos/video_nope/content").status_code == 404
+
+    def test_content_artifact_expired_404_detail_dict(self, video_env):
+        client, manager = video_env()
+        job = _seed_job(manager, "video_purged", status="completed")
+        job.artifact_path = None
+        job.expires_at = 1750000000.5
+        r = client.get("/v1/videos/video_purged/content")
+        assert r.status_code == 404
+        detail = r.json()["detail"]
+        assert isinstance(detail, dict)
+        assert detail["code"] == "artifact_expired"
+        assert detail["expires_at"] == 1750000000
+        assert "purged" in detail["message"]
+
+    def test_content_completed_serves_mp4(self, video_env, tmp_path):
+        client, manager = video_env()
+        payload = b"\x00\x00\x00\x18ftypmp42" + b"\x00" * 64
+        mp4 = tmp_path / "out.mp4"
+        mp4.write_bytes(payload)
+        job = _seed_job(manager, "video_done", status="completed")
+        job.artifact_path = str(mp4)
+        r = client.get("/v1/videos/video_done/content")
+        assert r.status_code == 200
+        assert r.headers["content-type"].startswith("video/mp4")
+        assert r.content == payload
+        assert "video_done.mp4" in r.headers.get("content-disposition", "")
+
+
+# ---------------------------------------------------------------------------
+# DELETE /v1/videos/{id}
+# ---------------------------------------------------------------------------
+
+
+class TestDeleteVideo:
+    def test_delete_known(self, video_env):
+        client, manager = video_env()
+        _seed_job(manager, "video_del")
+        r = client.delete("/v1/videos/video_del")
+        assert r.status_code == 200
+        assert r.json() == {
+            "id": "video_del",
+            "object": "video.deleted",
+            "deleted": True,
+        }
+        # Record is gone afterwards
+        assert manager.get("video_del") is None
+        assert client.get("/v1/videos/video_del").status_code == 404
+
+    def test_delete_unknown_404(self, video_env):
+        client, _ = video_env()
+        assert client.delete("/v1/videos/video_nope").status_code == 404
+
+
+# ---------------------------------------------------------------------------
+# GET /v1/videos -- list envelope + pagination (real list_jobs semantics)
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture
+def listing_env(video_env):
+    client, manager = video_env()
+    _seed_job(manager, "video_a", created_at=100.0)
+    _seed_job(manager, "video_b", created_at=200.0)
+    _seed_job(manager, "video_c", created_at=300.0)
+    return client, manager
+
+
+class TestListVideos:
+    def test_envelope_default_desc(self, listing_env):
+        client, _ = listing_env
+        r = client.get("/v1/videos")
+        assert r.status_code == 200
+        body = r.json()
+        assert body["object"] == "list"
+        assert [j["id"] for j in body["data"]] == [
+            "video_c", "video_b", "video_a",
+        ]
+        assert body["has_more"] is False
+        assert body["first_id"] == "video_c"
+        assert body["last_id"] == "video_a"
+
+    def test_limit_and_has_more(self, listing_env):
+        client, _ = listing_env
+        r = client.get("/v1/videos", params={"limit": 2})
+        body = r.json()
+        assert [j["id"] for j in body["data"]] == ["video_c", "video_b"]
+        assert body["has_more"] is True
+        assert body["first_id"] == "video_c"
+        assert body["last_id"] == "video_b"
+
+    def test_after_cursor(self, listing_env):
+        client, _ = listing_env
+        r = client.get("/v1/videos", params={"after": "video_c"})
+        body = r.json()
+        assert [j["id"] for j in body["data"]] == ["video_b", "video_a"]
+        assert body["has_more"] is False
+
+    def test_after_cursor_with_limit(self, listing_env):
+        client, _ = listing_env
+        r = client.get("/v1/videos", params={"after": "video_c", "limit": 1})
+        body = r.json()
+        assert [j["id"] for j in body["data"]] == ["video_b"]
+        assert body["has_more"] is True
+
+    def test_order_asc(self, listing_env):
+        client, _ = listing_env
+        r = client.get("/v1/videos", params={"order": "asc"})
+        body = r.json()
+        assert [j["id"] for j in body["data"]] == [
+            "video_a", "video_b", "video_c",
+        ]
+
+    def test_bad_order_400(self, listing_env):
+        client, _ = listing_env
+        assert client.get(
+            "/v1/videos", params={"order": "sideways"}
+        ).status_code == 400
+
+    def test_limit_clamped_to_minimum_1(self, listing_env):
+        client, _ = listing_env
+        r = client.get("/v1/videos", params={"limit": 0})
+        body = r.json()
+        assert len(body["data"]) == 1
+        assert body["has_more"] is True
+
+    def test_unknown_after_cursor_ignored(self, listing_env):
+        # Manager semantics: unknown cursor falls through to the full list
+        client, _ = listing_env
+        r = client.get("/v1/videos", params={"after": "video_ghost"})
+        body = r.json()
+        assert len(body["data"]) == 3
+
+    def test_empty_list_envelope(self, video_env):
+        client, _ = video_env()
+        r = client.get("/v1/videos")
+        body = r.json()
+        assert body == {
+            "object": "list",
+            "data": [],
+            "has_more": False,
+            "first_id": None,
+            "last_id": None,
+        }