Skip to content

p-doom/inverse-dynamics-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inverse Dynamics Model

Train an inverse dynamics model that predicts per-frame actions from screen-recording sequences.

Install

uv sync
# optional test deps
uv sync --extra dev
uv run pre-commit install

Data

For screen-recording data, we use crowd-cast. The data is expected in the following form:

crowd_cast_root/
  .../<user_id>/recordings/recording_<session-id>_seg####.mp4
  .../<user_id>/keylogs/input_<session-id>_seg####.msgpack

Preprocess the crowd-cast data into ArrayRecord format for IDM training:

uv run python -m data.video_to_array_records \
  --input-path /path/to/crowd_cast_root \
  --output-path /path/to/idm_data \
  --target-width 160 \
  --target-height 90 \
  --target-fps 10 \
  --top-bar-fraction 0.15 \
  --black-ratio 0.95 \
  --chunk-size 160 \
  --chunks-per-file 100 \
  --num-workers 16

The generated data directory looks like:

idm_data/
  metadata.json
  train/*.array_record
  val/*.array_record
  test/*.array_record

Lumine-inspired action format (per frame, see Lumine):

  • NO_OP
  • MOUSE:dx,dy,dz
  • MOUSE:dx,dy,dz ; <pressed_keys>

dx,dy,dz are per-frame relative mouse deltas (quantized/clipped during preprocessing), and pressed keys are appended as a sorted space-separated list when present.

Train

Single GPU (baseline):

torchrun --nproc_per_node=1 -m idm.train \
  --model-id Qwen/Qwen3-VL-2B-Instruct \
  --data-root /path/to/idm_data \
  --image-h 90 --image-w 160 --image-c 3 \
  --seq-len 32 \
  --global-batch-size 8 \
  --grad-accum 1 \
  --max-steps 3000 \
  --lr 2e-5 \
  --lr-schedule wsd \
  --warmup-steps 200 \
  --wsd-decay-steps 600 \
  --precision bf16 \
  --use-lora True \
  --wandb-enable True \
  --wandb-project idm \
  --wandb-run-name idm_qwen2b_baseline \
  --out-dir ./runs/idm_qwen2b

If you are not using wandb, set --wandb-enable False.

Multi-GPU (example: 8 GPUs):

torchrun --nproc_per_node=8 -m idm.train \
  --data-root /path/to/idm_data \
  --global-batch-size 64 \
  --out-dir ./runs/idm_8gpu \
  --wandb-enable False

Resume:

torchrun --nproc_per_node=8 -m idm.train --data-root /path/to/idm_data --resume-from latest

Checkpoints are written under out_dir/checkpoints/.

Test

uv run pytest tests/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors