JPEG-LM autoregressively generates image file bytes like language. Data preprocessing and inference of JPEG-LM is very simple and can be done with run.py
.
Example command: python run.py --query_vllm_server "local" --prefix_ratio 0.375 --temp 1.0 --topp 0.9 --topk 50 --test_image_path 'example_image_input/*.png' --repeat_generation 10 --seed 42 --output_dir "out"
.
Note that we use pillow==10.2.0
(lower versions won't work). torch (2.1.2)
, transformers (4.38.2)
, and vllm (0.3.3)
should be installed as well. Different sampling hyperparameters can also be tried further (e.g., removing top-p for landscape images).