JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
- 2025-10-07: Initial release of our technical report, code, data samples, and 🌐 Project Page. Check it out! 🚀
This release represents the public implementation; the full implementation and data will be made available after internal company policy requirements are met.
JanusCoder is a suite of open models that establish a unified visual–programmatic interface for multimodal code intelligence. The models (JanusCoder and JanusCoderV) handle text-centric and vision-centric tasks in a single framework—from chart-to-code and web UI generation/editing to dynamic theorem visualizations—and show strong results across public benchmarks, approaching or even surpassing proprietary systems.
Note
Due to company policy, we need some additional time to release all datasets and checkpoints. If you require access to more data, please feel free to send [email protected] an email.
We provide a versatile data synthesis toolkit that generates multimodal code data across heterogeneous domains—ranging from charts and Web UIs to visual artifacts and code-driven animations—while greatly reducing engineering efforts for large-scale corpus creation. 
Since the process of building JanusCode data involves a variety of synthesis pipelines, we provide a few examples here:
Extend, refine and derive new text-centric data for chart tasks
python data_synthesis/viscode_extend_synthesis_pipeline.py \
--input raw_data/viscode \
--output processed/viscode_extended
--output processed/mathematica_extendedExtend and derive new text-centric data for visual editing tasks
python data_synthesis/viscode_edit_synthesis_pipeline.py \
--input processed/viscode_extended \
--output processed/viscode_edited
--output processed/mathematica_extendedBuild data for generating dynamic animations with Manim
python data_synthesis/recontext_manim_data.py \
--input raw_data/manim \
--output processed/manim_recontext
--output processed/mathematica_extendedExtend scientific visualizations with Mathematica
python data_synthesis/mathematica_extend_synthesis_pipeline.py \
--input raw_data/mathematica \
--output processed/mathematica_extendedMore scripts will be released soon.
Data Samples:
- We provide text-centric data samples at this link.
- We provide vision-centric data samples at this link.
We primarily follow the official training pipelines provided. Users can directly refer to the linked repositories for detailed instructions on SFT.
| Our Model | Upstream Base | Training Pipelines |
|---|---|---|
| JanusCoder-8B | Qwen/Qwen3-8B | Qwen3 GitHub |
| JanusCoder-14B | Qwen/Qwen3-14B | Qwen3 GitHub |
| JanusCoderV-7B | Qwen/Qwen2.5-VL-7B-Instruct | Qwen-VL GitHub |
| JanusCoderV-8B | OpenGVLab/InternVL3_5-8B | InternVL GitHub |
We also provide some typical training configuration file for llamafactory users in training_files.
All our experiments were conducted on interconnected 8× H800 GPUs.
We provide several ready-to-use scripts to quickly reproduce our experimental results. You can replace them with other scripts under the evaluation directory to evaluate different tasks, for example:
bash DesignBench/scripts/designbench_vllm-januscoderv.shbash evaluation/ArtifactBench/artifactbench-januscoder.shbash evaluation/InteractScience/interactscience-januscoderv.shFor evaluations on LiveCodeBench-v6, MBPP+: We directly adopt the evaluation scripts provided by OpenCompass.
For evaluations on Artifactbench and Interactscience can be run the instructions in their original repositories. You can fork/clone the original repos to reuse their official environments and configs.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
If you are interested in our work or find this repository / our data helpful, please consider using the following citation format when referencing our paper:
@article{sun2025januscoder,
title={JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence},
author={Sun, Qiushi and Gong, Jingyang and Liu, Yang and Chen, Qiaosheng and Li, Lei and Chen, Kai and Guo, Qipeng and Kao, Ben and Yuan, Fei},
journal={Preprint},
year={2025}
}
