Releases: xorbitsai/inference
v1.3.1.post1
What's new in 1.3.1.post1 (2025-03-11)
These are the changes in inference v1.3.1.post1.
Bug fixes
- BUG: Fix reasoning content parser for qwq-32b by @amumu96 in #3024
- BUG: Failed to download model 'QwQ-32B' (size: 32, format: ggufv2) after multiple retries by @Jun-Howie in #3031
Documentation
Full Changelog: v1.3.1...v1.3.1.post1
v1.3.1
What's new in 1.3.1 (2025-03-09)
These are the changes in inference v1.3.1.
New features
- FEAT: Support qwen2.5-instruct-1m by @Jun-Howie in #2928
- FEAT: Support moonlight-16b-a3b by @Jun-Howie in #2963
- FEAT: create_embedding add field model_replica by @zhoudelong in #2779
- FEAT: [UI] add the reasoning_content parameter. by @yiboyasss in #2980
- FEAT: Support QwQ-32B by @cyhasuka in #3005
- FEAT: all engine support reasoning_content by @amumu96 in #3013
Enhancements
- ENH: InternVL2.5-MPO by @Minamiyama in #2913
- ENH: [UI] add copy button by @Minamiyama in #2920
- ENH: [UI] add model ability filtering feature to the audio model. by @yiboyasss in #2986
- ENH: Support xllamacpp by @codingl2k1 in #2997
- BLD: Install ffmpeg 6 for audio & video models by @phuchoang2603 in #2946
- BLD: fix ffprobe library not imported by @phuchoang2603 in #2971
- BLD: fix docker requirements for sglang by @qinxuye in #3015
- REF: [UI] move featureModels to data.js by @yiboyasss in #3008
Bug fixes
- BUG: fix qwen2.5-vl-7b cannot chat bug by @amumu96 in #2944
- BUG: Fix modelscope model id on Qwen2.5-VL Added support for AWQ quantization format in Qwen2.5-VL by @Jun-Howie in #2943
- BUG: fix Error while using Langchain-chatchat, because the parameter [max_tokens] passed is None by @William533036 in #2962
- BUG: using jina-clip-v2, no attribute error when only text of image pass in by @Minamiyama in #2974
- BUG: fix compatibility of mlx-lm v0.21.5 by @qinxuye in #2993
- BUG: Fix tokenizer error in create_embedding by @shuaiqidezhong in #2992
- BUG: wrong kwargs passing to encode method when using jina-clip-v2 by @Minamiyama in #2991
- BUG: [UI] fix the white screen bug. by @yiboyasss in #3014
New Contributors
- @phuchoang2603 made their first contribution in #2946
- @William533036 made their first contribution in #2962
- @zhoudelong made their first contribution in #2779
Full Changelog: v1.3.0.post2...v1.3.1
v1.3.0.post2
What's new in 1.3.0.post2 (2025-02-22)
These are the changes in inference v1.3.0.post2.
Bug fixes
Full Changelog: v1.3.0.post1...v1.3.0.post2
v1.3.0.post1
What's new in 1.3.0.post1 (2025-02-21)
These are the changes in inference v1.3.0.post1.
New features
- FEAT: Support qwen-2.5-instruct-1m by @Jun-Howie in #2841
- FEAT: support deepseek-v3 and deepseek-r1 by @qinxuye in #2864
- FEAT: [UI] additional parameter tip function. by @yiboyasss in #2876
- FEAT: [UI] add featured models filtering function. by @yiboyasss in #2871
- FEAT: [UI] support form parameters and command line conversion. by @yiboyasss in #2850
- FEAT: support distributed inference for sglang by @qinxuye in #2877
- FEAT: [UI] add n_worker parameter for model launch. by @yiboyasss in #2889
- FEAT: InternVL 2.5 by @Minamiyama in #2776
- FEAT: support vllm reasoning content by @amumu96 in #2905
Enhancements
- enh: add gpu utilization info by @amumu96 in #2852
- ENH: Update Kokoro model by @codingl2k1 in #2843
- ENH: cmdline supports --n-worker, add --model-path and make it compatible with --model_path by @qinxuye in #2890
- BLD: update sglang to v0.4.2.post4 and vllm to v0.7.2 by @qinxuye in #2838
- BLD: fix flashinfer installation in dockerfile by @qinxuye in #2844
Bug fixes
- BUG: Fix whisper CI by @codingl2k1 in #2822
- BUG: fix FLUX when a scheduler is specified which is incompatible. by @shuaiqidezhong in #2897
- BUG: [UI] fix the bug of missing hint during model running. by @yiboyasss in #2904
- BUG: Clear dependency by @codingl2k1 in #2910
Tests
- TST: Pin CI transformers<4.49 by @codingl2k1 in #2883
- TST: fix lint error by @amumu96 in #2911
Documentation
Others
- CHORE: Xavier now supports
vLLM >= 0.7.0
, drops support for older versions by @ChengjieLi28 in #2886
New Contributors
- @shuaiqidezhong made their first contribution in #2897
Full Changelog: v1.2.2...v1.3.0.post1
v1.2.2
What's new in 1.2.2 (2025-02-08)
These are the changes in inference v1.2.2.
New features
- FEAT: support qwen2.5-vl-instruct by @qinxuye in #2788
- FEAT: Support internlm3 by @Jun-Howie in #2789
- FEAT: support deepseek-r1-distill-llama by @qinxuye in #2811
- FEAT: Support Kokoro-82M by @codingl2k1 in #2790
- FEAT: vllm support for qwen2.5-vl-instruct by @qinxuye in #2821
Bug fixes
- BUG: fix llama-cpp when some quantizations have multiple parts by @qinxuye in #2786
- BUG: Use
Cache
class instead of rawtuple
for transformers continuous batching, compatible with latesttransformers
by @ChengjieLi28 in #2820
Documentation
- DOC: Update multimodal doc by @codingl2k1 in #2785
- DOC: update model docs by @qinxuye in #2792
- DOC: fix docs by @qinxuye in #2793
- DOC: Fix a couple of typos by @Paleski in #2817
New Contributors
Full Changelog: v1.2.1...v1.2.2
v1.2.1
What's new in 1.2.1 (2025-01-24)
These are the changes in inference v1.2.1.
New features
- FEAT: Support MeloTTS by @codingl2k1 in #2760
- FEAT: support deepseek-r1-distill-qwen by @qinxuye in #2781
Enhancements
- ENH: add model config for Whisper by @fonsc in #2755
- ENH: support cline style messages for all backend engines by @liunux4odoo in #2763
- ENH: CosyVoice2 support SFT speakers by @codingl2k1 in #2770
- ENH: Some improvements for Xavier by @ChengjieLi28 in #2777
Bug fixes
- BUG: Compat with openai extra body by @codingl2k1 in #2759
Tests
Documentation
- DOC: update new models in README and doc by @qinxuye in #2761
- DOC: using discord instead of slack & updating model to qwen2.5 in getting started doc by @qinxuye in #2775
Others
- FIX: [UI] normalize language input to ensure consistent array format. by @yiboyasss in #2771
New Contributors
Full Changelog: v1.2.0...v1.2.1
v1.2.0
What's new in 1.2.0 (2025-01-10)
These are the changes in inference v1.2.0.
New features
- FEAT: support HunyuanVideo by @qinxuye in #2721
- FEAT: support hunyuan-dit text2image by @qinxuye in #2727
- FEAT: support cline for vllm engine by @hwzhuhao in #2734
- FEAT: [UI] theme switch by @Minamiyama in #1335
- FEAT: support qwen2vl run on ascend npu by @Xu-pixel in #2741
- FEAT: [UI] Add language toggle for i18n support. by @yiboyasss in #2744
- FEAT: Support cogagent-9b by @amumu96 in #2740
- FEAT: Xavier: Share KV cache between VLLM replicas by @ChengjieLi28 in #2732
- FEAT: [UI] Add gguf_quantization, gguf_model_path, and cpu_offload for image models. by @yiboyasss in #2753
- FEAT: Support Marco-o1 by @Jun-Howie in #2749
Enhancements
- ENH: [UI] Update Button Style and Interaction Logic for Editing Cache in Model Card. by @yiboyasss in #2746
- ENH: Improve error message by @codingl2k1 in #2738
Bug fixes
- BUG: adapt mlx-vlm v0.1.7 by @qinxuye in #2724
- BUG: pin mlx<0.22.0 to prevent qwen2_vl failing in mlx-vlm by @qinxuye in #2752
Others
- FIX: [UI] Resolve bug preventing '/' input in model_path. by @yiboyasss in #2747
- FIX: [UI] Fix dark mode background bug. by @yiboyasss in #2748
- CHORE: Update new models in readme by @codingl2k1 in #2713
New Contributors
Full Changelog: v1.1.1...v1.2.0
v1.1.1
What's new in 1.1.1 (2024-12-27)
These are the changes in inference v1.1.1.
New features
- FEAT: support F5-TTS-MLX by @qinxuye in #2671
- FEAT: Support qwen2.5-coder-instruct model for tool calls by @Timmy-web in #2681
- FEAT: Support minicpm-4B on vllm by @Jun-Howie in #2697
- FEAT: support scheduling-policy for vllm by @hwzhuhao in #2700
- FEAT: Support QvQ-72B-Preview by @Jun-Howie in #2712
- FEAT: support SD3.5 series model by @qinxuye in #2706
Enhancements
- ENH: Guided Decoding OpenAIClient compatibility by @wxiwnd in #2673
- ENH: resample f5-tts-mlx ref audio when sample rate not synching. by @qinxuye in #2678
- ENH: support no images for MLX vlm by @qinxuye in #2670
- ENH: Update fish speech 1.5 by @codingl2k1 in #2672
- ENH: Update cosyvoice 2 by @codingl2k1 in #2684
- REF: Reduce code redundancy by setting default values by @pengjunfeng11 in #2711
Bug fixes
- BUG: Fix f5tts audio ref by @codingl2k1 in #2680
- BUG:
glm4-chat
cannot apply for continuous batching with transformers backend by @ChengjieLi28 in #2695
New Contributors
- @Timmy-web made their first contribution in #2681
Full Changelog: v1.1.0...v1.1.1
v1.1.0
What's new in 1.1.0 (2024-12-13)
These are the changes in inference v1.1.0.
New features
- FEAT: Support F5 TTS by @codingl2k1 in #2626
- FEAT: [UI] Add a hint for model running. by @yiboyasss in #2657
- FEAT: support VL models for MLX by @qinxuye in #2638
- FEAT: Add support for CLIP model by @Second222None in #2637
- FEAT: support llama-3.3-instruct by @qinxuye in #2661
Enhancements
- ENH: Optimize error message when user parameters are passed incorrectly by @namecd in #2623
- ENH: bypass the sampling parameter skip_special_tokens to vLLM backend by @zjuyzj in #2655
- ENH: unify prompt_text as cosyvoice for fish speech by @qinxuye in #2658
- ENH: Update glm4 chat model to new weights by @codingl2k1 in #2660
- ENH: upgrade sglang in Docker by @amumu96 in #2668
Bug fixes
- BUG: Cleanup Isolation tasks by @codingl2k1 in #2603
- BUG: fix qwq gguf download hub for modelscope by @redreamality in #2647
- BUG: fix ImportError when optional dependency FlagEmbedding is not installed by @zjuyzj in #2649
- BUG: use stream_generate in MLX by @qinxuye in #2635
- BUG:
stop
parameter leads to failure withtransformers
backend by @ChengjieLi28 in #2663 - BUG: fix FishSpeech Negative code found by @themanforfree in #2667
Documentation
- DOC: update new models by @qinxuye in #2632
- DOC: add doc about offline usage for SenseVoiceSmall by @qinxuye in #2654
Others
- FIX: fix launching bge-m3 with hybrid mode by @pengjunfeng11 in #2641
New Contributors
- @namecd made their first contribution in #2623
- @redreamality made their first contribution in #2647
- @Second222None made their first contribution in #2637
- @themanforfree made their first contribution in #2667
Full Changelog: v1.0.1...v1.1.0
v1.0.1
What's new in 1.0.1 (2024-11-29)
These are the changes in inference v1.0.1.
New features
- FEAT: Fish speech stream by @codingl2k1 in #2562
- FEAT: support sparse vector for bge-m3 by @pengjunfeng11 in #2540
- FEAT: whisper support for Mac MLX by @qinxuye in #2576
- FEAT: support guided decoding for vllm async engine by @wxiwnd in #2391
- FEAT: support QwQ-32B-Preview by @qinxuye in #2602
- FEAT: support glm-edge-chat model by @amumu96 in #2582
Enhancements
- ENH: Support fish speech reference audio by @codingl2k1 in #2542
Bug fixes
- BUG: GTE-qwen2 Embedding Dimension error by @cyhasuka in #2565
- BUG: request_limits does not work with streaming interfaces by @ChengjieLi28 in #2571
- BUG: Fix Codestral v0.1 URI for Pytorch Format by @danialcheung in #2590
- BUG: Correct the input bytes data by langchain_openai #2589 by @xiyuan-lee in #2600
Documentation
New Contributors
- @pengjunfeng11 made their first contribution in #2540
- @danialcheung made their first contribution in #2590
- @xiyuan-lee made their first contribution in #2600
Full Changelog: v1.0.0...v1.0.1