Skip to content

Commit d47acb2

Browse files
committed
完全支持asr模型,发布版本0.3.9
1 parent 523e45c commit d47acb2

File tree

8 files changed

+189
-52
lines changed

8 files changed

+189
-52
lines changed

README.md

Lines changed: 45 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,19 @@
3333
7. 全球唯一扩展了**openai**库,实现Reranker模型(rerank, /v1/rerank)。(代码样例见gpt_server/tests/test_openai_rerank.py)
3434
8. 全球唯一支持了**openai**库的文本审核模型接口(text-moderation, /v1/moderations)。(代码样例见gpt_server/tests/test_openai_moderation.py)
3535
9. 全球唯一支持了**openai**库的TTS模型接口(tts, /v1/audio/speech),自带edge-tts(免费的TTS)(代码样例见gpt_server/tests/test_openai_tts.py)
36-
10. 支持多模态大模型
37-
11. 与FastChat相同的分布式架构
36+
10. 全球唯一支持了**openai**库的ASR模型接口(asr, /v1/audio/transcriptions),基于fanasr后端(代码样例见gpt_server/tests/test_openai_transcriptions.py)
37+
11. 支持多模态大模型
38+
12. 与FastChat相同的分布式架构
39+
## 配置文档
40+
通过这个样例文件,可以很快的掌握项目的配置方式。
41+
<br>
42+
**配置文件的详细说明信息位于:[config_example.yaml](https://github.com/shell-nlp/gpt_server/blob/main/gpt_server/script/config_example.yaml "配置文件")**
3843

3944
## 更新信息
4045

4146
```plaintext
47+
2025-4-2 支持了 OpenAI的ASR接口 /v1/audio/transcriptions
48+
2025-4-1 支持了 internvl2.5模型
4249
2025-2-9 支持了 QVQ
4350
2024-12-22 支持了 tts, /v1/audio/speech TTS模型
4451
2024-12-21 支持了 text-moderation, /v1/moderations 文本审核模型
@@ -209,19 +216,19 @@ Chat UI界面:
209216
| Deepseek | deepseek |||||
210217
| Llama-3 | llama |||||
211218
| Baichuan-2 | baichuan |||||
212-
| QWQ-32B | qwen |||||
219+
| QWQ-32B | qwen |||||
213220
| Phi-4 | phi ||| × | × |
214221
### **VLM** (视觉大模型榜单 https://rank.opencompass.org.cn/leaderboard-multimodal)
215222

216223
| Models / BackEnd | model_type | HF | vllm | LMDeploy TurboMind | LMDeploy PyTorch |
217224
| :--------------: | :--------: | :---: | :---: | :----------------: | :--------------: |
218225
| glm-4v-9b | chatglm | × | × | × ||
219-
| InternVL2 | internvl | × | × |||
220-
| InternVL2.5 | internvl | × | × |||
226+
| InternVL2 | internvl | × | × |||
227+
| InternVL2.5 | internvl | × | × |||
221228
| MiniCPM-V-2_6 | minicpmv | × ||| × |
222229
| Qwen2-VL | qwen | × || × ||
223-
| Qwen2.5-VL | qwen | × | × | × ||
224-
| QVQ | qwen | × || × | × |
230+
| Qwen2.5-VL | qwen | × | × | × ||
231+
| QVQ | qwen | × || × | × |
225232
<br>
226233

227234
### Embedding/Rerank/Classify模型
@@ -232,24 +239,41 @@ Chat UI界面:
232239

233240
以下模型经过测试可放心使用:
234241

235-
| Embedding/Rerank/Classify | HF | Infinity |
236-
| --------------------------------------------- | --- | -------- |
237-
| bge-reranker |||
238-
| bce-reranker |||
239-
| bge-embedding |||
240-
| bce-embedding |||
241-
| puff |||
242-
| piccolo-base-zh-embedding |||
243-
| acge_text_embedding |||
244-
| Yinka |||
245-
| zpoint_large_embedding_zh |||
246-
| xiaobu-embedding |||
247-
| Conan-embedding-v1 |||
248-
| KoalaAI/Text-Moderation(文本审核/多分类,审核文本是否存在暴力、色情等) | × ||
242+
| Embedding/Rerank/Classify | HF | Infinity |
243+
| ----------------------------------------------------------------------------------- | --- | -------- |
244+
| bge-reranker |||
245+
| bce-reranker |||
246+
| bge-embedding |||
247+
| bce-embedding |||
248+
| puff |||
249+
| piccolo-base-zh-embedding |||
250+
| acge_text_embedding |||
251+
| Yinka |||
252+
| zpoint_large_embedding_zh |||
253+
| xiaobu-embedding |||
254+
| Conan-embedding-v1 |||
255+
| KoalaAI/Text-Moderation(文本审核/多分类,审核文本是否存在暴力、色情等) | × ||
249256
| protectai/deberta-v3-base-prompt-injection-v2(提示注入/2分类,审核文本为提示注入) | × ||
250257

251258
目前 TencentBAC的 **Conan-embedding-v1** C-MTEB榜单排行第一(MTEB: https://huggingface.co/spaces/mteb/leaderboard)
252259

260+
<br>
261+
262+
### **ASR** (支持FunASR非实时模型 https://github.com/modelscope/FunASR/blob/main/README_zh.md)
263+
目前只测试了SenseVoiceSmall模型(性能最优的),其它模型的支持情况只是从官方文档中拷贝过来,不一定可以正常使用,欢迎测试/提issue。
264+
265+
| Models / BackEnd | model_type |
266+
| :--------------------: | :--------: |
267+
| SenseVoiceSmall | funasr |
268+
| paraformer-zh | funasr |
269+
| paraformer-en | funasr |
270+
| conformer-en | funasr |
271+
| Whisper-large-v3 | funasr |
272+
| Whisper-large-v3-turbo | funasr |
273+
| Qwen-Audio | funasr |
274+
| Qwen-Audio-Chat | funasr |
275+
276+
<br>
253277

254278
## 架构
255279

gpt_server/model_worker/base/model_worker_base.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,8 @@ def run(cls):
214214
parser.add_argument("--gpu_memory_utilization", type=str, default="0.8")
215215
# kv_cache_quant_policy
216216
parser.add_argument("--kv_cache_quant_policy", type=str, default="0")
217+
# vad_model
218+
parser.add_argument("--vad_model", type=str, default="")
217219
args = parser.parse_args()
218220
os.environ["num_gpus"] = str(args.num_gpus)
219221
if args.backend == "vllm":
@@ -231,6 +233,8 @@ def run(cls):
231233
os.environ["lora"] = args.lora
232234
if args.max_model_len:
233235
os.environ["max_model_len"] = args.max_model_len
236+
if args.vad_model:
237+
os.environ["vad_model"] = args.vad_model
234238

235239
os.environ["enable_prefix_caching"] = args.enable_prefix_caching
236240
os.environ["gpu_memory_utilization"] = args.gpu_memory_utilization

gpt_server/model_worker/funasr.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
from funasr.utils.postprocess_utils import rich_transcription_postprocess
88
from io import BytesIO
99

10+
1011
class FunASRWorker(ModelWorkerBase):
1112
def __init__(
1213
self,
@@ -33,9 +34,10 @@ def __init__(
3334
else:
3435
device = "cuda"
3536
logger.info(f"使用{device}加载...")
37+
vad_model = os.environ.get("vad_model", None)
3638
self.model = AutoModel(
3739
model=model_path,
38-
vad_model="fsmn-vad",
40+
vad_model=vad_model,
3941
vad_kwargs={"max_single_segment_time": 30000},
4042
device="cuda",
4143
)

gpt_server/script/config_example.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,3 +117,16 @@ models:
117117
workers:
118118
- gpus:
119119
- 2
120+
- SenseVoiceSmall:
121+
## 最新支持ASR模型
122+
alias: null
123+
enable: true
124+
model_config:
125+
model_name_or_path: /home/dev/model/iic/SenseVoiceSmall # 模型路径
126+
vad_model: /home/dev/model/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/ # VAD模型,可以不设置
127+
model_type: funasr # 类型只能是 funasr
128+
work_mode: hf
129+
device: gpu
130+
workers:
131+
- gpus:
132+
- 2

gpt_server/utils.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,7 @@ def start_model_worker(config: dict):
160160
kv_cache_quant_policy = engine_config.get(
161161
"kv_cache_quant_policy", 0
162162
)
163+
vad_model = engine_config.get("vad_model", "")
163164

164165
else:
165166
logger.error(
@@ -242,6 +243,8 @@ def start_model_worker(config: dict):
242243
cmd += f" --lora '{json.dumps(lora)}'"
243244
if max_model_len:
244245
cmd += f" --max_model_len '{max_model_len}'"
246+
if vad_model:
247+
cmd += f" --vad_model '{vad_model}'"
245248
p = Process(target=run_cmd, args=(cmd,))
246249
p.start()
247250
process.append(p)

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "gpt_server"
3-
version = "0.3.8"
3+
version = "0.3.9"
44
description = "gpt_server是一个用于生产级部署LLMs或Embedding的开源框架。"
55
readme = "README.md"
66
license = { text = "Apache 2.0" }

0 commit comments

Comments
 (0)