Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

推理阶段,model_input_ids为什么只由单个token组成 #129

Open
Fuyubai opened this issue Dec 10, 2024 · 1 comment
Open

推理阶段,model_input_ids为什么只由单个token组成 #129

Fuyubai opened this issue Dec 10, 2024 · 1 comment

Comments

@Fuyubai
Copy link

Fuyubai commented Dec 10, 2024

在看推理阶段的源码(inference.py)时,有两个问题想请教一下
image

  1. 为什么第一个输出的audio token不生成audio呢,而是从第二个输出的audio token开始生成?
  2. 为什么model_input_ids只由上一个step输出的token构成呢,过往step输出的token为什么不加入到model_input_ids?audio feature为什么为None?
@mini-omni
Copy link
Contributor

  1. 第一个audio生成时加了一个token的delay,希望audio生成时有文本token做指导
  2. model里有kv cache,所以除了第一次,后续都不需要过往的token。audio feature具体可以参考model.py里的forward函数

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants