Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(translator): remove LLM <think>xxx</think> #609

Closed
missyoueveryday opened this issue Feb 13, 2025 · 9 comments · Fixed by #637
Closed

feat(translator): remove LLM <think>xxx</think> #609

missyoueveryday opened this issue Feb 13, 2025 · 9 comments · Fixed by #637
Labels
bug Something isn't working enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed Normal priority

Comments

@missyoueveryday
Copy link

问题描述

使用ollama本地模型进行翻译,如果使用带有思考的模型比如deepseek r1会将思考部分也输入到翻译结果中,并且同时导致排版错误

Image
可用于测试的pdf:

test.pdf

测试文档

Important

请提供用于复现测试的 PDF 文档

@missyoueveryday missyoueveryday added the bug Something isn't working label Feb 13, 2025
@awwaawwa
Copy link
Collaborator

awwaawwa commented Feb 13, 2025

There is currently an issue, which is that if the original text actually contains <think>xxx</think>, it will be mistakenly affected by the regex.

@missyoueveryday
Copy link
Author

easy way is after get the response, add a filter to delete the 'think' part and then go through the normal output.
Is this a good solution?

@awwaawwa
Copy link
Collaborator

The current issue is how to distinguish between the <think>xxx</think> generated by the model and the <think>xxx</think> contained in the original text.

@awwaawwa
Copy link
Collaborator

If unable to distinguish this, it may accidentally remove the <think>xxx</think> in the original text

@awwaawwa awwaawwa changed the title 使用ollama本地思考模型导致翻译格式错误 feat(translator): remove LLM <think>xxx</think> Feb 13, 2025
@awwaawwa awwaawwa added the enhancement New feature or request label Feb 13, 2025
@awwaawwa
Copy link
Collaborator

Using ^<think>.+</think> should be able to reduce the probability of false positives to a very low level. Looking forward to someone who can implement this.

@awwaawwa awwaawwa added Normal priority help wanted Extra attention is needed good first issue Good for newcomers labels Feb 13, 2025
@missyoueveryday
Copy link
Author

How about simply adding a checkbox to let users choose whether the model being used is a deep thinking model? 😎
default be normal model

@awwaawwa
Copy link
Collaborator

Since the false positive rate of using ^<think>.+</think> should be low enough, there's no need to add a checkbox. Additionally, the current configuration system of pdf2zh is not very convenient for adding parameters.......

@awwaawwa
Copy link
Collaborator

#586 After rewriting, adding parameters becomes much more convenient.

@hellofinch
Copy link
Contributor

ollama没有关闭think输出的开关,所以只能从试着快刀斩乱麻,粗暴的正则处理了。

@awwaawwa awwaawwa linked a pull request Feb 16, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed Normal priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants