feat(translator): remove LLM `<think>xxx</think>` #609

missyoueveryday · 2025-02-13T03:52:01Z

问题描述

使用ollama本地模型进行翻译，如果使用带有思考的模型比如deepseek r1会将思考部分也输入到翻译结果中，并且同时导致排版错误

可用于测试的pdf：

test.pdf

测试文档

Important

请提供用于复现测试的 PDF 文档

awwaawwa · 2025-02-13T04:04:05Z

There is currently an issue, which is that if the original text actually contains <think>xxx</think>, it will be mistakenly affected by the regex.

missyoueveryday · 2025-02-13T04:44:05Z

easy way is after get the response, add a filter to delete the 'think' part and then go through the normal output.
Is this a good solution?

awwaawwa · 2025-02-13T04:46:05Z

The current issue is how to distinguish between the <think>xxx</think> generated by the model and the <think>xxx</think> contained in the original text.

awwaawwa · 2025-02-13T04:47:14Z

If unable to distinguish this, it may accidentally remove the <think>xxx</think> in the original text

awwaawwa · 2025-02-13T06:01:41Z

Using ^<think>.+</think> should be able to reduce the probability of false positives to a very low level. Looking forward to someone who can implement this.

missyoueveryday · 2025-02-13T06:31:01Z

How about simply adding a checkbox to let users choose whether the model being used is a deep thinking model? 😎
default be normal model

awwaawwa · 2025-02-13T06:33:19Z

Since the false positive rate of using ^<think>.+</think> should be low enough, there's no need to add a checkbox. Additionally, the current configuration system of pdf2zh is not very convenient for adding parameters.......

awwaawwa · 2025-02-13T06:34:01Z

#586 After rewriting, adding parameters becomes much more convenient.

hellofinch · 2025-02-13T06:55:25Z

ollama没有关闭think输出的开关，所以只能从试着快刀斩乱麻，粗暴的正则处理了。

missyoueveryday added the bug Something isn't working label Feb 13, 2025

awwaawwa changed the title ~~使用ollama本地思考模型导致翻译格式错误~~ feat(translator): remove LLM <think>xxx</think> Feb 13, 2025

awwaawwa added the enhancement New feature or request label Feb 13, 2025

awwaawwa added Normal priority help wanted Extra attention is needed good first issue Good for newcomers labels Feb 13, 2025

awwaawwa linked a pull request Feb 16, 2025 that will close this issue

refactor(translator): enhance prompt conversion method #637

Merged

awwaawwa closed this as completed in #637 Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(translator): remove LLM `<think>xxx</think>` #609

feat(translator): remove LLM `<think>xxx</think>` #609

missyoueveryday commented Feb 13, 2025

awwaawwa commented Feb 13, 2025 •

edited

Loading

missyoueveryday commented Feb 13, 2025

awwaawwa commented Feb 13, 2025

awwaawwa commented Feb 13, 2025

awwaawwa commented Feb 13, 2025

missyoueveryday commented Feb 13, 2025

awwaawwa commented Feb 13, 2025

awwaawwa commented Feb 13, 2025

hellofinch commented Feb 13, 2025

feat(translator): remove LLM <think>xxx</think> #609

feat(translator): remove LLM <think>xxx</think> #609

Comments

missyoueveryday commented Feb 13, 2025

问题描述

测试文档

awwaawwa commented Feb 13, 2025 • edited Loading

missyoueveryday commented Feb 13, 2025

awwaawwa commented Feb 13, 2025

awwaawwa commented Feb 13, 2025

awwaawwa commented Feb 13, 2025

missyoueveryday commented Feb 13, 2025

awwaawwa commented Feb 13, 2025

awwaawwa commented Feb 13, 2025

hellofinch commented Feb 13, 2025

feat(translator): remove LLM `<think>xxx</think>` #609

feat(translator): remove LLM `<think>xxx</think>` #609

awwaawwa commented Feb 13, 2025 •

edited

Loading