Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

翻译流程能够执行,但是无法正确翻译文件 #583

Open
sainnodel opened this issue Feb 8, 2025 · 13 comments
Open

翻译流程能够执行,但是无法正确翻译文件 #583

sainnodel opened this issue Feb 8, 2025 · 13 comments

Comments

@sainnodel
Copy link

Expression of Coxsackie adenovirus receptor and alphav-integrin does not correlate with adenovector targeting in vivo indicating anatomical vector barriers-mono.pdf

上传文件为中英对照,英文部分正常,翻译部分基本全部缺失。
翻译任意文件均出现类似错误,疑似字体缺失导致,但是翻译过程并没有报错所以不知道到底出现了什么问题。

@awwaawwa
Copy link
Collaborator

awwaawwa commented Feb 9, 2025

Please upload the original input file so we can reproduce it

@sainnodel
Copy link
Author

以下是终端中的内容:
lijunxiong@SainnodelMacbookPro ~ % pdf2zh -i

To create a public link, set share=True in launch().
WARNING:python_multipart.multipart:Skipping data after last boundary
Files before translation: ['Isolation of a Common Receptor for Coxsackie B Viruses and Adenoviruses 2 and 5.pdf', 'Expression of Coxsackie adenovirus receptor and alphav-integrin does not correlate with adenovector targeting in vivo indicating anatomical vector barriers.pdf', 'test1-dual.pdf', 'Expression of Coxsackie adenovirus receptor and alphav-integrin does not correlate with adenovector targeting in vivo indicating anatomical vector barriers-mono.pdf', 'Expression of Coxsackie adenovirus receptor and alphav-integrin does not correlate with adenovector targeting in vivo indicating anatomical vector barriers-dual.pdf', 'test1.pdf', 'test1-mono.pdf', 'IR 20250206 Original IND 31276.pdf', 'IR 20250206 Original IND 31276-dual.pdf', 'Isolation of a Common Receptor for Coxsackie B Viruses and Adenoviruses 2 and 5-mono.pdf', 's10238-005-0076-1.pdf', 'science.275.5304.1320 1.pdf', 'Isolation of a Common Receptor for Coxsackie B Viruses and Adenoviruses 2 and 5-dual.pdf', 'The Coxsackie-adenovirus receptor has elevated expression in human breast cancer.pdf', 'IR 20250206 Original IND 31276-mono.pdf']
100%|███████████████████████████████████████████| 16/16 [00:03<00:00, 4.09it/s]
../../Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pymupdf/init.py:276:exception_info(): exception_info:
../../Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pymupdf/init.py:277:exception_info(): Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pymupdf/utils.py", line 5698, in build_subset
fts.main(args)
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/misc/loggingTools.py", line 375, in wrapper
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/subset/init.py", line 3786, in main
font = load_font(
^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/misc/loggingTools.py", line 375, in wrapper
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/subset/init.py", line 3628, in load_font
f = font["post"]
~~~~^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/ttLib/ttFont.py", line 461, in getitem
table = self._readTable(tag)
^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/ttLib/ttFont.py", line 468, in _readTable
data = self.reader[tag]
~~~~~~~~~~~^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/ttLib/sfnt.py", line 110, in getitem
data = entry.loadData(self.file)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/ttLib/sfnt.py", line 508, in loadData
assert len(data) == self.length
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

../../Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pymupdf/init.py:276:exception_info(): exception_info:
../../Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pymupdf/init.py:277:exception_info(): Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pymupdf/utils.py", line 5698, in build_subset
fts.main(args)
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/misc/loggingTools.py", line 375, in wrapper
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/subset/init.py", line 3786, in main
font = load_font(
^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/misc/loggingTools.py", line 375, in wrapper
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/subset/init.py", line 3628, in load_font
f = font["post"]
~~~~^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/ttLib/ttFont.py", line 461, in getitem
table = self._readTable(tag)
^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/ttLib/ttFont.py", line 468, in _readTable
data = self.reader[tag]
~~~~~~~~~~~^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/ttLib/sfnt.py", line 110, in getitem
data = entry.loadData(self.file)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/fontTools/ttLib/sfnt.py", line 508, in loadData
assert len(data) == self.length
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Files after translation: ['Isolation of a Common Receptor for Coxsackie B Viruses and Adenoviruses 2 and 5.pdf', 'Expression of Coxsackie adenovirus receptor and alphav-integrin does not correlate with adenovector targeting in vivo indicating anatomical vector barriers.pdf', 'test1-dual.pdf', 'Expression of Coxsackie adenovirus receptor and alphav-integrin does not correlate with adenovector targeting in vivo indicating anatomical vector barriers-mono.pdf', 'Expression of Coxsackie adenovirus receptor and alphav-integrin does not correlate with adenovector targeting in vivo indicating anatomical vector barriers-dual.pdf', 'test1.pdf', 'test1-mono.pdf', 'IR 20250206 Original IND 31276.pdf', 'IR 20250206 Original IND 31276-dual.pdf', 'Isolation of a Common Receptor for Coxsackie B Viruses and Adenoviruses 2 and 5-mono.pdf', 's10238-005-0076-1.pdf', 'science.275.5304.1320 1.pdf', 'Isolation of a Common Receptor for Coxsackie B Viruses and Adenoviruses 2 and 5-dual.pdf', 'The Coxsackie-adenovirus receptor has elevated expression in human breast cancer.pdf', 'IR 20250206 Original IND 31276-mono.pdf']

@sainnodel
Copy link
Author

我依然怀疑是字体缺失,请问pdf2zh需要使用哪些字体,我想去下载系统字体试试能否解决问题

@hellofinch
Copy link
Contributor

https://github.com/Byaidu/PDFMathTranslate/blob/main/pdf2zh/high_level.py#L382
可以参考这里,这里下载了所需要的字体。

@sainnodel
Copy link
Author

https://github.com/Byaidu/PDFMathTranslate/blob/main/pdf2zh/high_level.py#L382
这里, URL_PREFIX = "https://github.com/timelic/source-han-serif/releases/download/main/"
这一行指向的url提示404?没有理解是什么情况

@awwaawwa
Copy link
Collaborator

This is the URL prefix. Other strings will be appended to this function, and only after appending will it become a complete URL.

@sainnodel
Copy link
Author

我手动下载了GoNotoKurrent-Regular.ttf放进了字体册,但是结果依然没有变化。无论如何还是非常感谢你的帮助。

@hellofinch
Copy link
Contributor

https://github.com/Byaidu/PDFMathTranslate/blob/main/pdf2zh/high_level.py#L405
这里做了URL的拼接,最终请求的是一组内容。

@awwaawwa
Copy link
Collaborator

这个不是缺字体,是字体子集化失败。这块把fallback改成false吧,应该是能好, @hellofinch 有空试试?

@sainnodel
Copy link
Author

sainnodel commented Feb 14, 2025

请问我的情况还有解决的办法吗?@awwaawwa 请教您一下
YADT不太明白应该怎么使用,还是希望在pdf2zh内解决问题

@awwaawwa
Copy link
Collaborator

awwaawwa commented Feb 14, 2025

等这个搞完吧 #586

pdf2zh 2.0 will use YADT as its core, focusing on self-deployment related matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants