Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for mathematical unicode characters, treat as latex #607

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yliu000
Copy link

@yliu000 yliu000 commented Feb 13, 2025

Some pdf using mathematical unicode characters for mathematical meaning, such as:
𝒜, 𝑎

they should be treated as latex.

@awwaawwa
Copy link
Collaborator

When I have time, I will check the information and then merge it. Or look at other maintainers?

@yliu000
Copy link
Author

yliu000 commented Feb 13, 2025

unfortunately, I found that in some other pdf they use these unicode characters for normal text. So if apply my patch, it will leave a lot of content untranslated.

So maintainers, please use your expertise to decide whether accept this patch.

@awwaawwa
Copy link
Collaborator

I prefer not to merge this PR for now. When there's time later, we'll work on a dedicated mathematical optimization and investigate the occurrence of these characters in detail.

@awwaawwa awwaawwa added the Won't Merge Further research is needed label Feb 14, 2025
@awwaawwa
Copy link
Collaborator

awwaawwa commented Feb 16, 2025

https://github.com/funstory-ai/BabelDOC

It is recommended to make modifications in the new backend.

@awwaawwa
Copy link
Collaborator

According to #586, pdf2zh 2.0 will use BabelDOC as the new backend, and the current version of the backend will be deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Won't Merge Further research is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants