-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support of MDX or NON-PDF formarts in EXPORT only .PDF -> MDX #654
Comments
This is a super long-term goal, can wait patiently. |
The current internal representation we use contains many PDF implementation-related details and is highly unstable, so it is temporarily not suitable for other data analysis scenarios. |
Just to be clear it is just in the export format, can we export PDF to other formats like MDX, HTML etc? |
No |
Exporting PDF to other formats requires a lot of work and is not that simple. |
For this type of task, I suggest you consider other projects. There should be many such projects available now. The core focus of this project at the current stage is to maintain the layout while translating PDFs, rather than converting PDFs to other formats. |
The PDF records the drawing of XX glyphs using XX font at XX coordinates. It does not record high-level paragraph relationships. To convert PDF to other formats, you need to use layout OCR + reading order recognition + a bunch of other work to achieve the conversion. |
This is not Just in Export, but rather a massive undertaking... |
Is your feature request related to a problem?
Will it be possible to support different format which will be very useful in the data extraction pipelines.
Describe the solution you'd like
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: