You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, Docling processes images on a page uniformly during PDF-to-Markdown conversion. However, images within tables are not preserved in their original positions during table recognition. This leads to discrepancies in the resulting Markdown document, especially when images are converted to URLs.
I propose enhancing Docling's table recognition functionality to embed images (or their URLs) within the corresponding table cells in the Markdown output. This will ensure that the structure of the original PDF table, including images, is faithfully reproduced.
Expected Behavior:
Images within tables in a PDF should be correctly identified as part of the table.
When converting PDF to Markdown:
If an image is in a table cell, its URL should appear within the corresponding table cell in the Markdown output.
If an image is outside a table, it should be handled as it is currently (positioned relative to its original location).
Benefits:
Improves the fidelity of PDF-to-Markdown conversion.
Ensures that image URLs maintain their original context, especially in complex documents with tables.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Description:
Currently, Docling processes images on a page uniformly during PDF-to-Markdown conversion. However, images within tables are not preserved in their original positions during table recognition. This leads to discrepancies in the resulting Markdown document, especially when images are converted to URLs.
I propose enhancing Docling's table recognition functionality to embed images (or their URLs) within the corresponding table cells in the Markdown output. This will ensure that the structure of the original PDF table, including images, is faithfully reproduced.
Expected Behavior:
Images within tables in a PDF should be correctly identified as part of the table.
When converting PDF to Markdown:
If an image is in a table cell, its URL should appear within the corresponding table cell in the Markdown output.
If an image is outside a table, it should be handled as it is currently (positioned relative to its original location).
Benefits:
Improves the fidelity of PDF-to-Markdown conversion.
Ensures that image URLs maintain their original context, especially in complex documents with tables.
Beta Was this translation helpful? Give feedback.
All reactions