Releases: DS4SD/docling
Releases Β· DS4SD/docling
v2.17.0
Feature
- CLI: Expose code and formula models in the CLI (#820) (
6882e6c
) - Add platform info to CLI version printout (#816) (
95b293a
) - ocr: Expose
rec_keys_path
in RapidOcrOptions to support custom dictionaries (#786) (5332755
) - Introduce automatic language detection in TesseractOcrCliModel (#800) (
3be2fb5
)
Fix
- Fix single newline handling in MD backend (#824) (
5aed9f8
) - Use file extension if filetype fails with PDF (#827) (
adf6353
) - Parse html with omitted body tag (#818) (
a112d7a
)
Documentation
- Document Docling JSON parsing (#819) (
6875913
) - Add SSL verification error mitigation (#821) (
5139b48
) - backend XML: Do not delete temp file in notebook (#817) (
4d41db3
) - Typo (#814) (
8a4ec77
) - Added markdown headings to enable TOC in github pages (#808) (
b885b2f
) - Description of supported formats and backends (#788) (
c2ae1cc
)
v2.16.0
Feature
- New document picture classifier (#805) (
16a218d
) - Add Docling JSON ingestion (#783) (
88a0e66
) - Code and equation model for PDF and code blocks in markdown (#752) (
3213b24
) - Add "auto" language for TesseractOcr (#759) (
8543c22
)
Fix
- Added extraction of byte-images in excel (#804) (
a458e29
) - Update docling-parse-v2 backend version with new parsing fixes (#769) (
670a08b
)
Documentation
v2.15.1
v2.15.0
v2.14.0
v2.13.0
v2.12.0
v2.11.0
v2.10.0
v2.9.0
Feature
- Expose new hybrid chunker, update docs (#384) (
c8ecdd9
) - MS Word backend: Make detection of headers and other styles localization agnostic (#534) (
3e073df
)
Fix
- Correcting DefaultText ID for MS Word backend (#537) (
eb7ffcd
) - Add
py.typed
marker file (#531) (9102fe1
) - Enable HTML export in CLI and add options for image mode (#513) (
0d11e30
) - Missing text in docx (t tag) when embedded in a table (#528) (
b730b2d
) - Restore pydantic version pin after fixes (#512) (
c830b92
) - Folder input in cli (#511) (
8ada0bc
)