diff --git a/README.md b/README.md index c3af0f79..a82208cd 100644 --- a/README.md +++ b/README.md @@ -19,19 +19,22 @@ Docling parses documents and exports them to the desired format with ease and speed. - ## Features * πŸ—‚οΈ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON * πŸ“‘ Advanced PDF document understanding including page layout, reading order & table structures * 🧩 Unified, expressive [DoclingDocument](https://ds4sd.github.io/docling/concepts/docling_document/) representation format -* πŸ“ Metadata extraction, including title, authors, references & language -* πŸ€– Seamless LlamaIndex πŸ¦™ & LangChain πŸ¦œπŸ”— integration for powerful RAG / QA applications +* πŸ€– Easy integration with LlamaIndex πŸ¦™ & LangChain πŸ¦œπŸ”— for powerful RAG / QA applications * πŸ” OCR support for scanned PDFs * πŸ’» Simple and convenient CLI Explore the [documentation](https://ds4sd.github.io/docling/) to discover plenty examples and unlock the full power of Docling! +### Coming soon + +* ♾️ Equation & code extraction +* πŸ“ Metadata extraction, including title, authors, references & language +* πŸ¦œπŸ”— Native LangChain extension ## Installation @@ -57,7 +60,6 @@ result = converter.convert(source) print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]" ``` - Check out [Getting started](https://ds4sd.github.io/docling/). You will find lots of tuning options to leverage all the advanced capabilities. @@ -66,7 +68,6 @@ You will find lots of tuning options to leverage all the advanced capabilities. Please feel free to connect with us using the [discussion section](https://github.com/DS4SD/docling/discussions). - ## Technical report For more details on Docling's inner workings, check out the [Docling Technical Report](https://arxiv.org/abs/2408.09869). @@ -95,5 +96,5 @@ If you use Docling in your projects, please consider citing the following: ## License -The Docling codebase is under MIT license. +The Docling codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages. diff --git a/docs/index.md b/docs/index.md index 68cdd12a..ebef79a0 100644 --- a/docs/index.md +++ b/docs/index.md @@ -22,7 +22,12 @@ Docling parses documents and exports them to the desired format with ease and sp * πŸ—‚οΈ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON * πŸ“‘ Advanced PDF document understanding incl. page layout, reading order & table structures * 🧩 Unified, expressive [DoclingDocument](./concepts/docling_document.md) representation format -* πŸ“ Metadata extraction, including title, authors, references & language -* πŸ€– Seamless LlamaIndex πŸ¦™ & LangChain πŸ¦œπŸ”— integration for powerful RAG / QA applications +* πŸ€– Easy integration with LlamaIndex πŸ¦™ & LangChain πŸ¦œπŸ”— for powerful RAG / QA applications * πŸ” OCR support for scanned PDFs * πŸ’» Simple and convenient CLI + +### Coming soon + +* ♾️ Equation & code extraction +* πŸ“ Metadata extraction, including title, authors, references & language +* πŸ¦œπŸ”— Native LangChain extension