-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: add DocETL, Kotaemon, spaCy integrations; minor docs improvemen…
…ts (#408) Signed-off-by: Panos Vagenas <[email protected]>
- Loading branch information
Showing
9 changed files
with
56 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
Use the navigation on the left to browse some core Docling concepts. | ||
Use the navigation on the left to browse through some core Docling concepts. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Docling is available as a plugin for [EXAMPLE](https://example.com). | ||
|
||
- 💻 [GitHub][github] | ||
- 📖 [Docs][docs] | ||
- 📦 [PyPI][pypi] | ||
|
||
[github]: https://github.com/... | ||
[docs]: https://... | ||
[pypi]: https://pypi.org/project/... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,13 @@ | ||
## Get started | ||
|
||
Docling is used by the [Data Prep Kit \[↗\]](https://ibm.github.io/data-prep-kit/) open-source toolkit for preparing unstructured data for LLM application development ranging from laptop scale to datacenter scale. | ||
Docling is used by the [Data Prep Kit](https://ibm.github.io/data-prep-kit/) open-source toolkit for preparing unstructured data for LLM application development ranging from laptop scale to datacenter scale. | ||
|
||
Below you find the Data Prep Kit modules powered by Docling. | ||
|
||
## PDF ingestion to Parquet | ||
- 💻 [GitHub \[↗\]](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/pdf2parquet) | ||
- 📖 [API docs \[↗\]](https://ibm.github.io/data-prep-kit/transforms/language/pdf2parquet/python/) | ||
- 💻 [PDF-to-Parquet GitHub](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/pdf2parquet) | ||
- 📖 [PDF-to-Parquet Docs](https://ibm.github.io/data-prep-kit/transforms/language/pdf2parquet/python/) | ||
|
||
## Document chunking | ||
- 💻 [GitHub \[↗\]](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/doc_chunk) | ||
- 📖 [API docs \[↗\]](https://ibm.github.io/data-prep-kit/transforms/language/doc_chunk/python/) | ||
- 💻 [Doc Chunking GitHub](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/doc_chunk) | ||
- 📖 [Doc Chunking Docs](https://ibm.github.io/data-prep-kit/transforms/language/doc_chunk/python/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Docling is available as a file conversion method in [DocETL](https://github.com/ucbepic/docetl): | ||
|
||
- 💻 [DocETL GitHub][github] | ||
- 📖 [DocETL Docs][docs] | ||
- 📦 [DocETL PyPI][pypi] | ||
|
||
[github]: https://github.com/ucbepic/docetl | ||
[docs]: https://ucbepic.github.io/docetl/ | ||
[pypi]: https://pypi.org/project/docetl/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Docling is available in [Kotaemon](https://cinnamon.github.io/kotaemon/) as the `DoclingReader` loader: | ||
|
||
- 💻 [Kotaemon GitHub][github] | ||
- 📖 [DoclingReader Docs][docs] | ||
- ⚙️ [Docling Setup in Kotaemon][setup] | ||
|
||
[github]: https://github.com/Cinnamon/kotaemon | ||
[docs]: https://cinnamon.github.io/kotaemon/reference/loaders/docling_loader/ | ||
[setup]: https://cinnamon.github.io/kotaemon/development/?h=docling#setup-multimodal-document-parsing-ocr-table-parsing-figure-extraction |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,23 @@ | ||
## Get started | ||
|
||
Docling is available as an official [LlamaIndex \[↗\]](https://docs.llamaindex.ai/) extension. | ||
Docling is available as an official [LlamaIndex](https://docs.llamaindex.ai/) extension. | ||
|
||
To get started, check out the [step-by-step guide in LlamaIndex \[↗\]](https://docs.llamaindex.ai/en/stable/examples/data_connectors/DoclingReaderDemo/)<!--{target="_blank"}-->. | ||
To get started, check out the [step-by-step guide in LlamaIndex](https://docs.llamaindex.ai/en/stable/examples/data_connectors/DoclingReaderDemo/). | ||
|
||
## Components | ||
|
||
### Docling Reader | ||
|
||
Reads document files and uses Docling to populate LlamaIndex `Document` objects — either serializing Docling's data model (losslessly, e.g. as JSON) or exporting to a simplified format (lossily, e.g. as Markdown). | ||
|
||
- 💻 [GitHub \[↗\]](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/readers/llama-index-readers-docling)<!--{target="_blank"}--> | ||
- 📖 [API docs \[↗\]](https://docs.llamaindex.ai/en/stable/api_reference/readers/docling/)<!--{target="_blank"} --> | ||
- 📦 [PyPI \[↗\]](https://pypi.org/project/llama-index-readers-docling/)<!--{target="_blank"}--> | ||
- 🦙 [LlamaHub \[↗\]](https://llamahub.ai/l/readers/llama-index-readers-docling)<!--{target="_blank"}--> | ||
- 💻 [Docling Reader GitHub](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/readers/llama-index-readers-docling) | ||
- 📖 [Docling Reader Docs](https://docs.llamaindex.ai/en/stable/api_reference/readers/docling/) | ||
- 📦 [Docling Reader PyPI](https://pypi.org/project/llama-index-readers-docling/) | ||
|
||
### Docling Node Parser | ||
|
||
Reads LlamaIndex `Document` objects populated in Docling's format by Docling Reader and, using its knowledge of the Docling format, parses them to LlamaIndex `Node` objects for downstream usage in LlamaIndex applications, e.g. as chunks for embedding. | ||
|
||
- 💻 [GitHub \[↗\]](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/node_parser/llama-index-node-parser-docling)<!--{target="_blank"}--> | ||
- 📖 [API docs \[↗\]](https://docs.llamaindex.ai/en/stable/api_reference/node_parser/docling/)<!--{target="_blank"} --> | ||
- 📦 [PyPI \[↗\]](https://pypi.org/project/llama-index-node-parser-docling/)<!--{target="_blank"}--> | ||
- 🦙 [LlamaHub \[↗\]](https://llamahub.ai/l/node_parser/llama-index-node-parser-docling)<!--{target="_blank"}--> | ||
- 💻 [Docling Node Parser GitHub](https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/node_parser/llama-index-node-parser-docling) | ||
- 📖 [Docling Node Parser Docs](https://docs.llamaindex.ai/en/stable/api_reference/node_parser/docling/) | ||
- 📦 [Docling Node Parser PyPI](https://pypi.org/project/llama-index-node-parser-docling/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Docling is available in [spaCy](https://spacy.io/) as the "SpaCy Layout" plugin: | ||
|
||
- 💻 [SpacyLayout GitHub][github] | ||
- 📖 [SpacyLayout Docs][docs] | ||
- 📦 [SpacyLayout PyPI][pypi] | ||
|
||
[github]: https://github.com/explosion/spacy-layout | ||
[docs]: https://github.com/explosion/spacy-layout?tab=readme-ov-file#readme | ||
[pypi]: https://pypi.org/project/spacy-layout/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters