Skip to content

Commit

Permalink
docs: extend integration docs & README (#456)
Browse files Browse the repository at this point in the history
Signed-off-by: Panos Vagenas <[email protected]>
  • Loading branch information
vagenas authored Nov 28, 2024
1 parent 211f4f7 commit 84c46fd
Show file tree
Hide file tree
Showing 11 changed files with 71 additions and 10 deletions.
24 changes: 20 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
</a>
</p>

# Docling
# πŸ¦† Docling

<p align="center">
<a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
Expand All @@ -29,7 +29,7 @@ Docling parses documents and exports them to the desired format with ease and sp
* πŸ—‚οΈ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON
* πŸ“‘ Advanced PDF document understanding including page layout, reading order & table structures
* 🧩 Unified, expressive [DoclingDocument](https://ds4sd.github.io/docling/concepts/docling_document/) representation format
* πŸ€– Easy integration with LlamaIndex πŸ¦™ & LangChain πŸ¦œπŸ”— for powerful RAG / QA applications
* πŸ€– Easy integration with πŸ¦™ LlamaIndex & πŸ¦œπŸ”— LangChain for powerful RAG / QA applications
* πŸ” OCR support for scanned PDFs
* πŸ’» Simple and convenient CLI

Expand Down Expand Up @@ -65,8 +65,24 @@ result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
```

Check out [Getting started](https://ds4sd.github.io/docling/).
You will find lots of tuning options to leverage all the advanced capabilities.
More [advanced usage options](https://ds4sd.github.io/docling/usage/) are available in
the docs.

## Documentation

Check out Docling's [documentation](https://ds4sd.github.io/docling/), for details on
installation, usage, concepts, recipes, extensions, and more.

## Examples

Go hands-on with our [examples](https://ds4sd.github.io/docling/examples/),
demonstrating how to address different application use cases with Docling.

## Integrations

To further accelerate your AI application development, check out Docling's native
[integrations](https://ds4sd.github.io/docling/integrations/) with popular frameworks
and tools.

## Get help and support

Expand Down
Binary file added docs/assets/docling_ecosystem.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/docling_ecosystem.pptx
Binary file not shown.
4 changes: 1 addition & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
# Docling

<p align="center">
<img loading="lazy" alt="Docling" src="assets/docling_processing.png" width="100%" />
<a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
Expand All @@ -23,7 +21,7 @@ Docling parses documents and exports them to the desired format with ease and sp
* πŸ—‚οΈ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON
* πŸ“‘ Advanced PDF document understanding incl. page layout, reading order & table structures
* 🧩 Unified, expressive [DoclingDocument](./concepts/docling_document.md) representation format
* πŸ€– Easy integration with LlamaIndex πŸ¦™ & LangChain πŸ¦œπŸ”— for powerful RAG / QA applications
* πŸ€– Easy integration with πŸ¦™ LlamaIndex & πŸ¦œπŸ”— LangChain for powerful RAG / QA applications
* πŸ” OCR support for scanned PDFs
* πŸ’» Simple and convenient CLI

Expand Down
9 changes: 9 additions & 0 deletions docs/integrations/bee.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Docling is available as an extraction backend in the [Bee][github] framework.

- πŸ’» [Bee GitHub][github]
- πŸ“– [Bee Docs][docs]
- πŸ“¦ [Bee NPM][package]

[github]: https://github.com/i-am-bee
[docs]: https://i-am-bee.github.io/bee-agent-framework/
[package]: https://www.npmjs.com/package/bee-agent-framework
5 changes: 5 additions & 0 deletions docs/integrations/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,6 @@
Use the navigation on the left to browse through Docling integrations with popular frameworks and tools.


<p align="center">
<img loading="lazy" alt="Docling" src="../assets/docling_ecosystem.png" width="100%" />
</p>
17 changes: 17 additions & 0 deletions docs/integrations/instructlab.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Docling is powering document processing in [InstructLab](https://instructlab.ai/),
enabling users to unlock the knowledge hidden in documents and present it to
InstructLab's fine-tuning for aligning AI models to the user's specific data.

More details can be found in this [blog post][blog].

- 🏠 [InstructLab Home][home]
- πŸ’» [InstructLab GitHub][github]
- πŸ§‘πŸ»β€πŸ’» [InstructLab UI][ui]
- πŸ“– [InstructLab Docs][docs]
<!-- - πŸ“ [Blog post]() -->

[home]: https://instructlab.ai
[github]: https://github.com/instructlab
[ui]: https://ui.instructlab.ai/
[docs]: https://docs.instructlab.ai/
[blog]: https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai
9 changes: 9 additions & 0 deletions docs/integrations/prodigy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Docling is available in [Prodigy][home] as a [Prodigy-PDF plugin][plugin] recipe.

- 🌐 [Prodigy Home][home]
- πŸ”Œ [Prodigy-PDF Plugin][plugin]
- πŸ§‘πŸ½β€πŸ³ [pdf-spans.manual Recipe][recipe]

[home]: https://prodi.gy/
[plugin]: https://prodi.gy/docs/plugins#pdf
[recipe]: https://prodi.gy/docs/plugins#pdf-spans.manual
2 changes: 2 additions & 0 deletions docs/integrations/spacy.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# spaCy

Docling is available in [spaCy](https://spacy.io/) as the "SpaCy Layout" plugin:

- πŸ’» [SpacyLayout GitHub][github]
Expand Down
2 changes: 2 additions & 0 deletions docs/overrides/main.html
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
{% extends "base.html" %}

{#
{% block announce %}
<p>πŸŽ‰ Docling has gone v2! <a href="{{ 'v2' | url }}">Check out</a> what's new and how to get started!</p>
{% endblock %}
#}
9 changes: 6 additions & 3 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ theme:
- search.suggest
- toc.follow
nav:
- Get started:
- Home: index.md
- Home:
- "πŸ¦† Docling": index.md
- Installation: installation.md
- Usage: usage.md
- CLI: cli.md
Expand Down Expand Up @@ -85,10 +85,13 @@ nav:
# - CLI: examples/cli.md
- Integrations:
- Integrations: integrations/index.md
- "🐝 Bee": integrations/bee.md
- "Data Prep Kit": integrations/data_prep_kit.md
- "DocETL": integrations/docetl.md
- "🐢 InstructLab": integrations/instructlab.md
- "Kotaemon": integrations/kotaemon.md
- "LlamaIndex πŸ¦™": integrations/llamaindex.md
- "πŸ¦™ LlamaIndex": integrations/llamaindex.md
- "Prodigy": integrations/prodigy.md
- "spaCy": integrations/spacy.md
# - "LangChain πŸ¦œπŸ”—": integrations/langchain.md
# - API reference:
Expand Down

0 comments on commit 84c46fd

Please sign in to comment.