Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
Signed-off-by: Christoph Auer <[email protected]>
  • Loading branch information
cau-git committed Sep 19, 2024
1 parent d8163b0 commit 6ea6f29
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 2 deletions.
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ This can improve output quality if you find that multiple columns in extracted t


```python
from docling.datamodel.pipeline_options import PipelineOptions

pipeline_options = PipelineOptions(do_table_structure=True)
pipeline_options.table_structure_options.do_cell_matching = False # uses text cells predicted from table structure model

Expand All @@ -119,6 +121,20 @@ doc_converter = DocumentConverter(
)
```

Since docling 1.14.0: You can control which TableFormer mode you want to use. Choose between `TableFormerMode.FAST` (default) and `TableFormerMode.ACCURATE` (better, but slower) to receive better quality with difficult table structures.

```python
from docling.datamodel.pipeline_options import PipelineOptions, TableFormerMode

pipeline_options = PipelineOptions(do_table_structure=True)
pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE # use more accurate TableFormer model

doc_converter = DocumentConverter(
artifacts_path=artifacts_path,
pipeline_options=pipeline_options,
)
```

### Impose limits on the document size

You can limit the file size and number of pages which should be allowed to process per document:
Expand Down
4 changes: 2 additions & 2 deletions examples/custom_convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,9 @@ def main():
# PyPdfium with OCR
# -----------------
# pipeline_options = PipelineOptions()
# pipeline_options.do_ocr=False
# pipeline_options.do_ocr=True
# pipeline_options.do_table_structure=True
# pipeline_options.table_structure_options.do_cell_matching = True
# pipeline_options.table_structure_options.do_cell_matching = False

# doc_converter = DocumentConverter(
# pipeline_options=pipeline_options,
Expand Down

0 comments on commit 6ea6f29

Please sign in to comment.