Skip to content

Initial Docling support#413

Merged
ppinchuk merged 101 commits intomainfrom
pp/docling
May 6, 2026
Merged

Initial Docling support#413
ppinchuk merged 101 commits intomainfrom
pp/docling

Conversation

@ppinchuk
Copy link
Copy Markdown
Collaborator

@ppinchuk ppinchuk commented May 3, 2026

Initial docling support. Will be adding more features around this in future PRs

@ppinchuk ppinchuk self-assigned this May 3, 2026
@ppinchuk ppinchuk added the enhancement Update to logic or general code improvements label May 3, 2026
Copilot AI review requested due to automatic review settings May 3, 2026 21:08
@ppinchuk ppinchuk added dependencies Issues/pull requests related to a dependency p-high Priority: high labels May 3, 2026
@ppinchuk ppinchuk requested a review from castelao as a code owner May 3, 2026 21:08
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 3, 2026

Codecov Report

❌ Patch coverage is 39.68872% with 155 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.20%. Comparing base (6ef5ad4) to head (3691459).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
compass/services/cpu.py 31.53% 89 Missing ⚠️
compass/web/file_loader.py 35.41% 30 Missing and 1 partial ⚠️
compass/scripts/download.py 14.70% 29 Missing ⚠️
compass/utilities/logs.py 90.00% 4 Missing ⚠️
compass/_cli/process.py 0.00% 1 Missing ⚠️
compass/validation/location.py 50.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (39.68%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #413      +/-   ##
==========================================
- Coverage   55.57%   55.20%   -0.37%     
==========================================
  Files          61       62       +1     
  Lines        5681     5847     +166     
  Branches      531      543      +12     
==========================================
+ Hits         3157     3228      +71     
- Misses       2476     2570      +94     
- Partials       48       49       +1     
Flag Coverage Δ
unittests 55.20% <39.68%> (-0.37%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an initial Docling-backed web file loading path for COMPASS, wiring it into web crawling, jurisdiction-site validation, and process-pool logging so remote documents can be parsed through either the existing ELM loader or a new Docling backend.

Changes:

  • Added a new compass.web.file_loader module with a Docling-based web loader and switched web crawl / validation code to use a COMPASSWebFileLoader alias.
  • Refactored CPU/process-pool services and logging so worker output and exceptions can be forwarded through the centralized log listener.
  • Updated dependency/configuration files and adjusted tests to match renamed services and new loader entry points.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/python/unit/web/test_web_crawl.py Updates crawler unit test to patch the new loader alias.
tests/python/unit/utilities/test_utilities_logs.py Adjusts logging queue tests for the new lazy multiprocessing queue and exception-safe records.
tests/python/unit/utilities/test_utilities_io.py Reworks IO tests to use elm.web.search.run.load_docs and renamed CPU service classes.
tests/python/unit/services/test_services_cpu.py Adds tests for subprocess log forwarding and redirected stdout/stderr.
tests/python/unit/scripts/test_process.py Minor log-reading cleanup in process script tests.
tests/python/integration/test_integrated.py Small integration-test fixture updates for response headers and lint cleanup.
pyproject.toml Adds Docling-related deps and narrows Python/Pixi dependency constraints.
docs/source/conf.py Adds intersphinx mappings and formatting cleanup for docs config.
compass/web/website_crawl.py Switches crawler internals to the new COMPASSWebFileLoader.
compass/web/file_loader.py Introduces the new Docling-capable web loader and backend selection alias.
compass/validation/location.py Uses the new loader alias when validating jurisdiction websites.
compass/utilities/logs.py Replaces the global simple queue with a lazy multiprocessing queue and queue-safe exception formatting.
compass/utilities/io.py Removes the local-doc convenience loader from utilities.
compass/utilities/__init__.py Stops re-exporting load_local_docs.
compass/services/cpu.py Renames the PDF process service, adds Docling parsing helpers, and configures subprocess logging.
compass/scripts/process.py Swaps the base process service from PDFLoader to FileLoader.
compass/scripts/download.py Refactors document download/search flows to use explicit loader instances and new helper structure.
compass/_cli/process.py Includes docling logs at higher CLI verbosity levels.

Comment thread compass/scripts/download.py
Comment thread compass/web/file_loader.py
Comment thread pyproject.toml
Comment thread compass/web/file_loader.py
Comment thread compass/web/file_loader.py
ppinchuk and others added 17 commits May 3, 2026 15:16
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
@ppinchuk ppinchuk merged commit c577f03 into main May 6, 2026
30 checks passed
@ppinchuk ppinchuk deleted the pp/docling branch May 6, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Issues/pull requests related to a dependency enhancement Update to logic or general code improvements p-high Priority: high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants