Conversation
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (39.68%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #413 +/- ##
==========================================
- Coverage 55.57% 55.20% -0.37%
==========================================
Files 61 62 +1
Lines 5681 5847 +166
Branches 531 543 +12
==========================================
+ Hits 3157 3228 +71
- Misses 2476 2570 +94
- Partials 48 49 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR introduces an initial Docling-backed web file loading path for COMPASS, wiring it into web crawling, jurisdiction-site validation, and process-pool logging so remote documents can be parsed through either the existing ELM loader or a new Docling backend.
Changes:
- Added a new
compass.web.file_loadermodule with a Docling-based web loader and switched web crawl / validation code to use aCOMPASSWebFileLoaderalias. - Refactored CPU/process-pool services and logging so worker output and exceptions can be forwarded through the centralized log listener.
- Updated dependency/configuration files and adjusted tests to match renamed services and new loader entry points.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
tests/python/unit/web/test_web_crawl.py |
Updates crawler unit test to patch the new loader alias. |
tests/python/unit/utilities/test_utilities_logs.py |
Adjusts logging queue tests for the new lazy multiprocessing queue and exception-safe records. |
tests/python/unit/utilities/test_utilities_io.py |
Reworks IO tests to use elm.web.search.run.load_docs and renamed CPU service classes. |
tests/python/unit/services/test_services_cpu.py |
Adds tests for subprocess log forwarding and redirected stdout/stderr. |
tests/python/unit/scripts/test_process.py |
Minor log-reading cleanup in process script tests. |
tests/python/integration/test_integrated.py |
Small integration-test fixture updates for response headers and lint cleanup. |
pyproject.toml |
Adds Docling-related deps and narrows Python/Pixi dependency constraints. |
docs/source/conf.py |
Adds intersphinx mappings and formatting cleanup for docs config. |
compass/web/website_crawl.py |
Switches crawler internals to the new COMPASSWebFileLoader. |
compass/web/file_loader.py |
Introduces the new Docling-capable web loader and backend selection alias. |
compass/validation/location.py |
Uses the new loader alias when validating jurisdiction websites. |
compass/utilities/logs.py |
Replaces the global simple queue with a lazy multiprocessing queue and queue-safe exception formatting. |
compass/utilities/io.py |
Removes the local-doc convenience loader from utilities. |
compass/utilities/__init__.py |
Stops re-exporting load_local_docs. |
compass/services/cpu.py |
Renames the PDF process service, adds Docling parsing helpers, and configures subprocess logging. |
compass/scripts/process.py |
Swaps the base process service from PDFLoader to FileLoader. |
compass/scripts/download.py |
Refactors document download/search flows to use explicit loader instances and new helper structure. |
compass/_cli/process.py |
Includes docling logs at higher CLI verbosity levels. |
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Initial docling support. Will be adding more features around this in future PRs