Releases: alephdata/ingest-file
3.19.3-rc1
What's Changed
- Bump pantomime from 0.6.0 to 0.6.1 by @dependabot in #501
- Bump ruff from 0.0.269 to 0.0.282 by @dependabot in #500
- Bump cryptography from 39.0.1 to 41.0.3 by @dependabot in #502
- Bump sentry-sdk from 1.26.0 to 1.29.2 by @dependabot in #499
- Bump black from 23.3.0 to 23.7.0 by @dependabot in #496
- Bump pytest from 7.2.2 to 7.4.0 by @dependabot in #485
- Bump lxml from 4.9.2 to 4.9.3 by @dependabot in #495
- Bump google-cloud-vision from 3.4.1 to 3.4.4 by @dependabot in #497
- Bump tesserocr from 2.6.0 to 2.6.1 by @dependabot in #493
- Bump spacy from 3.5.1 to 3.6.1 by @dependabot in #509
- Bump pillow from 9.5.0 to 10.0.0 by @dependabot in #483
- Bump pytest-cov from 4.0.0 to 4.1.0 by @dependabot in #476
- Bump requests[security] from 2.28.2 to 2.31.0 by @dependabot in #477
- Bump followthemoney-store[postgresql] from 3.0.5 to 3.0.6 by @dependabot in #513
- Bump followthemoney from 3.4.4 to 3.5.2 by @dependabot in #507
- Bump icalendar from 5.0.4 to 5.0.7 by @dependabot in #469
- Bump pyicu from 2.10.2 to 2.11 by @dependabot in #459
- Bump click from 8.1.3 to 8.1.7 by @dependabot in #508
- Lower click version to avoid mismatch by @stchris in #514
- Bump ruff from 0.0.282 to 0.0.286 by @dependabot in #516
- Add merge_group trigger by @stchris in #521
- GitHub Actions: Update checkout action to v3 by @stchris in #522
- Bump sentry-sdk from 1.29.2 to 1.30.0 by @dependabot in #519
- Bump fingerprints from 1.1.0 to 1.1.1 by @dependabot in #518
- Bump servicelayer[amazon,google] from 1.21.0 to 1.21.2 by @dependabot in #520
Full Changelog: 3.19.2...3.19.3-rc1
3.19.2
What's Changed
- Fix handling of multipart emails by @tillprochaska in #488
- Send ProcessingExceptions to Sentry by @stchris in #487
New Contributors
- @tillprochaska made their first contribution in #488
Full Changelog: 3.18.4...3.19.2
3.19.2-rc1
What's Changed
- Fix handling of multipart emails by @tillprochaska in #488
- Send ProcessingExceptions to Sentry by @stchris in #487
Full Changelog: 3.18.4...3.19.2-rc1
3.19.1
3.19.0
What's Changed
- Add support for linting with ruff by @stchris in #468
- Bump versions of FTM and servicelayer by @catileptic
- Add ingest-file version to Document by @catileptic
- Lint with black by @stchris
- Bump versions: followthemoney==3.4.3, followthemoney-store[postgresql]==3.0.5, servicelayer[google,amazon]==1.21.0 by @stchris and @catileptic
Full Changelog: 3.18.4...3.19.0
3.18.4
What's Changed
Major PDF library change
We are hereby deprecating pdflib, replacing it with a well maintained, performant library: pymupdf. This enables local development on hardware with Apple Silicon CPUs. This also enables support for JBIG2 images in PDF files.
License change
Because of the above dependency as of this release ingest-file
is licensed under the terms of the AGPLv3+ license.
Integrating convert-document into ingest-file
- Merge convert-document into ingest-file by @stchris in #395
- Better logging when converting documents to pdf by @Rosencrantz in #376
Smaller changes
-
Fix PDF ingest bug by @catileptic in #430
-
Do full page OCR for PDF pages with Type3 fonts by @stchris in #449
Dependency upgrades
- Bump pikepdf from 6.2.8.post1 to 7.1.1 by @dependabot in #434
- Bump google-cloud-vision from 3.3.0 to 3.4.0 by @dependabot in #439
- Bump pantomime from 0.5.3 to 0.6.0 by @dependabot in #436
- Bump cryptography from 38.0.4 to 39.0.1 by @dependabot in #431
- Bump pytest from 7.2.0 to 7.2.1 by @dependabot in #424
- Bump openpyxl from 3.0.10 to 3.1.1 by @dependabot in #435
- Bump spacy from 3.4.4 to 3.5.1 by @dependabot in #440
- Bump fingerprints from 1.0.3 to 1.1.0 by @dependabot in #438
- Bump pillow from 9.4.0 to 9.5.0 by @dependabot in #448
- Bump google-cloud-vision from 3.4.0 to 3.4.1 by @dependabot in #447
- Bump openpyxl from 3.1.1 to 3.1.2 by @dependabot in #446
- Bump pytest from 7.2.1 to 7.2.2 by @dependabot in #444
- Bump tesserocr from 2.5.2 to 2.6.0 by @dependabot in #445
Full Changelog: 3.18.2...3.18.4
3.18.4-rc4
- Hotfix for the image path where full page images get extracted to (when ingesting PDFs with Type3 fonts)
Full Changelog: 3.18.4-rc3...3.18.4-rc4
3.18.4-rc3
What's Changed
Dependency upgrades
- Bump pillow from 9.4.0 to 9.5.0 by @dependabot in #448
- Bump google-cloud-vision from 3.4.0 to 3.4.1 by @dependabot in #447
- Bump openpyxl from 3.1.1 to 3.1.2 by @dependabot in #446
- Bump pytest from 7.2.1 to 7.2.2 by @dependabot in #444
- Bump tesserocr from 2.5.2 to 2.6.0 by @dependabot in #445
Full Changelog: 3.18.4-rc1...3.18.4-rc3
3.18.4-rc1
What's Changed
- Use PyMuPDF instead of pikepdf + pdfminer.six for PDF ingestion (text and image extraction). #441
Dependency upgrades
- Bump google-cloud-vision from 3.3.0 to 3.4.0 by @dependabot in #439
- Bump pantomime from 0.5.3 to 0.6.0 by @dependabot in #436
- Bump cryptography from 38.0.4 to 39.0.1 by @dependabot in #431
- Bump pytest from 7.2.0 to 7.2.1 by @dependabot in #424
- Bump openpyxl from 3.0.10 to 3.1.1 by @dependabot in #435
- Bump spacy from 3.4.4 to 3.5.1 by @dependabot in #440
- Bump fingerprints from 1.0.3 to 1.1.0 by @dependabot in #438
Full Changelog: 3.18.2...3.18.4-rc1
3.18.3-rc2
What's Changed
- Fix PDF ingest bug by @catileptic in #430
Dependency upgrades
- Bump pikepdf from 6.2.8.post1 to 7.1.1 by @dependabot in #434
- Bump google-cloud-vision from 3.3.0 to 3.4.0 by @dependabot in #439
- Bump pantomime from 0.5.3 to 0.6.0 by @dependabot in #436
- Bump cryptography from 38.0.4 to 39.0.1 by @dependabot in #431
- Bump pytest from 7.2.0 to 7.2.1 by @dependabot in #424
- Bump openpyxl from 3.0.10 to 3.1.1 by @dependabot in #435
- Bump spacy from 3.4.4 to 3.5.1 by @dependabot in #440
- Bump fingerprints from 1.0.3 to 1.1.0 by @dependabot in #438
Full Changelog: 3.18.2...3.18.3-rc2