PLAYA 0.2.8: 2025-01-22

PLAYA 0.2.7: 2025-01-07

Remove excessive debug logging
Add rendering matrix to GlyphObject
Fix ToUnicode CMaps for CID fonts
Optimize text extraction
Support slices and lists in PageList.__getitem__
Remove remaining dangerous cast usage
Make text extraction less Lazy so that we get graphics state correct (slightly breaking change)
Correct the handling of marked content sections\
Be robust to junk before the header
Deliberately break the CLI (ZeroVer FTW YOLO ROTFL)

Fix various bugs in the lazy API
- Add specialized __len__ methods to ContentObject classes
- Clarify iteration over ContentObject
Fix installation of playa-pdf[crypto]
Fix attribute classes in structure tree elements
Deprecate "user" device space to avoid confusion with user space
Parse arbitrary Encoding CMaps
Update pdfplumber support
Add parser for object streams and iterator over all indirect objects in a document

fix more embarrassing bugs largely regarding the creation of empty ContentObjects
these are not actually all fixed because (surprise!) sometimes we neglect to map the characters in fonts correctly
oh and also lots and lots of robustness fixes thanks to the pdf.js testsuite of pdf horrors

release early and often
fix some embarrassing bugs, again:
- CMap parser did not recognize bfrange correctly (regression)
- corner cases of inline images caused endless woe
- documentation said document.structtree exists but nope it didn't

fix serious bug on malformed stream_length
report actual bounding box for rotated glyphs
- eager API is no longer faster than pdfminer :( but it is more correct

expose form XObjects on Page to allow getting only their contents
expose form XObject IDs in LayoutDict
make TextState conform to PDF spec (leading and line matrix) and document it
expose more of TextState in LayoutDict (render mode in particular - OCRmyPDF)
do not try to map characters with no ToUnicode and no Encoding (OCRmyPDF)
properly support Pattern color space (uncolored tiling patterns) the way pdfplumber expects it to work
support marked content points as ContentObjects
document ContentObjects
make a proper schema for LayoutDict, document it, and communicate it to Polars
separate color values and patterns in LayoutDict