PLAYA 0.1.2: Initial release
Here's a first release, in case you want to use this. Reasons you might do so include:
- Faster than
pdfminer.six
(about 20% or so) - Much friendlier APIs than
PDFPageAggregator
,PDFResourceManager
,PDFPage
, etc, etc. - Many outstanding
pdfminer.six
bugs have been fixed
Why would you not want to use this?
- PyPI package name is not actually
playa
because somebody else took that name 13 years ago. - May be more or less tolerant of broken PDFs than
pdfminer.six
, and has no "strict mode" to be absolutely intolerant. - Doesn't let you extract image data (this is not always useful since PDFs tend to use compositing and thus you should use a real PDF renderer like pypdfium2 if you want to reliably extract images)
- Is not (or ain't) a layout analyzer, so no
LAParams
,TextBox
, and so on. - API subject to change and refinement.
- Does not have abstractions. You do not have the flexibility to subclass everything and build a PDF renderer on top of PLAYA.
- Probably contains bugs.
- Definitely lacks documentation.