Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: look for regressions when converting PDFs #1089

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Conversation

almet
Copy link
Member

@almet almet commented Mar 5, 2025

I convert all the documents we have in our test suite and store them in a reference folder, and then compare this bit to bit, using pymupdf pixel buffers.

We could use another diffing tool and tell it what's acceptable (if that exists), but I believe that in the end what matters most is the developer experience when we have an output that changes.

Two things come to mind:

  1. Inspect the differences
  2. Updating the reference version(s)

Inspecting the diff

There are multiple tools that allow to do that, but I found that diff-pdf good and able to generate an output we can look at without having to run a GUI.

diff-pdf /tmp/pytest-of-alexis/pytest-current/sample-docx0.pdf ./tests/test_docs/reference/sample-docx.pdf -m --output-diff=diff.pdf

Produces this diff.pdf file for the changes between the 0.8.1 release and this commit.

Update the reference version

We should have a command to bump all the reference documents (or a specific one).


Status:

This PR currently only fails tests when there a change in the output. I plan to do the following:

  • Check that PDF outputs are the same (pixel comparison) in our tests
  • Collect all differences and publish them as an artifact so we can inspect them, probably as part of the CI.
  • Add a tool to update all the reference documents.

Fixes #321

This stores a reference version of the converted PDFs and diffs them when
the newly converted document during the tests.
@almet almet changed the title tests: test for regressions when converting PDFs when running the tests tests: look for regressions when converting PDFs when running the tests Mar 6, 2025
@almet almet changed the title tests: look for regressions when converting PDFs when running the tests tests: look for regressions when converting PDFs Mar 6, 2025
almet added 7 commits March 10, 2025 15:42
This is useful to reduce the computation time when creating PDF visual
diffs. Here is a comparison of the same operation using python arrays
and numpy arrays + lookups:

Python arrays:
```
diff took 5.094218431997433 seconds
diff took 3.1553626069980965 seconds
diff took 3.3721952960004273 seconds
diff took 3.2134646750018874 seconds
diff took 3.3410625500000606 seconds
diff took 3.2893160990024626 seconds
```

Numpy:
```
diff took 0.13705662599750212 seconds
diff took 0.05698924000171246 seconds
diff took 0.15319590600120137 seconds
diff took 0.06126453700198908 seconds
diff took 0.12916332699751365 seconds
diff took 0.05839455900058965 seconds
Which makes it easier to inspect after CI run failures.
This leverages a new flag that can be passed during the tests to
regenerate the PDFs if needed.
This is to see how a failing CI would look like.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Verify PDF output of CI tests
1 participant