Skip to content

feat: optimize PDF saving to prevent file bloat during metadata updates#28

Merged
balazs-szucs merged 2 commits intogrimmory-tools:mainfrom
balazs-szucs:bloat-fix
Apr 1, 2026
Merged

feat: optimize PDF saving to prevent file bloat during metadata updates#28
balazs-szucs merged 2 commits intogrimmory-tools:mainfrom
balazs-szucs:bloat-fix

Conversation

@balazs-szucs
Copy link
Copy Markdown
Member

@balazs-szucs balazs-szucs commented Apr 1, 2026

Summary by CodeRabbit

  • New Features

    • Optimized PDF saving so edits that only update metadata/XMP produce much smaller, non-bloated output; full native save still used after structural edits.
  • Tests

    • Added tests for page import/merge, metadata-only and metadata+XMP save-size bounds, structural-change save behavior, and in-memory save/reopen verification.
  • Chores

    • Version bumped to 0.14.0.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ddab79af-9771-4a55-90ae-32c089377574

📥 Commits

Reviewing files that changed from the base of the PR and between 6ece310 and 7835f59.

📒 Files selected for processing (3)
  • src/main/java/org/grimmory/pdfium4j/PdfDocument.java
  • src/main/java/org/grimmory/pdfium4j/PdfSaver.java
  • src/test/java/org/grimmory/pdfium4j/PdfDocumentTest.java

📝 Walkthrough

Walkthrough

The PR adds a structurallyModified flag and a getOriginalBytes() helper to enable metadata-only saves that reuse original file bytes; PdfSaver.saveToBytes gains an originalBytes parameter. Also bumps project version to 0.14.0 and adds tests exercising import and metadata-save size behavior.

Changes

Cohort / File(s) Summary
Version Update
build.gradle.kts
Project version changed from 0.13.0 to 0.14.0.
Document state & IO
src/main/java/org/grimmory/pdfium4j/PdfDocument.java
Added structurallyModified flag; set it after native structural ops (delete/insert/import); added getOriginalBytes() to return sourceBytes or read sourcePath; use original bytes when saving if no structural changes.
Saver logic
src/main/java/org/grimmory/pdfium4j/PdfSaver.java
Added overload of saveToBytes(..., byte[] originalBytes); when originalBytes is non-null and metadata/XMP pending, use it as base instead of calling nativeSave; otherwise preserve existing flow and validation.
Tests
src/test/java/org/grimmory/pdfium4j/PdfDocumentTest.java
Added importPages test and four metadata-save size/persistence tests (Info/XMP, structural vs metadata-only, in-memory bytes path).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PdfDocument
    participant PdfSaver
    participant NativeLib as Native PDFium
    participant FileSystem

    rect rgba(100,150,200,0.5)
    Note over Client,FileSystem: Metadata-only save (structurallyModified = false)
    Client->>PdfDocument: saveToBytes(SaveOptions)
    PdfDocument->>PdfDocument: check structurallyModified & pending metadata
    PdfDocument->>FileSystem: getOriginalBytes() (read if needed)
    FileSystem-->>PdfDocument: original bytes
    PdfDocument->>PdfSaver: saveToBytes(..., originalBytes)
    PdfSaver->>PdfSaver: use originalBytes as base
    PdfSaver->>NativeLib: apply incremental metadata update
    NativeLib-->>PdfSaver: updated bytes
    PdfSaver-->>PdfDocument: return bytes
    PdfDocument-->>Client: return bytes
    end

    rect rgba(200,150,100,0.5)
    Note over Client,NativeLib: Structural save (structurallyModified = true)
    Client->>PdfDocument: saveToBytes(SaveOptions)
    PdfDocument->>PdfDocument: check structurallyModified
    PdfDocument->>PdfSaver: saveToBytes(..., null)
    PdfSaver->>NativeLib: nativeSave(docHandle)
    NativeLib-->>PdfSaver: full serialized bytes
    PdfSaver->>PdfSaver: apply metadata/XMP updates
    PdfSaver-->>PdfDocument: return bytes
    PdfDocument-->>Client: return bytes
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I nibbled bytes beneath the moonlit code,
Saved only stories, left the pages stowed,
When structure stays, I hop — no heavy write,
Tiny metadata breezes keep the file light,
A little rabbit cheer for bytes made right.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title follows conventional commit format with 'feat:' prefix and clearly describes the main optimization: preventing file bloat during metadata updates, which aligns with the core changes adding structural modification tracking and original bytes retrieval for metadata-only saves.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@balazs-szucs balazs-szucs merged commit 4fa68c7 into grimmory-tools:main Apr 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant