Skip to content

refactor(file): improve path handling and validation#82

Open
balazs-szucs wants to merge 6 commits intogrimmory-tools:developfrom
balazs-szucs:uncontrolled-paths
Open

refactor(file): improve path handling and validation#82
balazs-szucs wants to merge 6 commits intogrimmory-tools:developfrom
balazs-szucs:uncontrolled-paths

Conversation

@balazs-szucs
Copy link
Copy Markdown
Member

@balazs-szucs balazs-szucs commented Mar 20, 2026

📝 Description

This pull request focuses on improving file path validation and security throughout the codebase, especially to prevent directory traversal attacks and ensure safe file operations. It also includes some bug fixes and minor improvements to the OPDS feed and related services.

These changes collectively harden the application against path traversal vulnerabilities.

Linked Issue: Fixes #

Summary by CodeRabbit

  • Bug Fixes

    • OPDS feeds now reliably return UTF‑8 encoded payloads and improved XML escaping.
    • Migration skips suspicious resource names to avoid unsafe writes.
    • Path validation now blocks directory traversal and malformed paths.
  • Improvements

    • Upload/download flows enforce files remain within expected filesystem roots.
    • Pagination URLs are rebuilt from a whitelist and safely URL‑encoded.
    • File/folder hashing and related utilities now validate and normalize paths.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Adds filesystem path normalization and containment checks across uploads, downloads, migrations, and hashing; converts OPDS feed endpoints to return UTF-8 bytes; and tightens pagination query handling and XML escaping in feed generation.

Changes

Cohort / File(s) Summary
Path Validation Utilities
booklore-api/src/main/java/org/booklore/util/FileUtils.java
Added normalizeAbsolutePath, containsParentTraversal, requirePathWithinBase, and resolvePathWithinBase to normalize paths, detect .. segments, and enforce that candidate paths remain within a specified base (includes real-path validation).
File Operation Safety
booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java, booklore-api/src/main/java/org/booklore/service/file/FileFingerprint.java, booklore-api/src/main/java/org/booklore/service/file/PathService.java
Integrated FileUtils validators: BookDownloadService constrains primary file paths to library root via requirePathWithinBase; FileFingerprint validates/normalizes paths before hashing; PathService.getFoldersAtPath rejects null/blank, handles InvalidPathException, blocks parent traversal, and uses normalized absolute paths.
Upload Path Safety
booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java
Refactored upload/move flows to resolve and validate destinations within expected roots, introduced safe-relative path derivation (toSafeRelativePath/buildSafeRelativePath), tightened move validation and temp-file cleanup checks.
Migration File Safety
booklore-api/src/main/java/org/booklore/service/migration/migrations/MoveIconsToDataFolderMigration.java
Normalizes target directory, validates resource filenames (rejects separators or ..), and resolves/copies icon files only when resulting path is contained within the normalized target directory; skips and warns on escapes.
OPDS Response & Feed Handling
booklore-api/src/main/java/org/booklore/controller/OpdsController.java, booklore-api/src/main/java/org/booklore/service/opds/OpdsFeedService.java
OpdsController endpoints getCatalog/getRecentBooks now return ResponseEntity<byte[]> sending UTF‑8 feed bytes. OpdsFeedService adds a pagination whitelist, URL-encodes whitelisted query params, and uses StringEscapeUtils.escapeXml11 for feed ID escaping.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

backend, enhancement

Poem

🐰 I hopped through paths both wild and wide,

I trimmed the .. that tried to hide.
Feeds now sing in UTF‑8 light,
Encoded safe, escaped just right.
Roots held firm — I bound them tight. 🥕


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive PR description is vague and lacks required details. Template sections are incomplete: no specific explanation of changes, no actual linked issue number, and missing implementation details. Provide a concrete linked issue number, explain specific changes made to each file, and describe the security vulnerabilities being addressed with concrete examples.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title follows the conventional commit format with type 'refactor' and scope 'file', and accurately describes the main change: improving path handling and validation.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java (1)

107-107: ⚠️ Potential issue | 🟡 Minor

Inconsistent path validation: downloadBookFile lacks containment check.

Unlike downloadBook (lines 61-62) and downloadKoboBook (lines 261-263) which use FileUtils.requirePathWithinBase, downloadBookFile only normalizes the path without validating it stays within the library root. For consistency and defense-in-depth, consider adding the same containment check.

🛡️ Suggested fix for consistency
+            BookEntity bookEntity = bookFileEntity.getBook();
+            Path libraryRoot = Path.of(bookEntity.getLibraryPath().getPath());
-            Path file = bookFileEntity.getFullFilePath().toAbsolutePath().normalize();
+            Path file = FileUtils.requirePathWithinBase(bookFileEntity.getFullFilePath(), libraryRoot);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java`
at line 107, The downloadBookFile method currently only normalizes the path
returned by bookFileEntity.getFullFilePath() without verifying it remains inside
the library root; update downloadBookFile to call
FileUtils.requirePathWithinBase (same pattern used in downloadBook and
downloadKoboBook) after normalization to enforce containment within the library
root and throw/handle the same error if the check fails so path validation is
consistent across methods.
🧹 Nitpick comments (3)
booklore-api/src/main/java/org/booklore/util/FileUtils.java (1)

34-44: Consider consistent null handling across path utilities.

containsParentTraversal returns false for null input (line 36), while normalizeAbsolutePath throws IllegalArgumentException. This inconsistency could mask null inputs in call sites that check traversal before normalization.

♻️ Optional: throw for null to align with normalizeAbsolutePath
     public boolean containsParentTraversal(Path path) {
         if (path == null) {
-            return false;
+            throw new IllegalArgumentException("Path cannot be null");
         }
         for (Path part : path) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@booklore-api/src/main/java/org/booklore/util/FileUtils.java` around lines 34
- 44, containsParentTraversal currently returns false for a null Path but
normalizeAbsolutePath throws IllegalArgumentException for null; change
containsParentTraversal(Path path) to validate input the same way (throw
IllegalArgumentException when path is null) so callers get consistent null
handling. Update the method (containsParentTraversal) to check for null at the
start and throw IllegalArgumentException with a clear message, ensuring behavior
aligns with normalizeAbsolutePath.
booklore-api/src/main/java/org/booklore/controller/OpdsController.java (1)

140-146: Inconsistent response types across OPDS endpoints.

Only /catalog and /recent were updated to return byte[] with explicit UTF-8 encoding, while other endpoints (/, /libraries, /shelves, /magic-shelves, /authors, /series, /surprise) still return String. Since all endpoints declare charset=utf-8 in their content type, consider applying the same pattern consistently, or reverting to String if the explicit byte conversion isn't strictly necessary.

Also applies to: 151-157

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@booklore-api/src/main/java/org/booklore/controller/OpdsController.java`
around lines 140 - 146, The OPDS endpoints are inconsistent: getCatalog (and
recent) convert the feed String to byte[] with UTF-8 while other handlers (/,
/libraries, /shelves, /magic-shelves, /authors, /series, /surprise) still return
String; make them consistent by updating those handler methods to return
ResponseEntity<byte[]> and change their bodies to call
opdsFeedService.generateXFeed(...), convert the resulting String to byte[] using
StandardCharsets.UTF_8, and set the same
MediaType.parseMediaType(OPDS_ACQUISITION_MEDIA_TYPE) content type as in
getCatalog (or alternatively change getCatalog/recent back to returning
ResponseEntity<String> if you prefer Strings); pick one approach and apply it to
all methods (refer to getCatalog, opdsFeedService.generateCatalogFeed, and the
other endpoint handler method names) so all OPDS endpoints declare and deliver
UTF-8 consistently.
booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java (1)

352-354: Overly broad exception catch may mask bugs.

Catching RuntimeException could hide unexpected errors like NullPointerException. Consider catching only InvalidPathException which is what Path.of() throws for malformed paths.

♻️ Narrow exception type
+        } catch (java.nio.file.InvalidPathException e) {
-        } catch (RuntimeException e) {
             throw ApiError.GENERIC_BAD_REQUEST.createException("Invalid upload target path");
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java`
around lines 352 - 354, The catch block in FileUploadService that currently
catches RuntimeException around the Path.of(...) call should be narrowed to
catch java.nio.file.InvalidPathException only; replace "catch (RuntimeException
e)" with "catch (InvalidPathException e)" (importing it) and pass the caught
exception as the cause when creating the ApiError (e.g.,
ApiError.GENERIC_BAD_REQUEST.createException("Invalid upload target path", e))
so only malformed-path errors are handled while other runtime errors still
surface.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java`:
- Around line 341-355: The toSafeRelativePath method currently calls
Path.of(relativePath).normalize() before checking
FileUtils.containsParentTraversal, which defeats the traversal detection; update
to check for parent-traversal on the raw parsed Path elements before normalizing
(or inspect the original string for ".."), e.g., obtain Path parsed =
Path.of(relativePath) then verify parsed.isAbsolute() is false and iterate
parsed.getName(i) (or call FileUtils.containsParentTraversal(parsed) if it
inspects raw segments) to detect ".." segments, only then call
parsed.normalize() and return it; keep throwing
ApiError.GENERIC_BAD_REQUEST.createException("Invalid upload target path") on
any detection or exception.

In `@booklore-api/src/main/java/org/booklore/util/FileUtils.java`:
- Around line 70-73: The current check runs containsParentTraversal against
normalizedRelative which will hide parent-traversal like ".."; instead create a
raw Path from relativePath (e.g., rawRelative = Path.of(relativePath)), perform
the security checks against rawRelative (check rawRelative.isAbsolute() ||
containsParentTraversal(rawRelative)) and throw if invalid, then call
normalizedRelative = rawRelative.normalize() for further processing; update the
checks around containsParentTraversal, normalizedRelative, and relativePath in
FileUtils so traversal is detected before normalization.

In `@CHANGELOG.md`:
- Around line 1-11: The top-level release section "## [2.2.2] (2026-03-19)" is
placed above the document title; move that entire release block (the "##
[2.2.2]" heading and its following "### Bug Fixes" and "### Chores" entries) so
it appears below the main "# Changelog" heading instead, ensuring the file
starts with "# Changelog" and then lists releases like "## [2.2.2]" in
chronological order.

---

Outside diff comments:
In
`@booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java`:
- Line 107: The downloadBookFile method currently only normalizes the path
returned by bookFileEntity.getFullFilePath() without verifying it remains inside
the library root; update downloadBookFile to call
FileUtils.requirePathWithinBase (same pattern used in downloadBook and
downloadKoboBook) after normalization to enforce containment within the library
root and throw/handle the same error if the check fails so path validation is
consistent across methods.

---

Nitpick comments:
In `@booklore-api/src/main/java/org/booklore/controller/OpdsController.java`:
- Around line 140-146: The OPDS endpoints are inconsistent: getCatalog (and
recent) convert the feed String to byte[] with UTF-8 while other handlers (/,
/libraries, /shelves, /magic-shelves, /authors, /series, /surprise) still return
String; make them consistent by updating those handler methods to return
ResponseEntity<byte[]> and change their bodies to call
opdsFeedService.generateXFeed(...), convert the resulting String to byte[] using
StandardCharsets.UTF_8, and set the same
MediaType.parseMediaType(OPDS_ACQUISITION_MEDIA_TYPE) content type as in
getCatalog (or alternatively change getCatalog/recent back to returning
ResponseEntity<String> if you prefer Strings); pick one approach and apply it to
all methods (refer to getCatalog, opdsFeedService.generateCatalogFeed, and the
other endpoint handler method names) so all OPDS endpoints declare and deliver
UTF-8 consistently.

In
`@booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java`:
- Around line 352-354: The catch block in FileUploadService that currently
catches RuntimeException around the Path.of(...) call should be narrowed to
catch java.nio.file.InvalidPathException only; replace "catch (RuntimeException
e)" with "catch (InvalidPathException e)" (importing it) and pass the caught
exception as the cause when creating the ApiError (e.g.,
ApiError.GENERIC_BAD_REQUEST.createException("Invalid upload target path", e))
so only malformed-path errors are handled while other runtime errors still
surface.

In `@booklore-api/src/main/java/org/booklore/util/FileUtils.java`:
- Around line 34-44: containsParentTraversal currently returns false for a null
Path but normalizeAbsolutePath throws IllegalArgumentException for null; change
containsParentTraversal(Path path) to validate input the same way (throw
IllegalArgumentException when path is null) so callers get consistent null
handling. Update the method (containsParentTraversal) to check for null at the
start and throw IllegalArgumentException with a clear message, ensuring behavior
aligns with normalizeAbsolutePath.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6175d295-d43f-4e84-9675-66d0842549ab

📥 Commits

Reviewing files that changed from the base of the PR and between 6ef4448 and 79e0efd.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • booklore-api/src/main/java/org/booklore/controller/OpdsController.java
  • booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java
  • booklore-api/src/main/java/org/booklore/service/file/FileFingerprint.java
  • booklore-api/src/main/java/org/booklore/service/file/PathService.java
  • booklore-api/src/main/java/org/booklore/service/migration/migrations/MoveIconsToDataFolderMigration.java
  • booklore-api/src/main/java/org/booklore/service/opds/OpdsFeedService.java
  • booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java
  • booklore-api/src/main/java/org/booklore/util/FileUtils.java

@imajes imajes force-pushed the develop branch 2 times, most recently from 89113d4 to 37ca101 Compare March 20, 2026 22:20
# Conflicts:
#	booklore-api/src/main/java/org/booklore/service/file/FileFingerprint.java
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
booklore-api/src/main/java/org/booklore/service/file/FileFingerprint.java (1)

124-128: Consider reusing FileUtils.isAudioFile to reduce duplication.

This method duplicates FileUtils.isAudioFile (lines 235-239), and FileUtils is already imported.

♻️ Proposed refactor
-    private static boolean isAudioFile(String fileName) {
-        if (fileName == null) return false;
-        String lower = fileName.toLowerCase();
-        return lower.endsWith(".mp3") || lower.endsWith(".m4a") || lower.endsWith(".m4b") || lower.endsWith(".opus");
-    }

Then update line 63 to use FileUtils.isAudioFile(...) instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@booklore-api/src/main/java/org/booklore/service/file/FileFingerprint.java`
around lines 124 - 128, Replace the duplicated audio check in FileFingerprint by
delegating to the existing utility: remove or collapse the private static
boolean isAudioFile(String fileName) implementation and call
FileUtils.isAudioFile(fileName) wherever FileFingerprint.isAudioFile is
currently used; update callers (e.g., the place that currently invokes
FileFingerprint.isAudioFile) to use FileUtils.isAudioFile(...) directly so the
duplicated logic in FileFingerprint is eliminated and the shared FileUtils
implementation is reused.
booklore-api/src/main/java/org/booklore/util/FileUtils.java (1)

86-91: Consider walking up ancestors for real-path validation.

When the candidate path doesn't exist, only its immediate parent is checked. If the parent also doesn't exist, validation is skipped. For deeper defense-in-depth against symlink attacks on non-existent intermediate directories, consider walking up the tree to find the first existing ancestor.

♻️ Proposed enhancement
-            Path existingAnchor = Files.exists(normalizedCandidate)
-                    ? normalizedCandidate
-                    : normalizedCandidate.getParent();
-            if (existingAnchor == null || !Files.exists(existingAnchor)) {
-                return;
-            }
+            Path existingAnchor = normalizedCandidate;
+            while (existingAnchor != null && !Files.exists(existingAnchor)) {
+                existingAnchor = existingAnchor.getParent();
+            }
+            if (existingAnchor == null) {
+                return;
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@booklore-api/src/main/java/org/booklore/util/FileUtils.java` around lines 86
- 91, The current validation in FileUtils.java only checks the candidate or its
immediate parent; update the logic in the method that computes existingAnchor so
it walks up the ancestor chain from normalizedCandidate until it finds the first
existing ancestor (loop using Path.getParent()), set existingAnchor to that
path, and if none found return; ensure you still preserve normalization and use
Files.exists(existingAnchor) for the check (adjust the existingAnchor variable
and its use accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@booklore-api/src/main/java/org/booklore/service/file/FileFingerprint.java`:
- Around line 55-57: Remove the redundant existence check that follows
validateReadableFolderPath: delete the if-block that tests
Files.exists(normalizedFolderPath) || Files.isDirectory(normalizedFolderPath)
and its RuntimeException throw, because validateReadableFolderPath(folderPath)
already validates existence and directory-ness; keep using normalizedFolderPath
afterwards and rely on validateReadableFolderPath and its RuntimeException for
error handling (references: validateReadableFolderPath and
normalizedFolderPath).

---

Nitpick comments:
In `@booklore-api/src/main/java/org/booklore/service/file/FileFingerprint.java`:
- Around line 124-128: Replace the duplicated audio check in FileFingerprint by
delegating to the existing utility: remove or collapse the private static
boolean isAudioFile(String fileName) implementation and call
FileUtils.isAudioFile(fileName) wherever FileFingerprint.isAudioFile is
currently used; update callers (e.g., the place that currently invokes
FileFingerprint.isAudioFile) to use FileUtils.isAudioFile(...) directly so the
duplicated logic in FileFingerprint is eliminated and the shared FileUtils
implementation is reused.

In `@booklore-api/src/main/java/org/booklore/util/FileUtils.java`:
- Around line 86-91: The current validation in FileUtils.java only checks the
candidate or its immediate parent; update the logic in the method that computes
existingAnchor so it walks up the ancestor chain from normalizedCandidate until
it finds the first existing ancestor (loop using Path.getParent()), set
existingAnchor to that path, and if none found return; ensure you still preserve
normalization and use Files.exists(existingAnchor) for the check (adjust the
existingAnchor variable and its use accordingly).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9f454891-4df9-4d64-b644-76ecf6464bd3

📥 Commits

Reviewing files that changed from the base of the PR and between 79e0efd and 5e4e98d.

📒 Files selected for processing (3)
  • booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java
  • booklore-api/src/main/java/org/booklore/service/file/FileFingerprint.java
  • booklore-api/src/main/java/org/booklore/util/FileUtils.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java (1)

342-355: ⚠️ Potential issue | 🟠 Major

Check raw .. segments before normalize(), and reject empty results.

Path.of(relativePath).normalize() strips embedded parent segments before containsParentTraversal(...) runs, so inputs like foo/../bar are accepted. It can also collapse dir/.. to an empty path, which makes the later safeRelativePath.getFileName().toString() dereference at Line 133 / Line 146 blow up. Parse first, validate the raw segments, then normalize and require at least one name element.

🛠️ Proposed fix
     private Path toSafeRelativePath(String relativePath) {
         if (!StringUtils.hasText(relativePath)) {
             throw ApiError.GENERIC_BAD_REQUEST.createException("Invalid upload target path");
         }

         try {
-            Path parsed = Path.of(relativePath).normalize();
-            if (parsed.isAbsolute() || FileUtils.containsParentTraversal(parsed)) {
+            Path parsed = Path.of(relativePath);
+            if (parsed.isAbsolute() || FileUtils.containsParentTraversal(parsed)) {
                 throw ApiError.GENERIC_BAD_REQUEST.createException("Invalid upload target path");
             }
-            return parsed;
+            Path normalized = parsed.normalize();
+            if (normalized.getNameCount() == 0) {
+                throw ApiError.GENERIC_BAD_REQUEST.createException("Invalid upload target path");
+            }
+            return normalized;
         } catch (RuntimeException e) {
             throw ApiError.GENERIC_BAD_REQUEST.createException("Invalid upload target path");
         }
     }

If containsParentTraversal(...) is a raw-segment check, the current ordering still misses embedded traversals. Please verify it with:

#!/bin/bash
set -euo pipefail

echo "== FileUploadService.toSafeRelativePath =="
sed -n '342,375p' booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java

echo
echo "== FileUtils.containsParentTraversal =="
rg -n -C4 'containsParentTraversal' booklore-api/src/main/java/org/booklore/util/FileUtils.java
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java`
around lines 342 - 355, The toSafeRelativePath method currently normalizes the
input before checking for parent-traversal which lets inputs like "foo/../bar"
slip through and can collapse to an empty path; update to first parse the raw
Path (e.g., Path.of(relativePath) or use relativePath.split(File.separator) to
inspect raw name elements), run FileUtils.containsParentTraversal against the
raw segments (or the raw Path) to reject any ".." segments before calling
normalize(), then normalize and verify the resulting Path has at least one name
element (reject empty results) and is not absolute; keep existing
ApiError.GENERIC_BAD_REQUEST exceptions for invalid inputs and preserve the
try/catch around parsing to handle runtime errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@booklore-api/src/main/java/org/booklore/service/opds/OpdsFeedService.java`:
- Around line 480-495: buildPaginationUrl currently re-emits every repeated
query value from PAGINATION_QUERY_WHITELIST even though generateCatalogFeed only
uses a single value per param; update buildPaginationUrl to canonicalize by
taking only the first non-blank value for each key (e.g. use
request.getParameter(key) or values[0] after skipping null/blank entries)
instead of iterating all values, then URL-encode and append that single
key=value pair; reference PAGINATION_QUERY_WHITELIST, buildPaginationUrl, and
generateCatalogFeed when locating and changing the logic.
- Line 713: The code path in OpdsFeedService that returns the escaped XML string
currently calls StringEscapeUtils.escapeXml11 (return input == null ? "" :
StringEscapeUtils.escapeXml11(input)); replace that call with
StringEscapeUtils.escapeXml10 to match the XML 1.0 declaration used by the
feeds; keep the same null-to-empty handling and update the call site in the
helper method in OpdsFeedService so all output conforms to XML 1.0 escaping
rules.

In
`@booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java`:
- Around line 299-304: FileUtils.requirePathWithinBase() currently allows broken
symlinks to bypass root checks; update validateRealPathAnchorWithinBase() so it
validates symlink targets even when they don't exist: if path is a symbolic link
(Files.isSymbolicLink(path)), resolve the link target with
Files.readSymbolicLink(path) and resolve that target against path.getParent() to
produce an absolute normalized candidate, then compare that candidate against
the expectedRoot (use expectedRoot.toRealPath() or normalize both sides) to
ensure it is contained within the base; for non-symlink or existing targets
continue using toRealPath() as before. Ensure the same validation logic is
applied for both existing and broken symlinks so callers of
FileUtils.requirePathWithinBase() (and callers like moveFileToFinalLocation)
cannot escape expectedRoot via a symlink.

---

Duplicate comments:
In
`@booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java`:
- Around line 342-355: The toSafeRelativePath method currently normalizes the
input before checking for parent-traversal which lets inputs like "foo/../bar"
slip through and can collapse to an empty path; update to first parse the raw
Path (e.g., Path.of(relativePath) or use relativePath.split(File.separator) to
inspect raw name elements), run FileUtils.containsParentTraversal against the
raw segments (or the raw Path) to reject any ".." segments before calling
normalize(), then normalize and verify the resulting Path has at least one name
element (reject empty results) and is not absolute; keep existing
ApiError.GENERIC_BAD_REQUEST exceptions for invalid inputs and preserve the
try/catch around parsing to handle runtime errors.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 36933a6e-acf0-4b1d-ba2c-b9cc5d97be17

📥 Commits

Reviewing files that changed from the base of the PR and between 5e4e98d and 6b1f38d.

📒 Files selected for processing (3)
  • booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java
  • booklore-api/src/main/java/org/booklore/service/opds/OpdsFeedService.java
  • booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java
✅ Files skipped from review due to trivial changes (1)
  • booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java

Comment on lines +480 to +495
for (String key : PAGINATION_QUERY_WHITELIST) {
String[] values = request.getParameterValues(key);
if (values == null) {
continue;
}

for (String value : values) {
if (value == null || value.isBlank()) {
continue;
}

result.append(URLEncoder.encode(key, StandardCharsets.UTF_8))
.append("=")
.append(URLEncoder.encode(value, StandardCharsets.UTF_8))
.append("&");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Canonicalize pagination links to the same filter shape the endpoint executes.

generateCatalogFeed() only consumes one value for each of these params, but buildPaginationUrl() re-emits every repeated value. A request like author=A&author=B will return page data for the first value while rel=self and rel=next advertise both, so the feed URL stops being a canonical representation of the executed query.

🔧 Proposed fix
         for (String key : PAGINATION_QUERY_WHITELIST) {
-            String[] values = request.getParameterValues(key);
-            if (values == null) {
-                continue;
-            }
-
-            for (String value : values) {
-                if (value == null || value.isBlank()) {
-                    continue;
-                }
-
-                result.append(URLEncoder.encode(key, StandardCharsets.UTF_8))
-                        .append("=")
-                        .append(URLEncoder.encode(value, StandardCharsets.UTF_8))
-                        .append("&");
-            }
+            String value = request.getParameter(key);
+            if (value == null || value.isBlank()) {
+                continue;
+            }
+
+            result.append(URLEncoder.encode(key, StandardCharsets.UTF_8))
+                    .append("=")
+                    .append(URLEncoder.encode(value, StandardCharsets.UTF_8))
+                    .append("&");
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@booklore-api/src/main/java/org/booklore/service/opds/OpdsFeedService.java`
around lines 480 - 495, buildPaginationUrl currently re-emits every repeated
query value from PAGINATION_QUERY_WHITELIST even though generateCatalogFeed only
uses a single value per param; update buildPaginationUrl to canonicalize by
taking only the first non-blank value for each key (e.g. use
request.getParameter(key) or values[0] after skipping null/blank entries)
instead of iterating all values, then URL-encode and append that single
key=value pair; reference PAGINATION_QUERY_WHITELIST, buildPaginationUrl, and
generateCatalogFeed when locating and changing the logic.

.replace(">", "&gt;")
.replace("\"", "&quot;")
.replace("'", "&apos;");
return input == null ? "" : StringEscapeUtils.escapeXml11(input);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does Apache Commons Text StringEscapeUtils.escapeXml11 generate output that is valid in XML 1.0 documents, and how does it differ from escapeXml10?

💡 Result:

No. StringEscapeUtils.escapeXml11() is intended for XML 1.1 output and should not be used for XML 1.0 documents; Commons Text explicitly says its result “can be inserted into a valid XML 1.1 document” and “Do not use it for XML 1.0 documents.” [1]

How it differs from escapeXml10():

  • Allowed/removed characters

    • escapeXml10() (XML 1.0 rules) removes characters outside XML 1.0’s allowed character ranges (it cannot represent most control characters even via escaping). It returns text that “can be inserted into a valid XML 1.0 or XML 1.1 document.” [1]
    • escapeXml11() (XML 1.1 rules) removes only what XML 1.1 still forbids (notably the null byte and unpaired surrogates), and it allows additional control characters by escaping them. [1]
  • What gets escaped

    • escapeXml10() additionally escapes some C1 control-range characters ([#x7F-#x84] | [#x86-#x9F]) “though not strictly necessary.” [1]
    • escapeXml11() escapes a broader set of control characters ([#x1-#x8] | [#xB-#xC] | [#xE-#x1F] plus the same C1 ranges). [1]

Bottom line: Use escapeXml10() for XML 1.0; use escapeXml11() only when you are generating XML 1.1. [1]

[1] https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/StringEscapeUtils.html


🏁 Script executed:

# First, let's find the OpdsFeedService.java file
find . -name "OpdsFeedService.java" -type f

Repository: grimmory-tools/grimmory

Length of output: 142


🏁 Script executed:

# Once we find it, we'll check:
# 1. XML declaration/version
# 2. All uses of StringEscapeUtils.escapeXml* methods
cat booklore-api/src/main/java/org/booklore/service/opds/OpdsFeedService.java | grep -n -i "xml" | head -30

Repository: grimmory-tools/grimmory

Length of output: 3651


🏁 Script executed:

# Also check for escapeXml* usage across the entire file
grep -n "escapeXml" booklore-api/src/main/java/org/booklore/service/opds/OpdsFeedService.java

Repository: grimmory-tools/grimmory

Length of output: 3195


Use the XML 1.0 escaper here.

All feeds in this class declare XML 1.0 (e.g., line 46: <?xml version="1.0" encoding="UTF-8"?>), but escapeXml11() is intended for XML 1.1 only and emits character references that violate XML 1.0 spec. Replace with escapeXml10().

🔧 Proposed fix
-        return input == null ? "" : StringEscapeUtils.escapeXml11(input);
+        return input == null ? "" : StringEscapeUtils.escapeXml10(input);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return input == null ? "" : StringEscapeUtils.escapeXml11(input);
return input == null ? "" : StringEscapeUtils.escapeXml10(input);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@booklore-api/src/main/java/org/booklore/service/opds/OpdsFeedService.java` at
line 713, The code path in OpdsFeedService that returns the escaped XML string
currently calls StringEscapeUtils.escapeXml11 (return input == null ? "" :
StringEscapeUtils.escapeXml11(input)); replace that call with
StringEscapeUtils.escapeXml10 to match the XML 1.0 declaration used by the
feeds; keep the same null-to-empty handling and update the call site in the
helper method in OpdsFeedService so all output conforms to XML 1.0 escaping
rules.

Comment on lines +299 to +304
private void moveFileToFinalLocation(Path sourcePath, Path targetPath, Path expectedRoot) throws IOException {
Path safeSourcePath = requirePathWithinSystemTemp(sourcePath);
Path safeTargetPath = requirePathWithinRoot(targetPath, expectedRoot, "Invalid upload target path");

Files.createDirectories(safeTargetPath.getParent());
Files.move(safeSourcePath, safeTargetPath);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== moveFileToFinalLocation / requirePathWithinRoot =="
sed -n '299,305p;377,388p' booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java

echo
echo "== FileUtils.requirePathWithinBase =="
rg -n -C8 'requirePathWithinBase|normalizeAbsolutePath' booklore-api/src/main/java/org/booklore/util/FileUtils.java

Repository: grimmory-tools/grimmory

Length of output: 3690


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Full FileUtils.requirePathWithinBase and validateRealPathAnchorWithinBase =="
sed -n '46,100p' booklore-api/src/main/java/org/booklore/util/FileUtils.java

Repository: grimmory-tools/grimmory

Length of output: 2563


Fix symlink validation for non-existent targets in FileUtils.requirePathWithinBase().

The current implementation of validateRealPathAnchorWithinBase() correctly uses toRealPath() to resolve symlinks for paths that exist, but it has a gap: if a symlink's target does not yet exist (broken symlink), the validation falls back to checking only the parent directory. This allows a symlink pointing outside expectedRoot to pass validation and potentially route Files.move() outside the intended base path.

For example, a symlink /expectedRoot/link/outside/somewhere with a non-existent target would pass the check (parent /expectedRoot is within base), but the move operation would follow it to /outside/somewhere.

The fix should be applied in FileUtils.requirePathWithinBase() (specifically in validateRealPathAnchorWithinBase()) to ensure all symlink paths—including broken ones—are validated against their real or intended destinations, so all callers get consistent protection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/main/java/org/booklore/service/upload/FileUploadService.java`
around lines 299 - 304, FileUtils.requirePathWithinBase() currently allows
broken symlinks to bypass root checks; update validateRealPathAnchorWithinBase()
so it validates symlink targets even when they don't exist: if path is a
symbolic link (Files.isSymbolicLink(path)), resolve the link target with
Files.readSymbolicLink(path) and resolve that target against path.getParent() to
produce an absolute normalized candidate, then compare that candidate against
the expectedRoot (use expectedRoot.toRealPath() or normalize both sides) to
ensure it is contained within the base; for non-symlink or existing targets
continue using toRealPath() as before. Ensure the same validation logic is
applied for both existing and broken symlinks so callers of
FileUtils.requirePathWithinBase() (and callers like moveFileToFinalLocation)
cannot escape expectedRoot via a symlink.

# Conflicts:
#	booklore-api/src/main/java/org/booklore/service/book/BookDownloadService.java
#	booklore-api/src/main/java/org/booklore/util/FileUtils.java
if (Files.exists(finalPath)) {
private void validateFinalPath(Path finalPath, Path expectedRoot) {
Path safeFinalPath = requirePathWithinRoot(finalPath, expectedRoot, "Invalid upload target path");
if (Files.exists(safeFinalPath)) {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
Path safeSourcePath = requirePathWithinSystemTemp(sourcePath);
Path safeTargetPath = requirePathWithinRoot(targetPath, expectedRoot, "Invalid upload target path");

Files.createDirectories(safeTargetPath.getParent());

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
Path safeTargetPath = requirePathWithinRoot(targetPath, expectedRoot, "Invalid upload target path");

Files.createDirectories(safeTargetPath.getParent());
Files.move(safeSourcePath, safeTargetPath);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
}

Path realBase = normalizedBase.toRealPath();
Path existingAnchor = Files.exists(normalizedCandidate)

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
Path existingAnchor = Files.exists(normalizedCandidate)
? normalizedCandidate
: normalizedCandidate.getParent();
if (existingAnchor == null || !Files.exists(existingAnchor)) {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
booklore-api/src/main/java/org/booklore/util/FileUtils.java (1)

69-72: ⚠️ Potential issue | 🟠 Major

Check relativePath before calling normalize().

Path.normalize() collapses inputs like subdir/../file.epub to file.epub, so Line 71 only inspects the already-normalized path and no longer enforces the “reject traversal segments” contract. Validate the raw Path first, then normalize it after the guard.

🐛 Proposed fix
-        Path normalizedBase = normalizeAbsolutePath(basePath);
-        Path normalizedRelative = Path.of(relativePath).normalize();
-        if (normalizedRelative.isAbsolute() || containsParentTraversal(normalizedRelative)) {
+        Path normalizedBase = normalizeAbsolutePath(basePath);
+        Path rawRelative = Path.of(relativePath);
+        if (rawRelative.isAbsolute() || containsParentTraversal(rawRelative)) {
             throw new IllegalArgumentException("Relative path contains invalid traversal or absolute segments");
         }
+        Path normalizedRelative = rawRelative.normalize();
 
         Path resolved = normalizedBase.resolve(normalizedRelative).normalize();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@booklore-api/src/main/java/org/booklore/util/FileUtils.java` around lines 69
- 72, The current code calls Path.of(relativePath).normalize() before validating
traversal, which lets constructs like "subdir/../file.epub" bypass
containsParentTraversal; change the order in the method that builds
normalizedBase/normalizedRelative (use Path rawRelative =
Path.of(relativePath);) and run the guards against rawRelative (check
rawRelative.isAbsolute() and containsParentTraversal(rawRelative)) first, then
set normalizedRelative = rawRelative.normalize(); keep normalizedBase =
normalizeAbsolutePath(basePath) and reuse the existing
containsParentTraversal(...) helper for the pre-normalization check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@booklore-api/src/main/java/org/booklore/util/FileUtils.java`:
- Around line 69-72: The current code calls Path.of(relativePath).normalize()
before validating traversal, which lets constructs like "subdir/../file.epub"
bypass containsParentTraversal; change the order in the method that builds
normalizedBase/normalizedRelative (use Path rawRelative =
Path.of(relativePath);) and run the guards against rawRelative (check
rawRelative.isAbsolute() and containsParentTraversal(rawRelative)) first, then
set normalizedRelative = rawRelative.normalize(); keep normalizedBase =
normalizeAbsolutePath(basePath) and reuse the existing
containsParentTraversal(...) helper for the pre-normalization check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2e6fd6dc-762c-4b61-bae9-7f7330fe36e3

📥 Commits

Reviewing files that changed from the base of the PR and between 6b1f38d and 2235404.

📒 Files selected for processing (1)
  • booklore-api/src/main/java/org/booklore/util/FileUtils.java
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test Suite / Backend Tests
  • GitHub Check: Analyze (java-kotlin)
🧰 Additional context used
📓 Path-based instructions (1)
booklore-api/src/**/*.java

📄 CodeRabbit inference engine (AGENTS.md)

booklore-api/src/**/*.java: Use 4-space indentation in Java files and match surrounding Java style
Prefer constructor injection via Lombok patterns already used in the codebase. Do not introduce @Autowired field injection
Use MapStruct for entity/DTO mapping

Files:

  • booklore-api/src/main/java/org/booklore/util/FileUtils.java
🧠 Learnings (1)
📚 Learning: 2026-03-23T18:36:05.843Z
Learnt from: imnotjames
Repo: grimmory-tools/grimmory PR: 146
File: booklore-api/src/main/java/org/booklore/util/FileService.java:151-168
Timestamp: 2026-03-23T18:36:05.843Z
Learning: In `booklore-api/src/main/java/org/booklore/util/FileService.java`, the `"bin"` search path in `getSystemSearchPath()` is intentionally left as a relative path (not derived from `appProperties.getPathConfig()`), so binary lookup resolves relative to the process working directory. Do not flag this as an issue.

Applied to files:

  • booklore-api/src/main/java/org/booklore/util/FileUtils.java
🔇 Additional comments (1)
booklore-api/src/main/java/org/booklore/util/FileUtils.java (1)

86-95: [Your rewritten review comment text here]
[Exactly ONE classification tag]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants