Skip to content

Conversation

@KashiwalHarsh
Copy link

@KashiwalHarsh KashiwalHarsh commented Jan 16, 2026

Summary by CodeRabbit

  • New Features

    • Runtime file-type validation against UI-specified allowed types before upload.
    • UI spec parsing added so accepted file types are honored during uploads.
  • Bug Fixes

    • Clearer error handling for invalid file types and attempts to re-upload existing files.
  • Tests

    • Updated tests to cover UI spec handling and validation behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: KashiwalHarsh <harsh.kashiwal@infosys.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

Walkthrough

Adds runtime file-type validation to the upload flow: the service reads uploaded bytes, retrieves a UI spec to determine allowed MIME types, detects the file MIME with Apache Tika, and rejects uploads with disallowed types. Also adds a UiSpec utility, two error constants, a Tika dependency, and a test update.

Changes

Cohort / File(s) Summary
Upload validation logic
signup-service/src/main/java/io/mosip/signup/services/RegistrationService.java
Added runtime file-type validation: read file bytes, obtain UI spec, determine allowed MIME types, detect actual MIME via Apache Tika, and throw INVALID_FILE_TYPE when detected type is not allowed. Introduced a Tika instance and related imports; validation performed before existing Base64/storage flow.
Error constants
signup-service/src/main/java/io/mosip/signup/util/ErrorConstants.java
Added INVALID_FILE_TYPE and FILE_ALREADY_EXISTS string constants.
UI spec utility
signup-service/src/main/java/io/mosip/signup/util/UiSpecUtils.java
New utility class with public static Set<String> findAcceptedFileTypes(JsonNode root) to extract accepted file types from a UI spec JSON.
Build dependency
signup-service/pom.xml
Added Maven dependency org.apache.tika:tika-core:2.9.2.
Tests
signup-service/src/test/java/io/mosip/signup/services/RegistrationServiceTest.java
Test updated to set up a UI spec JSON and to declare throws JsonProcessingException; verifies UISpec retrieval in the test.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Service as RegistrationService
  participant Plugin as ProfileRegistryPlugin
  participant UiUtil as UiSpecUtils
  participant Tika as ApacheTika

  Client->>Service: POST /upload (file)
  Service->>Service: read bytes from uploaded file
  Service->>Plugin: getUISpecification()
  Plugin-->>Service: UI spec JSON
  Service->>UiUtil: findAcceptedFileTypes(UI spec)
  UiUtil-->>Service: Set of allowed MIME types
  Service->>Tika: detect(file bytes)
  Tika-->>Service: detected MIME type
  alt detected in allowed set
    Service->>Service: continue existing Base64 encoding & storage flow
    Service-->>Client: 200 OK / success
  else detected not allowed
    Service-->>Client: 4xx INVALID_FILE_TYPE
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • ase-101

Poem

🐰 I hopped in bytes and sniffed the type,
Tika’s nose kept uploads right.
UI specs told me what to cheer,
I nibbled bad files—kept the rest near. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: adding runtime file type validation and error handling for file uploads in the signup service.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@signup-service/src/main/java/io/mosip/signup/services/RegistrationService.java`:
- Around line 333-337: The current check in RegistrationService uses
file.getContentType() which can be spoofed; instead implement server-side
content validation by reading the file's initial bytes (from the MultipartFile
input stream) to detect magic signatures for allowed types (JPEG: FF D8 FF, PNG:
89 50 4E 47, GIF: 47 49 46) and only accept those exact types (reject
image/svg+xml and any other image/*). Replace or augment the existing
contentType check on the MultipartFile variable file with: (1) a small helper
(e.g., isAllowedImageByMagicBytes(InputStream) or detectImageType(byte[])) that
reads the first 8 bytes, identifies PNG/JPEG/GIF signatures, and returns the
canonical MIME (image/png, image/jpeg, image/gif); (2) validate the detected
MIME against an allowlist and log the detected signature/MIME before throwing
ErrorConstants.INVALID_FILE_TYPE; and (3) ensure the MultipartFile stream is
handled safely (reset/close) and that the rest of the code uses the trusted
detected MIME rather than file.getContentType().
🧹 Nitpick comments (1)
signup-service/src/main/java/io/mosip/signup/services/RegistrationService.java (1)

341-354: Consider hashing raw bytes and caching the hash for efficiency.

Two observations:

  1. Hash computation on Base64 string: Computing the hash on the Base64-encoded string (line 342) works but is less efficient than hashing the raw bytes directly, as Base64 encoding increases size by ~33%.

  2. Re-computing existing file hash: The hash of the existing file is recalculated on every upload (line 348). For large files, storing the hash alongside the file in RegistrationFiles would improve performance.

♻️ Suggested improvement
-            String newFileBase64 = Base64.getEncoder().encodeToString(file.getBytes());
-            String newFileHash = IdentityProviderUtil.generateB64EncodedHash(IdentityProviderUtil.ALGO_SHA3_256, newFileBase64);
+            byte[] fileBytes = file.getBytes();
+            String newFileHash = IdentityProviderUtil.generateB64EncodedHash(IdentityProviderUtil.ALGO_SHA3_256, new String(fileBytes));
+            String newFileBase64 = Base64.getEncoder().encodeToString(fileBytes);

Additionally, consider extending RegistrationFiles to store hashes alongside file content to avoid recomputation.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea965f and 3d5a3f1.

📒 Files selected for processing (2)
  • signup-service/src/main/java/io/mosip/signup/services/RegistrationService.java
  • signup-service/src/main/java/io/mosip/signup/util/ErrorConstants.java
🧰 Additional context used
🧬 Code graph analysis (1)
signup-service/src/main/java/io/mosip/signup/services/RegistrationService.java (1)
signup-service/src/main/java/io/mosip/signup/util/ErrorConstants.java (1)
  • ErrorConstants (8-67)
🔇 Additional comments (2)
signup-service/src/main/java/io/mosip/signup/util/ErrorConstants.java (1)

65-66: LGTM!

The new error constants follow the existing naming conventions and are appropriately placed with other upload-related constants.

signup-service/src/main/java/io/mosip/signup/services/RegistrationService.java (1)

357-371: LGTM on the file storage and error handling logic.

The null-safe initialization of RegistrationFiles and uploadedFiles map is correctly handled. The exception flow properly distinguishes between validation errors (which propagate as SignUpException) and I/O errors (which are caught and re-thrown as UPLOAD_FAILED).

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Signed-off-by: KashiwalHarsh <kashiwalharsh1234@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@signup-service/pom.xml`:
- Around line 83-87: Dependency org.apache.tika:tika-core currently pinned to
version 2.9.2 which is vulnerable; update the dependency declaration for
artifactId "tika-core" (groupId "org.apache.tika") to use at least version 3.2.2
(preferably 3.2.3) in the pom.xml, run mvn -U dependency:resolve or update
project to verify the new version is picked, and run the build/tests to ensure
compatibility after the upgrade.

In `@signup-service/src/main/java/io/mosip/signup/util/UiSpecUtils.java`:
- Around line 17-27: In UiSpecUtils, the array branch that iterates over
acceptedFileTypes can add empty or whitespace-only MIME entries to result;
update the loop that handles acceptedFileTypes.isArray() (inside the method
where result is populated) to trim each type string (use
typeNode.asText().trim()) and only add it when non-empty (e.g., check
!trimmed.isEmpty()) so the array branch mirrors the comma-separated branch's
filtering behavior.
🧹 Nitpick comments (1)
signup-service/src/main/java/io/mosip/signup/util/UiSpecUtils.java (1)

5-6: Make the utility class non-instantiable.
Consider marking it final with a private constructor to prevent accidental instantiation.

♻️ Proposed refactor
-public class UiSpecUtils {
+public final class UiSpecUtils {
+    private UiSpecUtils() {
+    }
     public static Set<String> findAcceptedFileTypes(JsonNode root) {

Comment on lines +83 to +87
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>2.9.2</version>
</dependency>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check latest version on Maven Central
echo "=== Checking Maven Central for tika-core versions ==="
curl -s https://repo1.maven.org/maven2/org/apache/tika/tika-core/maven-metadata.xml | python3 <<'PY'
import sys, xml.etree.ElementTree as ET
try:
    root = ET.fromstring(sys.stdin.read())
    # Register namespace
    ns = {'mvn': 'http://maven.apache.org/metadata/1.0.0'}
    
    # Try with namespace first
    latest = root.findtext('.//{http://maven.apache.org/metadata/1.0.0}latest')
    release = root.findtext('.//{http://maven.apache.org/metadata/1.0.0}release')
    
    # Fallback to without namespace
    if not latest:
        latest = root.findtext('./versioning/latest')
    if not release:
        release = root.findtext('./versioning/release')
    
    print(f"Latest: {latest}")
    print(f"Release: {release}")
    
    # Get all versions
    versions = []
    for v in root.findall('.//{http://maven.apache.org/metadata/1.0.0}version'):
        if v.text:
            versions.append(v.text)
    
    if not versions:
        for v in root.findall('./versioning/versions/version'):
            if v.text:
                versions.append(v.text)
    
    # Sort and show recent versions
    versions_sorted = sorted(versions, key=lambda x: tuple(map(int, x.split('.')[:3])))
    recent = versions_sorted[-10:] if len(versions_sorted) > 10 else versions_sorted
    print(f"Recent versions: {recent}")
    print(f"Contains 2.9.2: {'2.9.2' in versions}")
except Exception as e:
    print(f"Error: {e}")
PY

# Check OSV for known vulnerabilities
echo -e "\n=== Checking OSV database for vulnerabilities ==="
curl -s https://api.osv.dev/v1/query -X POST -H "Content-Type: application/json" \
  -d '{"package":{"ecosystem":"Maven","name":"org.apache.tika:tika-core"},"version":"2.9.2"}' | python3 <<'PY'
import json, sys
try:
    data = json.load(sys.stdin)
    vulns = data.get("vulns", [])
    if vulns:
        print(f"Found {len(vulns)} vulnerabilities affecting 2.9.2:")
        for v in vulns:
            print(f"  - {v.get('id')}: {v.get('summary', 'No summary')}")
    else:
        print("No vulnerabilities found for 2.9.2")
except Exception as e:
    print(f"Error: {e}")
PY

Repository: mosip/esignet-signup

Length of output: 150


🌐 Web query:

tika-core 2.9.2 CVE vulnerabilities security advisory

💡 Result:

Apache Tika tika-core 2.9.2 (released Apr 2, 2024) is in the affected version range for a critical XXE issue related to parsing XFA inside PDFs. [1][2]

Security advisories / CVEs impacting org.apache.tika:tika-core 2.9.2

  • CVE-2025-66516 (Critical XXE / CWE-611)
    Affected: tika-core 1.13 through 3.2.1 (so 2.9.2 is affected)
    Fixed: upgrade tika-core to >= 3.2.2 (or newer). [2][3]
  • CVE-2025-54988 (XXE when parsing XFA via PDFParser)
    Tika project’s security page lists this as affecting 1.13–3.2.1. (This is the original report; CVE-2025-66516 expands/clarifies scope, emphasizing the fix is in tika-core.) [4][3]

What to do

  • Upgrade: move off 2.9.2 to Apache Tika >= 3.2.2 (latest available is newer). [1][2]
  • If you can’t upgrade immediately: avoid/untrust PDF parsing paths involving XFA, and ensure XML parsing is configured to disallow external entities (defense-in-depth). [2][4]

Sources:
[1] Maven Repository – org.apache.tika:tika-core:2.9.2 (release date + vulnerability flagging)
[2] NVD – CVE-2025-66516 (affected versions + fix)
[3] oss-security post expanding scope to tika-core and advising tika-core >= 3.2.2
[4] Apache Tika Security page (CVE-2025-54988 affected versions)


🌐 Web query:

Apache tika-core latest version 2024 2025

💡 Result:

Apache Tika tika-core latest versions (2024–2025):

Last 2.x line (Java 8 era):

Maven coordinates:

<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-core</artifactId>
  <version>3.2.3</version>
</dependency>

Citations:


🌐 Web query:

org.apache.tika:tika-core Maven Central latest release

💡 Result:

Latest release of org.apache.tika:tika-core on Maven Central is 3.2.3 (released Sep 11, 2025). [1][2]

<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-core</artifactId>
  <version>3.2.3</version>
</dependency>

Upgrade tika-core to >= 3.2.2 (latest: 3.2.3) to fix critical XXE vulnerability.

Version 2.9.2 is affected by CVE-2025-66516 (Critical XXE/CWE-611) when parsing XFA inside PDFs. This vulnerability affects all tika-core versions 1.13–3.2.1 and is fixed in 3.2.2 and later. The latest available version is 3.2.3 (released Sep 11, 2025).

🤖 Prompt for AI Agents
In `@signup-service/pom.xml` around lines 83 - 87, Dependency
org.apache.tika:tika-core currently pinned to version 2.9.2 which is vulnerable;
update the dependency declaration for artifactId "tika-core" (groupId
"org.apache.tika") to use at least version 3.2.2 (preferably 3.2.3) in the
pom.xml, run mvn -U dependency:resolve or update project to verify the new
version is picked, and run the build/tests to ensure compatibility after the
upgrade.

Comment on lines +17 to +27
if (acceptedFileTypes.isArray()) {
for (JsonNode typeNode : acceptedFileTypes) {
if (typeNode.isTextual()) {
result.add(typeNode.asText().trim());
}
}
} else if (acceptedFileTypes.isTextual()) {
Arrays.stream(acceptedFileTypes.asText().split(","))
.map(String::trim)
.filter(s -> !s.isEmpty())
.forEach(result::add);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Filter out empty MIME entries in array-form acceptedFileTypes.
Empty/whitespace-only entries can slip into the set in the array branch, which is inconsistent with the comma-separated branch and may cause confusing allowlist behavior.

🐛 Suggested fix
             if (acceptedFileTypes.isArray()) {
                 for (JsonNode typeNode : acceptedFileTypes) {
                     if (typeNode.isTextual()) {
-                        result.add(typeNode.asText().trim());
+                        String type = typeNode.asText().trim();
+                        if (!type.isEmpty()) {
+                            result.add(type);
+                        }
                     }
                 }
             } else if (acceptedFileTypes.isTextual()) {
🤖 Prompt for AI Agents
In `@signup-service/src/main/java/io/mosip/signup/util/UiSpecUtils.java` around
lines 17 - 27, In UiSpecUtils, the array branch that iterates over
acceptedFileTypes can add empty or whitespace-only MIME entries to result;
update the loop that handles acceptedFileTypes.isArray() (inside the method
where result is populated) to trim each type string (use
typeNode.asText().trim()) and only add it when non-empty (e.g., check
!trimmed.isEmpty()) so the array branch mirrors the comma-separated branch's
filtering behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant