-
Notifications
You must be signed in to change notification settings - Fork 60
[ES-2641] [ES-2646] Resolved the file upload bugs #838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Signed-off-by: KashiwalHarsh <harsh.kashiwal@infosys.com>
WalkthroughAdds runtime file-type validation to the upload flow: the service reads uploaded bytes, retrieves a UI spec to determine allowed MIME types, detects the file MIME with Apache Tika, and rejects uploads with disallowed types. Also adds a UiSpec utility, two error constants, a Tika dependency, and a test update. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Service as RegistrationService
participant Plugin as ProfileRegistryPlugin
participant UiUtil as UiSpecUtils
participant Tika as ApacheTika
Client->>Service: POST /upload (file)
Service->>Service: read bytes from uploaded file
Service->>Plugin: getUISpecification()
Plugin-->>Service: UI spec JSON
Service->>UiUtil: findAcceptedFileTypes(UI spec)
UiUtil-->>Service: Set of allowed MIME types
Service->>Tika: detect(file bytes)
Tika-->>Service: detected MIME type
alt detected in allowed set
Service->>Service: continue existing Base64 encoding & storage flow
Service-->>Client: 200 OK / success
else detected not allowed
Service-->>Client: 4xx INVALID_FILE_TYPE
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@signup-service/src/main/java/io/mosip/signup/services/RegistrationService.java`:
- Around line 333-337: The current check in RegistrationService uses
file.getContentType() which can be spoofed; instead implement server-side
content validation by reading the file's initial bytes (from the MultipartFile
input stream) to detect magic signatures for allowed types (JPEG: FF D8 FF, PNG:
89 50 4E 47, GIF: 47 49 46) and only accept those exact types (reject
image/svg+xml and any other image/*). Replace or augment the existing
contentType check on the MultipartFile variable file with: (1) a small helper
(e.g., isAllowedImageByMagicBytes(InputStream) or detectImageType(byte[])) that
reads the first 8 bytes, identifies PNG/JPEG/GIF signatures, and returns the
canonical MIME (image/png, image/jpeg, image/gif); (2) validate the detected
MIME against an allowlist and log the detected signature/MIME before throwing
ErrorConstants.INVALID_FILE_TYPE; and (3) ensure the MultipartFile stream is
handled safely (reset/close) and that the rest of the code uses the trusted
detected MIME rather than file.getContentType().
🧹 Nitpick comments (1)
signup-service/src/main/java/io/mosip/signup/services/RegistrationService.java (1)
341-354: Consider hashing raw bytes and caching the hash for efficiency.Two observations:
Hash computation on Base64 string: Computing the hash on the Base64-encoded string (line 342) works but is less efficient than hashing the raw bytes directly, as Base64 encoding increases size by ~33%.
Re-computing existing file hash: The hash of the existing file is recalculated on every upload (line 348). For large files, storing the hash alongside the file in
RegistrationFileswould improve performance.♻️ Suggested improvement
- String newFileBase64 = Base64.getEncoder().encodeToString(file.getBytes()); - String newFileHash = IdentityProviderUtil.generateB64EncodedHash(IdentityProviderUtil.ALGO_SHA3_256, newFileBase64); + byte[] fileBytes = file.getBytes(); + String newFileHash = IdentityProviderUtil.generateB64EncodedHash(IdentityProviderUtil.ALGO_SHA3_256, new String(fileBytes)); + String newFileBase64 = Base64.getEncoder().encodeToString(fileBytes);Additionally, consider extending
RegistrationFilesto store hashes alongside file content to avoid recomputation.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
signup-service/src/main/java/io/mosip/signup/services/RegistrationService.javasignup-service/src/main/java/io/mosip/signup/util/ErrorConstants.java
🧰 Additional context used
🧬 Code graph analysis (1)
signup-service/src/main/java/io/mosip/signup/services/RegistrationService.java (1)
signup-service/src/main/java/io/mosip/signup/util/ErrorConstants.java (1)
ErrorConstants(8-67)
🔇 Additional comments (2)
signup-service/src/main/java/io/mosip/signup/util/ErrorConstants.java (1)
65-66: LGTM!The new error constants follow the existing naming conventions and are appropriately placed with other upload-related constants.
signup-service/src/main/java/io/mosip/signup/services/RegistrationService.java (1)
357-371: LGTM on the file storage and error handling logic.The null-safe initialization of
RegistrationFilesanduploadedFilesmap is correctly handled. The exception flow properly distinguishes between validation errors (which propagate asSignUpException) and I/O errors (which are caught and re-thrown asUPLOAD_FAILED).
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
signup-service/src/main/java/io/mosip/signup/services/RegistrationService.java
Outdated
Show resolved
Hide resolved
Signed-off-by: KashiwalHarsh <kashiwalharsh1234@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@signup-service/pom.xml`:
- Around line 83-87: Dependency org.apache.tika:tika-core currently pinned to
version 2.9.2 which is vulnerable; update the dependency declaration for
artifactId "tika-core" (groupId "org.apache.tika") to use at least version 3.2.2
(preferably 3.2.3) in the pom.xml, run mvn -U dependency:resolve or update
project to verify the new version is picked, and run the build/tests to ensure
compatibility after the upgrade.
In `@signup-service/src/main/java/io/mosip/signup/util/UiSpecUtils.java`:
- Around line 17-27: In UiSpecUtils, the array branch that iterates over
acceptedFileTypes can add empty or whitespace-only MIME entries to result;
update the loop that handles acceptedFileTypes.isArray() (inside the method
where result is populated) to trim each type string (use
typeNode.asText().trim()) and only add it when non-empty (e.g., check
!trimmed.isEmpty()) so the array branch mirrors the comma-separated branch's
filtering behavior.
🧹 Nitpick comments (1)
signup-service/src/main/java/io/mosip/signup/util/UiSpecUtils.java (1)
5-6: Make the utility class non-instantiable.
Consider marking itfinalwith a private constructor to prevent accidental instantiation.♻️ Proposed refactor
-public class UiSpecUtils { +public final class UiSpecUtils { + private UiSpecUtils() { + } public static Set<String> findAcceptedFileTypes(JsonNode root) {
| <dependency> | ||
| <groupId>org.apache.tika</groupId> | ||
| <artifactId>tika-core</artifactId> | ||
| <version>2.9.2</version> | ||
| </dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check latest version on Maven Central
echo "=== Checking Maven Central for tika-core versions ==="
curl -s https://repo1.maven.org/maven2/org/apache/tika/tika-core/maven-metadata.xml | python3 <<'PY'
import sys, xml.etree.ElementTree as ET
try:
root = ET.fromstring(sys.stdin.read())
# Register namespace
ns = {'mvn': 'http://maven.apache.org/metadata/1.0.0'}
# Try with namespace first
latest = root.findtext('.//{http://maven.apache.org/metadata/1.0.0}latest')
release = root.findtext('.//{http://maven.apache.org/metadata/1.0.0}release')
# Fallback to without namespace
if not latest:
latest = root.findtext('./versioning/latest')
if not release:
release = root.findtext('./versioning/release')
print(f"Latest: {latest}")
print(f"Release: {release}")
# Get all versions
versions = []
for v in root.findall('.//{http://maven.apache.org/metadata/1.0.0}version'):
if v.text:
versions.append(v.text)
if not versions:
for v in root.findall('./versioning/versions/version'):
if v.text:
versions.append(v.text)
# Sort and show recent versions
versions_sorted = sorted(versions, key=lambda x: tuple(map(int, x.split('.')[:3])))
recent = versions_sorted[-10:] if len(versions_sorted) > 10 else versions_sorted
print(f"Recent versions: {recent}")
print(f"Contains 2.9.2: {'2.9.2' in versions}")
except Exception as e:
print(f"Error: {e}")
PY
# Check OSV for known vulnerabilities
echo -e "\n=== Checking OSV database for vulnerabilities ==="
curl -s https://api.osv.dev/v1/query -X POST -H "Content-Type: application/json" \
-d '{"package":{"ecosystem":"Maven","name":"org.apache.tika:tika-core"},"version":"2.9.2"}' | python3 <<'PY'
import json, sys
try:
data = json.load(sys.stdin)
vulns = data.get("vulns", [])
if vulns:
print(f"Found {len(vulns)} vulnerabilities affecting 2.9.2:")
for v in vulns:
print(f" - {v.get('id')}: {v.get('summary', 'No summary')}")
else:
print("No vulnerabilities found for 2.9.2")
except Exception as e:
print(f"Error: {e}")
PYRepository: mosip/esignet-signup
Length of output: 150
🌐 Web query:
tika-core 2.9.2 CVE vulnerabilities security advisory
💡 Result:
Apache Tika tika-core 2.9.2 (released Apr 2, 2024) is in the affected version range for a critical XXE issue related to parsing XFA inside PDFs. [1][2]
Security advisories / CVEs impacting org.apache.tika:tika-core 2.9.2
- CVE-2025-66516 (Critical XXE / CWE-611)
Affected:tika-core1.13 through 3.2.1 (so 2.9.2 is affected)
Fixed: upgradetika-coreto >= 3.2.2 (or newer). [2][3] - CVE-2025-54988 (XXE when parsing XFA via PDFParser)
Tika project’s security page lists this as affecting 1.13–3.2.1. (This is the original report; CVE-2025-66516 expands/clarifies scope, emphasizing the fix is intika-core.) [4][3]
What to do
- Upgrade: move off 2.9.2 to Apache Tika >= 3.2.2 (latest available is newer). [1][2]
- If you can’t upgrade immediately: avoid/untrust PDF parsing paths involving XFA, and ensure XML parsing is configured to disallow external entities (defense-in-depth). [2][4]
Sources:
[1] Maven Repository – org.apache.tika:tika-core:2.9.2 (release date + vulnerability flagging)
[2] NVD – CVE-2025-66516 (affected versions + fix)
[3] oss-security post expanding scope to tika-core and advising tika-core >= 3.2.2
[4] Apache Tika Security page (CVE-2025-54988 affected versions)
🌐 Web query:
Apache tika-core latest version 2024 2025
💡 Result:
Apache Tika tika-core latest versions (2024–2025):
- Latest in 2025 (and latest overall as of Sep 9, 2025):
3.2.3(tika.apache.org) - Latest in 2024:
3.0.0(released Oct 19, 2024) (tika.apache.org)
Last 2.x line (Java 8 era):
2.9.4(released May 5, 2025) — final 2.x release (tika.apache.org)
Maven coordinates:
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>3.2.3</version>
</dependency>Citations:
- 1: https://tika.apache.org/?utm_source=openai
- 2: https://tika.apache.org/?utm_source=openai
- 3: https://tika.apache.org/?utm_source=openai
🌐 Web query:
org.apache.tika:tika-core Maven Central latest release
💡 Result:
Latest release of org.apache.tika:tika-core on Maven Central is 3.2.3 (released Sep 11, 2025). [1][2]
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>3.2.3</version>
</dependency>Upgrade tika-core to >= 3.2.2 (latest: 3.2.3) to fix critical XXE vulnerability.
Version 2.9.2 is affected by CVE-2025-66516 (Critical XXE/CWE-611) when parsing XFA inside PDFs. This vulnerability affects all tika-core versions 1.13–3.2.1 and is fixed in 3.2.2 and later. The latest available version is 3.2.3 (released Sep 11, 2025).
🤖 Prompt for AI Agents
In `@signup-service/pom.xml` around lines 83 - 87, Dependency
org.apache.tika:tika-core currently pinned to version 2.9.2 which is vulnerable;
update the dependency declaration for artifactId "tika-core" (groupId
"org.apache.tika") to use at least version 3.2.2 (preferably 3.2.3) in the
pom.xml, run mvn -U dependency:resolve or update project to verify the new
version is picked, and run the build/tests to ensure compatibility after the
upgrade.
| if (acceptedFileTypes.isArray()) { | ||
| for (JsonNode typeNode : acceptedFileTypes) { | ||
| if (typeNode.isTextual()) { | ||
| result.add(typeNode.asText().trim()); | ||
| } | ||
| } | ||
| } else if (acceptedFileTypes.isTextual()) { | ||
| Arrays.stream(acceptedFileTypes.asText().split(",")) | ||
| .map(String::trim) | ||
| .filter(s -> !s.isEmpty()) | ||
| .forEach(result::add); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filter out empty MIME entries in array-form acceptedFileTypes.
Empty/whitespace-only entries can slip into the set in the array branch, which is inconsistent with the comma-separated branch and may cause confusing allowlist behavior.
🐛 Suggested fix
if (acceptedFileTypes.isArray()) {
for (JsonNode typeNode : acceptedFileTypes) {
if (typeNode.isTextual()) {
- result.add(typeNode.asText().trim());
+ String type = typeNode.asText().trim();
+ if (!type.isEmpty()) {
+ result.add(type);
+ }
}
}
} else if (acceptedFileTypes.isTextual()) {🤖 Prompt for AI Agents
In `@signup-service/src/main/java/io/mosip/signup/util/UiSpecUtils.java` around
lines 17 - 27, In UiSpecUtils, the array branch that iterates over
acceptedFileTypes can add empty or whitespace-only MIME entries to result;
update the loop that handles acceptedFileTypes.isArray() (inside the method
where result is populated) to trim each type string (use
typeNode.asText().trim()) and only add it when non-empty (e.g., check
!trimmed.isEmpty()) so the array branch mirrors the comma-separated branch's
filtering behavior.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.