Skip to content

Conversation

@nishika26
Copy link
Collaborator

@nishika26 nishika26 commented Dec 9, 2025

Summary

Target issue is #426

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

  • Enhanced the send_callback function by adding callback url validation to check if the url is a proper public https url and not one of these - private, multicast, link-local (which covers cloud metadata endpoints), localhost/loopback and reserved
  • Also added session.trust_env=false for extra protection as it ignores environment proxies
  • Disables automatic redirect following in callback requests to prevent redirect-based SSRF attacks and ensure all redirect destinations are validated before connection.
  • Put a limit on callback response size which is configurable from env

Summary by CodeRabbit

  • New Features

    • Callback handling now validates URLs before use and reporting success/failure for send operations.
  • Security Improvements

    • Enforced HTTPS-only callbacks; DNS/IP checks block private, loopback, link-local, and multicast destinations. Callbacks disable redirects, stream responses, and enforce size/time limits.
  • Bug Fixes

    • Request paths that accept callback URLs now validate them at runtime before proceeding.
  • Tests

    • Added comprehensive tests for URL validation, SSRF protections, and callback send behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@nishika26 nishika26 self-assigned this Dec 9, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Walkthrough

Adds SSRF protections and stricter callback handling: new IP classification, hostname/DNS-resolved IP checks, HTTPS-only enforcement, runtime validation calls from API routes, updated send_callback to pre-validate, stream responses, disable redirects and return success status, removal of an old post_callback helper, and new tests covering these behaviors.

Changes

Cohort / File(s) Summary
API route validation
backend/app/api/routes/collections.py, backend/app/api/routes/documents.py, backend/app/api/routes/llm.py
Import and invoke validate_callback_url(...) when a callback URL is provided at runtime; documents.py also adds a HttpUrl import.
Callback security & delivery
backend/app/utils.py
Add _is_private_ip(ip), validate_callback_url(url) (HTTPS-only, hostname required, DNS-resolved IP checks for loopback/private/link-local/multicast), and update send_callback(callback_url, data) to pre-validate URLs, disable redirects, stream responses, enforce a max response size, and return a boolean status.
Core utilities removal
backend/app/core/util.py
Remove the previous post_callback function and related HTTP request imports.
Configuration & env example
backend/app/core/config.py, .env.example
Update comments to reference callback timeouts and size limits (no functional changes to defaults or new env variables).
Tests
backend/app/tests/utils/test_callback_ssrf.py
New comprehensive tests for _is_private_ip, validate_callback_url, and send_callback (scheme checks, DNS-mocked IP checks, redirects disabled, streaming, timeout tuple, response-size enforcement, and error/HTTP handling).
sequenceDiagram
    participant API as API Endpoint
    participant Validator as validate_callback_url
    participant DNS as DNS Resolver
    participant IPCheck as _is_private_ip
    participant Sender as send_callback
    participant Remote as External Service

    API->>Validator: validate_callback_url(url)
    Validator->>Validator: ensure HTTPS & hostname present
    Validator->>DNS: resolve hostname
    DNS-->>Validator: return IP list
    loop per resolved IP
        Validator->>IPCheck: classify IP (loopback/private/link-local/multicast)
        IPCheck-->>Validator: blocked? / reason
    end
    alt any IP blocked
        Validator-->>API: raise ValueError (SSRF blocked)
    else
        Validator-->>API: validation passed
        API->>Sender: send_callback(url, payload)
        Sender->>Validator: re-validate URL
        Sender->>Remote: POST (stream=True, allow_redirects=False, timeout=(..,..))
        Remote-->>Sender: response stream
        Sender->>Sender: enforce max response size while streaming
        alt within limit
            Sender-->>API: True (success)
        else exceeded
            Sender-->>API: False (closed early)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect backend/app/utils.py closely: DNS resolution, ipaddress logic (RFC1918/link-local/loopback/multicast), exact ValueError messages, and response streaming/truncation/cleanup.
  • Verify send_callback uses stream=True, allow_redirects=False, a 2-tuple timeout, enforces the response-size limit, and properly closes/releases responses.
  • Confirm API routes call validate_callback_url where appropriate and no references to the removed post_callback remain.
  • Review the new tests for correctness of mocks and coverage of edge cases.

Possibly related issues

  • Improve send_callback Function #426 — Matches SSRF mitigations and hardening of send_callback (HTTPS-only, private IP checks, disable redirects, timeouts, size limits); this PR appears to implement the same objectives.

Poem

🐰 I hopped through HTTPS fields with care,

I sniffed the names and chased each IP lair.
No private burrows let me through the gate,
I sip the stream and cap the byte-sized plate.
Safe callbacks now — hop, secure, and great! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main enhancements: improved send_callback function with callback URL validation, redirection restrictions, and response size limits, matching the core changes across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 87.10% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch enhancement/callback_ssrf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nishika26 nishika26 linked an issue Dec 9, 2025 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

❌ Patch coverage is 96.83258% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/utils.py 86.84% 5 Missing ⚠️
backend/app/api/routes/collections.py 80.00% 1 Missing ⚠️
backend/app/api/routes/documents.py 75.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
backend/app/core/config.py (1)

121-124: Centralized callback response size limit looks good (consider future configurability).

Defining CALLBACK_MAX_RESPONSE_SIZE here and using it in send_callback gives a clear, global control over max response size. If you expect different deployments to need different limits, consider wiring this through env settings later, but the current 1 MiB default is a solid starting point.

backend/app/api/routes/llm.py (1)

8-8: Callback URL validation adds SSRF protection; consider surfacing a clean HTTP error to clients.

Using validate_callback_url before start_job is the right place to block SSRF-style callbacks. Currently, a ValueError from validate_callback_url will bubble up; unless you already have a global ValueError handler, FastAPI will likely turn this into a 500. To keep the API surface predictable, you may want to wrap this in a small try/except and raise an HTTPException (e.g., 400/422) or convert to an APIResponse.failure_response instead.

Example:

 from app.utils import APIResponse, validate_callback_url
+from fastapi import HTTPException

 ...

-    if request.callback_url:
-        validate_callback_url(str(request.callback_url))
+    if request.callback_url:
+        try:
+            validate_callback_url(str(request.callback_url))
+        except ValueError as e:
+            raise HTTPException(status_code=400, detail=str(e))

Also applies to: 47-49

backend/app/api/routes/collections.py (1)

29-29: Consistent SSRF checks on collection callbacks; align error semantics with the rest of the API.

Adding validate_callback_url to both create_collection and delete_collection gives consistent SSRF protection for collection callbacks. As with the LLM route, these calls currently raise ValueError directly; without a global handler this will show up as a 500 to clients. Consider normalizing this to a 4xx via HTTPException or your standard error envelope so invalid callback URLs are clearly reported as client errors.

Pattern sketch:

-from app.utils import APIResponse, load_description, validate_callback_url
+from app.utils import APIResponse, load_description, validate_callback_url
+from fastapi import HTTPException

 ...

-    if request.callback_url:
-        validate_callback_url(str(request.callback_url))
+    if request.callback_url:
+        try:
+            validate_callback_url(str(request.callback_url))
+        except ValueError as e:
+            raise HTTPException(status_code=400, detail=str(e))

 ...

-    if request and request.callback_url:
-        validate_callback_url(str(request.callback_url))
+    if request and request.callback_url:
+        try:
+            validate_callback_url(str(request.callback_url))
+        except ValueError as e:
+            raise HTTPException(status_code=400, detail=str(e))

Also applies to: 84-86, 136-138

backend/app/utils.py (2)

342-392: send_callback hardening is good; consider minor polish and future extensibility.

The updated send_callback now:

  • Re-validates the URL via validate_callback_url.
  • Uses a requests.Session context manager.
  • Disables redirects and enables streaming.
  • Applies connect/read timeouts from settings.
  • Enforces a max response size via CALLBACK_MAX_RESPONSE_SIZE.

This is a solid improvement for SSRF and DoS resistance. A couple of small follow‑ups you might consider (not blocking):

  • If you want invalid callback URLs (from validate_callback_url) to be logged distinctly from network failures, you could catch ValueError separately and log a clearer message before re‑raising or returning False.
  • If you anticipate needing different size limits per deployment, wiring CALLBACK_MAX_RESPONSE_SIZE from env (as mentioned in Settings) would make tuning easier.

No changes required for behaviour; these are purely polish/observability ideas.


33-37: Modernize APIResponse type hints to use built‑in generics.

Ruff’s hint about typing.Dict being deprecated is valid here:

metadata: Optional[Dict[str, Any]] = None

In Python 3.11+, you can simplify this to:

-    metadata: Optional[Dict[str, Any]] = None
+    metadata: dict[str, Any] | None = None

and drop Dict/Optional imports if they become unused. Not urgent, but keeps types idiomatic.

backend/app/tests/utils/test_callback_ssrf.py (1)

1-383: Comprehensive SSRF and callback tests; only minor nit on patch scope.

This test module gives very thorough coverage of _is_private_ip, validate_callback_url, and send_callback (including DNS round‑robin, IPv6, metadata endpoints, timeouts, redirects, streaming, and size limits), which is exactly what this change needs.

One small optional improvement: for the validate_callback_url tests you currently patch socket.getaddrinfo:

@patch("socket.getaddrinfo")
def test_reject_localhost_by_name(self, mock_getaddrinfo):
    ...

Patching app.utils.socket.getaddrinfo instead would narrow the monkey‑patch to the code under test and reduce the chance of surprising interactions with other tests that also use socket:

@patch("app.utils.socket.getaddrinfo")
def test_reject_localhost_by_name(self, mock_getaddrinfo):
    ...

Not required, but a bit safer in a larger test suite.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 002b191 and d4e774b.

📒 Files selected for processing (7)
  • backend/app/api/routes/collections.py (3 hunks)
  • backend/app/api/routes/documents.py (4 hunks)
  • backend/app/api/routes/llm.py (2 hunks)
  • backend/app/core/config.py (1 hunks)
  • backend/app/core/util.py (0 hunks)
  • backend/app/tests/utils/test_callback_ssrf.py (1 hunks)
  • backend/app/utils.py (2 hunks)
💤 Files with no reviewable changes (1)
  • backend/app/core/util.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/api/routes/llm.py
  • backend/app/api/routes/collections.py
  • backend/app/api/routes/documents.py
  • backend/app/core/config.py
  • backend/app/tests/utils/test_callback_ssrf.py
  • backend/app/utils.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/llm.py
  • backend/app/api/routes/collections.py
  • backend/app/api/routes/documents.py
backend/app/core/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place core functionality (config, DB session, security, exceptions, middleware) in backend/app/core/

Files:

  • backend/app/core/config.py
🧬 Code graph analysis (5)
backend/app/api/routes/llm.py (1)
backend/app/utils.py (1)
  • validate_callback_url (291-339)
backend/app/api/routes/collections.py (1)
backend/app/utils.py (1)
  • validate_callback_url (291-339)
backend/app/api/routes/documents.py (1)
backend/app/utils.py (2)
  • load_description (399-404)
  • validate_callback_url (291-339)
backend/app/tests/utils/test_callback_ssrf.py (1)
backend/app/utils.py (4)
  • _is_private_ip (268-288)
  • validate_callback_url (291-339)
  • send_callback (342-395)
  • _ (408-409)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
  • send_callback (36-48)
🪛 Ruff (0.14.8)
backend/app/utils.py

9-9: typing.Dict is deprecated, use dict instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

@nishika26 nishika26 changed the title improve send_callback function improve send_callback function with callback url validation, restricting redirection and limiting response size Dec 9, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
.env.example (1)

83-86: Fix .env formatting to follow best practices.

The new environment variable has several formatting issues flagged by static analysis:

  1. Spaces around equal signs (lines 84-86 all have this issue)
  2. Unquoted value on line 86
  3. Variable ordering (CALLBACK_MAX_RESPONSE_SIZE should come before CALLBACK_READ_TIMEOUT alphabetically)

While lines 84-85 also have spaces (existing code), it's a good opportunity to fix all three lines to follow .env best practices.

Apply this diff to fix the formatting:

-# Callback Timeouts and size limit(in seconds and MB respectively)
-CALLBACK_CONNECT_TIMEOUT = 3
-CALLBACK_READ_TIMEOUT = 10
-CALLBACK_MAX_RESPONSE_SIZE = 1048576  #(1*1024*1024)
+# Callback Timeouts and size limit (in seconds and MB respectively)
+CALLBACK_CONNECT_TIMEOUT=3
+CALLBACK_MAX_RESPONSE_SIZE="1048576"
+CALLBACK_READ_TIMEOUT=10

Note: Added space after "limit" in the comment, removed spaces around =, quoted the numeric value, and reordered alphabetically.

backend/app/utils.py (1)

364-364: Remove redundant type cast.

The str() cast is unnecessary since callback_url is already typed as str in the function signature.

-        validate_callback_url(str(callback_url))
+        validate_callback_url(callback_url)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4e774b and c95f148.

📒 Files selected for processing (3)
  • .env.example (1 hunks)
  • backend/app/api/routes/documents.py (3 hunks)
  • backend/app/utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/api/routes/documents.py
🧰 Additional context used
📓 Path-based instructions (2)
.env.example

📄 CodeRabbit inference engine (CLAUDE.md)

Provide .env.example as the template for .env

Files:

  • .env.example
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
  • send_callback (36-48)
🪛 dotenv-linter (4.0.0)
.env.example

[warning] 84-84: [SpaceCharacter] The line has spaces around equal sign

(SpaceCharacter)


[warning] 85-85: [SpaceCharacter] The line has spaces around equal sign

(SpaceCharacter)


[warning] 86-86: [SpaceCharacter] The line has spaces around equal sign

(SpaceCharacter)


[warning] 86-86: [UnorderedKey] The CALLBACK_MAX_RESPONSE_SIZE key should go before the CALLBACK_READ_TIMEOUT key

(UnorderedKey)


[warning] 86-86: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)

🪛 Ruff (0.14.8)
backend/app/utils.py

9-9: typing.Dict is deprecated, use dict instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/utils.py (2)

268-289: LGTM! Return type annotation has been corrected.

The function correctly implements comprehensive IP blocking checks (loopback, link-local, multicast, private, reserved). The return type annotation tuple[bool, str] now matches the actual return values, addressing the past review comment.


366-389: Excellent SSRF protection measures!

The implementation demonstrates strong security practices:

  • Pre-request URL validation prevents SSRF attacks
  • allow_redirects=False blocks redirect-based SSRF
  • stream=True with chunked reading prevents memory exhaustion
  • Response size limit (enforced incrementally) protects against DoS via large responses

Note: The response body is read and discarded solely to enforce the size limit. While this consumes bandwidth, it's necessary to prevent attackers from exploiting unlimited response sizes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/app/utils.py (1)

291-339: Add DNS resolution timeout and normalize IPv6 addresses before IP checks.

The HTTPS-only enforcement and per-IP classification are solid, but there are two gaps:

  1. No timeout on socket.getaddrinfo (DoS risk).
    A slow/malicious DNS server can cause validate_callback_url to hang indefinitely. This was already flagged in a previous review and is still unresolved.

  2. IPv6 scope IDs may bypass blocking.
    getaddrinfo can return addresses like fe80::1%eth0; passing this directly to ipaddress.ip_address raises ValueError, and _is_private_ip then treats it as “not blocked”.

You can address both with something along these lines:

+from concurrent.futures import ThreadPoolExecutor, TimeoutError as FuturesTimeoutError
@@
-        addr_info = socket.getaddrinfo(
-            parsed.hostname,
-            parsed.port or 443,
-            socket.AF_UNSPEC,
-            socket.SOCK_STREAM,
-        )
+        # Resolve DNS with a bounded timeout to avoid hanging validation
+        with ThreadPoolExecutor(max_workers=1) as executor:
+            future = executor.submit(
+                socket.getaddrinfo,
+                parsed.hostname,
+                parsed.port or 443,
+                socket.AF_UNSPEC,
+                socket.SOCK_STREAM,
+            )
+            try:
+                addr_info = future.result(timeout=5.0)  # consider making this configurable
+            except FuturesTimeoutError:
+                raise ValueError("DNS resolution timeout for callback URL")
@@
-        for info in addr_info:
-            ip_address = info[4][0]
-            is_blocked, reason = _is_private_ip(ip_address)
+        for info in addr_info:
+            raw_ip = info[4][0]
+            # Strip IPv6 scope IDs like "fe80::1%eth0" before classification
+            ip_for_check = raw_ip.split("%", 1)[0]
+            is_blocked, reason = _is_private_ip(ip_for_check)
             if is_blocked:
                 raise ValueError(
-                    f"Callback URL resolves to {reason} IP address: {ip_address}. "
+                    f"Callback URL resolves to {reason} IP address: {raw_ip}. "
                     f"This IP type is not allowed for callbacks."
                 )

This keeps the existing validation semantics but makes the function resilient to DNS-based DoS and correctly handles IPv6 link-local addresses with scope IDs.

🧹 Nitpick comments (2)
backend/app/utils.py (2)

2-10: Use builtin dict[...] instead of typing.Dict (Py3.11 style).

New imports look good. To satisfy the Ruff hint and modern typing style, you can drop Dict and use builtin generics:

-from typing import Any, Dict, Generic, Optional, TypeVar
+from typing import Any, Generic, Optional, TypeVar
@@
 class APIResponse(BaseModel, Generic[T]):
@@
-    metadata: Optional[Dict[str, Any]] = None
+    metadata: Optional[dict[str, Any]] = None

Also applies to: 34-37


268-289: Consider also blocking “unspecified” addresses.

The checks for loopback, link-local, multicast, private, and reserved ranges look solid. You may also want to treat ip_obj.is_unspecified (e.g. 0.0.0.0, ::) as blocked to avoid callbacks to non-routable/placeholder addresses:

-        checks = [
+        checks = [
             (ip_obj.is_loopback, "loopback/localhost"),
             (ip_obj.is_link_local, "link-local"),
             (ip_obj.is_multicast, "multicast"),
             (ip_obj.is_private, "private"),
             (ip_obj.is_reserved, "reserved"),
+            (ip_obj.is_unspecified, "unspecified"),
         ]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c95f148 and 83a8e99.

📒 Files selected for processing (1)
  • backend/app/utils.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
  • send_callback (36-48)
🪛 Ruff (0.14.8)
backend/app/utils.py

9-9: typing.Dict is deprecated, use dict instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/app/utils.py (2)

268-289: _is_private_ip logic and typing look solid; consider optionally treating “unspecified” as blocked.

The function now returns tuple[bool, str] consistently with its callers, and the set of checks (loopback, link-local, multicast, private, reserved) is a good baseline for callback/IP classification. If you want to be extra strict, you could also treat ip_obj.is_unspecified as blocked (e.g., 0.0.0.0, ::) so that clearly non-routable addresses are rejected up front, but this is optional and not a blocker.


9-9: Optional: modernize Dict usage to built‑in dict for type hints.

Ruff’s UP035 hint applies here: for Python 3.11+ you can drop Dict entirely and rely on PEP‑585 style built‑ins, which you’re already using elsewhere (dict[str, Any] in send_callback):

-from typing import Any, Dict, Generic, Optional, TypeVar
+from typing import Any, Generic, Optional, TypeVar
@@
-    metadata: Optional[Dict[str, Any]] = None
+    metadata: Optional[dict[str, Any]] = None
@@
-        cls, data: T, metadata: Optional[Dict[str, Any]] = None
+        cls, data: T, metadata: Optional[dict[str, Any]] = None
@@
-        metadata: Optional[Dict[str, Any]] = None,
+        metadata: Optional[dict[str, Any]] = None,

This is non‑functional and can be done whenever you next touch this class.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83a8e99 and 1ba5150.

📒 Files selected for processing (1)
  • backend/app/utils.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
  • send_callback (36-48)
🪛 Ruff (0.14.8)
backend/app/utils.py

9-9: typing.Dict is deprecated, use dict instead

(UP035)


394-394: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/utils.py (2)

2-10: New networking imports are appropriate and used correctly.

ipaddress, socket, and urlparse are all exercised in the new SSRF helpers and keep concerns local to this module; no issues from a dependency or usage standpoint.


291-339: validate_callback_url provides a good HTTPS-only, IP-based SSRF guardrail.

The URL parsing + HTTPS enforcement + hostname requirement + getaddrinfo over all resolved addresses, combined with _is_private_ip checks, gives you reasonable coverage against private/loopback/link-local/multicast/reserved targets and basic DNS‑rebind scenarios. The exception handling is also clear: callers see ValueError with a useful message for both policy rejections and unexpected resolution errors.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/app/utils.py (2)

9-9: Use lowercase dict instead of deprecated typing.Dict.

For Python 3.9+, the built-in dict can be used directly in type annotations. The Dict import is unused after you've already updated send_callback to use dict[str, Any].

-from typing import Any, Dict, Generic, Optional, TypeVar
+from typing import Any, Generic, Optional, TypeVar

Then update remaining usages in APIResponse:

-    metadata: Optional[Dict[str, Any]] = None
+    metadata: Optional[dict[str, Any]] = None

363-367: Remove unnecessary str() conversion.

The callback_url parameter is already typed as str, so the str() conversion is redundant.

     try:
-        validate_callback_url(str(callback_url))
+        validate_callback_url(callback_url)
     except ValueError as ve:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ba5150 and 29cbaea.

📒 Files selected for processing (1)
  • backend/app/utils.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (2)
backend/app/api/routes/threads.py (1)
  • send_callback (36-48)
backend/app/core/cloud/storage.py (2)
  • stream (125-127)
  • stream (179-195)
🪛 Ruff (0.14.8)
backend/app/utils.py

9-9: typing.Dict is deprecated, use dict instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/utils.py (3)

268-289: LGTM!

The IP validation logic is well-structured with comprehensive checks covering loopback, link-local, multicast, private, and reserved IP ranges. The return type annotation is correct, and gracefully handling ValueError for invalid IP strings is appropriate.


291-339: LGTM!

The SSRF validation is thorough: enforces HTTPS, validates hostname presence, resolves DNS, and checks all returned IP addresses against the private/reserved ranges. Checking all IPs from getaddrinfo provides protection against DNS round-robin attacks.


369-401: LGTM on the SSRF hardening!

The implementation addresses all key SSRF vectors:

  • session.trust_env = False prevents proxy-based SSRF
  • allow_redirects=False blocks redirect-based attacks
  • stream=True with chunked size checking prevents response-based DoS
  • Pre-request URL validation with IP resolution

@nishika26 nishika26 added enhancement New feature or request ready-for-review labels Dec 9, 2025
@nishika26 nishika26 requested review from Prajna1999 and removed request for kartpop December 9, 2025 09:28
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/api/routes/llm.py (1)

1-63: Fix Black formatting failure reported by CI

The CI pipeline reports that Black reformatted at least one file. Please run your formatting hooks locally (for example, pre-commit run --all-files or black .) and commit the resulting changes so the AI Platform CI check passes.

🧹 Nitpick comments (2)
backend/app/api/routes/llm.py (2)

49-51: Good SSRF guard; consider surfacing invalid callback URLs as 4xx instead of 500

Validating request.callback_url with validate_callback_url before starting the job is the right place to enforce SSRF protections and HTTPS-only callbacks.

However, validate_callback_url raises ValueError, which in a FastAPI route will typically bubble up as a 500 unless you already map it to a client error globally. To make invalid callbacks clearly a client-side issue and avoid generic 500s, consider catching ValueError here and translating it to an HTTPException (400/422), e.g.:

from fastapi import HTTPException

def llm_call(...):
    ...
    if request.callback_url is not None:
        try:
            validate_callback_url(str(request.callback_url))
        except ValueError as exc:
            raise HTTPException(status_code=400, detail=str(exc))

Adjust status code and error shaping to match your existing API conventions and any global exception handlers.


22-23: Add explicit return type hints for route handlers (optional)

To align with the project guideline of using type hints, you may want to add explicit return types, for example:

def llm_callback_notification(body: APIResponse[LLMCallResponse]) -> None:  # or appropriate type

def llm_call(
    _current_user: AuthContextDep,
    _session: SessionDep,
    request: LLMCallRequest,
) -> APIResponse[Message]:
    ...

This helps static analysis and keeps the API surface self-documenting.

Also applies to: 40-41

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29cbaea and abc2612.

📒 Files selected for processing (2)
  • backend/app/api/routes/documents.py (3 hunks)
  • backend/app/api/routes/llm.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/api/routes/documents.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/api/routes/llm.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/llm.py
🧬 Code graph analysis (1)
backend/app/api/routes/llm.py (1)
backend/app/utils.py (1)
  • validate_callback_url (291-339)
🪛 GitHub Actions: AI Platform CI
backend/app/api/routes/llm.py

[error] 1-1: Black formatting failed: 1 file reformatted. Run 'pre-commit run --all-files' or 'black .' to fix formatting.

🔇 Additional comments (1)
backend/app/api/routes/llm.py (1)

8-8: Import updates align with new callback validation utility

Importing APIResponse, validate_callback_url, and load_description from app.utils matches their usage in this module and keeps SSRF-related logic centralized. No issues here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/app/api/routes/llm.py (2)

8-8: Import usage looks good; consider aligning load_description call with its Path-typed signature

Bringing validate_callback_url into this module fits the new SSRF validation flow and matches its use below.

One minor thing: load_description in app.utils is annotated to accept a Path and calls filename.exists() / filename.read_text(), while here it’s invoked with a string literal:

description = load_description("llm/llm_call.md")

If load_description really expects a Path, consider either:

  • Passing a Path here:
+from pathlib import Path
@@
-    description=load_description("llm/llm_call.md"),
+    description=load_description(Path("llm/llm_call.md")),

or

  • Updating load_description to accept str | Path and normalize internally.

Please double-check the current load_description implementation to ensure a plain str won’t raise at import time.

Also applies to: 35-35


48-50: Callback URL pre-validation is correct; consider surfacing a clear 4xx instead of a generic 500 on failure

The new guard:

if request.callback_url:
    validate_callback_url(str(request.callback_url))

is in the right place and correctly reuses the central SSRF protections (HTTPS-only, DNS/IP checks, etc.) before enqueueing the job.

One behavior to check: validate_callback_url raises ValueError on invalid URLs. Unless you already have a global exception handler that maps ValueError to a 4xx response, FastAPI will treat this as an unhandled exception and return a 500.

If you don’t have that handler, consider wrapping this locally, e.g.:

+from fastapi import HTTPException
@@
-    if request.callback_url:
-        validate_callback_url(str(request.callback_url))
+    if request.callback_url:
+        try:
+            validate_callback_url(str(request.callback_url))
+        except ValueError as e:
+            raise HTTPException(status_code=400, detail=str(e))

This keeps the SSRF protection while giving clients a clear, client-error response when they submit a bad callback URL.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abc2612 and 1b5c8d9.

📒 Files selected for processing (1)
  • backend/app/api/routes/llm.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/api/routes/llm.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/llm.py
🧬 Code graph analysis (1)
backend/app/api/routes/llm.py (1)
backend/app/utils.py (2)
  • validate_callback_url (291-339)
  • load_description (405-410)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/app/utils.py (2)

268-289: Consider simplifying tuple return syntax.

The function logic is solid and provides comprehensive SSRF protection. However, the explicit parentheses in tuple returns (e.g., return (True, reason)) are optional in Python and commonly omitted for cleaner code.

Apply this diff for a more idiomatic style:

         for is_blocked, reason in checks:
             if is_blocked:
-                return (True, reason)
+                return True, reason
 
-        return (False, "")
+        return False, ""
 
     except ValueError:
-        return (False, "")
+        return False, ""

291-340: Strong SSRF protection implementation.

The validation logic is comprehensive:

  • HTTPS-only enforcement prevents protocol smuggling
  • DNS resolution with getaddrinfo covers both IPv4 and IPv6
  • All resolved IPs are checked against private/reserved ranges
  • Proper exception handling with clear error messages

The implementation effectively blocks private IPs, localhost, link-local (including 169.254.169.254 cloud metadata), multicast, and reserved ranges as documented.

Minor suggestion: Consider renaming ip_address (line 328) to resolved_ip to avoid confusion with the ipaddress module:

         for info in addr_info:
-            ip_address = info[4][0]
-            is_blocked, reason = _is_private_ip(ip_address)
+            resolved_ip = info[4][0]
+            is_blocked, reason = _is_private_ip(resolved_ip)
             if is_blocked:
                 raise ValueError(
-                    f"Callback URL resolves to {reason} IP address: {ip_address}. "
+                    f"Callback URL resolves to {reason} IP address: {resolved_ip}. "
                     f"This IP type is not allowed for callbacks."
                 )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b5c8d9 and 366e86e.

📒 Files selected for processing (4)
  • .env.example (1 hunks)
  • backend/app/core/config.py (1 hunks)
  • backend/app/tests/utils/test_callback_ssrf.py (1 hunks)
  • backend/app/utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/app/core/config.py
  • .env.example
  • backend/app/tests/utils/test_callback_ssrf.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
  • send_callback (36-48)
🪛 Ruff (0.14.8)
backend/app/utils.py

9-9: typing.Dict is deprecated, use dict instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/utils.py (2)

2-2: LGTM! Import additions are appropriate for SSRF protection.

The new imports (ipaddress, socket, urlparse) are necessary and correctly used for the URL validation and IP address checking functionality.

Also applies to: 8-8, 10-10


342-389: Strong security improvements, but response size limiting mentioned in PR objectives is not visible in implementation.

The SSRF protections are excellent:

  • Pre-validation via validate_callback_url blocks malicious URLs
  • trust_env=False prevents environment proxy hijacking
  • allow_redirects=False prevents redirect-based SSRF
  • Proper error handling and logging

However, the PR summary states: "Adds a configurable limit on callback response size (controlled via an environment variable)," but I don't see explicit response size limiting in the code. While the function doesn't read the response body (it only checks the status code), requests may still buffer the response.

Consider one of these approaches:

Option 1: Stream response and limit size (if you need to verify response is reasonable)

             response = session.post(
                 callback_url,
                 json=data,
                 timeout=(
                     settings.CALLBACK_CONNECT_TIMEOUT,
                     settings.CALLBACK_READ_TIMEOUT,
                 ),
                 allow_redirects=False,
+                stream=True,
             )
 
             response.raise_for_status()
+            
+            # Limit response size to prevent memory exhaustion
+            max_size = getattr(settings, 'CALLBACK_MAX_RESPONSE_SIZE', 1024 * 1024)  # 1MB default
+            total_size = 0
+            for chunk in response.iter_content(chunk_size=8192):
+                total_size += len(chunk)
+                if total_size > max_size:
+                    logger.warning("[send_callback] Response size exceeded limit")
+                    break
+            response.close()

Option 2: Simply stream without reading (if you don't care about response body)

             response = session.post(
                 callback_url,
                 json=data,
                 timeout=(
                     settings.CALLBACK_CONNECT_TIMEOUT,
                     settings.CALLBACK_READ_TIMEOUT,
                 ),
                 allow_redirects=False,
+                stream=True,
             )
 
             response.raise_for_status()
+            response.close()  # Close without reading body

Can you clarify if response size limiting was intentionally omitted or if it needs to be added per the PR objectives?

@nishika26 nishika26 merged commit 6b20d12 into main Dec 12, 2025
3 checks passed
@nishika26 nishika26 deleted the enhancement/callback_ssrf branch December 12, 2025 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve send_callback Function

4 participants