-
Notifications
You must be signed in to change notification settings - Fork 7
improve send_callback function with callback url validation, restricting redirection and limiting response size #474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds SSRF protections and stricter callback handling: new IP classification, hostname/DNS-resolved IP checks, HTTPS-only enforcement, runtime validation calls from API routes, updated send_callback to pre-validate, stream responses, disable redirects and return success status, removal of an old post_callback helper, and new tests covering these behaviors. Changes
sequenceDiagram
participant API as API Endpoint
participant Validator as validate_callback_url
participant DNS as DNS Resolver
participant IPCheck as _is_private_ip
participant Sender as send_callback
participant Remote as External Service
API->>Validator: validate_callback_url(url)
Validator->>Validator: ensure HTTPS & hostname present
Validator->>DNS: resolve hostname
DNS-->>Validator: return IP list
loop per resolved IP
Validator->>IPCheck: classify IP (loopback/private/link-local/multicast)
IPCheck-->>Validator: blocked? / reason
end
alt any IP blocked
Validator-->>API: raise ValueError (SSRF blocked)
else
Validator-->>API: validation passed
API->>Sender: send_callback(url, payload)
Sender->>Validator: re-validate URL
Sender->>Remote: POST (stream=True, allow_redirects=False, timeout=(..,..))
Remote-->>Sender: response stream
Sender->>Sender: enforce max response size while streaming
alt within limit
Sender-->>API: True (success)
else exceeded
Sender-->>API: False (closed early)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related issues
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
backend/app/core/config.py (1)
121-124: Centralized callback response size limit looks good (consider future configurability).Defining
CALLBACK_MAX_RESPONSE_SIZEhere and using it insend_callbackgives a clear, global control over max response size. If you expect different deployments to need different limits, consider wiring this through env settings later, but the current 1 MiB default is a solid starting point.backend/app/api/routes/llm.py (1)
8-8: Callback URL validation adds SSRF protection; consider surfacing a clean HTTP error to clients.Using
validate_callback_urlbeforestart_jobis the right place to block SSRF-style callbacks. Currently, aValueErrorfromvalidate_callback_urlwill bubble up; unless you already have a globalValueErrorhandler, FastAPI will likely turn this into a 500. To keep the API surface predictable, you may want to wrap this in a small try/except and raise anHTTPException(e.g., 400/422) or convert to anAPIResponse.failure_responseinstead.Example:
from app.utils import APIResponse, validate_callback_url +from fastapi import HTTPException ... - if request.callback_url: - validate_callback_url(str(request.callback_url)) + if request.callback_url: + try: + validate_callback_url(str(request.callback_url)) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e))Also applies to: 47-49
backend/app/api/routes/collections.py (1)
29-29: Consistent SSRF checks on collection callbacks; align error semantics with the rest of the API.Adding
validate_callback_urlto bothcreate_collectionanddelete_collectiongives consistent SSRF protection for collection callbacks. As with the LLM route, these calls currently raiseValueErrordirectly; without a global handler this will show up as a 500 to clients. Consider normalizing this to a 4xx viaHTTPExceptionor your standard error envelope so invalid callback URLs are clearly reported as client errors.Pattern sketch:
-from app.utils import APIResponse, load_description, validate_callback_url +from app.utils import APIResponse, load_description, validate_callback_url +from fastapi import HTTPException ... - if request.callback_url: - validate_callback_url(str(request.callback_url)) + if request.callback_url: + try: + validate_callback_url(str(request.callback_url)) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e)) ... - if request and request.callback_url: - validate_callback_url(str(request.callback_url)) + if request and request.callback_url: + try: + validate_callback_url(str(request.callback_url)) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e))Also applies to: 84-86, 136-138
backend/app/utils.py (2)
342-392:send_callbackhardening is good; consider minor polish and future extensibility.The updated
send_callbacknow:
- Re-validates the URL via
validate_callback_url.- Uses a
requests.Sessioncontext manager.- Disables redirects and enables streaming.
- Applies connect/read timeouts from settings.
- Enforces a max response size via
CALLBACK_MAX_RESPONSE_SIZE.This is a solid improvement for SSRF and DoS resistance. A couple of small follow‑ups you might consider (not blocking):
- If you want invalid callback URLs (from
validate_callback_url) to be logged distinctly from network failures, you could catchValueErrorseparately and log a clearer message before re‑raising or returningFalse.- If you anticipate needing different size limits per deployment, wiring
CALLBACK_MAX_RESPONSE_SIZEfrom env (as mentioned inSettings) would make tuning easier.No changes required for behaviour; these are purely polish/observability ideas.
33-37: ModernizeAPIResponsetype hints to use built‑in generics.Ruff’s hint about
typing.Dictbeing deprecated is valid here:metadata: Optional[Dict[str, Any]] = NoneIn Python 3.11+, you can simplify this to:
- metadata: Optional[Dict[str, Any]] = None + metadata: dict[str, Any] | None = Noneand drop
Dict/Optionalimports if they become unused. Not urgent, but keeps types idiomatic.backend/app/tests/utils/test_callback_ssrf.py (1)
1-383: Comprehensive SSRF and callback tests; only minor nit on patch scope.This test module gives very thorough coverage of
_is_private_ip,validate_callback_url, andsend_callback(including DNS round‑robin, IPv6, metadata endpoints, timeouts, redirects, streaming, and size limits), which is exactly what this change needs.One small optional improvement: for the
validate_callback_urltests you currently patchsocket.getaddrinfo:@patch("socket.getaddrinfo") def test_reject_localhost_by_name(self, mock_getaddrinfo): ...Patching
app.utils.socket.getaddrinfoinstead would narrow the monkey‑patch to the code under test and reduce the chance of surprising interactions with other tests that also usesocket:@patch("app.utils.socket.getaddrinfo") def test_reject_localhost_by_name(self, mock_getaddrinfo): ...Not required, but a bit safer in a larger test suite.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/api/routes/collections.py(3 hunks)backend/app/api/routes/documents.py(4 hunks)backend/app/api/routes/llm.py(2 hunks)backend/app/core/config.py(1 hunks)backend/app/core/util.py(0 hunks)backend/app/tests/utils/test_callback_ssrf.py(1 hunks)backend/app/utils.py(2 hunks)
💤 Files with no reviewable changes (1)
- backend/app/core/util.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/api/routes/llm.pybackend/app/api/routes/collections.pybackend/app/api/routes/documents.pybackend/app/core/config.pybackend/app/tests/utils/test_callback_ssrf.pybackend/app/utils.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/llm.pybackend/app/api/routes/collections.pybackend/app/api/routes/documents.py
backend/app/core/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Place core functionality (config, DB session, security, exceptions, middleware) in backend/app/core/
Files:
backend/app/core/config.py
🧬 Code graph analysis (5)
backend/app/api/routes/llm.py (1)
backend/app/utils.py (1)
validate_callback_url(291-339)
backend/app/api/routes/collections.py (1)
backend/app/utils.py (1)
validate_callback_url(291-339)
backend/app/api/routes/documents.py (1)
backend/app/utils.py (2)
load_description(399-404)validate_callback_url(291-339)
backend/app/tests/utils/test_callback_ssrf.py (1)
backend/app/utils.py (4)
_is_private_ip(268-288)validate_callback_url(291-339)send_callback(342-395)_(408-409)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
send_callback(36-48)
🪛 Ruff (0.14.8)
backend/app/utils.py
9-9: typing.Dict is deprecated, use dict instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
.env.example (1)
83-86: Fix .env formatting to follow best practices.The new environment variable has several formatting issues flagged by static analysis:
- Spaces around equal signs (lines 84-86 all have this issue)
- Unquoted value on line 86
- Variable ordering (CALLBACK_MAX_RESPONSE_SIZE should come before CALLBACK_READ_TIMEOUT alphabetically)
While lines 84-85 also have spaces (existing code), it's a good opportunity to fix all three lines to follow .env best practices.
Apply this diff to fix the formatting:
-# Callback Timeouts and size limit(in seconds and MB respectively) -CALLBACK_CONNECT_TIMEOUT = 3 -CALLBACK_READ_TIMEOUT = 10 -CALLBACK_MAX_RESPONSE_SIZE = 1048576 #(1*1024*1024) +# Callback Timeouts and size limit (in seconds and MB respectively) +CALLBACK_CONNECT_TIMEOUT=3 +CALLBACK_MAX_RESPONSE_SIZE="1048576" +CALLBACK_READ_TIMEOUT=10Note: Added space after "limit" in the comment, removed spaces around
=, quoted the numeric value, and reordered alphabetically.backend/app/utils.py (1)
364-364: Remove redundant type cast.The
str()cast is unnecessary sincecallback_urlis already typed asstrin the function signature.- validate_callback_url(str(callback_url)) + validate_callback_url(callback_url)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.env.example(1 hunks)backend/app/api/routes/documents.py(3 hunks)backend/app/utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/api/routes/documents.py
🧰 Additional context used
📓 Path-based instructions (2)
.env.example
📄 CodeRabbit inference engine (CLAUDE.md)
Provide .env.example as the template for .env
Files:
.env.example
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
send_callback(36-48)
🪛 dotenv-linter (4.0.0)
.env.example
[warning] 84-84: [SpaceCharacter] The line has spaces around equal sign
(SpaceCharacter)
[warning] 85-85: [SpaceCharacter] The line has spaces around equal sign
(SpaceCharacter)
[warning] 86-86: [SpaceCharacter] The line has spaces around equal sign
(SpaceCharacter)
[warning] 86-86: [UnorderedKey] The CALLBACK_MAX_RESPONSE_SIZE key should go before the CALLBACK_READ_TIMEOUT key
(UnorderedKey)
[warning] 86-86: [ValueWithoutQuotes] This value needs to be surrounded in quotes
(ValueWithoutQuotes)
🪛 Ruff (0.14.8)
backend/app/utils.py
9-9: typing.Dict is deprecated, use dict instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/utils.py (2)
268-289: LGTM! Return type annotation has been corrected.The function correctly implements comprehensive IP blocking checks (loopback, link-local, multicast, private, reserved). The return type annotation
tuple[bool, str]now matches the actual return values, addressing the past review comment.
366-389: Excellent SSRF protection measures!The implementation demonstrates strong security practices:
- Pre-request URL validation prevents SSRF attacks
allow_redirects=Falseblocks redirect-based SSRFstream=Truewith chunked reading prevents memory exhaustion- Response size limit (enforced incrementally) protects against DoS via large responses
Note: The response body is read and discarded solely to enforce the size limit. While this consumes bandwidth, it's necessary to prevent attackers from exploiting unlimited response sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/app/utils.py (1)
291-339: Add DNS resolution timeout and normalize IPv6 addresses before IP checks.The HTTPS-only enforcement and per-IP classification are solid, but there are two gaps:
No timeout on
socket.getaddrinfo(DoS risk).
A slow/malicious DNS server can causevalidate_callback_urlto hang indefinitely. This was already flagged in a previous review and is still unresolved.IPv6 scope IDs may bypass blocking.
getaddrinfocan return addresses likefe80::1%eth0; passing this directly toipaddress.ip_addressraisesValueError, and_is_private_ipthen treats it as “not blocked”.You can address both with something along these lines:
+from concurrent.futures import ThreadPoolExecutor, TimeoutError as FuturesTimeoutError @@ - addr_info = socket.getaddrinfo( - parsed.hostname, - parsed.port or 443, - socket.AF_UNSPEC, - socket.SOCK_STREAM, - ) + # Resolve DNS with a bounded timeout to avoid hanging validation + with ThreadPoolExecutor(max_workers=1) as executor: + future = executor.submit( + socket.getaddrinfo, + parsed.hostname, + parsed.port or 443, + socket.AF_UNSPEC, + socket.SOCK_STREAM, + ) + try: + addr_info = future.result(timeout=5.0) # consider making this configurable + except FuturesTimeoutError: + raise ValueError("DNS resolution timeout for callback URL") @@ - for info in addr_info: - ip_address = info[4][0] - is_blocked, reason = _is_private_ip(ip_address) + for info in addr_info: + raw_ip = info[4][0] + # Strip IPv6 scope IDs like "fe80::1%eth0" before classification + ip_for_check = raw_ip.split("%", 1)[0] + is_blocked, reason = _is_private_ip(ip_for_check) if is_blocked: raise ValueError( - f"Callback URL resolves to {reason} IP address: {ip_address}. " + f"Callback URL resolves to {reason} IP address: {raw_ip}. " f"This IP type is not allowed for callbacks." )This keeps the existing validation semantics but makes the function resilient to DNS-based DoS and correctly handles IPv6 link-local addresses with scope IDs.
🧹 Nitpick comments (2)
backend/app/utils.py (2)
2-10: Use builtindict[...]instead oftyping.Dict(Py3.11 style).New imports look good. To satisfy the Ruff hint and modern typing style, you can drop
Dictand use builtin generics:-from typing import Any, Dict, Generic, Optional, TypeVar +from typing import Any, Generic, Optional, TypeVar @@ class APIResponse(BaseModel, Generic[T]): @@ - metadata: Optional[Dict[str, Any]] = None + metadata: Optional[dict[str, Any]] = NoneAlso applies to: 34-37
268-289: Consider also blocking “unspecified” addresses.The checks for loopback, link-local, multicast, private, and reserved ranges look solid. You may also want to treat
ip_obj.is_unspecified(e.g.0.0.0.0,::) as blocked to avoid callbacks to non-routable/placeholder addresses:- checks = [ + checks = [ (ip_obj.is_loopback, "loopback/localhost"), (ip_obj.is_link_local, "link-local"), (ip_obj.is_multicast, "multicast"), (ip_obj.is_private, "private"), (ip_obj.is_reserved, "reserved"), + (ip_obj.is_unspecified, "unspecified"), ]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
send_callback(36-48)
🪛 Ruff (0.14.8)
backend/app/utils.py
9-9: typing.Dict is deprecated, use dict instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/app/utils.py (2)
268-289:_is_private_iplogic and typing look solid; consider optionally treating “unspecified” as blocked.The function now returns
tuple[bool, str]consistently with its callers, and the set of checks (loopback, link-local, multicast, private, reserved) is a good baseline for callback/IP classification. If you want to be extra strict, you could also treatip_obj.is_unspecifiedas blocked (e.g.,0.0.0.0,::) so that clearly non-routable addresses are rejected up front, but this is optional and not a blocker.
9-9: Optional: modernizeDictusage to built‑indictfor type hints.Ruff’s UP035 hint applies here: for Python 3.11+ you can drop
Dictentirely and rely on PEP‑585 style built‑ins, which you’re already using elsewhere (dict[str, Any]insend_callback):-from typing import Any, Dict, Generic, Optional, TypeVar +from typing import Any, Generic, Optional, TypeVar @@ - metadata: Optional[Dict[str, Any]] = None + metadata: Optional[dict[str, Any]] = None @@ - cls, data: T, metadata: Optional[Dict[str, Any]] = None + cls, data: T, metadata: Optional[dict[str, Any]] = None @@ - metadata: Optional[Dict[str, Any]] = None, + metadata: Optional[dict[str, Any]] = None,This is non‑functional and can be done whenever you next touch this class.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
send_callback(36-48)
🪛 Ruff (0.14.8)
backend/app/utils.py
9-9: typing.Dict is deprecated, use dict instead
(UP035)
394-394: f-string without any placeholders
Remove extraneous f prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/utils.py (2)
2-10: New networking imports are appropriate and used correctly.
ipaddress,socket, andurlparseare all exercised in the new SSRF helpers and keep concerns local to this module; no issues from a dependency or usage standpoint.
291-339:validate_callback_urlprovides a good HTTPS-only, IP-based SSRF guardrail.The URL parsing + HTTPS enforcement + hostname requirement +
getaddrinfoover all resolved addresses, combined with_is_private_ipchecks, gives you reasonable coverage against private/loopback/link-local/multicast/reserved targets and basic DNS‑rebind scenarios. The exception handling is also clear: callers seeValueErrorwith a useful message for both policy rejections and unexpected resolution errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/app/utils.py (2)
9-9: Use lowercasedictinstead of deprecatedtyping.Dict.For Python 3.9+, the built-in
dictcan be used directly in type annotations. TheDictimport is unused after you've already updatedsend_callbackto usedict[str, Any].-from typing import Any, Dict, Generic, Optional, TypeVar +from typing import Any, Generic, Optional, TypeVarThen update remaining usages in
APIResponse:- metadata: Optional[Dict[str, Any]] = None + metadata: Optional[dict[str, Any]] = None
363-367: Remove unnecessarystr()conversion.The
callback_urlparameter is already typed asstr, so thestr()conversion is redundant.try: - validate_callback_url(str(callback_url)) + validate_callback_url(callback_url) except ValueError as ve:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (2)
backend/app/api/routes/threads.py (1)
send_callback(36-48)backend/app/core/cloud/storage.py (2)
stream(125-127)stream(179-195)
🪛 Ruff (0.14.8)
backend/app/utils.py
9-9: typing.Dict is deprecated, use dict instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/utils.py (3)
268-289: LGTM!The IP validation logic is well-structured with comprehensive checks covering loopback, link-local, multicast, private, and reserved IP ranges. The return type annotation is correct, and gracefully handling
ValueErrorfor invalid IP strings is appropriate.
291-339: LGTM!The SSRF validation is thorough: enforces HTTPS, validates hostname presence, resolves DNS, and checks all returned IP addresses against the private/reserved ranges. Checking all IPs from
getaddrinfoprovides protection against DNS round-robin attacks.
369-401: LGTM on the SSRF hardening!The implementation addresses all key SSRF vectors:
session.trust_env = Falseprevents proxy-based SSRFallow_redirects=Falseblocks redirect-based attacksstream=Truewith chunked size checking prevents response-based DoS- Pre-request URL validation with IP resolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/api/routes/llm.py (1)
1-63: Fix Black formatting failure reported by CIThe CI pipeline reports that Black reformatted at least one file. Please run your formatting hooks locally (for example,
pre-commit run --all-filesorblack .) and commit the resulting changes so the AI Platform CI check passes.
🧹 Nitpick comments (2)
backend/app/api/routes/llm.py (2)
49-51: Good SSRF guard; consider surfacing invalid callback URLs as 4xx instead of 500Validating
request.callback_urlwithvalidate_callback_urlbefore starting the job is the right place to enforce SSRF protections and HTTPS-only callbacks.However,
validate_callback_urlraisesValueError, which in a FastAPI route will typically bubble up as a 500 unless you already map it to a client error globally. To make invalid callbacks clearly a client-side issue and avoid generic 500s, consider catchingValueErrorhere and translating it to anHTTPException(400/422), e.g.:from fastapi import HTTPException def llm_call(...): ... if request.callback_url is not None: try: validate_callback_url(str(request.callback_url)) except ValueError as exc: raise HTTPException(status_code=400, detail=str(exc))Adjust status code and error shaping to match your existing API conventions and any global exception handlers.
22-23: Add explicit return type hints for route handlers (optional)To align with the project guideline of using type hints, you may want to add explicit return types, for example:
def llm_callback_notification(body: APIResponse[LLMCallResponse]) -> None: # or appropriate type def llm_call( _current_user: AuthContextDep, _session: SessionDep, request: LLMCallRequest, ) -> APIResponse[Message]: ...This helps static analysis and keeps the API surface self-documenting.
Also applies to: 40-41
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/api/routes/documents.py(3 hunks)backend/app/api/routes/llm.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/api/routes/documents.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/api/routes/llm.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/llm.py
🧬 Code graph analysis (1)
backend/app/api/routes/llm.py (1)
backend/app/utils.py (1)
validate_callback_url(291-339)
🪛 GitHub Actions: AI Platform CI
backend/app/api/routes/llm.py
[error] 1-1: Black formatting failed: 1 file reformatted. Run 'pre-commit run --all-files' or 'black .' to fix formatting.
🔇 Additional comments (1)
backend/app/api/routes/llm.py (1)
8-8: Import updates align with new callback validation utilityImporting
APIResponse,validate_callback_url, andload_descriptionfromapp.utilsmatches their usage in this module and keeps SSRF-related logic centralized. No issues here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
backend/app/api/routes/llm.py (2)
8-8: Import usage looks good; consider aligningload_descriptioncall with itsPath-typed signatureBringing
validate_callback_urlinto this module fits the new SSRF validation flow and matches its use below.One minor thing:
load_descriptioninapp.utilsis annotated to accept aPathand callsfilename.exists()/filename.read_text(), while here it’s invoked with a string literal:description = load_description("llm/llm_call.md")If
load_descriptionreally expects aPath, consider either:
- Passing a
Pathhere:+from pathlib import Path @@ - description=load_description("llm/llm_call.md"), + description=load_description(Path("llm/llm_call.md")),or
- Updating
load_descriptionto acceptstr | Pathand normalize internally.Please double-check the current
load_descriptionimplementation to ensure a plainstrwon’t raise at import time.Also applies to: 35-35
48-50: Callback URL pre-validation is correct; consider surfacing a clear 4xx instead of a generic 500 on failureThe new guard:
if request.callback_url: validate_callback_url(str(request.callback_url))is in the right place and correctly reuses the central SSRF protections (HTTPS-only, DNS/IP checks, etc.) before enqueueing the job.
One behavior to check:
validate_callback_urlraisesValueErroron invalid URLs. Unless you already have a global exception handler that mapsValueErrorto a 4xx response, FastAPI will treat this as an unhandled exception and return a 500.If you don’t have that handler, consider wrapping this locally, e.g.:
+from fastapi import HTTPException @@ - if request.callback_url: - validate_callback_url(str(request.callback_url)) + if request.callback_url: + try: + validate_callback_url(str(request.callback_url)) + except ValueError as e: + raise HTTPException(status_code=400, detail=str(e))This keeps the SSRF protection while giving clients a clear, client-error response when they submit a bad callback URL.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/api/routes/llm.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/api/routes/llm.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/llm.py
🧬 Code graph analysis (1)
backend/app/api/routes/llm.py (1)
backend/app/utils.py (2)
validate_callback_url(291-339)load_description(405-410)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
backend/app/utils.py (2)
268-289: Consider simplifying tuple return syntax.The function logic is solid and provides comprehensive SSRF protection. However, the explicit parentheses in tuple returns (e.g.,
return (True, reason)) are optional in Python and commonly omitted for cleaner code.Apply this diff for a more idiomatic style:
for is_blocked, reason in checks: if is_blocked: - return (True, reason) + return True, reason - return (False, "") + return False, "" except ValueError: - return (False, "") + return False, ""
291-340: Strong SSRF protection implementation.The validation logic is comprehensive:
- HTTPS-only enforcement prevents protocol smuggling
- DNS resolution with
getaddrinfocovers both IPv4 and IPv6- All resolved IPs are checked against private/reserved ranges
- Proper exception handling with clear error messages
The implementation effectively blocks private IPs, localhost, link-local (including 169.254.169.254 cloud metadata), multicast, and reserved ranges as documented.
Minor suggestion: Consider renaming
ip_address(line 328) toresolved_ipto avoid confusion with theipaddressmodule:for info in addr_info: - ip_address = info[4][0] - is_blocked, reason = _is_private_ip(ip_address) + resolved_ip = info[4][0] + is_blocked, reason = _is_private_ip(resolved_ip) if is_blocked: raise ValueError( - f"Callback URL resolves to {reason} IP address: {ip_address}. " + f"Callback URL resolves to {reason} IP address: {resolved_ip}. " f"This IP type is not allowed for callbacks." )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.env.example(1 hunks)backend/app/core/config.py(1 hunks)backend/app/tests/utils/test_callback_ssrf.py(1 hunks)backend/app/utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/app/core/config.py
- .env.example
- backend/app/tests/utils/test_callback_ssrf.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/utils.py
🧬 Code graph analysis (1)
backend/app/utils.py (1)
backend/app/api/routes/threads.py (1)
send_callback(36-48)
🪛 Ruff (0.14.8)
backend/app/utils.py
9-9: typing.Dict is deprecated, use dict instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/utils.py (2)
2-2: LGTM! Import additions are appropriate for SSRF protection.The new imports (
ipaddress,socket,urlparse) are necessary and correctly used for the URL validation and IP address checking functionality.Also applies to: 8-8, 10-10
342-389: Strong security improvements, but response size limiting mentioned in PR objectives is not visible in implementation.The SSRF protections are excellent:
- Pre-validation via
validate_callback_urlblocks malicious URLstrust_env=Falseprevents environment proxy hijackingallow_redirects=Falseprevents redirect-based SSRF- Proper error handling and logging
However, the PR summary states: "Adds a configurable limit on callback response size (controlled via an environment variable)," but I don't see explicit response size limiting in the code. While the function doesn't read the response body (it only checks the status code), requests may still buffer the response.
Consider one of these approaches:
Option 1: Stream response and limit size (if you need to verify response is reasonable)
response = session.post( callback_url, json=data, timeout=( settings.CALLBACK_CONNECT_TIMEOUT, settings.CALLBACK_READ_TIMEOUT, ), allow_redirects=False, + stream=True, ) response.raise_for_status() + + # Limit response size to prevent memory exhaustion + max_size = getattr(settings, 'CALLBACK_MAX_RESPONSE_SIZE', 1024 * 1024) # 1MB default + total_size = 0 + for chunk in response.iter_content(chunk_size=8192): + total_size += len(chunk) + if total_size > max_size: + logger.warning("[send_callback] Response size exceeded limit") + break + response.close()Option 2: Simply stream without reading (if you don't care about response body)
response = session.post( callback_url, json=data, timeout=( settings.CALLBACK_CONNECT_TIMEOUT, settings.CALLBACK_READ_TIMEOUT, ), allow_redirects=False, + stream=True, ) response.raise_for_status() + response.close() # Close without reading bodyCan you clarify if response size limiting was intentionally omitted or if it needs to be added per the PR objectives?
Summary
Target issue is #426
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
send_callbackfunction by adding callback url validation to check if the url is a proper public https url and not one of these - private, multicast, link-local (which covers cloud metadata endpoints), localhost/loopback and reservedsession.trust_env=falsefor extra protection as it ignores environment proxiesSummary by CodeRabbit
New Features
Security Improvements
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.