security: stricter sanitizers for partial-ssrf and log-injection follow-ups#174
Merged
Conversation
…ow-ups Add safe_agent_segment() helper that validates caller-id and session-id via a strict regex allowlist (alphanum + _.+@- only). CodeQL recognizes regex-anchored allowlists as proper sanitizers; the previous quote(safe='') approach only percent-encoded, which CodeQL still considered tainted for partial-ssrf because the value still flowed into the URL. Also sanitize model_id before logger.exception() in docgrok/admin.py to address the two new py/log-injection alerts. Follow-up to PR #173. Resolves 5 alerts opened by that PR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #173. The CodeQL re-scan on
mainafter #173 merged closed all 15 original alerts but opened 5 new ones at the same locations because the fix patterns weren't strict enough:py/partial-ssrfatapi/api.py:1146/1183/1195—urllib.parse.quote(value, safe='')percent-encodes but CodeQL doesn't treat it as a sanitizer (the value still flows into the URL).py/log-injectionatdocgrok/admin.py:328/340— newlogger.exception(...)calls passmodel_id(user-controlled) directly into the log format string.Fix
api/security_utils.py: newsafe_agent_segment(value)helper that validates against a strict regex allowlist (^[A-Za-z0-9_.+@-]{1,256}$) and raises ValueError otherwise. Accepts UPNs/emails/GUIDs/IDs but rejects/?#and control chars. Returns the value unchanged — the regex itself is the sanitizer CodeQL recognizes.api/api.py: all 4 agent-proxy endpoints (sessions list/get/delete/approvals) now usesafe_agent_segmentwrapped in try/except → HTTP 400 on bad input.docgrok/admin.py:model_idis sanitized via.replace('\r',' ').replace('\n',' ')[:128]before being passed tologger.exception.Verification
Will be confirmed by the post-merge CodeQL re-scan: expected = 0 open alerts on
main.