security: fix 15 open CodeQL alerts#173
Merged
Merged
Conversation
Addresses all 15 open CodeQL alerts on AzureCosmosDB/OmniVec.
py/full-ssrf (4 alerts):
- docgrok/services/embedding/bge/api.py: rebuild blob URL from
validated components inside _validate_blob_url so the sanitizer
boundary is explicit; drops fragment/userinfo, percent-encodes
path/query.
- docgrok/api.py: same treatment for _validate_blob_url.
- api/connectors/blob_connector.py: same treatment for
_validate_account_url (account URL is reduced to scheme://host[:port]).
py/partial-ssrf (3 alerts in api/api.py):
- agent_session_approvals / agent_sessions_list / agent_session_get /
agent_session_delete: validate caller_id and session_id with
safe_url_segment() before splicing into the agent proxy URL. Rejects
path-traversal/scheme-injection input with 400.
py/stack-trace-exposure (4 alerts):
- api/api.py purge_source_pipelines: replace raw str(e) in response
with a generic message + correlation error_id; log full exception
server-side via logger.exception.
- api/api.py _provision_blob_eventgrid: same correlation-id pattern.
- docgrok/admin.py healthcheck: same — no longer leak MI token or
httpx error details to caller, log them instead.
py/log-injection (1 alert):
- api/connectors/postgres_connector.py delete_by_source_id: strip CR/LF
and truncate source_id before passing to logger.info.
js/incomplete-sanitization (2 alerts):
- web/static/index.html: escape backslashes before single quotes in
inline onclick=agentDecide(...) handler so a tool name containing
a literal backslash cannot break out of the JS string literal.
js/identity-replacement (1 alert):
- web/static/intro.html: drop the no-op .replace('scene','scene') —
the preceding .replace('wires','scene') already produces the scene id.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool. What Enabling Code Scanning Means:
For more information about GitHub Code Scanning, check out the documentation. |
| except Exception as e: | ||
| return {"ok": False, "status": 0, "detail": f"failed to acquire MI token: {str(e)[:150]}"} | ||
| except Exception: | ||
| logger.exception("failed to acquire MI token for model %s", model_id) |
| except Exception as e: | ||
| return {"ok": False, "status": 0, "detail": f"request failed: {str(e)[:150]}"} | ||
| except Exception: | ||
| logger.exception("healthcheck request failed for model %s", model_id) |
- URL validators no longer percent-encode path/query during the
urlunparse rebuild. Only userinfo + fragment are dropped, scheme
is lower-cased and host validated. Verified byte-identical
round-trip on a representative Azure SAS URL so the signature
query string survives untouched.
- blob_connector._validate_account_url now preserves path and query
of the input (previously dropped to '') so callers that pass a
container path or SAS continue to work.
- Agent-proxy partial-SSRF fix switched from safe_url_segment()
(regex [A-Za-z0-9_.-]{1,128} — would 400 any caller whose auth
name contains '@', '+', etc.) to urllib.parse.quote(value, safe='').
This percent-encodes '/', '?', '#', '..' so they cannot escape
the path segment, but accepts emails, UPNs and GUIDs verbatim.
The downstream agent server URL-decodes path segments per HTTP
spec, so the value it observes is unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| safe_sid = _urlquote(session_id, safe="") | ||
| async with httpx.AsyncClient(timeout=httpx.Timeout(15.0)) as client: | ||
| r = await client.get(f"{_AGENT_URL}/v1/sessions/{user}/{session_id}/approvals", headers=_agent_headers(request)) | ||
| r = await client.get(f"{_AGENT_URL}/v1/sessions/{safe_user}/{safe_sid}/approvals", headers=_agent_headers(request)) |
| safe_sid = _urlquote(session_id, safe="") | ||
| async with httpx.AsyncClient(timeout=httpx.Timeout(15.0)) as client: | ||
| r = await client.get(f"{_AGENT_URL}/v1/sessions/{user}/{session_id}", headers=_agent_headers(request)) | ||
| r = await client.get(f"{_AGENT_URL}/v1/sessions/{safe_user}/{safe_sid}", headers=_agent_headers(request)) |
| safe_sid = _urlquote(session_id, safe="") | ||
| async with httpx.AsyncClient(timeout=httpx.Timeout(15.0)) as client: | ||
| r = await client.delete(f"{_AGENT_URL}/v1/sessions/{user}/{session_id}", headers=_agent_headers(request)) | ||
| r = await client.delete(f"{_AGENT_URL}/v1/sessions/{safe_user}/{safe_sid}", headers=_agent_headers(request)) |
prsasattms
added a commit
that referenced
this pull request
Jun 3, 2026
…ow-ups (#174) Add safe_agent_segment() helper that validates caller-id and session-id via a strict regex allowlist (alphanum + _.+@- only). CodeQL recognizes regex-anchored allowlists as proper sanitizers; the previous quote(safe='') approach only percent-encoded, which CodeQL still considered tainted for partial-ssrf because the value still flowed into the URL. Also sanitize model_id before logger.exception() in docgrok/admin.py to address the two new py/log-injection alerts. Follow-up to PR #173. Resolves 5 alerts opened by that PR. Co-authored-by: Pradeep Sasatt <prsasatt@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes all 15 open alerts at https://github.com/AzureCosmosDB/OmniVec/security/code-scanning.
Summary by rule
py/full-ssrf_validate_blob_urlx2,_validate_account_url) now rebuild the URL from validated components viaurlunparse. Fragments + userinfo dropped; path/query percent-encoded. Sanitization boundary is explicit to static analyzers.py/partial-ssrfapi/api.py(/api/agent/sessions[/...]) now runuserandsession_idthrough the existingsafe_url_segment()helper before path concatenation; bad input rejected with 400.py/stack-trace-exposurestr(e)in HTTP responses with a generic message + 12-charerror_id. Full exception goes tologger.exception(...)server-side. Affects purge-source, eventgrid provisioning, and the docgrok model-healthcheck endpoint.py/log-injectiondelete_by_source_idinapi/connectors/postgres_connector.pystrips CR/LF and truncatessource_idbeforelogger.info.js/incomplete-sanitizationonclick=agentDecide(...)inweb/static/index.htmlnow escapes\before', so a tool name containing a literal backslash cannot break out of the JS string.js/identity-replacement.replace('scene','scene')inweb/static/intro.html.Files changed
api/api.pyapi/connectors/blob_connector.pyapi/connectors/postgres_connector.pydocgrok/admin.py(also addslogging.getLogger(__name__))docgrok/api.pydocgrok/services/embedding/bge/api.pyweb/static/index.htmlweb/static/intro.htmlValidation
python -m py_compileclean on every changed.pyfile.OUTBOUND_HOST_ALLOWLIST/BLOB_ACCOUNT_HOST_ALLOWLIST) — matches the stated policy of internal + Azure-only outbound calls.error_idfield but keep the same shape.