Skip to content

security: fix 15 open CodeQL alerts#173

Merged
prsasattms merged 2 commits into
mainfrom
fix/code-scanning-alerts
Jun 3, 2026
Merged

security: fix 15 open CodeQL alerts#173
prsasattms merged 2 commits into
mainfrom
fix/code-scanning-alerts

Conversation

@prsasattms

Copy link
Copy Markdown
Collaborator

Fixes all 15 open alerts at https://github.com/AzureCosmosDB/OmniVec/security/code-scanning.

Summary by rule

Rule Count Fix
py/full-ssrf 4 URL validators (_validate_blob_url x2, _validate_account_url) now rebuild the URL from validated components via urlunparse. Fragments + userinfo dropped; path/query percent-encoded. Sanitization boundary is explicit to static analyzers.
py/partial-ssrf 3 Agent-proxy endpoints in api/api.py (/api/agent/sessions[/...]) now run user and session_id through the existing safe_url_segment() helper before path concatenation; bad input rejected with 400.
py/stack-trace-exposure 4 Replaced str(e) in HTTP responses with a generic message + 12-char error_id. Full exception goes to logger.exception(...) server-side. Affects purge-source, eventgrid provisioning, and the docgrok model-healthcheck endpoint.
py/log-injection 1 delete_by_source_id in api/connectors/postgres_connector.py strips CR/LF and truncates source_id before logger.info.
js/incomplete-sanitization 2 Inline onclick=agentDecide(...) in web/static/index.html now escapes \ before ', so a tool name containing a literal backslash cannot break out of the JS string.
js/identity-replacement 1 Dropped the no-op .replace('scene','scene') in web/static/intro.html.

Files changed

  • api/api.py
  • api/connectors/blob_connector.py
  • api/connectors/postgres_connector.py
  • docgrok/admin.py (also adds logging.getLogger(__name__))
  • docgrok/api.py
  • docgrok/services/embedding/bge/api.py
  • web/static/index.html
  • web/static/intro.html

Validation

  • python -m py_compile clean on every changed .py file.
  • Outbound allowlist behavior preserved: only Azure storage suffixes (and any host in OUTBOUND_HOST_ALLOWLIST / BLOB_ACCOUNT_HOST_ALLOWLIST) — matches the stated policy of internal + Azure-only outbound calls.
  • No public API contract changes; error responses gain an error_id field but keep the same shape.

Addresses all 15 open CodeQL alerts on AzureCosmosDB/OmniVec.

py/full-ssrf (4 alerts):
- docgrok/services/embedding/bge/api.py: rebuild blob URL from
  validated components inside _validate_blob_url so the sanitizer
  boundary is explicit; drops fragment/userinfo, percent-encodes
  path/query.
- docgrok/api.py: same treatment for _validate_blob_url.
- api/connectors/blob_connector.py: same treatment for
  _validate_account_url (account URL is reduced to scheme://host[:port]).

py/partial-ssrf (3 alerts in api/api.py):
- agent_session_approvals / agent_sessions_list / agent_session_get /
  agent_session_delete: validate caller_id and session_id with
  safe_url_segment() before splicing into the agent proxy URL. Rejects
  path-traversal/scheme-injection input with 400.

py/stack-trace-exposure (4 alerts):
- api/api.py purge_source_pipelines: replace raw str(e) in response
  with a generic message + correlation error_id; log full exception
  server-side via logger.exception.
- api/api.py _provision_blob_eventgrid: same correlation-id pattern.
- docgrok/admin.py healthcheck: same — no longer leak MI token or
  httpx error details to caller, log them instead.

py/log-injection (1 alert):
- api/connectors/postgres_connector.py delete_by_source_id: strip CR/LF
  and truncate source_id before passing to logger.info.

js/incomplete-sanitization (2 alerts):
- web/static/index.html: escape backslashes before single quotes in
  inline onclick=agentDecide(...) handler so a tool name containing
  a literal backslash cannot break out of the JS string literal.

js/identity-replacement (1 alert):
- web/static/intro.html: drop the no-op .replace('scene','scene') —
  the preceding .replace('wires','scene') already produces the scene id.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-advanced-security

Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

Comment thread docgrok/admin.py
except Exception as e:
return {"ok": False, "status": 0, "detail": f"failed to acquire MI token: {str(e)[:150]}"}
except Exception:
logger.exception("failed to acquire MI token for model %s", model_id)
Comment thread docgrok/admin.py
except Exception as e:
return {"ok": False, "status": 0, "detail": f"request failed: {str(e)[:150]}"}
except Exception:
logger.exception("healthcheck request failed for model %s", model_id)
- URL validators no longer percent-encode path/query during the
  urlunparse rebuild. Only userinfo + fragment are dropped, scheme
  is lower-cased and host validated. Verified byte-identical
  round-trip on a representative Azure SAS URL so the signature
  query string survives untouched.

- blob_connector._validate_account_url now preserves path and query
  of the input (previously dropped to '') so callers that pass a
  container path or SAS continue to work.

- Agent-proxy partial-SSRF fix switched from safe_url_segment()
  (regex [A-Za-z0-9_.-]{1,128} — would 400 any caller whose auth
  name contains '@', '+', etc.) to urllib.parse.quote(value, safe='').
  This percent-encodes '/', '?', '#', '..' so they cannot escape
  the path segment, but accepts emails, UPNs and GUIDs verbatim.
  The downstream agent server URL-decodes path segments per HTTP
  spec, so the value it observes is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread api/api.py
safe_sid = _urlquote(session_id, safe="")
async with httpx.AsyncClient(timeout=httpx.Timeout(15.0)) as client:
r = await client.get(f"{_AGENT_URL}/v1/sessions/{user}/{session_id}/approvals", headers=_agent_headers(request))
r = await client.get(f"{_AGENT_URL}/v1/sessions/{safe_user}/{safe_sid}/approvals", headers=_agent_headers(request))
Comment thread api/api.py
safe_sid = _urlquote(session_id, safe="")
async with httpx.AsyncClient(timeout=httpx.Timeout(15.0)) as client:
r = await client.get(f"{_AGENT_URL}/v1/sessions/{user}/{session_id}", headers=_agent_headers(request))
r = await client.get(f"{_AGENT_URL}/v1/sessions/{safe_user}/{safe_sid}", headers=_agent_headers(request))
Comment thread api/api.py
safe_sid = _urlquote(session_id, safe="")
async with httpx.AsyncClient(timeout=httpx.Timeout(15.0)) as client:
r = await client.delete(f"{_AGENT_URL}/v1/sessions/{user}/{session_id}", headers=_agent_headers(request))
r = await client.delete(f"{_AGENT_URL}/v1/sessions/{safe_user}/{safe_sid}", headers=_agent_headers(request))
@prsasattms prsasattms merged commit 96195c6 into main Jun 3, 2026
9 of 10 checks passed
@prsasattms prsasattms deleted the fix/code-scanning-alerts branch June 3, 2026 12:12
prsasattms added a commit that referenced this pull request Jun 3, 2026
…ow-ups (#174)

Add safe_agent_segment() helper that validates caller-id and session-id
via a strict regex allowlist (alphanum + _.+@- only). CodeQL recognizes
regex-anchored allowlists as proper sanitizers; the previous quote(safe='')
approach only percent-encoded, which CodeQL still considered tainted for
partial-ssrf because the value still flowed into the URL.

Also sanitize model_id before logger.exception() in docgrok/admin.py to
address the two new py/log-injection alerts.

Follow-up to PR #173. Resolves 5 alerts opened by that PR.

Co-authored-by: Pradeep Sasatt <prsasatt@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants