security: fix 15 open CodeQL alerts by prsasattms · Pull Request #173 · AzureCosmosDB/OmniVec

prsasattms · 2026-06-03T09:25:51Z

Fixes all 15 open alerts at https://github.com/AzureCosmosDB/OmniVec/security/code-scanning.

Summary by rule

Rule	Count	Fix
`py/full-ssrf`	4	URL validators (`_validate_blob_url` x2, `_validate_account_url`) now rebuild the URL from validated components via `urlunparse`. Fragments + userinfo dropped; path/query percent-encoded. Sanitization boundary is explicit to static analyzers.
`py/partial-ssrf`	3	Agent-proxy endpoints in `api/api.py` (`/api/agent/sessions[/...]`) now run `user` and `session_id` through the existing `safe_url_segment()` helper before path concatenation; bad input rejected with 400.
`py/stack-trace-exposure`	4	Replaced `str(e)` in HTTP responses with a generic message + 12-char `error_id`. Full exception goes to `logger.exception(...)` server-side. Affects purge-source, eventgrid provisioning, and the docgrok model-healthcheck endpoint.
`py/log-injection`	1	`delete_by_source_id` in `api/connectors/postgres_connector.py` strips CR/LF and truncates `source_id` before `logger.info`.
`js/incomplete-sanitization`	2	Inline `onclick=agentDecide(...)` in `web/static/index.html` now escapes `\` before `'`, so a tool name containing a literal backslash cannot break out of the JS string.
`js/identity-replacement`	1	Dropped the no-op `.replace('scene','scene')` in `web/static/intro.html`.

Files changed

api/api.py
api/connectors/blob_connector.py
api/connectors/postgres_connector.py
docgrok/admin.py (also adds logging.getLogger(__name__))
docgrok/api.py
docgrok/services/embedding/bge/api.py
web/static/index.html
web/static/intro.html

Validation

python -m py_compile clean on every changed .py file.
Outbound allowlist behavior preserved: only Azure storage suffixes (and any host in OUTBOUND_HOST_ALLOWLIST / BLOB_ACCOUNT_HOST_ALLOWLIST) — matches the stated policy of internal + Azure-only outbound calls.
No public API contract changes; error responses gain an error_id field but keep the same shape.

Addresses all 15 open CodeQL alerts on AzureCosmosDB/OmniVec. py/full-ssrf (4 alerts): - docgrok/services/embedding/bge/api.py: rebuild blob URL from validated components inside _validate_blob_url so the sanitizer boundary is explicit; drops fragment/userinfo, percent-encodes path/query. - docgrok/api.py: same treatment for _validate_blob_url. - api/connectors/blob_connector.py: same treatment for _validate_account_url (account URL is reduced to scheme://host[:port]). py/partial-ssrf (3 alerts in api/api.py): - agent_session_approvals / agent_sessions_list / agent_session_get / agent_session_delete: validate caller_id and session_id with safe_url_segment() before splicing into the agent proxy URL. Rejects path-traversal/scheme-injection input with 400. py/stack-trace-exposure (4 alerts): - api/api.py purge_source_pipelines: replace raw str(e) in response with a generic message + correlation error_id; log full exception server-side via logger.exception. - api/api.py _provision_blob_eventgrid: same correlation-id pattern. - docgrok/admin.py healthcheck: same — no longer leak MI token or httpx error details to caller, log them instead. py/log-injection (1 alert): - api/connectors/postgres_connector.py delete_by_source_id: strip CR/LF and truncate source_id before passing to logger.info. js/incomplete-sanitization (2 alerts): - web/static/index.html: escape backslashes before single quotes in inline onclick=agentDecide(...) handler so a tool name containing a literal backslash cannot break out of the JS string literal. js/identity-replacement (1 alert): - web/static/intro.html: drop the no-op .replace('scene','scene') — the preceding .replace('wires','scene') already produces the scene id. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-advanced-security · 2026-06-03T09:28:08Z

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

-        except Exception as e:
-            return {"ok": False, "status": 0, "detail": f"failed to acquire MI token: {str(e)[:150]}"}
+        except Exception:
+            logger.exception("failed to acquire MI token for model %s", model_id)


-        except Exception as e:
-            return {"ok": False, "status": 0, "detail": f"request failed: {str(e)[:150]}"}
+        except Exception:
+            logger.exception("healthcheck request failed for model %s", model_id)


- URL validators no longer percent-encode path/query during the urlunparse rebuild. Only userinfo + fragment are dropped, scheme is lower-cased and host validated. Verified byte-identical round-trip on a representative Azure SAS URL so the signature query string survives untouched. - blob_connector._validate_account_url now preserves path and query of the input (previously dropped to '') so callers that pass a container path or SAS continue to work. - Agent-proxy partial-SSRF fix switched from safe_url_segment() (regex [A-Za-z0-9_.-]{1,128} — would 400 any caller whose auth name contains '@', '+', etc.) to urllib.parse.quote(value, safe=''). This percent-encodes '/', '?', '#', '..' so they cannot escape the path segment, but accepts emails, UPNs and GUIDs verbatim. The downstream agent server URL-decodes path segments per HTTP spec, so the value it observes is unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

+    safe_sid = _urlquote(session_id, safe="")
    async with httpx.AsyncClient(timeout=httpx.Timeout(15.0)) as client:
-        r = await client.get(f"{_AGENT_URL}/v1/sessions/{user}/{session_id}/approvals", headers=_agent_headers(request))
+        r = await client.get(f"{_AGENT_URL}/v1/sessions/{safe_user}/{safe_sid}/approvals", headers=_agent_headers(request))


+    safe_sid = _urlquote(session_id, safe="")
    async with httpx.AsyncClient(timeout=httpx.Timeout(15.0)) as client:
-        r = await client.get(f"{_AGENT_URL}/v1/sessions/{user}/{session_id}", headers=_agent_headers(request))
+        r = await client.get(f"{_AGENT_URL}/v1/sessions/{safe_user}/{safe_sid}", headers=_agent_headers(request))


+    safe_sid = _urlquote(session_id, safe="")
    async with httpx.AsyncClient(timeout=httpx.Timeout(15.0)) as client:
-        r = await client.delete(f"{_AGENT_URL}/v1/sessions/{user}/{session_id}", headers=_agent_headers(request))
+        r = await client.delete(f"{_AGENT_URL}/v1/sessions/{safe_user}/{safe_sid}", headers=_agent_headers(request))


…ow-ups (#174) Add safe_agent_segment() helper that validates caller-id and session-id via a strict regex allowlist (alphanum + _.+@- only). CodeQL recognizes regex-anchored allowlists as proper sanitizers; the previous quote(safe='') approach only percent-encoded, which CodeQL still considered tainted for partial-ssrf because the value still flowed into the URL. Also sanitize model_id before logger.exception() in docgrok/admin.py to address the two new py/log-injection alerts. Follow-up to PR #173. Resolves 5 alerts opened by that PR. Co-authored-by: Pradeep Sasatt <prsasatt@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-advanced-security AI found potential problems Jun 3, 2026

View reviewed changes

prsasattms merged commit 96195c6 into main Jun 3, 2026
9 of 10 checks passed

prsasattms deleted the fix/code-scanning-alerts branch June 3, 2026 12:12

prsasattms mentioned this pull request Jun 3, 2026

security: stricter sanitizers for partial-ssrf and log-injection follow-ups #174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

security: fix 15 open CodeQL alerts#173

security: fix 15 open CodeQL alerts#173
prsasattms merged 2 commits into
mainfrom
fix/code-scanning-alerts

prsasattms commented Jun 3, 2026

Uh oh!

github-advanced-security AI commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

prsasattms commented Jun 3, 2026

Summary by rule

Files changed

Validation

Uh oh!

github-advanced-security AI commented Jun 3, 2026

What Enabling Code Scanning Means:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants