Skip to content

Add opt-in observed URL domain audit pipeline for tool responses and write sinks#7987

Merged
lpcox merged 6 commits into
mainfrom
copilot/gateway-surface-observed-url-domains
Jun 23, 2026
Merged

Add opt-in observed URL domain audit pipeline for tool responses and write sinks#7987
lpcox merged 6 commits into
mainfrom
copilot/gateway-surface-observed-url-domains

Conversation

Copilot AI commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Safe-outputs currently redacts URLs for domains not explicitly allowlisted, which makes open-web research output unusable and provides no runtime visibility into what domains were actually encountered. This change adds an opt-in audit path in the gateway to surface observed domains as a session artifact that downstream safe-outputs logic can consume.

  • Config + runtime toggle

    • Added --url-domain-audit and MCP_GATEWAY_URL_DOMAIN_AUDIT.
    • Wired toggle into runtime config (Gateway.URLDomainAudit) and logger audit enablement.
  • Observed domain artifact

    • Added a dedicated logger that maintains observed-url-domains.json in the log directory.
    • Stores unique domains per source key (server ID / write-sink), updates atomically, and initializes alongside existing gateway loggers.
  • Middleware domain extraction (tool responses)

    • In jqschema middleware, when audit mode is enabled, extracts URL hostnames from response payload string leaves and records them under the backend server ID inferred from server___tool.
  • Write-sink body audit (safe-output writes)

    • In WriteSinkGuard.LabelResource, when audit mode is enabled, extracts URL hostnames from tool args/body, emits debug audit logs, and records domains to the same observed-domains artifact.
    • No enforcement behavior was added here; this is observability-only.
  • Shared URL extraction utility

    • Added internal/urlutil for recursive value-walk + hostname extraction to avoid duplicated parsing logic across middleware and guard paths.
  • Docs

    • Documented MCP_GATEWAY_URL_DOMAIN_AUDIT in environment variable reference.
// CLI/env-controlled opt-in audit mode
cmd.Flags().BoolVar(
    &urlDomainAudit,
    "url-domain-audit",
    envutil.GetEnvBool("MCP_GATEWAY_URL_DOMAIN_AUDIT", false),
    "Observe and persist URL domains seen in tool responses and safe-output writes",
)

// Runtime wiring
applyFlagOrEnv(cmd, "url-domain-audit", &cfg.Gateway.URLDomainAudit, urlDomainAudit, false)
logger.SetURLDomainAuditEnabled(cfg.Gateway.URLDomainAudit)

GitHub Advanced Security started work on behalf of lpcox June 23, 2026 18:41 View session
GitHub Advanced Security finished work on behalf of lpcox June 23, 2026 18:42
GitHub Advanced Security started work on behalf of lpcox June 23, 2026 18:49 View session
GitHub Advanced Security finished work on behalf of lpcox June 23, 2026 18:51
GitHub Advanced Security started work on behalf of lpcox June 23, 2026 18:54 View session
GitHub Advanced Security finished work on behalf of lpcox June 23, 2026 18:55
GitHub Advanced Security started work on behalf of lpcox June 23, 2026 18:57 View session
GitHub Advanced Security finished work on behalf of lpcox June 23, 2026 18:58
GitHub Advanced Security started work on behalf of lpcox June 23, 2026 18:59 View session
Copilot AI changed the title [WIP] Add support for safe-outputs URL domain surfacing Add opt-in observed URL domain audit pipeline for tool responses and write sinks Jun 23, 2026
GitHub Advanced Security finished work on behalf of lpcox June 23, 2026 19:00
Copilot finished work on behalf of lpcox June 23, 2026 19:00
Copilot AI requested a review from lpcox June 23, 2026 19:00
@lpcox lpcox marked this pull request as ready for review June 23, 2026 19:58
Copilot AI review requested due to automatic review settings June 23, 2026 19:58

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in URL domain audit pipeline to the MCP Gateway so operators can see which URL domains were encountered at runtime (from tool responses and write-sink bodies) and persist them as a log-directory artifact (observed-url-domains.json) for downstream safe-outputs consumption.

Changes:

  • Adds --url-domain-audit / MCP_GATEWAY_URL_DOMAIN_AUDIT, wires it into runtime config, and gates auditing via a global logger toggle.
  • Introduces an observed-url-domains.json logger that records unique domains per source key (server ID and write-sink) using atomic file updates.
  • Implements domain extraction from tool response payloads (middleware) and write-sink request bodies (guard) using a shared internal/urlutil helper, with unit/integration test coverage.
Show a summary per file
File Description
internal/urlutil/domains.go New shared helper to extract URL hostnames from strings and recursively-walked JSON-like values.
internal/middleware/jqschema.go Hooks audit into tool response handling and infers server ID from server___tool names.
internal/middleware/jqschema_test.go Adds unit tests for URL domain extraction helpers.
internal/middleware/jqschema_integration_test.go Adds integration test ensuring observed-url-domains.json is written when audit is enabled.
internal/logger/registry.go Registers the observed-domains logger in gateway init/close sequences.
internal/logger/observed_url_domains_logger.go New logger implementation that persists observed domains per server ID to JSON.
internal/logger/init_test.go Updates registry/init expectations to include the new logger and file.
internal/logger/global_helpers.go Extends the generic logger constraint to include the new logger type.
internal/guard/write_sink.go Audits URL domains found in write-sink tool args/body when enabled.
internal/guard/write_sink_test.go Adds tests validating write-sink auditing writes to observed-url-domains.json.
internal/config/config_core.go Adds a runtime-only Gateway.URLDomainAudit flag (not loaded from config files).
internal/cmd/root.go Wires the CLI/env flag into runtime config and enables auditing in the logger package.
internal/cmd/flags_test.go Ensures --url-domain-audit flag is registered.
internal/cmd/flags_logging.go Registers the new CLI flag with env-default behavior.
docs/ENVIRONMENT_VARIABLES.md Documents MCP_GATEWAY_URL_DOMAIN_AUDIT and the output artifact file.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 15/15 changed files
  • Comments generated: 3

Comment thread internal/urlutil/domains.go Outdated
Comment on lines +60 to +70
for _, match := range matches {
parsed, err := url.Parse(match)
if err != nil {
continue
}
host := strings.ToLower(parsed.Hostname())
if host == "" {
continue
}
domainSet[host] = struct{}{}
}
Comment on lines 54 to 58
// closableLogger is a constraint for types that have a Close method.
// This is satisfied by *FileLogger, *JSONLLogger, *MarkdownLogger, *ServerFileLogger, and *ToolsLogger.
type closableLogger interface {
*FileLogger | *JSONLLogger | *MarkdownLogger | *ServerFileLogger | *ToolsLogger
*FileLogger | *JSONLLogger | *MarkdownLogger | *ServerFileLogger | *ToolsLogger | *ObservedURLDomainsLogger
Close() error
@lpcox

lpcox commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

@copilot address review feedback

…from URLs, and update closableLogger comment

Copilot AI commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

@copilot address review feedback

Addressed all three issues in the latest commit:

  1. Case-insensitive scheme (domains.go:13) — added (?i) to urlPattern so HTTPS://, HTTP://, etc. are matched.
  2. Trailing punctuation (domains.go:60-70) — strings.TrimRight strips ,, ., ;, :, !, ?, ), ], }, ", ' from each regex match before url.Parse, preventing keys like example.com, in observed-url-domains.json.
  3. Stale comment (global_helpers.go:54-58) — added *ObservedURLDomainsLogger to the closableLogger comment.

Also added test cases covering the upper-case scheme and trailing-punctuation scenarios in TestExtractURLDomains.

GitHub Advanced Security started work on behalf of lpcox June 23, 2026 21:53 View session
Copilot finished work on behalf of lpcox June 23, 2026 21:54
GitHub Advanced Security finished work on behalf of lpcox June 23, 2026 21:54
@lpcox lpcox merged commit 481a598 into main Jun 23, 2026
27 checks passed
@lpcox lpcox deleted the copilot/gateway-surface-observed-url-domains branch June 23, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants