Design proposal: Token exchange to acquire tokens for external auth #2063

jhrozek · 2025-10-02T14:17:41Z

Adds a design for #2041

JAORMX · 2025-10-02T14:21:22Z

@claude please review this with extra focus on security. Do the review in English. Please provide some jokes in Czech at the end of the review.

claude · 2025-10-02T14:21:38Z

Claude finished @JAORMX's task —— View job

Security Review of Token Exchange Design Proposal

I've completed a thorough security analysis of the token exchange middleware design. Here's my comprehensive review:

Todo List:

Read the token exchange design proposal document
Analyze security implications and potential vulnerabilities
Review authentication and authorization flows
Check for proper token handling and storage practices
Evaluate attack vectors and mitigation strategies
Provide comprehensive security feedback
Add Czech jokes as requested

Security Analysis

🛡️ Strengths

RFC 8693 Compliance: Following the standard RFC 8693 token exchange specification provides a solid security foundation with well-defined behaviors.
Proper Middleware Ordering: Placing token exchange middleware after auth middleware ensures incoming tokens are validated before exchange attempts.
Audience Validation: The design properly validates token audiences (aud=proxy → aud=upstream), which is critical for preventing token misuse.
Secret Management: Using SecretKeyRef in Kubernetes and proper client secret handling shows awareness of credential security.

⚠️ Security Concerns & Recommendations

Token Caching Security Gap (docs/proposals/token-exchange-middleware.md:67)
- Issue: "run the token exchange for each request" creates performance issues and potential rate limiting
- Risk: No mention of secure token caching mechanisms
- Recommendation: Implement secure in-memory token cache with:
  - Encrypted token storage
  - Proper cache invalidation based on token expiry
  - Memory clearing on process termination
Client Secret Exposure Risk (docs/proposals/token-exchange-middleware.md:74-79)
- Issue: Client secrets in CLI arguments could be exposed in process lists
- Risk: Secrets visible via ps commands, shell history, logs
- Recommendation: Support environment variables and secret files only, never CLI args for secrets
Token Storage in Transit (docs/proposals/token-exchange-middleware.md:35-39)
- Issue: No explicit mention of token handling security during exchange
- Risk: Tokens in memory, logs, or temporary storage
- Recommendation:
  - Ensure all HTTP requests use TLS
  - Clear sensitive data from memory after use
  - Implement structured logging that redacts tokens
Missing Token Validation (docs/proposals/token-exchange-middleware.md:33)
- Issue: Token extraction lacks validation before exchange
- Risk: Malformed or expired tokens sent to OAuth server
- Recommendation: Pre-validate token format and expiry before exchange attempt
OAuth Server Trust Boundary (docs/proposals/token-exchange-middleware.md:35)
- Issue: No certificate pinning or validation requirements mentioned
- Risk: MITM attacks on token exchange endpoint
- Recommendation: Implement certificate validation and consider certificate pinning
Error Information Disclosure (General concern)
- Issue: OAuth errors might leak sensitive information
- Risk: Token details, internal service info exposed in error messages
- Recommendation: Sanitize error responses and log detailed errors server-side only
Replay Attack Prevention (Missing consideration)
- Issue: No mention of replay attack prevention
- Risk: Intercepted requests could be replayed
- Recommendation: Consider implementing nonce or timestamp validation

🔒 Additional Security Recommendations

Rate Limiting: Implement rate limiting on token exchange to prevent abuse
Audit Logging: Log all token exchange attempts with correlation IDs
Token Revocation: Handle token revocation scenarios gracefully
Circuit Breaker: Implement circuit breaker pattern for OAuth server failures
Metrics & Monitoring: Add security metrics for token exchange failures/anomalies

✅ Implementation Security Checklist

Use secure HTTP client with proper TLS verification
Implement token cache with encryption and expiry
Add comprehensive audit logging
Sanitize all error messages before client response
Use secure random generators for any nonces/IDs
Implement proper cleanup of sensitive data from memory
Add rate limiting and circuit breaker patterns

📋 Testing Requirements

Security testing with invalid/malformed tokens
MITM attack simulation tests
Token cache security validation
Error message sanitization verification
Performance testing under high load

Czech Jokes 🇨🇿

Vtip 1:
Proč programátoři nemají rádi token exchange?
Protože pořád vyměňují tokeny, ale nikdy nevymění svou kávu za spánek! ☕

Vtip 2:
Jak se řekne "bezpečnostní chyba" česky?
"Hele, ta autentikace se rozbila, ale aspoň máme hezký error message!" 😅

Vtip 3:
Proč OAuth servery nespí?
Protože pořád někdo klepe na jejich /token endpoint! 🔑

codecov · 2025-10-02T14:22:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.28%. Comparing base (5285ed4) to head (18b0fb8).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2063      +/-   ##
==========================================
+ Coverage   48.20%   48.28%   +0.08%     
==========================================
  Files         233      236       +3     
  Lines       29229    29690     +461     
==========================================
+ Hits        14089    14337     +248     
- Misses      14106    14277     +171     
- Partials     1034     1076      +42

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jhrozek · 2025-10-03T08:54:14Z

Thank you for the thorough security review! We've updated the proposal to address all concerns raised. Responses inline below:

Security Concerns & Recommendations

Token Caching Security Gap (docs/proposals/token-exchange-middleware.md:67)

Issue: "run the token exchange for each request" creates performance issues and potential rate limiting

Risk: No mention of secure token caching mechanisms

Recommendation: Implement secure in-memory token cache with encrypted token storage, proper cache invalidation based on token expiry, memory clearing on process termination

Addressed: The proposal already mentioned using oauth2.ReuseTokenSource for caching. We've now created #2067 with detailed implementation notes and added a link in the proposal (line 67). This follows ToolHive's existing pattern in pkg/auth/oauth/flow.go:453-454. The Go oauth2.ReuseTokenSource provides thread-safe caching and automatic token refresh - no custom encryption needed for short-lived tokens in container-isolated environments.

Client Secret Exposure Risk (docs/proposals/token-exchange-middleware.md:74-79)

Issue: Client secrets in CLI arguments could be exposed in process lists

Risk: Secrets visible via ps commands, shell history, logs

Recommendation: Support environment variables and secret files only, never CLI args for secrets

Addressed: Updated the proposal to document secure secret handling (lines 71-114):

Added --token-exchange-client-secret-file flag (follows existing --remote-auth-client-secret-file pattern in cmd/thv/app/auth_flags.go)
Added environment variable support: TOOLHIVE_TOKEN_EXCHANGE_CLIENT_SECRET
Updated examples to show file-based (recommended) and environment variable approaches
Added explicit security warning about inline secrets

The Kubernetes operator already uses ClientSecretRef for secure secret management.

Token Storage in Transit (docs/proposals/token-exchange-middleware.md:35-39)

Issue: No explicit mention of token handling security during exchange

Risk: Tokens in memory, logs, or temporary storage

Recommendation: Ensure all HTTP requests use TLS, clear sensitive data from memory after use, implement structured logging that redacts tokens

Addressed: Added comprehensive "Security Considerations" section (lines 116-153) documenting ToolHive's existing security infrastructure:

HTTPS enforcement: All OAuth endpoints validated for HTTPS (pkg/auth/oauth/oidc.go:69-72)
TLS 1.2+ minimum: Enforced by default (pkg/networking/http_client.go:150-151)
Token memory handling: Using Go's standard oauth2 library patterns - secure for short-lived tokens in containers
Logging: Documented token redaction requirements (SHA256 hash prefix only, never full tokens)

Missing Token Validation (docs/proposals/token-exchange-middleware.md:33)

Issue: Token extraction lacks validation before exchange

Risk: Malformed or expired tokens sent to OAuth server

Recommendation: Pre-validate token format and expiry before exchange attempt

Addressed: Added explicit statement at line 63:

"This ensures that only tokens with valid signatures, non-expired timestamps, and correct audiences are sent to the OAuth server for exchange. Malformed or invalid tokens are rejected before any exchange attempt occurs."

The architecture already ensures this through middleware ordering - auth middleware validates tokens before token exchange middleware runs.

OAuth Server Trust Boundary (docs/proposals/token-exchange-middleware.md:35)

Issue: No certificate pinning or validation requirements mentioned

Risk: MITM attacks on token exchange endpoint

Recommendation: Implement certificate validation and consider certificate pinning

Addressed: Security Considerations section documents:

TLS certificate validation enabled by default
Custom CA bundle support (existing ToolHive infrastructure in pkg/networking/http_client.go:137-154)
Certificate pinning considered but deemed unnecessary for enterprise deployments

Standard TLS validation is sufficient for this use case.

Error Information Disclosure (General concern)

Issue: OAuth errors might leak sensitive information

Risk: Token details, internal service info exposed in error messages

Recommendation: Sanitize error responses and log detailed errors server-side only

Addressed: Added error handling documentation with example (lines 134-148):

Generic errors to clients: {"error": "token_exchange_failed"}
Detailed server-side logging with token redaction
Example showing proper error sanitization pattern

Replay Attack Prevention (Missing consideration)

Issue: No mention of replay attack prevention

Risk: Intercepted requests could be replayed

Recommendation: Consider implementing nonce or timestamp validation

Addressed: Added section at lines 150-152 explaining RFC 8693/OAuth2 standard protections:

Short-lived tokens (1 hour expiry) minimize attack window
OAuth server validates tokens on every exchange
Expired/revoked tokens automatically rejected

No additional replay protection needed beyond RFC 8693 standard mechanisms.

Additional Security Recommendations

Rate Limiting: Implement rate limiting on token exchange to prevent abuse

Audit Logging: Log all token exchange attempts with correlation IDs

Token Revocation: Handle token revocation scenarios gracefully

Circuit Breaker: Implement circuit breaker pattern for OAuth server failures

Metrics & Monitoring: Add security metrics for token exchange failures/anomalies

These are valuable production-hardening features. The initial implementation focuses on core RFC 8693 compliance. These enhancements can be added in future iterations based on operational experience. ToolHive has an existing audit event system in pkg/events that token exchange should integrate with.

Summary

The proposal now includes:

Link to Create a caching layer using ReuseTokenSource for exchanged tokens #2067 for token caching implementation
Complete CLI flags documentation including file-based secrets and environment variables
New "Security Considerations" section with code references to existing infrastructure
Explicit token pre-validation statement
Error handling examples with sanitization patterns
Replay attack prevention explanation

Most security concerns were already addressed through ToolHive's existing infrastructure (HTTPS enforcement, TLS 1.2+, certificate validation, oauth2.ReuseTokenSource patterns). The proposal now explicitly documents these existing protections with code references.

JAORMX · 2025-10-03T10:52:32Z

docs/proposals/token-exchange-middleware.md

+        clientSecretRef:
+          name: token-exchange-creds
+          key: client-secret
+        audience: backend-service


I wonder if at this point we should have a dedicated CRD for the authentication pieces. I know that's out of scope of this PR, but I think we should think about it sooner rather than later. I can see folks wanting to share these configurations.

yeah, the CRD is getting quite big. So the MCPServer CRD would have a reference to some MCPServerAuthConfig CRD that would be shared across servers?

We'd have to template the client names though

I actually wonder if this /should/ be a pre-requisite though. Once we add something to our CRDs it;s hard to back off

@JAORMX Structurally I can see how the authentication and authorization pieces are complex enough to warrant their own controller (i.e. auth CRD reconciler). Especially in light of incoming requirements around SPIFFE and SPIRE where token lifetimes and privilege will be scoped as narrowly as possible and expire/be rotated quite regularly. If we consider security more broadly there's also certificate expiry, rotation, and management that is likely to be encountered in the future as well with use of something like cert-manager. I think there's quite a bit here with respect to the responsibilities of such an independent security manager/controller.

@jhrozek While I don't think it's a pre-requisite by definition, it would make life easier in a number of ways to do it now rather than later. In light of the sizing constraints already having been hit with the current CRD that's also a good technical reason to do it now outside of the organization and maintenance involved of separating it later.

The counterpoint here would be if the authentication/authorization information is not to be utilized in the operator but only passed through to either the proxy and/or MCP server operands that the operator launches and operates. In which case is there a need for the operator to serve as middleman? I don't think that's the case (see the first paragraph above) but worth putting the question out there for consideration.

Design proposal: Token exchange to acquire tokens for external auth

18b0fb8

JAORMX reviewed Oct 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design proposal: Token exchange to acquire tokens for external auth #2063

Design proposal: Token exchange to acquire tokens for external auth #2063

jhrozek commented Oct 2, 2025

Uh oh!

JAORMX commented Oct 2, 2025

Uh oh!

claude bot commented Oct 2, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 2, 2025

Uh oh!

jhrozek commented Oct 3, 2025

Uh oh!

JAORMX Oct 3, 2025

Uh oh!

jhrozek Oct 3, 2025

Uh oh!

jhrozek Oct 3, 2025

Uh oh!

RoddieKieley Oct 3, 2025

Uh oh!

Uh oh!

Design proposal: Token exchange to acquire tokens for external auth #2063

Are you sure you want to change the base?

Design proposal: Token exchange to acquire tokens for external auth #2063

Conversation

jhrozek commented Oct 2, 2025

Uh oh!

JAORMX commented Oct 2, 2025

Uh oh!

claude bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Review of Token Exchange Design Proposal

Todo List:

Security Analysis

🛡️ Strengths

⚠️ Security Concerns & Recommendations

🔒 Additional Security Recommendations

✅ Implementation Security Checklist

📋 Testing Requirements

Czech Jokes 🇨🇿

Uh oh!

codecov bot commented Oct 2, 2025

Codecov Report

Uh oh!

jhrozek commented Oct 3, 2025

Security Concerns & Recommendations

Additional Security Recommendations

Summary

Uh oh!

JAORMX Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

jhrozek Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

jhrozek Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

RoddieKieley Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude bot commented Oct 2, 2025 •

edited

Loading