Skip to content

Conversation

jhrozek
Copy link
Contributor

@jhrozek jhrozek commented Oct 2, 2025

Adds a design for #2041

@JAORMX
Copy link
Collaborator

JAORMX commented Oct 2, 2025

@claude please review this with extra focus on security. Do the review in English. Please provide some jokes in Czech at the end of the review.

Copy link
Contributor

claude bot commented Oct 2, 2025

Claude finished @JAORMX's task —— View job


Security Review of Token Exchange Design Proposal

I've completed a thorough security analysis of the token exchange middleware design. Here's my comprehensive review:

Todo List:

  • Read the token exchange design proposal document
  • Analyze security implications and potential vulnerabilities
  • Review authentication and authorization flows
  • Check for proper token handling and storage practices
  • Evaluate attack vectors and mitigation strategies
  • Provide comprehensive security feedback
  • Add Czech jokes as requested

Security Analysis

🛡️ Strengths

  1. RFC 8693 Compliance: Following the standard RFC 8693 token exchange specification provides a solid security foundation with well-defined behaviors.

  2. Proper Middleware Ordering: Placing token exchange middleware after auth middleware ensures incoming tokens are validated before exchange attempts.

  3. Audience Validation: The design properly validates token audiences (aud=proxyaud=upstream), which is critical for preventing token misuse.

  4. Secret Management: Using SecretKeyRef in Kubernetes and proper client secret handling shows awareness of credential security.

⚠️ Security Concerns & Recommendations

  1. Token Caching Security Gap (docs/proposals/token-exchange-middleware.md:67)

    • Issue: "run the token exchange for each request" creates performance issues and potential rate limiting
    • Risk: No mention of secure token caching mechanisms
    • Recommendation: Implement secure in-memory token cache with:
      • Encrypted token storage
      • Proper cache invalidation based on token expiry
      • Memory clearing on process termination
  2. Client Secret Exposure Risk (docs/proposals/token-exchange-middleware.md:74-79)

    • Issue: Client secrets in CLI arguments could be exposed in process lists
    • Risk: Secrets visible via ps commands, shell history, logs
    • Recommendation: Support environment variables and secret files only, never CLI args for secrets
  3. Token Storage in Transit (docs/proposals/token-exchange-middleware.md:35-39)

    • Issue: No explicit mention of token handling security during exchange
    • Risk: Tokens in memory, logs, or temporary storage
    • Recommendation:
      • Ensure all HTTP requests use TLS
      • Clear sensitive data from memory after use
      • Implement structured logging that redacts tokens
  4. Missing Token Validation (docs/proposals/token-exchange-middleware.md:33)

    • Issue: Token extraction lacks validation before exchange
    • Risk: Malformed or expired tokens sent to OAuth server
    • Recommendation: Pre-validate token format and expiry before exchange attempt
  5. OAuth Server Trust Boundary (docs/proposals/token-exchange-middleware.md:35)

    • Issue: No certificate pinning or validation requirements mentioned
    • Risk: MITM attacks on token exchange endpoint
    • Recommendation: Implement certificate validation and consider certificate pinning
  6. Error Information Disclosure (General concern)

    • Issue: OAuth errors might leak sensitive information
    • Risk: Token details, internal service info exposed in error messages
    • Recommendation: Sanitize error responses and log detailed errors server-side only
  7. Replay Attack Prevention (Missing consideration)

    • Issue: No mention of replay attack prevention
    • Risk: Intercepted requests could be replayed
    • Recommendation: Consider implementing nonce or timestamp validation

🔒 Additional Security Recommendations

  1. Rate Limiting: Implement rate limiting on token exchange to prevent abuse
  2. Audit Logging: Log all token exchange attempts with correlation IDs
  3. Token Revocation: Handle token revocation scenarios gracefully
  4. Circuit Breaker: Implement circuit breaker pattern for OAuth server failures
  5. Metrics & Monitoring: Add security metrics for token exchange failures/anomalies

Implementation Security Checklist

  • Use secure HTTP client with proper TLS verification
  • Implement token cache with encryption and expiry
  • Add comprehensive audit logging
  • Sanitize all error messages before client response
  • Use secure random generators for any nonces/IDs
  • Implement proper cleanup of sensitive data from memory
  • Add rate limiting and circuit breaker patterns

📋 Testing Requirements

  • Security testing with invalid/malformed tokens
  • MITM attack simulation tests
  • Token cache security validation
  • Error message sanitization verification
  • Performance testing under high load

Czech Jokes 🇨🇿

Vtip 1:
Proč programátoři nemají rádi token exchange?
Protože pořád vyměňují tokeny, ale nikdy nevymění svou kávu za spánek! ☕

Vtip 2:
Jak se řekne "bezpečnostní chyba" česky?
"Hele, ta autentikace se rozbila, ale aspoň máme hezký error message!" 😅

Vtip 3:
Proč OAuth servery nespí?
Protože pořád někdo klepe na jejich /token endpoint! 🔑


Copy link

codecov bot commented Oct 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.28%. Comparing base (5285ed4) to head (18b0fb8).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2063      +/-   ##
==========================================
+ Coverage   48.20%   48.28%   +0.08%     
==========================================
  Files         233      236       +3     
  Lines       29229    29690     +461     
==========================================
+ Hits        14089    14337     +248     
- Misses      14106    14277     +171     
- Partials     1034     1076      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jhrozek
Copy link
Contributor Author

jhrozek commented Oct 3, 2025

Thank you for the thorough security review! We've updated the proposal to address all concerns raised. Responses inline below:

Security Concerns & Recommendations

  1. Token Caching Security Gap (docs/proposals/token-exchange-middleware.md:67)
    • Issue: "run the token exchange for each request" creates performance issues and potential rate limiting
    • Risk: No mention of secure token caching mechanisms
    • Recommendation: Implement secure in-memory token cache with encrypted token storage, proper cache invalidation based on token expiry, memory clearing on process termination

Addressed: The proposal already mentioned using oauth2.ReuseTokenSource for caching. We've now created #2067 with detailed implementation notes and added a link in the proposal (line 67). This follows ToolHive's existing pattern in pkg/auth/oauth/flow.go:453-454. The Go oauth2.ReuseTokenSource provides thread-safe caching and automatic token refresh - no custom encryption needed for short-lived tokens in container-isolated environments.

  1. Client Secret Exposure Risk (docs/proposals/token-exchange-middleware.md:74-79)
    • Issue: Client secrets in CLI arguments could be exposed in process lists
    • Risk: Secrets visible via ps commands, shell history, logs
    • Recommendation: Support environment variables and secret files only, never CLI args for secrets

Addressed: Updated the proposal to document secure secret handling (lines 71-114):

  • Added --token-exchange-client-secret-file flag (follows existing --remote-auth-client-secret-file pattern in cmd/thv/app/auth_flags.go)
  • Added environment variable support: TOOLHIVE_TOKEN_EXCHANGE_CLIENT_SECRET
  • Updated examples to show file-based (recommended) and environment variable approaches
  • Added explicit security warning about inline secrets

The Kubernetes operator already uses ClientSecretRef for secure secret management.

  1. Token Storage in Transit (docs/proposals/token-exchange-middleware.md:35-39)
    • Issue: No explicit mention of token handling security during exchange
    • Risk: Tokens in memory, logs, or temporary storage
    • Recommendation: Ensure all HTTP requests use TLS, clear sensitive data from memory after use, implement structured logging that redacts tokens

Addressed: Added comprehensive "Security Considerations" section (lines 116-153) documenting ToolHive's existing security infrastructure:

  • HTTPS enforcement: All OAuth endpoints validated for HTTPS (pkg/auth/oauth/oidc.go:69-72)
  • TLS 1.2+ minimum: Enforced by default (pkg/networking/http_client.go:150-151)
  • Token memory handling: Using Go's standard oauth2 library patterns - secure for short-lived tokens in containers
  • Logging: Documented token redaction requirements (SHA256 hash prefix only, never full tokens)
  1. Missing Token Validation (docs/proposals/token-exchange-middleware.md:33)
    • Issue: Token extraction lacks validation before exchange
    • Risk: Malformed or expired tokens sent to OAuth server
    • Recommendation: Pre-validate token format and expiry before exchange attempt

Addressed: Added explicit statement at line 63:

"This ensures that only tokens with valid signatures, non-expired timestamps, and correct audiences are sent to the OAuth server for exchange. Malformed or invalid tokens are rejected before any exchange attempt occurs."

The architecture already ensures this through middleware ordering - auth middleware validates tokens before token exchange middleware runs.

  1. OAuth Server Trust Boundary (docs/proposals/token-exchange-middleware.md:35)
    • Issue: No certificate pinning or validation requirements mentioned
    • Risk: MITM attacks on token exchange endpoint
    • Recommendation: Implement certificate validation and consider certificate pinning

Addressed: Security Considerations section documents:

  • TLS certificate validation enabled by default
  • Custom CA bundle support (existing ToolHive infrastructure in pkg/networking/http_client.go:137-154)
  • Certificate pinning considered but deemed unnecessary for enterprise deployments

Standard TLS validation is sufficient for this use case.

  1. Error Information Disclosure (General concern)
    • Issue: OAuth errors might leak sensitive information
    • Risk: Token details, internal service info exposed in error messages
    • Recommendation: Sanitize error responses and log detailed errors server-side only

Addressed: Added error handling documentation with example (lines 134-148):

  • Generic errors to clients: {"error": "token_exchange_failed"}
  • Detailed server-side logging with token redaction
  • Example showing proper error sanitization pattern
  1. Replay Attack Prevention (Missing consideration)
    • Issue: No mention of replay attack prevention
    • Risk: Intercepted requests could be replayed
    • Recommendation: Consider implementing nonce or timestamp validation

Addressed: Added section at lines 150-152 explaining RFC 8693/OAuth2 standard protections:

  • Short-lived tokens (1 hour expiry) minimize attack window
  • OAuth server validates tokens on every exchange
  • Expired/revoked tokens automatically rejected

No additional replay protection needed beyond RFC 8693 standard mechanisms.

Additional Security Recommendations

  1. Rate Limiting: Implement rate limiting on token exchange to prevent abuse
  2. Audit Logging: Log all token exchange attempts with correlation IDs
  3. Token Revocation: Handle token revocation scenarios gracefully
  4. Circuit Breaker: Implement circuit breaker pattern for OAuth server failures
  5. Metrics & Monitoring: Add security metrics for token exchange failures/anomalies

These are valuable production-hardening features. The initial implementation focuses on core RFC 8693 compliance. These enhancements can be added in future iterations based on operational experience. ToolHive has an existing audit event system in pkg/events that token exchange should integrate with.

Summary

The proposal now includes:

  1. Link to Create a caching layer using ReuseTokenSource for exchanged tokens #2067 for token caching implementation
  2. Complete CLI flags documentation including file-based secrets and environment variables
  3. New "Security Considerations" section with code references to existing infrastructure
  4. Explicit token pre-validation statement
  5. Error handling examples with sanitization patterns
  6. Replay attack prevention explanation

Most security concerns were already addressed through ToolHive's existing infrastructure (HTTPS enforcement, TLS 1.2+, certificate validation, oauth2.ReuseTokenSource patterns). The proposal now explicitly documents these existing protections with code references.

clientSecretRef:
name: token-exchange-creds
key: client-secret
audience: backend-service
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if at this point we should have a dedicated CRD for the authentication pieces. I know that's out of scope of this PR, but I think we should think about it sooner rather than later. I can see folks wanting to share these configurations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the CRD is getting quite big. So the MCPServer CRD would have a reference to some MCPServerAuthConfig CRD that would be shared across servers?

We'd have to template the client names though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wonder if this /should/ be a pre-requisite though. Once we add something to our CRDs it;s hard to back off

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JAORMX Structurally I can see how the authentication and authorization pieces are complex enough to warrant their own controller (i.e. auth CRD reconciler). Especially in light of incoming requirements around SPIFFE and SPIRE where token lifetimes and privilege will be scoped as narrowly as possible and expire/be rotated quite regularly. If we consider security more broadly there's also certificate expiry, rotation, and management that is likely to be encountered in the future as well with use of something like cert-manager. I think there's quite a bit here with respect to the responsibilities of such an independent security manager/controller.

@jhrozek While I don't think it's a pre-requisite by definition, it would make life easier in a number of ways to do it now rather than later. In light of the sizing constraints already having been hit with the current CRD that's also a good technical reason to do it now outside of the organization and maintenance involved of separating it later.

The counterpoint here would be if the authentication/authorization information is not to be utilized in the operator but only passed through to either the proxy and/or MCP server operands that the operator launches and operates. In which case is there a need for the operator to serve as middleman? I don't think that's the case (see the first paragraph above) but worth putting the question out there for consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants