feat(registry): implement Docker registry v2 token authentication#74
feat(registry): implement Docker registry v2 token authentication#74hiroTamada wants to merge 7 commits intomainfrom
Conversation
The two-tier build cache PR (#70) introduced a new token format using `repo_access` field with per-repository scopes, but the middleware wasn't updated to parse it. This caused 401 Unauthorized errors when builder VMs tried to push images to the registry, as the middleware only checked for the legacy `repos` field which is empty in new tokens. Changes: - Add RepoPermission struct and RepoAccess field to RegistryTokenClaims - Update validateRegistryToken to check both RepoAccess (new) and Repositories (legacy) formats - Add per-repo scope checking for write operations - Add comprehensive tests for both token formats Fixes build failures in production where the new token format was being used but not recognized by the registry auth middleware.
The IP fallback for registry authentication only allowed 10.100.x.x and 10.102.x.x subnets (staging/dev), but production uses 172.30.x.x. This caused builds to fail in production while working in staging. Changes: - Add 172.30.x.x to allowed subnets in isInternalVMRequest() - Add logger injection to /v2 routes for debug logging - Add tests for internal VM request subnet matching
Add comprehensive tests for JwtAuth middleware on /v2/ registry paths: - Valid token access (both legacy and RepoAccess formats) - IP fallback for staging (10.100.x.x, 10.102.x.x) and production (172.30.x.x) - External IP rejection without valid token - Invalid token fallback behavior - Bearer and Basic auth support These tests would have caught the production subnet issue earlier.
Root cause: Registry was returning 401 without WWW-Authenticate header, so BuildKit didn't know to send credentials from Docker config. Changes: - Add WWW-Authenticate: Basic realm="registry" to 401 responses - Remove production subnet (172.30.x.x) from IP fallback (staging 10.100.x.x still has fallback as safety net) - Builder agent: write Docker config to both /home/builder/.docker and /root/.docker to ensure BuildKit finds it - Add tests for BuildKit auth flow simulation This enables proper token-based registry auth in production.
| // knows to send credentials from the Docker config | ||
| if statusCode == http.StatusUnauthorized { | ||
| w.Header().Set("WWW-Authenticate", `Basic realm="registry"`) | ||
| } |
There was a problem hiding this comment.
WWW-Authenticate header incorrectly added to all API 401s
Low Severity
The OapiErrorHandler function now adds WWW-Authenticate: Basic realm="registry" to all 401 responses, not just registry endpoints. This affects non-registry API endpoints like /instances/{id}/exec, /instances/{id}/cp, and all OpenAPI-validated endpoints. These endpoints expect Bearer authentication, so returning a Basic auth challenge with a "registry" realm is misleading and could confuse API clients about how to authenticate.
BuildKit with registry.insecure=true doesn't do WWW-Authenticate challenge-response flow - it just fails on 401 without retrying with credentials. Re-enable IP fallback for production (172.30.x.x) until we find a way to make BuildKit send auth proactively.
BuildKit's containerd resolver doesn't send credentials to the token endpoint proactively - it tries anonymous first. This PR implements the standard Docker registry v2 token authentication flow to work with BuildKit's behavior. Changes: - Add /v2/token endpoint that grants tokens for builds/* paths - Return WWW-Authenticate: Bearer challenge instead of Basic - Validate Bearer access tokens in middleware - Support HTTPS with self-signed certs via CA cert configuration - Builder agent: strip scheme from registry URLs for image refs - Builder agent: install CA certs system-wide for token endpoint TLS Security model: Build IDs are cryptographically random and short-lived, serving as implicit authentication (similar to pre-signed URLs). The IP fallback remains as a safety net for older builder images.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| Name: repoName, | ||
| Actions: actions, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Anonymous token endpoint grants unauthenticated push access to cache paths
Medium Severity
The returnAnonymousToken function grants push access to any cache/* path without authentication. An attacker who knows or guesses a tenant's cache scope (e.g., cache/tenant-x) can request GET /v2/token?scope=repository:cache/tenant-x:push,pull without credentials and receive a valid token with write access. This enables cache poisoning attacks where malicious layers are pushed to a victim tenant's cache, affecting subsequent builds that import from it. While builds/* paths use cryptographically random IDs making them hard to guess, cache/* scopes are tenant-provided values that may be predictable.
| // OapiErrorHandler creates a custom error handler for nethttp-middleware | ||
| // that returns consistent error responses. | ||
| func OapiErrorHandler(w http.ResponseWriter, message string, statusCode int) { | ||
| OapiErrorHandlerWithHost(w, message, statusCode, "") |
There was a problem hiding this comment.
Access tokens bypass API authentication rejection checks
Medium Severity
The new access tokens issued by /v2/token can be used to authenticate to API endpoints. The token rejection logic in OapiAuthenticationFunc and JwtAuth checks for repos, scope, build_id claims, and builder- subject prefix. However, access tokens have none of these - they have an access claim and sub: "anonymous-builder" which starts with anonymous-, not builder-. An attacker can call /v2/token anonymously, obtain a signed JWT, then use it as a Bearer token for API endpoints, authenticating as user anonymous-builder.
Additional Locations (2)
| // Docker/OCI image references don't include http:// or https:// | ||
| registryHost := config.RegistryURL | ||
| registryHost = strings.TrimPrefix(registryHost, "https://") | ||
| registryHost = strings.TrimPrefix(registryHost, "http://") |
There was a problem hiding this comment.
Case-sensitive scheme stripping causes build failures with uppercase URLs
Low Severity
The scheme detection uses strings.ToLower() for case-insensitive matching (e.g., isHTTP := !strings.HasPrefix(strings.ToLower(config.RegistryURL), "https://")), but the scheme stripping uses case-sensitive strings.TrimPrefix(). If a URL has an uppercase scheme like HTTPS://myregistry:5000, the detection correctly identifies it as HTTPS, but the stripping fails to remove the scheme. This produces an invalid image reference like HTTPS://myregistry:5000/builds/abc123, causing BuildKit to fail with an invalid reference format error.
Summary
BuildKit's containerd resolver doesn't send credentials to the token endpoint proactively - it tries anonymous first. This PR implements the standard Docker registry v2 token authentication flow to work with BuildKit's behavior.
Problem
When pushing images, BuildKit follows this flow:
401 UnauthorizedWWW-Authenticate: Bearer realm="/v2/token".../v2/tokenwithout credentials (anonymous)This is a known BuildKit limitation - it doesn't read credentials from
config.jsonwhen calling the token endpoint.Solution
Implement the standard Docker registry v2 token authentication with a pragmatic security model:
/v2/tokenendpoint that grants tokens forbuilds/*pathsWWW-Authenticate: Bearerchallenge instead of BasicChanges
lib/registry/token.go- New token endpoint handlerlib/middleware/oapi_auth.go- Bearer token validation, WWW-Authenticate headerlib/builds/builder_agent/main.go- Strip scheme from URLs, CA cert supportcmd/api/main.go- Mount token handler at/v2/tokenSecurity Model
To exploit this, an attacker would need to:
Test plan