Skip to content

feat(registry): implement Docker registry v2 token authentication#74

Closed
hiroTamada wants to merge 7 commits intomainfrom
fix/registry-token-repo-access
Closed

feat(registry): implement Docker registry v2 token authentication#74
hiroTamada wants to merge 7 commits intomainfrom
fix/registry-token-repo-access

Conversation

@hiroTamada
Copy link
Contributor

@hiroTamada hiroTamada commented Jan 27, 2026

Summary

BuildKit's containerd resolver doesn't send credentials to the token endpoint proactively - it tries anonymous first. This PR implements the standard Docker registry v2 token authentication flow to work with BuildKit's behavior.

Problem

When pushing images, BuildKit follows this flow:

  1. Push to registry → gets 401 Unauthorized
  2. Registry returns WWW-Authenticate: Bearer realm="/v2/token"...
  3. BuildKit requests token from /v2/token without credentials (anonymous)
  4. BuildKit expects a token back, but our server had no token endpoint

This is a known BuildKit limitation - it doesn't read credentials from config.json when calling the token endpoint.

Solution

Implement the standard Docker registry v2 token authentication with a pragmatic security model:

  • Add /v2/token endpoint that grants tokens for builds/* paths
  • Build ID as implicit auth - Build IDs are cryptographically random and short-lived, serving as a capability token (similar to pre-signed URLs in S3)
  • Bearer token flow - Return WWW-Authenticate: Bearer challenge instead of Basic
  • IP fallback preserved - Safety net for older builder images

Changes

  • lib/registry/token.go - New token endpoint handler
  • lib/middleware/oapi_auth.go - Bearer token validation, WWW-Authenticate header
  • lib/builds/builder_agent/main.go - Strip scheme from URLs, CA cert support
  • cmd/api/main.go - Mount token handler at /v2/token
  • Supporting config changes for HTTPS with self-signed certs

Security Model

Approach Security Trade-off
IP fallback Weak Anyone on VM network can push anywhere
Build ID as auth Medium Must guess random ID within short window
Full JWT validation Strong Requires BuildKit to send credentials (doesn't work)

To exploit this, an attacker would need to:

  1. Know a valid build ID (random, not enumerable)
  2. Act within the build's short lifetime (~10 min)
  3. Have network access to the registry (internal only)

Test plan

  • Unit tests for middleware auth flow
  • Local end-to-end test with HTTPS + self-signed cert
  • Deploy to staging and verify builds complete
  • Deploy to production

The two-tier build cache PR (#70) introduced a new token format using
`repo_access` field with per-repository scopes, but the middleware
wasn't updated to parse it.

This caused 401 Unauthorized errors when builder VMs tried to push
images to the registry, as the middleware only checked for the legacy
`repos` field which is empty in new tokens.

Changes:
- Add RepoPermission struct and RepoAccess field to RegistryTokenClaims
- Update validateRegistryToken to check both RepoAccess (new) and
  Repositories (legacy) formats
- Add per-repo scope checking for write operations
- Add comprehensive tests for both token formats

Fixes build failures in production where the new token format was
being used but not recognized by the registry auth middleware.
The IP fallback for registry authentication only allowed 10.100.x.x and
10.102.x.x subnets (staging/dev), but production uses 172.30.x.x. This
caused builds to fail in production while working in staging.

Changes:
- Add 172.30.x.x to allowed subnets in isInternalVMRequest()
- Add logger injection to /v2 routes for debug logging
- Add tests for internal VM request subnet matching
Add comprehensive tests for JwtAuth middleware on /v2/ registry paths:
- Valid token access (both legacy and RepoAccess formats)
- IP fallback for staging (10.100.x.x, 10.102.x.x) and production (172.30.x.x)
- External IP rejection without valid token
- Invalid token fallback behavior
- Bearer and Basic auth support

These tests would have caught the production subnet issue earlier.
Root cause: Registry was returning 401 without WWW-Authenticate header,
so BuildKit didn't know to send credentials from Docker config.

Changes:
- Add WWW-Authenticate: Basic realm="registry" to 401 responses
- Remove production subnet (172.30.x.x) from IP fallback
  (staging 10.100.x.x still has fallback as safety net)
- Builder agent: write Docker config to both /home/builder/.docker
  and /root/.docker to ensure BuildKit finds it
- Add tests for BuildKit auth flow simulation

This enables proper token-based registry auth in production.
// knows to send credentials from the Docker config
if statusCode == http.StatusUnauthorized {
w.Header().Set("WWW-Authenticate", `Basic realm="registry"`)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WWW-Authenticate header incorrectly added to all API 401s

Low Severity

The OapiErrorHandler function now adds WWW-Authenticate: Basic realm="registry" to all 401 responses, not just registry endpoints. This affects non-registry API endpoints like /instances/{id}/exec, /instances/{id}/cp, and all OpenAPI-validated endpoints. These endpoints expect Bearer authentication, so returning a Basic auth challenge with a "registry" realm is misleading and could confuse API clients about how to authenticate.

Fix in Cursor Fix in Web

BuildKit with registry.insecure=true doesn't do WWW-Authenticate
challenge-response flow - it just fails on 401 without retrying
with credentials.

Re-enable IP fallback for production (172.30.x.x) until we find
a way to make BuildKit send auth proactively.
BuildKit's containerd resolver doesn't send credentials to the token endpoint
proactively - it tries anonymous first. This PR implements the standard Docker
registry v2 token authentication flow to work with BuildKit's behavior.

Changes:
- Add /v2/token endpoint that grants tokens for builds/* paths
- Return WWW-Authenticate: Bearer challenge instead of Basic
- Validate Bearer access tokens in middleware
- Support HTTPS with self-signed certs via CA cert configuration
- Builder agent: strip scheme from registry URLs for image refs
- Builder agent: install CA certs system-wide for token endpoint TLS

Security model:
Build IDs are cryptographically random and short-lived, serving as implicit
authentication (similar to pre-signed URLs). The IP fallback remains as a
safety net for older builder images.
@hiroTamada hiroTamada changed the title fix(middleware): support RepoAccess token format in registry auth feat(registry): implement Docker registry v2 token authentication Jan 28, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Name: repoName,
Actions: actions,
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anonymous token endpoint grants unauthenticated push access to cache paths

Medium Severity

The returnAnonymousToken function grants push access to any cache/* path without authentication. An attacker who knows or guesses a tenant's cache scope (e.g., cache/tenant-x) can request GET /v2/token?scope=repository:cache/tenant-x:push,pull without credentials and receive a valid token with write access. This enables cache poisoning attacks where malicious layers are pushed to a victim tenant's cache, affecting subsequent builds that import from it. While builds/* paths use cryptographically random IDs making them hard to guess, cache/* scopes are tenant-provided values that may be predictable.

Fix in Cursor Fix in Web

// OapiErrorHandler creates a custom error handler for nethttp-middleware
// that returns consistent error responses.
func OapiErrorHandler(w http.ResponseWriter, message string, statusCode int) {
OapiErrorHandlerWithHost(w, message, statusCode, "")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Access tokens bypass API authentication rejection checks

Medium Severity

The new access tokens issued by /v2/token can be used to authenticate to API endpoints. The token rejection logic in OapiAuthenticationFunc and JwtAuth checks for repos, scope, build_id claims, and builder- subject prefix. However, access tokens have none of these - they have an access claim and sub: "anonymous-builder" which starts with anonymous-, not builder-. An attacker can call /v2/token anonymously, obtain a signed JWT, then use it as a Bearer token for API endpoints, authenticating as user anonymous-builder.

Additional Locations (2)

Fix in Cursor Fix in Web

// Docker/OCI image references don't include http:// or https://
registryHost := config.RegistryURL
registryHost = strings.TrimPrefix(registryHost, "https://")
registryHost = strings.TrimPrefix(registryHost, "http://")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case-sensitive scheme stripping causes build failures with uppercase URLs

Low Severity

The scheme detection uses strings.ToLower() for case-insensitive matching (e.g., isHTTP := !strings.HasPrefix(strings.ToLower(config.RegistryURL), "https://")), but the scheme stripping uses case-sensitive strings.TrimPrefix(). If a URL has an uppercase scheme like HTTPS://myregistry:5000, the detection correctly identifies it as HTTPS, but the stripping fails to remove the scheme. This produces an invalid image reference like HTTPS://myregistry:5000/builds/abc123, causing BuildKit to fail with an invalid reference format error.

Additional Locations (2)

Fix in Cursor Fix in Web

@hiroTamada hiroTamada closed this Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant