Summary
applyWebhookPolicy and applyWasmPolicy in
pkg/fleetautoscalers/fleetautoscalers.go both dereference
Response.Scale without first checking if Response is nil.
When a webhook or Wasm policy returns a JSON body that omits the
response field (e.g., {}), json.Unmarshal succeeds with
Response as a nil pointer. The subsequent dereference panics,
crashing the controller.
Both HA controller replicas pull the same FleetAutoscaler from the
shared workqueue and crash on the same item, producing CrashLoopBackOff
on both replicas. FleetAutoscaler reconciliation halts cluster-wide
until the resource is removed by an admin.
Affected code
pkg/fleetautoscalers/fleetautoscalers.go:
applyWebhookPolicy: faResp.Response.Scale accessed without nil check
applyWasmPolicy: review.Response.Scale accessed without nil check
Reproduction
Tested against helm install agones release 1.57.0 in a kind cluster.
- Deploy any minimal Fleet.
- Deploy an HTTP server that returns
{} to all POST requests:
from http.server import HTTPServer, BaseHTTPRequestHandler
class H(BaseHTTPRequestHandler):
def do_POST(self):
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(b'{}')
HTTPServer(('', 8888), H).serve_forever()
- Apply a FleetAutoscaler pointing at that server:
apiVersion: autoscaling.agones.dev/v1
kind: FleetAutoscaler
metadata:
name: panic-poc
spec:
fleetName: <fleet-name>
policy:
type: Webhook
webhook:
url: "http://<server-service>.<namespace>.svc.cluster.local:8888/"
- Both
agones-controller pods enter CrashLoopBackOff within 30s.
Stack trace
panic: runtime error: invalid memory address or nil pointer dereference
agones.dev/agones/pkg/fleetautoscalers.applyWebhookPolicy(...)
.../pkg/fleetautoscalers/fleetautoscalers.go:355 +0x974
[recovered, repanicked]
Fix
A nil check on Response before dereferencing at both sites.
A PR with the fix is forthcoming.
Summary
applyWebhookPolicyandapplyWasmPolicyinpkg/fleetautoscalers/fleetautoscalers.goboth dereferenceResponse.Scalewithout first checking ifResponseis nil.When a webhook or Wasm policy returns a JSON body that omits the
responsefield (e.g.,{}),json.Unmarshalsucceeds withResponseas a nil pointer. The subsequent dereference panics,crashing the controller.
Both HA controller replicas pull the same FleetAutoscaler from the
shared workqueue and crash on the same item, producing CrashLoopBackOff
on both replicas. FleetAutoscaler reconciliation halts cluster-wide
until the resource is removed by an admin.
Affected code
pkg/fleetautoscalers/fleetautoscalers.go:applyWebhookPolicy:faResp.Response.Scaleaccessed without nil checkapplyWasmPolicy:review.Response.Scaleaccessed without nil checkReproduction
Tested against
helm install agonesrelease 1.57.0 in a kind cluster.{}to all POST requests:agones-controllerpods enter CrashLoopBackOff within 30s.Stack trace
panic: runtime error: invalid memory address or nil pointer dereference
agones.dev/agones/pkg/fleetautoscalers.applyWebhookPolicy(...)
.../pkg/fleetautoscalers/fleetautoscalers.go:355 +0x974
[recovered, repanicked]
Fix
A nil check on
Responsebefore dereferencing at both sites.A PR with the fix is forthcoming.