Skip to content

panic when benchmark schedule plugin v2 #1227

@YaoZengzeng

Description

@YaoZengzeng

What happened:

I tried to benchmark plugin v2 with benchmark tool of sglang, ref: #768

The benchmark config is as follow:

"--host", "10.247.244.186", "--backend", "vllm", "--port", "80", "--model", "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B", "--tokenizer", "/root/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-7B/snapshots/916b56a44061fd5cd7d6a8fb632557ed4f724f60/", "--dataset-name", "generated-shared-prefix", "--gsp-num-groups", "256", "--gsp-prompts-per-group", "32", "--gsp-system-prompt-len", "4096", "--gsp-question-len", "256", "--gsp-output-len", "128", "--request-rate", "800", "--max-concurrency", "300"

The panic log is:

fatal error: concurrent map iteration and map write

goroutine 1902 [running]:
internal/runtime/maps.fatal({0x22f3337?, 0x47a1cf?})
	/usr/local/go/src/runtime/panic.go:1058 +0x18
internal/runtime/maps.(*Iter).Next(0xc00074b560?)
	/usr/local/go/src/internal/runtime/maps/table.go:683 +0x86
sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/framework/plugins/multi/prefix.(*Plugin).matchLongestPrefix(0xc00024fea0, {0x2634990?, 0xc00571e690?}, {0xc005676000, 0x101, 0x0?})
	/src/pkg/epp/scheduling/framework/plugins/multi/prefix/plugin.go:226 +0x28f
sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/framework/plugins/multi/prefix.(*Plugin).Score(0xc00024fea0, {0x2634990, 0xc00571e690}, 0xc00704e7e0, 0xc004f20080, {0xc00704e540, 0x3, 0x2?})
	/src/pkg/epp/scheduling/framework/plugins/multi/prefix/plugin.go:174 +0xe5
sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/framework.(*SchedulerProfile).runScorerPlugins(0xc00075bec0, {0x2634990, 0xc00571e690}, 0xc004f20080, 0xc00704e7e0, {0xc00704e540, 0x3, 0x3})
	/src/pkg/epp/scheduling/framework/scheduler_profile.go:156 +0x374
sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/framework.(*SchedulerProfile).Run(0xc00075bec0, {0x2634990, 0xc00571e690}, 0xc004f20080, 0xc00704e7e0, {0xc00704e540?, 0x2?, 0x2?})
	/src/pkg/epp/scheduling/framework/scheduler_profile.go:115 +0xca
sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling.(*Scheduler).Schedule(0xc00000f3f8, {0x2634990, 0xc00571e690}, 0xc004f20080, {0xc00704e540, 0x3, 0x3})
	/src/pkg/epp/scheduling/scheduler.go:76 +0x92c
sigs.k8s.io/gateway-api-inference-extension/pkg/epp/requestcontrol.(*Director).HandleRequest(0xc00077c000, {0x2634990, 0xc005b9fb90}, 0xc003c26a00)
	/src/pkg/epp/requestcontrol/director.go:150 +0x794
sigs.k8s.io/gateway-api-inference-extension/pkg/epp/handlers.(*StreamingServer).Process(0xc000d328c0, {0x2640488, 0xc002560580})
	/src/pkg/epp/handlers/server.go:218 +0xb6d
github.com/envoyproxy/go-control-plane/envoy/service/ext_proc/v3._ExternalProcessor_Process_Handler({0x2152f60?, 0xc000d328c0}, {0x263c528, 0xc0056c6620})
	/go/pkg/mod/github.com/envoyproxy/go-control-plane/[email protected]/service/ext_proc/v3/external_processor_grpc.pb.go:106 +0xd8
google.golang.org/grpc.(*Server).processStreamingRPC(0xc00014ea00, {0x2634990, 0xc0008fd6e0}, 0xc000d99860, 0xc000a00030, 0x3978a20, 0x0)
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1695 +0x1252
google.golang.org/grpc.(*Server).handleStream(0xc00014ea00, {0x2634f20, 0xc000d8e000}, 0xc000d99860)
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1819 +0xb47
google.golang.org/grpc.(*Server).serveStreams.func2.1()
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1035 +0x7f
created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 52
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1046 +0x11d

It appears to be caused by the improper manipulation of the map.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Inference extension version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • Install tools:
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions