Skip to content

Make topKMetrics thread-safe in metrics limiter #1738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lukeina2z
Copy link

The shared topKMetrics data structure did not include protection for concurrent access, which led to crashes. Below is a sample log illustrating the issue.

This code change introduces a read/write mutex to ensure thread safety and prevent data races.

All unit tests pass, and manual end-to-end tests have been successfully completed.


fatal error: concurrent map writes
goroutine 180863 [running]:
internal/runtime/maps.fatal({0x552d503?, 0x22?})
runtime/panic.go:1058 +0x18
github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol.(*topKMetrics).Push(0xc000bd56e0, 0x5d2e590?, 0xc00145f780)
github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol/metrics_limiter.go:310 +0x1c5
github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol.(*service).InsertMetricDataToSecondary(0xc000a8e240, 0xc00145f700)
github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol/metrics_limiter.go:209 +0xec
github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol.(*MetricsLimiter).Admit(0xc000ad4050, {0xc000df1fc9, 0x5}, {0xc00081c1c0?, 0xc000df1f08?}, {0xc001156618?, 0xc000df1f08?}) --------------

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

The shared topKMetrics data structure did not include protection for concurrent access, which led to crashes. Below is a sample log illustrating the issue.

This code change introduces a read/write mutex to ensure thread safety and prevent data races.

All unit tests pass, and manual end-to-end tests have been successfully completed.

--------------
fatal error: concurrent map writes
goroutine 180863 [running]:
internal/runtime/maps.fatal({0x552d503?, 0x22?})
    runtime/panic.go:1058 +0x18
github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol.(*topKMetrics).Push(0xc000bd56e0, 0x5d2e590?, 0xc00145f780)
    github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol/metrics_limiter.go:310 +0x1c5
github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol.(*service).InsertMetricDataToSecondary(0xc000a8e240, 0xc00145f700)
    github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol/metrics_limiter.go:209 +0xec
github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol.(*MetricsLimiter).Admit(0xc000ad4050, {0xc000df1fc9, 0x5}, {0xc00081c1c0?, 0xc000df1f08?}, {0xc001156618?, 0xc000df1f08?})
--------------
@lukeina2z lukeina2z requested a review from a team as a code owner June 19, 2025 00:04
@bjrara
Copy link
Collaborator

bjrara commented Jun 19, 2025

Can we add unit tests for the concurrency scenario?

@lukeina2z
Copy link
Author

Can we add unit tests for the concurrency scenario?

Could you please elaborate on your question? I didn’t quite understand it. How can we use unit tests to deterministically capture concurrency issues? Is there a sample unit test in this repository that I could learn from?

@lukeina2z
Copy link
Author

Run these commands and address all the warnings. We should get a good sense of where the race conditions are. I don't think we need to invest in new unit tests specifically for race conditions.

go test -race -parallel 7 -timeout 7m github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals

go test -race -parallel 7 -timeout 7m github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/common

go test -race -parallel 7 -timeout 7m github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/config

go test -race -parallel 7 -timeout 7m github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/attributes

go test -race -parallel 7 -timeout 7m github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol

go test -race -parallel 7 -timeout 7m github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/metrichandlers

go test -race -parallel 7 -timeout 7m github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/normalizer

go test -race -parallel 7 -timeout 7m github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/resolver

go test -race -parallel 7 -timeout 7m github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/rules

Copy link
Contributor

This PR was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label Jun 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants