Add store performance grafana alerts by jagathweerasinghe-da · Pull Request #5255 · canton-network/splice

jagathweerasinghe-da · 2026-04-27T13:44:44Z

No description provided.

OriolMunoz-da · 2026-04-28T12:28:00Z

+              conditions:
+                - evaluator:
+                    type: gt
+                    params: [300]


My main worry is, with an example:

Ingestion test now takes ~200s

A day later somebody does a change that makes ingestion 25% slower (that's very significant)

Now the test takes 250s, which would still be < 300, we wouldn't alert, and likely nobody would notice

This is fine as an approach (per the design doc, "the CI run fails if the performance is below expected throughputs"), but then:

These thresholds need to be tight enough that we'd detect big swings, but not flake (too often, I guess). They might be already, I cannot tell from this PR only

We need to document our expectations to contributors wrt these tests (DEVELOPMENT.md/CONTRIBUTING.md, maybe?). From the design doc: "We expect that most changes won’t affect ingestion performance and for the ones that do the PR author runs the performance test. We can still switch to a more restrictive approach if we see that it is necessary."

Likewise we should document for us what to do if these alerts get triggered (debug/check commits/bump the threshold here...)

We should document (a comment here is fine) how/when these thresholds were set for reference. E.g. "performance tests take 280s~ as of yyyy-mm-dd"

Let's add the documentation to PERFORMANCE.md as suggested by the design doc.

Working on determining the threshold. We need to take into the datasize/batchsize as well as the dataset changes over time.

OriolMunoz-da · 2026-04-28T12:29:55Z

btw, do you know where do the alerts from the splice cluster end up?

It ends up in team-canton-network-internal-ci
./cluster/deployment/splice/.envrc.vars
export SLACK_ALERT_NOTIFICATION_CHANNEL_FULL_NAME="team-canton-network-internal-ci"

pretty sure that that channel is mostly ignored in favor of the test failures dashboard, you might want to ask for opinions on #internal

@OriolMunoz-da

As per the internal discussion we had, let's go with the GH issue. Thanks for bringing this point:

There are two routes to create a GH issue:

GH Action->Prometheous->Grafana Alerts->GH Issue

GH Action->GH Issue

I would like go ahead with the second approach, which makes the integration clean and simple. WDYT?

yeah, 2 seems easiest

OriolMunoz-da · 2026-04-28T12:31:17Z

+              to: 0
+            datasourceUid: prometheus
+            model:
+              expr: splice_perf_ingestion_total_time_ns{test="ScanStoreIngestionPerformanceTest"} / 1e9


is there no easy way to have a single alert instead of a separate one? I'd like to not have to add an alert when we add a test, which is easy to forget

I can imagine that the different threshold per test makes it difficult, but maybe there's a way to do that

Simplified the rules, so that the changes needed when adding a new test is minimum.

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

jagathweerasinghe-da requested a review from OriolMunoz-da April 27, 2026 16:00

OriolMunoz-da reviewed Apr 28, 2026

View reviewed changes

jagathweerasinghe-da force-pushed the wee/add_store_performance_grafana_alerts branch 2 times, most recently from 3395bfa to 2d0d565 Compare April 29, 2026 13:30

jagathweerasinghe-da added 8 commits April 30, 2026 14:33

add grafana alerts for store ingestion

fe2e4a1

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

fix message

b146fb6

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

add read performance alerts

9f5bf86

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

format

988a4e7

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

simplify alert rules

aba338f

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

simplify read alert rules

71a2d06

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

load alerts

d17243d

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

add gh issue creation

e96334e

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

jagathweerasinghe-da force-pushed the wee/add_store_performance_grafana_alerts branch from 85264ee to e96334e Compare April 30, 2026 14:34

jagathweerasinghe-da added 5 commits April 30, 2026 14:36

lower the threshold to create a test gh issue

8f74156

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

temp update for testing

a35a23c

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

temp disable event name

6288396

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

install pyyaml

55c79db

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

use json

ac8f260

[static] Signed-off-by: Jagath Weerasinghe <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add store performance grafana alerts#5255

Add store performance grafana alerts#5255
jagathweerasinghe-da wants to merge 13 commits intomainfrom
wee/add_store_performance_grafana_alerts

jagathweerasinghe-da commented Apr 27, 2026

Uh oh!

OriolMunoz-da Apr 28, 2026

Uh oh!

jagathweerasinghe-da Apr 29, 2026

Uh oh!

jagathweerasinghe-da Apr 29, 2026

Uh oh!

OriolMunoz-da Apr 28, 2026

Uh oh!

jagathweerasinghe-da Apr 29, 2026 •

edited

Loading

Uh oh!

OriolMunoz-da Apr 29, 2026

Uh oh!

jagathweerasinghe-da Apr 30, 2026

Uh oh!

OriolMunoz-da Apr 30, 2026

Uh oh!

OriolMunoz-da Apr 28, 2026

Uh oh!

jagathweerasinghe-da Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jagathweerasinghe-da commented Apr 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jagathweerasinghe-da Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jagathweerasinghe-da Apr 29, 2026 •

edited

Loading