feat(vector): implement spooling logic #229

JoshFerge · 2025-01-31T22:20:36Z

Right now when a vector request fails, we simply drop the uptime check result. This is not acceptable. The following PR does the following:

Introduces two new config items retry_vector_errors_forever

If retry_vector_errors_forever is set to True, we will retry making the vector request for a batch forever if the request fails due to a non 2XX or other error. Messages will spool on the channel in the meantime. Given the current machine size and requests per second a pod handles, this should give us roughly 2.5 days of spooling before the service fails from an OOM. (assuming 500 byte check result size, and 4GB pod memory limit).

We also assume for the sake of this idea that we will produce vector results faster than we will uptime checks. I assume this a safe assumption, although if vector is having problems, the queue could still increase while we're producing successfully to vector.

Introduces a metric for vector request worker queue size, and vector response time.

We can create monitors on both of these metrics, and an aggressive queue size monitor should allow for plenty of time for the on-call to take action before a queue fills up & process dies.

In a future PR, we could consider a strategy like dropping all successful uptime checks, if the queue is too full. can evaluate options with the team

src/app/config.rs

wedamija

To some extent I wonder if it's even worth having the max number of retries, if we have days of spooling buffer available

src/producer/vector_producer.rs

src/app/config.rs

evanpurkhiser · 2025-02-11T21:29:42Z

src/producer/vector_producer.rs

+                if let Err(e) = {
+                    let start = std::time::Instant::now();
+                    let result = send_batch(
+                        batch_to_send,
+                        client.clone(),
+                        config.endpoint.clone(),
+                        config.retry_vector_errors_forever,
+                    )
+                    .await;
+                    metrics::histogram!("vector_producer.send_batch.duration", "uptime_region" => config.region.clone(), "histogram" => "timer").record(start.elapsed().as_secs_f64());
+                    result
+                } {


nit: Having this whoel block inside the if is getting a bit unweildy, I think we could just do the if on the result itself

if let Err(e) = result { // ... }

evanpurkhiser

Looks good. I think we could clean up some of the metric and logs after in this module to make them all more consistent maybe

JoshFerge requested a review from a team as a code owner January 31, 2025 22:20

evanpurkhiser reviewed Jan 31, 2025

View reviewed changes

src/app/config.rs Show resolved Hide resolved

wedamija reviewed Feb 3, 2025

View reviewed changes

src/producer/vector_producer.rs Show resolved Hide resolved

src/app/config.rs Outdated Show resolved Hide resolved

src/app/config.rs Outdated Show resolved Hide resolved

JoshFerge force-pushed the jferg/vector-retries branch from d95dd3d to d215f4a Compare February 4, 2025 16:38

JoshFerge requested review from wedamija and evanpurkhiser February 4, 2025 22:33

JoshFerge changed the title ~~feat(vector): implement spooling / retry logic~~ feat(vector): implement spooling logic Feb 4, 2025

wedamija approved these changes Feb 11, 2025

View reviewed changes

evanpurkhiser reviewed Feb 11, 2025

View reviewed changes

src/app/config.rs Outdated Show resolved Hide resolved

JoshFerge and others added 13 commits February 11, 2025 13:33

init vector retries

bfe63b4

fmt

519661d

improve logic and add max delay

1f492a8

z

76eab03

add retry forever

5f20391

add retries back in

38ffd41

metrics

5991dd3

refactor worker args

59b45bb

🛠️ apply pre-commit fixes

2f5aca9

remove retries

5f03319

remove retry test

cda60af

🛠️ apply pre-commit fixes

d4e1f5c

remove unused declaration

2cff19b

JoshFerge force-pushed the jferg/vector-retries branch from 35f7731 to 2cff19b Compare February 11, 2025 18:36

fix tests

0431ad4

evanpurkhiser reviewed Feb 11, 2025

View reviewed changes

evanpurkhiser approved these changes Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vector): implement spooling logic #229

feat(vector): implement spooling logic #229

JoshFerge commented Jan 31, 2025 •

edited

Loading

wedamija left a comment

evanpurkhiser Feb 11, 2025

evanpurkhiser left a comment

feat(vector): implement spooling logic #229

Are you sure you want to change the base?

feat(vector): implement spooling logic #229

Conversation

JoshFerge commented Jan 31, 2025 • edited Loading

wedamija left a comment

Choose a reason for hiding this comment

evanpurkhiser Feb 11, 2025

Choose a reason for hiding this comment

evanpurkhiser left a comment

Choose a reason for hiding this comment

JoshFerge commented Jan 31, 2025 •

edited

Loading