-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add metrics for wf engine #2076
base: 02-22-fix_various_bug_fixes
Are you sure you want to change the base?
fix: add metrics for wf engine #2076
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR adds comprehensive metrics tracking to the workflow engine. Here's a summary of the key changes:
- Added timing metrics for workflow operations including signal dispatch, message handling, and sub-workflow execution using
Instant::now()
measurements - Added new metrics in
metrics.rs
for tracking workflow states, signal handling latencies, and message processing durations with proper labeling - Added SQLite performance testing module with worker tasks to measure database operation latencies
- Changed SQLite synchronization mode from Normal to Full for better data consistency, trading some performance for reliability
- Added proper error handling and metric recording across workflow operations with standardized metric naming conventions
The changes appear well-structured and improve observability of the workflow engine's performance. However, there are a few potential concerns:
- The SQLite synchronization mode change could impact performance and should be carefully monitored
- Some metric labels use empty strings which may affect metric cardinality and querying
- The large number of changed files suggests this PR may be mixing multiple concerns beyond just metrics
- Some debug logging statements look like temporary development code and should be cleaned up
64 file(s) reviewed, 19 comment(s)
Edit PR Review Bot Settings | Greptile
[dev-dependencies] | ||
anyhow = "1.0.82" | ||
rand = "0.8" No newline at end of file | ||
rand = "0.8" | ||
statrs = "0.18" | ||
dirs = "5.0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Duplicate dependency anyhow
defined in both dependencies and dev-dependencies sections. Consider removing from dev-dependencies since it's already available through main dependencies.
[dev-dependencies] | |
anyhow = "1.0.82" | |
rand = "0.8" | |
\ No newline at end of file | |
rand = "0.8" | |
statrs = "0.18" | |
dirs = "5.0.1" | |
[dev-dependencies] | |
rand = "0.8" | |
statrs = "0.18" | |
dirs = "5.0.1" |
.with_label_values(&["", T::NAME]) | ||
.observe(dt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Empty string label seems unnecessary and inconsistent. Consider removing the empty string or documenting why it's needed.
let dt = start_instant.elapsed().as_secs_f64(); | ||
metrics::SIGNAL_SEND_DURATION | ||
.with_label_values(&["", T::NAME]) | ||
.observe(dt); | ||
metrics::SIGNAL_PUBLISHED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Duration metric should be recorded before incrementing the published counter in case of failures
.with_label_values(&["", M::NAME]) | ||
.observe(dt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Empty string label prefix seems incorrect. Should document why this empty string is needed or remove it if unnecessary.
let dt = start_instant.elapsed().as_secs_f64(); | ||
metrics::MESSAGE_SEND_DURATION | ||
.with_label_values(&["", M::NAME]) | ||
.observe(dt); | ||
metrics::MESSAGE_PUBLISHED | ||
.with_label_values(&[M::NAME]) | ||
.with_label_values(&["", M::NAME]) | ||
.inc(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Metrics are recorded even if message send fails. Consider moving metrics recording into the success path only.
pub struct DbDataKey { | ||
db_name_segment: Arc<Vec<u8>> | ||
db_name_segment: Arc<Vec<u8>>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: The struct field visibility is private but the struct itself is public. Consider documenting why this design choice was made or making the field public if it needs to be accessed externally.
root.server.as_mut().unwrap().foundationdb = Some(Default::default()); | ||
let config = rivet_config::Config::from_root(root); | ||
let mut root = rivet_config::config::Root::default(); | ||
root.server.as_mut().unwrap().foundationdb = Some(Default::default()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: unwrap() on Option without error handling could panic if server is None
root.server.as_mut().unwrap().foundationdb = Some(Default::default()); | |
root.server.get_or_insert_with(Default::default).foundationdb = Some(Default::default()); |
let date = if print_ts > 1 { | ||
datetime.format("%Y-%m-%d %H:%M:%S%.3f") | ||
} else { | ||
datetime.format("%Y-%m-%d %H:%M:%S") | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider extracting timestamp formatting logic into a helper function since it's repeated in multiple places
|
||
pub static ref INSERT_COMMANDS_ACQUIRE_DURATION: HistogramVec = register_histogram_vec_with_registry!( | ||
"pegboard_client_insert_commands_acquire_duration", | ||
"TODO REMOVE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: The description 'TODO REMOVE' suggests this metric is temporary but doesn't explain why. Either remove the metric now or document its actual purpose.
"pegboard_client_insert_commands_acquire_duration", | ||
"TODO REMOVE", | ||
&["workflow_id"], | ||
BUCKETS.to_vec(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: BUCKETS.to_vec() creates a new vector allocation for each histogram. Consider using BUCKETS directly if possible to avoid unnecessary allocations.
ef7f175
to
51a9774
Compare
51a9774
to
dba8d32
Compare
4eff98d
to
ed680a5
Compare
ed680a5
to
2051a35
Compare
dba8d32
to
dbc2e30
Compare
2051a35
to
ed680a5
Compare
Deploying rivet-hub with
|
Latest commit: |
dba8d32
|
Status: | ✅ Deploy successful! |
Preview URL: | https://63951b92.rivet-hub-7jb.pages.dev |
Branch Preview URL: | https://02-25-fix-add-metrics-for-wf.rivet-hub-7jb.pages.dev |
dbc2e30
to
dba8d32
Compare
ed680a5
to
2051a35
Compare
dba8d32
to
dbc2e30
Compare
dbc2e30
to
dba8d32
Compare
2051a35
to
ed680a5
Compare
ed680a5
to
2051a35
Compare
dba8d32
to
dbc2e30
Compare
2051a35
to
ed680a5
Compare
dbc2e30
to
dba8d32
Compare
ed680a5
to
2051a35
Compare
dba8d32
to
dbc2e30
Compare
Changes