Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
use std::sync::{Arc, Mutex};
use std::{
sync::{Arc, Mutex},
time::{Duration, Instant},
};

use futures_util::{stream::FuturesUnordered, StreamExt};
use log::{error, info, trace, warn};
Expand Down Expand Up @@ -242,23 +245,27 @@ where
execution_permit: OwnedSemaphorePermit,
result_sender: broadcast::Sender<BroadcastedIntentExecutionResult>,
) {
let instant = Instant::now();
let result = executor
.execute(intent.inner.clone(), persister)
.await
.inspect_err(|err| {
error!(
"Failed to execute BaseIntent. id: {}. {:?}",
"Failed to execute BaseIntent. id: {}. {}",
intent.id, err
)
})
});

// Metrics
Self::execution_metrics(instant.elapsed(), &intent, &result);

let result = result
.map(|output| ExecutionOutputWrapper {
id: intent.id,
trigger_type: intent.trigger_type,
output,
})
.map_err(|err| {
// Increase failed intents metric as well
metrics::inc_committor_failed_intents_count();
(intent.inner.id, intent.trigger_type, Arc::new(err))
});
Comment on lines +248 to 270
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

LGTM: Timing instrumentation properly integrated.

The execution timing is correctly captured with Instant::now() before the async operation and metrics collection occurs after execution completes. The metrics are collected before result mapping, which correctly preserves the original error types for labeling.

Minor observation: Line 254 changed error logging from {:?} (Debug) to {} (Display). This simplifies the log output but may reduce detail if the error's Display implementation is less verbose than Debug.

🤖 Prompt for AI Agents
In
magicblock-committor-service/src/intent_execution_manager/intent_execution_engine.rs
around lines 248 to 270, the error logging was changed from using the Debug
formatter ({:?}) to the Display formatter ({}), which can lose valuable
diagnostic detail; update the error logging to use Debug formatting (e.g.,
"{:?}") or explicitly include both Display and Debug to ensure full error
details are logged for troubleshooting.


Expand All @@ -280,6 +287,47 @@ where
// Free worker
drop(execution_permit);
}

/// Records metrics related to intent execution
fn execution_metrics(
execution_time: Duration,
intent: &ScheduledBaseIntentWrapper,
result: &IntentExecutorResult<ExecutionOutput>,
) {
const EXECUTION_TIME_THRESHOLD: f64 = 2.0;

let intent_execution_secs = execution_time.as_secs_f64();
metrics::observe_committor_intent_execution_time_histogram(
intent_execution_secs,
intent,
result,
);
if let Err(ref err) = result {
metrics::inc_committor_failed_intents_count(intent, err);
}

// Loki alerts
if intent_execution_secs >= EXECUTION_TIME_THRESHOLD {
info!(
"Intent took too long to execute: {}s. {}",
intent_execution_secs,
result
.as_ref()
.map(|_| "succeeded".to_string())
.unwrap_or_else(|err| format!("{err:?}"))
);
} else {
trace!("Seconds took to execute intent: {}", intent_execution_secs);
}

// Alert
if intent.is_undelegate() && result.is_err() {
warn!(
"Intent execution resulted in stuck accounts: {:?}",
intent.get_committed_pubkeys()
);
}
}
}

/// Worker tests
Expand Down
12 changes: 12 additions & 0 deletions magicblock-committor-service/src/intent_executor/error.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use log::error;
use magicblock_metrics::metrics;
use magicblock_rpc_client::{
utils::TransactionErrorMapper, MagicBlockRpcClientError,
};
Expand Down Expand Up @@ -100,6 +101,17 @@ impl IntentExecutorError {
}
}

impl metrics::LabelValue for IntentExecutorError {
fn value(&self) -> &str {
match self {
IntentExecutorError::ActionsError(_, _) => "actions_failed",
IntentExecutorError::CpiLimitError(_, _) => "cpi_limit_failed",
IntentExecutorError::CommitIDError(_, _) => "commit_nonce_failed",
_ => "failed",
}
}
}

/// Those are the errors that may occur during Commit/Finalize stages on Base layer
#[derive(thiserror::Error, Debug)]
pub enum TransactionStrategyExecutionError {
Expand Down
78 changes: 74 additions & 4 deletions magicblock-committor-service/src/intent_executor/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ use std::{ops::ControlFlow, sync::Arc, time::Duration};
use async_trait::async_trait;
use futures_util::future::try_join_all;
use log::{error, trace, warn};
use magicblock_metrics::metrics;
use magicblock_program::{
magic_scheduled_base_intent::ScheduledBaseIntent,
validator::validator_authority,
Expand All @@ -18,8 +19,8 @@ use magicblock_rpc_client::{
decide_rpc_error_flow, map_magicblock_client_error,
send_transaction_with_retries, SendErrorMapper, TransactionErrorMapper,
},
MagicBlockSendTransactionConfig, MagicBlockSendTransactionOutcome,
MagicblockRpcClient,
MagicBlockRpcClientError, MagicBlockSendTransactionConfig,
MagicBlockSendTransactionOutcome, MagicblockRpcClient,
};
use solana_pubkey::Pubkey;
use solana_sdk::{
Expand Down Expand Up @@ -68,6 +69,18 @@ pub enum ExecutionOutput {
},
}

impl metrics::LabelValue for ExecutionOutput {
fn value(&self) -> &str {
match self {
Self::SingleStage(_) => "single_stage_succeeded",
Self::TwoStage {
commit_signature: _,
finalize_signature: _,
} => "two_stage_succeeded",
}
}
}

#[async_trait]
pub trait IntentExecutor: Send + Sync + 'static {
/// Executes Message on Base layer
Expand Down Expand Up @@ -663,6 +676,59 @@ where
)
.await
}

async fn intent_metrics(
rpc_client: MagicblockRpcClient,
execution_outcome: ExecutionOutput,
) {
use solana_transaction_status_client_types::EncodedTransactionWithStatusMeta;
fn extract_cu(tx: EncodedTransactionWithStatusMeta) -> Option<u64> {
let cu = tx.meta?.compute_units_consumed;
cu.into()
}

let cu_metrics = || async {
match execution_outcome {
ExecutionOutput::SingleStage(signature) => {
let tx =
rpc_client.get_transaction(&signature, None).await?;
Ok::<_, MagicBlockRpcClientError>(extract_cu(
tx.transaction,
))
}
ExecutionOutput::TwoStage {
commit_signature,
finalize_signature,
} => {
let commit_tx = rpc_client
.get_transaction(&commit_signature, None)
.await?;
let finalize_tx = rpc_client
.get_transaction(&finalize_signature, None)
.await?;
let commit_cu = extract_cu(commit_tx.transaction);
let finalize_cu = extract_cu(finalize_tx.transaction);
let (Some(commit_cu), Some(finalize_cu)) =
(commit_cu, finalize_cu)
else {
return Ok(None);
};
Ok(Some(commit_cu + finalize_cu))
}
}
};

match cu_metrics().await {
Ok(Some(cu)) => metrics::set_commmittor_intent_cu_usage(
i64::try_from(cu).unwrap_or(i64::MAX),
),
Err(err) => warn!(
"Failed to fetch CUs for intent: {:?}. {:?}",
err, execution_outcome
),
_ => {}
}
}
}

#[async_trait]
Expand Down Expand Up @@ -692,8 +758,12 @@ where
// Write result of intent into Persister
Self::persist_result(&persister, &result, message_id, &pubkeys);
}

result
result.inspect(|output| {
tokio::spawn(Self::intent_metrics(
self.rpc_client.clone(),
*output,
));
})
}
}

Expand Down
15 changes: 14 additions & 1 deletion magicblock-committor-service/src/types.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
use std::ops::Deref;

use magicblock_program::magic_scheduled_base_intent::ScheduledBaseIntent;
use magicblock_metrics::metrics;
use magicblock_program::magic_scheduled_base_intent::{
MagicBaseIntent, ScheduledBaseIntent,
};

// TODO: should be removed once cranks are supported
// Ideally even now OffChain/"Manual" commits should be triggered via Tx
Expand All @@ -16,6 +19,16 @@ pub struct ScheduledBaseIntentWrapper {
pub trigger_type: TriggerType,
}

impl metrics::LabelValue for ScheduledBaseIntentWrapper {
fn value(&self) -> &str {
match &self.inner.base_intent {
MagicBaseIntent::BaseActions(_) => "actions",
MagicBaseIntent::Commit(_) => "commit",
MagicBaseIntent::CommitAndUndelegate(_) => "commit_and_undelegate",
}
}
}

impl Deref for ScheduledBaseIntentWrapper {
type Target = ScheduledBaseIntent;

Expand Down
50 changes: 45 additions & 5 deletions magicblock-metrics/src/metrics/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ use prometheus::{
Histogram, HistogramOpts, HistogramVec, IntCounter, IntCounterVec,
IntGauge, IntGaugeVec, Opts, Registry,
};
pub use types::{AccountClone, AccountCommit, Outcome};
pub use types::{AccountClone, AccountCommit, LabelValue, Outcome};

mod types;

// -----------------
Expand Down Expand Up @@ -181,13 +182,32 @@ lazy_static::lazy_static! {
"committor_intent_backlog_count", "Number of intents in backlog",
).unwrap();

static ref COMMITTOR_FAILED_INTENTS_COUNT: IntCounter = IntCounter::new(
"committor_failed_intents_count", "Number of failed to be executed intents",
static ref COMMITTOR_FAILED_INTENTS_COUNT: IntCounterVec = IntCounterVec::new(
Opts::new("committor_failed_intents_count", "Number of failed to be executed intents"),
&["intent_kind", "error_kind"]
).unwrap();

static ref COMMITTOR_EXECUTORS_BUSY_COUNT: IntGauge = IntGauge::new(
"committor_executors_busy_count", "Number of busy intent executors"
).unwrap();

static ref COMMITTOR_INTENT_EXECUTION_TIME_HISTOGRAM: HistogramVec = HistogramVec::new(
HistogramOpts::new(
"committor_intent_execution_time_histogram",
"Time in seconds spent on intent execution"
)
.buckets(
MILLIS_1_9.iter()
.chain(MILLIS_10_90.iter())
.chain(MILLIS_100_900.iter())
.chain(SECONDS_1_9.iter()).cloned().collect(),
),
&["intent_kind", "outcome_kind"],
).unwrap();
Comment on lines +194 to +206
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding microsecond buckets for sub-millisecond executions.

The histogram starts at 1ms buckets, which may not capture fast-executing intents that complete in microseconds. If intent execution can be faster than 1ms, consider adding MICROS_10_90 and MICROS_100_900 buckets to the chain.

If you're unsure whether intents can execute in sub-millisecond time, you could:

  1. Check existing logs or traces for execution times
  2. Monitor the new histogram initially to see if values cluster below 1ms
  3. Add the microsecond buckets preemptively if the overhead is acceptable
🤖 Prompt for AI Agents
In magicblock-metrics/src/metrics/mod.rs around lines 194 to 206, the histogram
buckets start at 1ms and may miss sub-millisecond executions; update the bucket
chain to include MICROS_10_90 and MICROS_100_900 before the millisecond buckets
(and add the necessary imports/constants if not present) so the chain becomes
micros then millis, ensure .cloned().collect() still applies, and run
tests/build to verify no import or type errors.


static ref COMMITTOR_INTENT_CU_USAGE: IntGauge = IntGauge::new(
"committor_intent_cu_usage", "Compute units used for Intent"
).unwrap();
}

pub(crate) fn register() {
Expand Down Expand Up @@ -223,6 +243,7 @@ pub(crate) fn register() {
register!(COMMITTOR_INTENTS_BACKLOG_COUNT);
register!(COMMITTOR_FAILED_INTENTS_COUNT);
register!(COMMITTOR_EXECUTORS_BUSY_COUNT);
register!(COMMITTOR_INTENT_EXECUTION_TIME_HISTOGRAM);
register!(ENSURE_ACCOUNTS_TIME);
Comment on lines 244 to 247
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Register the CU usage gauge.

COMMITTOR_INTENT_CU_USAGE is never registered, so Prometheus will never expose the values you’re setting. Please register it alongside the other committor collectors.

         register!(COMMITTOR_FAILED_INTENTS_COUNT);
         register!(COMMITTOR_EXECUTORS_BUSY_COUNT);
+        register!(COMMITTOR_INTENT_CU_USAGE);
         register!(COMMITTOR_INTENT_EXECUTION_TIME_HISTOGRAM);
🤖 Prompt for AI Agents
In magicblock-metrics/src/metrics/mod.rs around lines 244 to 247, the
COMMITTOR_INTENT_CU_USAGE gauge is not being registered with Prometheus; add a
register!(COMMITTOR_INTENT_CU_USAGE); call alongside the other committor
collectors (e.g., after COMMITTOR_INTENT_EXECUTION_TIME_HISTOGRAM) so the gauge
is exposed and its values are scraped.

register!(RPC_REQUEST_HANDLING_TIME);
register!(TRANSACTION_PROCESSING_TIME);
Expand Down Expand Up @@ -318,10 +339,29 @@ pub fn set_committor_intents_backlog_count(value: i64) {
COMMITTOR_INTENTS_BACKLOG_COUNT.set(value)
}

pub fn inc_committor_failed_intents_count() {
COMMITTOR_FAILED_INTENTS_COUNT.inc()
pub fn inc_committor_failed_intents_count(
intent_kind: &impl LabelValue,
error_kind: &impl LabelValue,
) {
COMMITTOR_FAILED_INTENTS_COUNT
.with_label_values(&[intent_kind.value(), error_kind.value()])
.inc()
}

pub fn set_committor_executors_busy_count(value: i64) {
COMMITTOR_EXECUTORS_BUSY_COUNT.set(value)
}

pub fn observe_committor_intent_execution_time_histogram(
seconds: f64,
kind: &impl LabelValue,
outcome: &impl LabelValue,
) {
COMMITTOR_INTENT_EXECUTION_TIME_HISTOGRAM
.with_label_values(&[kind.value(), outcome.value()])
.observe(seconds);
}

pub fn set_commmittor_intent_cu_usage(value: i64) {
COMMITTOR_INTENT_CU_USAGE.set(value)
Comment on lines +365 to +366
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Fix the typo in the public setter.

The new API is exposed as set_commmittor_intent_cu_usage (three m’s); everything else—metric name, other helpers, call sites—uses committor. Please rename it now (and update the call site in intent_executor/mod.rs) before we publish the misspelt API.

-pub fn set_commmittor_intent_cu_usage(value: i64) {
+pub fn set_committor_intent_cu_usage(value: i64) {
     COMMITTOR_INTENT_CU_USAGE.set(value)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn set_commmittor_intent_cu_usage(value: i64) {
COMMITTOR_INTENT_CU_USAGE.set(value)
pub fn set_committor_intent_cu_usage(value: i64) {
COMMITTOR_INTENT_CU_USAGE.set(value)
}
🤖 Prompt for AI Agents
In magicblock-metrics/src/metrics/mod.rs around lines 365-366 the public setter
is misspelt as set_committor_intent_cu_usage (it contains an extra 'm'); rename
the function to the correct set_committor_intent_cu_usage, update the function
declaration and all references (notably the call site in intent_executor/mod.rs)
to the corrected name, then run cargo check/tests to ensure no remaining
references to the old name.

}
17 changes: 17 additions & 0 deletions magicblock-metrics/src/metrics/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,20 @@ pub enum AccountCommit<'a> {
CommitOnly { pubkey: &'a str, outcome: Outcome },
CommitAndUndelegate { pubkey: &'a str, outcome: Outcome },
}

pub trait LabelValue {
fn value(&self) -> &str;
}

impl<T, E> LabelValue for Result<T, E>
where
T: LabelValue,
E: LabelValue,
{
fn value(&self) -> &str {
match self {
Ok(ok) => ok.value(),
Err(err) => err.value(),
}
}
}