Use tower trace layer #698

sergerad · 2025-02-16T04:11:38Z

Relates to #681.

WIP

sergerad · 2025-02-16T04:13:53Z

crates/block-producer/src/server.rs

+            )
+            .on_failure(
+                |_error: GrpcFailureClass, _latency: Duration, _span: &tracing::Span| todo!(),
+            );


@Mirko-von-Leipzig I agree that this is probably the correct abstraction to be using here (as you mentioned in this comment).

Any suggestions on how to test this locally? TY

Its definitely a bit awkward to test. There is the text exporter which at least lets you debug/inspect the manually.

If we do want to uni/integration test these things, we'll probably have to write our own exporter - e.g. something to aggregate the data which we can then assert against. I'm unsure why this isn't already available tbh.

What I've been doing in the meantime is using https://www.honeycomb.io/ with a free account to test things out. There's some setup info in the wip guide here.

I essentially start a local node, configure otel to use the honeycomb endpoint and then generate traffic using the miden-client integration test against my local node.

This might be a pita though - let me know if you run into issues, or if you can think of something smarter :D

Actually here we go: https://docs.rs/opentelemetry_sdk/latest/opentelemetry_sdk/testing/trace/index.html

They do actually have test infra :)

Thanks just getting back to this now. Will have a crack at integrating those trace test capabilities in the unit tests.

unit tests

Or integration tests rather. Which would involve allowing the node to be configured with different SpanExporters to allow the test one to be used.

Yeah I imagine this may take some more consideration, given that the exporters are (always?) global.

Might be good to simply explore the options and then discuss what is actually testable without much fuss.

Integration tests are a sore point at the moment - and likely will get worse before getting better. e.g. this thread. Its not quite trivial to just spin up a node at the moment, but depending on what you discover here maybe that's something we should aim at.

@Mirko-von-Leipzig This is how I think the testing would have to work:

Update a few functions to take in an impl SpanExporter.

Update node and faucet main.rs to construct either a test SpanExporter or an actual one based on flags or toml.

Implement tests that call the main functionality and assert on the SpanData coming out of the test SpanExporter. These would probably be the existing integration tests but I haven't had a look yet.

But I don't think it is worthwhile. I have seen Rust projects do similar things where they run unit or integration tests and assert against log lines produced by the stack. I have generally avoided this kind of thing in the past, but I can imagine situations where its worth the maintenance cost and brittleness (tests being coupled to log lines).

I think it would become practical to just test trace output changes on the dev deployment/cluster once the trace scaffolding/impl stabilizes for the node. Rather than testing every change locally or in CI.

I have connected to honeycomb successfully and eye-balled the info! output coming from my latest changes, e.g.:

2025-02-21T06:22:59.610587Z INFO block-producer.rpc/SubmitProvenTransaction: miden_node_block_producer::server: crates/block-producer/src/server.rs:218: request: POST /block_p roducer.Api/SubmitProvenTransaction {"te": "trailers", "content-type": "application/grpc", "traceparent": "00-20616281d08ea7970f0701e71dd2ef80-453f6f8340af0947-01", "tracestate": "", "user-agent": "tonic/0.12.3"}

Still getting my head around exactly what we want to achieve and whether the approach I've added so far fulfils that. The example above only has grpc related headers so I think I'll need to swap

let trace_layer = TraceLayer::new_for_grpc()

to

let trace_layer = TraceLayer::new_for_http()

But I don't think it is worthwhile. I have seen Rust projects do similar things where they run unit or integration tests and assert against log lines produced by the stack. I have generally avoided this kind of thing in the past, but I can imagine situations where its worth the maintenance cost and brittleness (tests being coupled to log lines).

I agree; isn't worth it especially not at our current project maturity level. I think I was hoping we could do something like:

#[test] async fn block_builder_trace() { let store = mock_store(); let exporter = TestSpanExporter::new()...; exporter.register_for_this_test_only(); BlockBuilder::build_block(...).await; assert_eq!(exporter.spans, expected_spans); }

But I suspect its not trivially possible. Though maybe there is a way to have the global exporter registered as TestSpanExporter and somehow associate the spans we get from this test only.

The example above only has grpc related headers so I think I'll need to swap

Yeah. I think the important things to get in are the server/client IP addresses which will (probably) only be available for http I imagine? Long term we'll also want to add interesting headers and/or CORS information maybe, but I'm unsure.

The http fn only differs from the grpc one w.r.t the error type handled by error callback:

/// Create a new [`TraceLayer`] using [`ServerErrorsAsFailures`] which supports classifying /// regular HTTP responses based on the status code. pub fn new_for_http() -> Self { Self { make_classifier: SharedClassifier::new(ServerErrorsAsFailures::default()), // vs: make_classifier: SharedClassifier::new(GrpcErrorsAsFailures::default()),

Think the best we can do is uri.host. We won't be able to see source/client IP unless its put into a header by a proxy or some other part of this stack.

I suppose there is still value in the tower_http::trace::TraceLayer because we can register callbacks here:

.on_request(|request: &http::Request<_>, _span: &tracing::Span| { tracing::info!( "request: {} {} {} {:?}", request.method(), request.uri.host request.uri().host().unwrap_or("NOHOST"), request.uri().path(), request.headers() ); }) .on_response( |response: &http::Response<_>, latency: Duration, _span: &tracing::Span| { tracing::info!("response: {} {:?}", response.status(), latency); }, ) .on_failure(|error: GrpcFailureClass, _latency: Duration, _span: &tracing::Span| { tracing::error!("error: {}", error);

Here is the diff in error type for the on failure callback:

/// The failure class for [`ServerErrorsAsFailures`]. #[derive(Debug)] pub enum ServerErrorsFailureClass { /// A response was classified as a failure with the corresponding status. StatusCode(StatusCode), /// A response was classified as an error with the corresponding error description. Error(String), } ... #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] pub struct StatusCode(NonZeroU16); ...

vs

/// The failure class for [`GrpcErrorsAsFailures`]. #[derive(Debug)] pub enum GrpcFailureClass { /// A gRPC response was classified as a failure with the corresponding status. Code(std::num::NonZeroI32), /// A gRPC response was classified as an error with the corresponding error description. Error(String), }

So might as well go with the http one I guess.

I'll look at the unit test approach you mentioned above and see if its possible.

sergerad · 2025-02-22T05:21:45Z

crates/block-producer/src/mempool/tests.rs

+            .count(),
+        1,
+    );
+}


@Mirko-von-Leipzig I added this unit test to illustrate the otel trace test capability. It doesn't relate to the functionality added in this PR. This was just the easiest unit test I could find to impl. The block producer rpc stack doesn't have mocks etc atm.

LMK if you want to keep/rm this test or move it to another PR with other tests etc.

Let's leave it for a separate PR.

Is it possible to separate by test itself - or do we need to run these kind of tests sequentially to ensure we get the spans recorded that we expect?

As in, running tests in parallel probably muddles the span exporter unless we somehow mark the spans with a test ID?

But this looks promising, we can figure out the specifics in the other PR/issue.

sergerad · 2025-02-22T05:23:15Z

crates/block-producer/src/server.rs

+            })
+            .on_failure(|error: ServerErrorsFailureClass, latency: Duration, _span: &Span| {
+                error!("error: {} {:?}", error, latency);
+            });


@Mirko-von-Leipzig I have checked this locally and on honeycomb. LMK any changes you would make here.

Right now you're emitting events, but I'd like to attach whatever we can as attributes on the span.

For context, the open-telemetry specification has a bunch of (experimental) suggestions:

https://opentelemetry.io/docs/specs/semconv/rpc/rpc-spans/#server-attributes

https://opentelemetry.io/docs/specs/semconv/rpc/grpc/

We have access to only some of them here I think.

I just can't find a way to surface lower level details like client addr to the relevant stacks here.

The tonic::Request struct has ::remote_addr() -> Option<SocketAddr> but we can't get that type wired through into our trace stack here AFAICT, only http::Request. And I can't find a way to get headers to contain the remote IP address via http::Request in any way.

Relevant links for reference
hyperium/tonic#430
https://github.com/tower-rs/tower-http/blob/main/examples/tonic-key-value-store/src/main.rs#L191

Also don't see where those server attributes you linked above can be accessed.

Should be able to get socket addr via this when building the tonic server:
https://docs.rs/tonic/0.2.1/tonic/transport/server/trait.Connected.html

But that again relies on tonic::Request:
https://github.com/hyperium/tonic/blob/master/examples/src/uds/server.rs#L32

Is it possible for us to alter our code gen to use tonic::Request rather than http::Request? E.G. this part

impl<T, B> tonic::codegen::Service<http::Request<B>> for ApiServer<T> where T: Api, B: Body + std::marker::Send + 'static, B::Error: Into<StdError> + std::marker::Send + 'static, {

Wow this was crazy hard to find: tower-rs/tower-http#428.

Only by chance did I go check in the discussions - no other google/search-kungfu found it.

I believe this is what we need? It may also mean that we have to just do our own layer to extract some of the info before handling the routes/request/responses.

Ah, well done! This looks to have worked. e.g.:

.on_request(|request: &http::Request<_>, _span: &Span| { info!( "request: {} {} {} {} {:?}", request .extensions() .get::<tonic::transport::server::TcpConnectInfo>() // as per axum example above .unwrap() .remote_addr() .unwrap(),

I'll put together all the changes from your comments and put this in review next.

Mirko-von-Leipzig · 2025-02-25T14:23:21Z

crates/block-producer/src/server.rs

+        let trace_layer = TraceLayer::new_for_grpc()
+            .make_span_with(miden_node_utils::tracing::grpc::block_producer_trace_fn)


We can consider moving this entire layer into the utils crate since its likely going to be identical.

Yea the reason I haven't done that is because I don't think we can avoid returning a type that involves like 7 generics or so (because the fn that uses it doesn't take in a trait iirc). Will have a look at doing it as cleanly as possible

Potentially the way to do it:
https://stackoverflow.com/questions/71178212/how-to-configure-tower-http-tracelayer-in-a-separate-function

Oh damn I see. Little corner cases everywhere.

Thanks, yeah I think lets go with the stackoverflow answer - and just document on the function why we're doing it this way.

sergerad commented Feb 16, 2025

View reviewed changes

sergerad mentioned this pull request Feb 18, 2025

Add custom OpenTelemetrySpanExt trait #700

Merged

sergerad added 4 commits February 21, 2025 19:25

Use tower_http::TraceLayer in block producer

b7626cd

Add trace feature to root import

1e79cfc

Add info! and error! lines

c949440

Fix honeycomb URL

a8897a1

sergerad force-pushed the sergerad/tower-http-trace branch from ee29fb9 to a8897a1 Compare February 21, 2025 06:25

Use http trace layer

9a3b60e

sergerad commented Feb 22, 2025

View reviewed changes

sergerad force-pushed the sergerad/tower-http-trace branch from 0e36c47 to 73fbb93 Compare February 22, 2025 05:26

Add trace unit test to mempool

2b2bc7e

sergerad force-pushed the sergerad/tower-http-trace branch from 73fbb93 to 2b2bc7e Compare February 22, 2025 10:04

sergerad added 2 commits February 23, 2025 07:00

Use grpc trace layer and update store rpc

641af80

replace unwrap logic

db5134b

Mirko-von-Leipzig reviewed Feb 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use tower trace layer #698

Use tower trace layer #698

sergerad commented Feb 16, 2025

sergerad Feb 16, 2025

Mirko-von-Leipzig Feb 17, 2025

Mirko-von-Leipzig Feb 18, 2025

sergerad Feb 20, 2025

sergerad Feb 20, 2025

Mirko-von-Leipzig Feb 20, 2025

sergerad Feb 21, 2025

Mirko-von-Leipzig Feb 21, 2025

sergerad Feb 21, 2025

sergerad Feb 22, 2025

Mirko-von-Leipzig Feb 25, 2025

sergerad Feb 22, 2025

Mirko-von-Leipzig Feb 25, 2025

sergerad Feb 26, 2025 •

edited

Loading

sergerad Feb 26, 2025 •

edited

Loading

Mirko-von-Leipzig Feb 26, 2025

sergerad Feb 26, 2025

Mirko-von-Leipzig Feb 25, 2025

sergerad Feb 26, 2025 •

edited

Loading

sergerad Feb 26, 2025

Mirko-von-Leipzig Feb 26, 2025

		let trace_layer = TraceLayer::new_for_grpc()
		.make_span_with(miden_node_utils::tracing::grpc::block_producer_trace_fn)

Use tower trace layer #698

Are you sure you want to change the base?

Use tower trace layer #698

Conversation

sergerad commented Feb 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sergerad Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

sergerad Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sergerad Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sergerad Feb 26, 2025 •

edited

Loading

sergerad Feb 26, 2025 •

edited

Loading

sergerad Feb 26, 2025 •

edited

Loading