logging, tracing and OpenTelemetry design
#1949
Replies: 2 comments
-
Runtime telemetry control : notes & remarksSeparate log level from span levelThe current pain may be that one filter is trying to serve two jobs. Local logs should be quiet enough for a person to read. Exported spans should keep enough detail to debug work that crosses RPC, store, mempool, prover, validator, and other paths. The node should split those controls: let log_level = config.log_string.unwrap_or_else(|| "info".into());
let env_filter =
EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new(log_level));
let (log_filter, reload_handle) = reload::Layer::new(env_filter);
let span_level = config.span_level.unwrap_or(Level::INFO);
let span_filter = filter::filter_fn(move |metadata| {
metadata.is_span() && *metadata.level() <= span_level
});That points to separate knobs:
Expose a FilterHandle backed by tracing_subscriber::reload, with update() and get()Runtime filter reload should be small. Keep a handle, validate the new filter, install it, and return the active value. #[derive(Clone, Debug)]
pub struct FilterHandle(reload::Handle<EnvFilter, Registry>);
impl FilterHandle {
pub fn update<S: AsRef<str>>(&self, directives: S) -> Result<(), BoxError> {
let filter = EnvFilter::try_new(directives)?;
self.0.reload(filter)?;
Ok(())
}
pub fn get(&self) -> Result<String, BoxError> {
self.0
.with_current(|filter| filter.to_string())
.map_err(Into::into)
}
}The admin API can stay blunt: The endpoint should reject invalid filters before changing state. If raw The operator workflow should be: Do we have metrics?Span-derived latency can be useful, but only with low-cardinality labels. A subscriber layer can record also record the span duration in a metrics db when the span closes. We may want to consider a Prometheus instance for proper metrics in general — they're not a substitute of spans, but a useful complement. Follow-up questionsTracing design in the abstract is hard to slice through for me. I think the main question is what an operator should learn or change without restarting the node. If we want to approach this bottom-up (which I prefer), the cases to test are concrete:
The gap may not be "better levels", but rather missing debug modes. Possible presets: These presets would show whether the target tree, span names, and fields are good enough. Two questions should stay separate: If a span is not created, no collector can recover it. If it is created but not exported, volume can still be controlled later. Questions that I'm asking myself — but it may just be that I'm uneducated about our setup:
A fast way to expose the real gap is to solve three incidents on paper: For each one, I'd think about the filter change, expected spans, and fields needed. That might show the source of the issue. |
Beta Was this translation helpful? Give feedback.
-
|
In the past, I accomplished this by having all services emit everything (at least everything that I wanted to be dynamically configurable) based on the presence of a specific request header, and then sampling of traces was configured at the service mesh level (this was a k8s cluster, which I had set up with istio + embassy for the mesh). I was using Grafana's Tempo for traces (since that integrated nicely with the other Grafana products in a way that Jaeger itself did not, though Tempo was Jaeger-compatible). Each service in the mesh had a sidecar that handled mesh communication, and would record trace spans for inbound/outbound requests. To determine whether or not to trace; a request that entered the mesh would have a request header (containing the trace id) conditionally set based on the sampling configuration. When that request header was present, then everything related to that request would be traced, so long as the request header was propagated by a service when it made requests to other services in the mesh. The sidecar containers would record basic trace spans for the mesh-level parts, but it was up to each individual service to record additional spans for its internal behavior, if supported. The nice thing about this, is that things were very simple for individual services, and all the dynamic configuration was handled by the mesh (i.e. what requests would be sampled, based on what criteria). It also ensured that for a traced request, we had everything for that request, not just parts of it. The configuration I used was a combination of "trace N% of requests" and "trace request R if R meets some criteria (i.e. contains some request header, comes from a specific user, etc.)" - and these could be controlled on the fly. We'd frequently use this in conjunction with some client side functionality we had to enable tracing of everything a specific user was doing either app-wide, or in specific areas. Storage is always the tricky element here. If you don't sample enough requests, then when something goes wrong, the odds are that you won't have a trace for that request. On the other hand, if you sample too much, then the system itself can degrade due to tracing overhead, and you need a lot of storage for all that data. So this aspect really requires experimentation and regular maintenance to observe whether or not tweaks are needed. Now, in my case, all of this was being surfaced in Grafana, which has rich features for querying traces, so the issue of navigating all the data was basically solved out of the box. We also weren't emitting traces to stdout/stderr as logs, so that just never affected us when running things locally (while we could run things in a local configuration with tracing enabled, our default development configuration did not do this, tracing was just disabled by default). I don't know if that is useful or not. I mean, our infra is obviously not what I outlined above, but one doesn't need a k8s cluster + service mesh to accomplish more or less the same goals, so long as you have some kind of service proxy/smart load balancer (e.g. embassy) to handle the sampling configuration bit when requests enter the system. Ultimately though, the approach you take really needs to be tailored around how you plan to surface traces for observation (i.e. in my case I knew I was using Grafana, so I could make choices based on its strengths/weaknesses). On the topic of log The compiler has a lot of instrumentation output, so using
So basically we have two types of hierarchy:
And we want to be able to filter on both simultaneously, with additional refinements to declutter the output of a specific run. With To impose this structure a bit more rigidly, we also define a |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'd like some input on managing our tracing (and therefore telemetry) levels and targets. Its a bit murky to me, and while I've got some ideas, I don't know if there is something I'm missing.
I'll give an outline of what I want and then some ideas. Maybe what I want is dumb, either way -- input please :)
Desires
Goal 1: Configurable production trace volumes and levels
I'd like to have an admin endpoint where we can configure the tracing level dynamically. This is possible, we just need to create it. A problem is that we currently have
target=COMPONENTwhich means we don't have much fine grained control here.This is coupled with distributed open telemetry traces. We essentially have three "knobs" which we can turn.
tracingakaRUST_LOG).We currently do (1) only, and that's set at startup.
Goal 2: Useful local node logs
Our current info logs are what we consider necessary for production, and they're traces. This is waaaaay too much information for local node users who don't need 99% of this.
Ideally we have a better default setting here for normal users. This can be done somewhat trivially if we have better info/debug/trace level standards. Currently we set info for everything we care about in production. Better might be if info is reserved for main operations only, and then for production we set the levels to include debug where we want.
This isn't a big problem because we can simply set the docker compose up with a nicer default
RUST_LOG(or whatever configuration we choose) for the local node.What is an issue, is if users want more logs in a specific area for their domain. e.g. trace network transaction logs.
Proposed solutions
Tracing targets and levels
Add an admin endpoint which allows setting the trace level string. Even nicer would be a UI which we somehow populate with all known targets somehow populated at compile time.. that's an advanced thing though, and not required now.
We also want more fine grained hierarchical target names. Maybe similar to the span names instead. Hierarchical so we can specific broadly, but also be more specific as needed.
Something to also be aware of, is that one can target specific names with
target[name]=<level>but that only works with exact matches which is not ergonomic.We should also come up with a better standard. e.g. root level span is
info, children aredebug, and events / loops aretrace. For example, adding a transaction would beinfoat the RPC level, possibly at themempoolandvalidatoringress.debugwould be the internal methods called, andtraceevents might be things that occur within themempoole.g.account[N].state A -> B,transaction %ID selected for batchetc.Telemetry volume control
Probably in process sampling is not desirable, at least insofar as performance allows. This is because of the distributed nature of the system, since we want to collect a single "trace" across multiple services and they cannot communicate which trace to sample collectively since useful sampling is only possible after the trace is completed (tail sampling). We can of course use head-sampling (predetermine which trace to sample before completion), but this isn't terribly useful if we want to sample all traces which contain errors. I also think head-sampling is fairly similar to what
tracinglevel control already gives us.Out of process sampling can be done by setting up a separate instance which receives all traces and determines which to keep, before passing these outward. This is fairly simple to setup with basic yaml e.g. keep all errors, 1% of RPC calls etc.
Telemetry standards
This has been mentioned in issues, and PRs already. But we need a consistent property naming structure. i.e.
transaction.idand nottx_id,transaction_id,tx.id(all of which are currently present).Its an uphill review battle making these consistent. Part of the problem is that
tracingis overly flexible and we want "restrictions". I think wrappingtracing/otelin a separate crate and exposing our own restricted macros is the solution, combined with some nice traits. This is mostly orthogonal to the other things though, just mentioning for completeness.As an example:
Open-ish questions
Beta Was this translation helpful? Give feedback.
All reactions