Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka autoinstrumentation trace propagator doesn't work out of the box #5382

Closed
NickEm opened this issue Feb 15, 2022 · 12 comments
Closed

Kafka autoinstrumentation trace propagator doesn't work out of the box #5382

NickEm opened this issue Feb 15, 2022 · 12 comments
Labels
bug Something isn't working

Comments

@NickEm
Copy link

NickEm commented Feb 15, 2022

Describe the bug
When an application with otel auto-instrumentation publishes a message to a Kafka topic it generates a trace (b3multi or tracecontex or both, doesn't matter). Let's call Trace A.
Then another application with otel auto-instrumentation picks up the message (there are present Kafka headers for the trace) processes it and puts it to another Kafka topic (or the same doesn't matter).

Statement Each publish to Kafka by java application with auto-instrumentation doesn't propagate trace, but generates a new one.

Steps to reproduce
I have created a project which reproduces the issue with all the details provided. Please, ask if you need anything else.

What did you expect to see?
Trace A generated initially is preserved in the last message. Context propagation is place.

What did you see instead?
Trace A isn't preserved, another trace is generated and put to header of the last Kafka message in the chain.

What version are you using?
1.10.1

Environment
Compiler: (e.g., "Azul Java 17.0.0-zulu")
OS: (e.g., "macOS Monterey")

@NickEm NickEm added the bug Something isn't working label Feb 15, 2022
@laurit
Copy link
Contributor

laurit commented Feb 15, 2022

@NickEm see this discussion #4509 Does it work when you add -Dotel.instrumentation.common.experimental.suppress-messaging-receive-spans=true?

@NickEm
Copy link
Author

NickEm commented Feb 15, 2022

Well, yeah, it looks like it's working. Pretty confused that it doesn't work by default, but anyway. Thank you!

@anuraaga
Copy link
Contributor

I think it'd be good for us to reconsider this default. I have gotten at least 5 reports from users about broken messaging tracing related to this flag now, I will likely change the default in the AWS distro but wonder if that would make sense here too given the feedback I'm seeing @trask @mateuszrzeszutek

@sukrit007wawa
Copy link

Just ran into this issue. The workaround works. However, consumer processing is shown as a span of producer service instead of consumer service. In our case, producer and consumer are 2 separate apps, running in 2 different JVMs.

@trask
Copy link
Member

trask commented Feb 16, 2022

I think it'd be good for us to reconsider this default

ya, I tend to agree, opened #5385 to discuss

@trask
Copy link
Member

trask commented Feb 16, 2022

@sukrit007wawa can you open a separate issue and provide some more details about your issue?

@NickEm
Copy link
Author

NickEm commented Feb 18, 2022

I would like to raise another concern/issue I have. The env fix
-Dotel.instrumentation.common.experimental.suppress-messaging-receive-spans=true works in case consumer and producer of the application are in the same thread. As soon as the producer in another thread TraceId gets changing on producing messages again. Please, take a look at the commit which breaks the context propagation in Kafka

@NickEm
Copy link
Author

NickEm commented Feb 21, 2022

@laurit sorry for tagging you directly maybe you know to work around this behavior?

@laurit
Copy link
Contributor

laurit commented Feb 21, 2022

I believe the issue is that we don't automatically propagate context into the new thread. You'll have to do it manually. Before creating the new thread capture current context with Context context = Context.current() and inside the new thread use

try (Scope ignored = context.makeCurrent()) {
  producer.send(produceRecord, callback);
}

@trask
Copy link
Member

trask commented Feb 22, 2022

linking to #2527, we don't instrument direct Thread creation currently, because it's hard to know automatically whether we should or should not propagate to it

@trask
Copy link
Member

trask commented May 3, 2022

@NickEm were you able to get this working with @laurit's suggestion above?

@laurit
Copy link
Contributor

laurit commented Jun 9, 2022

Feel free to reopen if you have more questions.

@laurit laurit closed this as completed Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants