Add Opentelemetry Support #304

dkPranav · 2025-06-11T10:45:26Z

About

This is based on @AndrewWinterman's PR #272.

These changes have been tested and are currently running in our Production environment.
The changes from this PR provided out-of-the-box instrumentation for our publish operations. It also enabled seamless cross-language context propagation between our Go service and a NodeJS service.

On the consumer side, since AMQP uses a channel-based approach, we had to manually instrument the message processing logic. By adding spans around the relevant operations, we were able to achieve the desired tracing behavior.

Changes

Addressed an issue with publish span closure, as pointed out in this comment - feat(otel): add opentelemety utility functions #272 (comment)
Adding another attribute to spans messaging.rabbitmq.routing_key. This attribute is used by NodeJS instrumentation (https://www.npmjs.com/package/@opentelemetry/instrumentation-amqplib). (They are following an older semantic convention)
Added a function for generating attributes for consumption

Example for Consumption

Below is sample code showing how we instrumented message consumption from a RabbitMQ queue:

// Extract span context from AMQP headers
amqpCtx := amqp091.ExtractSpanContext(context.Background(), data.Headers)

// Create a new span for message processing
ctx, span := otel.Tracer(serviceName).Start(
    amqpCtx,
    "process-offer-message",
    trace.WithSpanKind(trace.SpanKindConsumer),
)
defer span.End()

// Add attributes for consumption tracing
amqp091.AddAttributesforConsumption(span, data, queueName)

This PR extracts opentelemetry utility functions from my private project and adds them to this project without calling them. It resolves rabbitmq#43 I'd like a broader discussion about whether these should be automatically called by the library where possible, or if they should simply be provided to clients to use if they so wish. I did my best to follow OpenTelemetry semantic conventions as described here https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/, but they are at times ambiguous for rabbitmq-- e.g. is the destination for a message the Queue or the Consumer Tag the message was delivered to. Given the channel based approaches of this library, it is impossible for the library to know the full execution of a consumer. Unless autoack=false, we cannot actually know when to end the span associated with a delivery, so at least in the consumer case, it's probably best to allow the client to manage spans for themselves. We *can* manage spans on the producer side, and at the very least extract span identifiers to include on published headers automatically, and provide utilities for pulling them back out again. My intention with putting this PR up is to move the conversation forward. Because the PR *only* provides private methods (if I left members public please call them out), it can be safely merged while these questions are worked out.

…an/amqp091-go into feat/opentelemetry

AndrewWinterman · 2025-06-12T18:11:08Z

delivery.go

+    }
+}
+
+func (d Delivery) Settle(ctx context.Context, response DeliveryResponse, multiple, requeue bool) error {


A review comment on the old PR suggested swapping this for {Ack,Nack,Reject}Ctx methods instead.

Yes, thank you.. Have updated the code

AndrewWinterman · 2025-06-12T18:12:41Z

opentelemetry.go

+)
+
+// tracer is the tracer used by the package
+var tracer = otel.Tracer("amqp091")


there's actually an issue with instantiating this like this-- namely this tracer instances will be created before configuration has been applied to it, which will make it no-op I think.

Better to make this a function that retrieves a trace with the right name.

Suggested change

var tracer = otel.Tracer("amqp091")

func tracer() trace.Tracer {

return otel.Tracer("amqp091")

}

Working my way back through the otel code this will work as one would hope. Basically it gets stored until a tracer delegate is loaded when otel is setup, at which point it updates the tracer with the delegate.

A quick check of GitHub suggests that this is a common pattern to define the tracer once as a global.

Okay, i'll let this be as-is then

opentelemetry.go

Co-authored-by: AndrewWinterman <[email protected]>

Zerpet · 2025-06-26T08:32:14Z

Thank you for this contribution! I've been busy with other responsibilities. I'm hoping to get to this PR today or tomorrow 👨‍💻

Thank you too to @AndrewWinterman and @adcharre for reviewing this PR 👀

…bservability

Zerpet · 2025-07-01T08:55:23Z

channel.go

-func (ch *Channel) PublishWithContext(_ context.Context, exchange, key string, mandatory, immediate bool, msg Publishing) error {
-	return ch.Publish(exchange, key, mandatory, immediate, msg)
+func (ch *Channel) PublishWithContext(ctx context.Context, exchange, key string, mandatory, immediate bool, msg Publishing) error {
+	_, err := ch.PublishWithDeferredConfirmWithContext(ctx, exchange, key, mandatory, immediate, msg)
+	return err


Leaving a note to myself to revisit this change. I think it would be preferable to duplicate some code, instead of executing all the code for the deferred confirm and ignore the return value.

Zerpet · 2025-07-01T09:32:06Z

go.mod

 module github.com/rabbitmq/amqp091-go

-go 1.20
+go 1.22.0


What features are we using of Go 1.22 for this bump?

We are quite conservative with go directive bumps, specially since recent versions of Go have set this as a hard requirement for minimum version.

This appears to have come-in as part of the otel-go package https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.34.0, which requires a minimum go version of 1.22.0.

We will revisit the approach for this PR, aiming to adopt a subpackage-middleware strategy without making major changes to the main package. (discussions around this in the other PR #272 (comment))

Does this make sense @Zerpet ?

@Zerpet / @AndrewWinterman - Just to flesh this out a bit more.

The OpenTelemetry packages have a requirement for the latest version of Go, for example the most recent release bumps the requirement to 1.23.

There are 2 possible solutions to including OpenTelemetry into this package:

Simply accept the Go version requirement from the Otel package and add the Otel packages to this packages dependencies.

Introduce a Middleware concept to break the dependancy with Otel and maintain the existing Go version requirement. OpenTelemetry instrumentation would then be implemented as a new middleware module, within this repo but with it's own dependencies.

Option 1 is what's implemented currently in this PR.

Option 2, would add a new interface and methods to set the middleware of a Connection and an otel/ directory with it's own go.md.

I envision that the Middleware would have pre and post functions for all the methods that we had instrumented so far.

type Middleware interface { PrePublishWithContext(ctx context.Context, exchange, key string, mandatory, immediate bool, msg Publishing) PostPublishWithContext(ctx context.Context, exchange, key string, mandatory, immediate bool, msg Publishing, err Error) // err is result of the call to PublishWithContext. ... // Other methods Pre/Post }

and then in the PublishWithContext function

func (ch *Channel) PublishWithContext(ctx context.Context, exchange, key string, mandatory, immediate bool, msg Publishing) error { if ch.middleware != nil { ch.middleware.PrePublishWithContext(ctx, exchange, key, mandatory, immediate, msg) } _, err := ch.PublishWithDeferredConfirmWithContext(ctx, exchange, key, mandatory, immediate, msg) if ch.middleware != nil { ch.middleware.PostPublishWithContext(ctx, exchange, key, mandatory, immediate, msg, err) } return err }

If we think Option 2 is the way to go, We can add a new PR with the proposed Middleware implementation and agree on that before working on the OpenTelemetry middleware.

Comments?

I'm still drinking my coffee, so apologies in advance if I miss something obvious 😴

I'm not sure how option 2 solves the problem. I understand that we can move OTEL bits to its own module e.g. otel/go.mod, however, this module will still require OTEL version X, which forces a go directive of 1.22 or 1.23 to the middleware module. From the code snippet, I understand that the client module amqp091-go will import the middleware module in otel/go.mod, which will in turn force the client module to bump its Go directive. This is somewhat "recent" since the Go toolchain has become very picky about importing modules of higher go directive. In short, I don't think this middleware approach would solve the problem.

Am I missing something?

I think I wasn't clear enough....

The middleware would be an interface and there would be no explicit dependency between the original ampq091-go module and the new otel module. The new Otel module would have a new class that implemented the methods defined in the Middleware interface.

Any code using the ampq091-go module which didn't want to use otel would not need changing or see an change to the go version required.

Code that wanted otel instrumentation would import the otel module, instantiate the Otel middleware and apply it to the client/channel.

Another idea to make option 1 acceptable would be to stay behind on OTEL releases. I personally think that the biggest version bump is to 1.21, because it introduces the toolchain directive, which is not interpreted by earlier versions.

I'm ok to bump to Go 1.22 as part of this PR, and bump to Go 1.23 in a few months time, when there's a valid justification (like supporting OTEL). I'm just on the lookout for Go version bumps for the sake of bumping it 👀

I actually like the middleware approach, but I thought we could likely drop down to the protocol level. Likely still need some way to get the span off of a delivery or a return, but that can live in a different package

I think I wasn't clear enough....

The middleware would be an interface and there would be no explicit dependency between the original ampq091-go module and the new otel module. The new Otel module would have a new class that implemented the methods defined in the Middleware interface.

Any code using the ampq091-go module which didn't want to use otel would not need changing or see an change to the go version required.

Code that wanted otel instrumentation would import the otel module, instantiate the Otel middleware and apply it to the client/channel.

Thank for the explanation, I think I get the idea now. The "middleware" interface (I think the name should be more specific) will be part of amqp091-go module, and the module with OTEL will implement this interface. Then any user who wants to use automatic instrumentation will initialise the middleware and inject/set it in the Connection or anywhere TBD that makes sense.

If my understanding is correct, I like this idea 👍 and I agree is the right thing to do. FWIW, other data services like Redis also separate the OTEL module from the main library.

I don't really have bandwidth to make a middleware module, but I really think it's a good way to go for this sort of thing. Also would let libraries consistently set e.g. app id, user id, and timestamp headers

Zerpet · 2025-07-01T09:42:56Z

go.mod

+require (
+	github.com/go-logr/logr v1.4.2 // indirect
+	github.com/go-logr/stdr v1.2.2 // indirect
+	go.opentelemetry.io/auto/sdk v1.1.0 // indirect


I wonder what's causing this indirect import 🤔 I was under the impression that libraries should only import APIs and not the SDK, according to OpenTelemetry docs.

Zerpet · 2025-07-01T10:24:59Z

opentelemetry.go

+func (c amqpHeaderCarrier) Get(key string) string {
+	v, ok := c[key]
+	if !ok {
+		return ""
+	}
+	s, ok := v.(string)
+	if ok {
+		return s
+	}
+	return ""
+}


I'm not convinced by this implementation. This implementation is basically ignoring any value type other than strings. I'm not sure this is the right thing to do, because we are basically silently dropping information that could be relevant. For example, re-delivery information, which IIRC it's an integer. I'm ok with a text-map implementation, but it should set conversion rules between non-string types to string types, and document them.

TBH this was me being a bit lazy in my original implementation, justified by my understanding that this was basically only used for the tracestate, which basically a big string: https://opentelemetry.io/docs/specs/otel/trace/tracestate-handling/

Zerpet · 2025-07-01T11:14:29Z

opentelemetry.go

+				// semconv.NetPeerIP("localhost")
+				// semconv.ServerAddress("localhost")
+			),
+			trace.WithNewRoot(),


I'm not sure about this 🤔 Couldn't we be inside a nested span here?

For example, and application that has its own app tracing, and calls our library to send a message. I would expect to see our "library spans" as nested spans to the application traces. What's the reasoning to always treat this span as root?

Zerpet · 2025-07-01T13:04:13Z

opentelemetry.go

+// spanForDelivery creates a span for the delivered messages
+// returns a new context with the span headers and the span.
+func spanForDelivery(ctx context.Context, delivery *Delivery, options ...trace.SpanStartOption) (context.Context, trace.Span) {
+	spanName := fmt.Sprintf("consume %s %s", delivery.Exchange, delivery.RoutingKey)


I've been thinking about this for a bit. It's quite unfortunate that the delivery frame does not contain the queue name. I find awkward to set use the exchange name as part of the "consume" span. After reading the messaging guidelines from OTEL regarding subscriptions:

Subscriptions represent entities within messaging systems that allow multiple consumers to receive messages from the topic following subscription-specific consumption behavior that includes load balancing, durability, filtering, or other system-specific capabilities.

Named subscriptions and consumers groups are semantically different mechanisms messaging systems use for similar scenarios such as load-balancing or broadcasting.

I think the correct behaviour would be to use the "named subscription" name, which would be either the queue name, or the routing key, or perhaps both! Given that the queue name is not available, I think it's sensible to use the routing key.

What do you think?

I pretty much agree-- having the queue name would be best but we don't have it, so I'm doing the best with what we do have.

opentelemetry.go

Zerpet · 2025-07-01T13:12:39Z

opentelemetry.go

+	exchange, routinKey string,
+	immediate bool,
+) (context.Context, Publishing, func(err error)) {
+	spanName := fmt.Sprintf("%s publish", routinKey)


The operation name i.e. publish should be the first word of the span name, according to OTEL guidelines.

sorry, assuming this was propagated from my initial attempt, this is my dyslexia.

Suggested change

spanName := fmt.Sprintf("%s publish", routinKey)

spanName := fmt.Sprintf("publish %s", routinKey)

Zerpet · 2025-07-01T13:13:16Z

opentelemetry.go

+			semconv.MessagingMessageID(publishing.MessageId),
+			semconv.MessagingMessageConversationID(publishing.CorrelationId),
+			semconv.MessagingSystemRabbitmq,
+			semconv.MessagingClientIDKey.String(publishing.AppId),


Same comment as earlier regarding User ID vs App ID.

Zerpet · 2025-07-01T13:35:27Z

opentelemetry.go

+	return ctx, publishing, func(err error) {
+		if err != nil {
+			span.RecordError(err)
+			amqpErr := &Error{}
+			if errors.As(err, &amqpErr) {
+				span.SetAttributes(
+					semconv.ErrorTypeKey.String(amqpErr.Reason),
+				)
+			}
+			span.SetStatus(codes.Error, err.Error())
+		}
+		span.End()
+	}


I see the idea now, and I think this is going to be tricky. Have you tested what happens when you publish an un-routable message with mandatory flag set?

I foresee that this Span will be marked as successful, and eventually the server (rabbit) will send a return frame, stating that the message is undeliverable. This will only happen with publisher confirmations enabled, of course. In this situation, it's up for debate whether the "publish" has succeeded. From the application point of view, one could argue that publish did not succeed, because my message was not accepted by the server. Another could argue that the basic.publish frame was sent successfully to the wire, therefore the publish succeeded.

I think it's important to agree on the definition of "success" for publish, and write it down in the public function code doc.

I think it's just mirroring the protocol-- it reports the publish as succeeded even though it will eventually asynchronously report a failure. Without doing some sort of higher level wrapping I think this is the best we can do.

Co-authored-by: Aitor Pérez Cedres <[email protected]>

Zerpet · 2025-08-04T11:47:51Z

Since the 28th of July 2025, all external contributors to RabbitMQ will have to sign a CLA again.

Unfortunately this time around, we were asked to use a good old Word document because reasons [1][2].

Note companies that contribute to RabbitMQ only need to sign the CLA once, there is no need to make every
contributing employee to do so separately.

I apologize for how "simplistic" this CLA method is but our team does not have any control over this matter.

https://github.com/rabbitmq/cla
https://github.com/rabbitmq/rabbitmq-server/blob/main/CONTRIBUTING.md

AndrewWinterman and others added 18 commits June 28, 2024 15:11

Merge branch 'main' into feat/opentelemetry

d292598

feat(otel): take a stab at wiring otel up

75a6aeb

Merge branch 'feat/opentelemetry' of https://github.com/AndrewWinterm…

ccf814a

…an/amqp091-go into feat/opentelemetry

Merge branch 'main' into feat/opentelemetry

13a1894

fix: remove reference to outreach gobox lib

e0fa7c6

Merge branch 'feat/opentelemetry' of https://github.com/AndrewWinterm…

1aeb2d0

…an/amqp091-go into feat/opentelemetry

a smidge of polish

47aa58b

Add otel inst

135fda6

Update README.md

9790ae7

AndrewWinterman + fix

f7be3c5

Merge branch 'otel' of github.com:dkPranav/amqp091-go into otel

f154228

Go fmt

4d6ab91

Typo

11e87a8

Merge branch 'feat/opentelemetry' into otel

c42d97b

Fix Rebase issues

1a75eac

Attributes for consume operation

626c70d

Comments

4851162

AndrewWinterman reviewed Jun 12, 2025

View reviewed changes

opentelemetry.go Outdated Show resolved Hide resolved

AndrewWinterman reviewed Jun 12, 2025

View reviewed changes

opentelemetry.go Outdated Show resolved Hide resolved

Update opentelemetry.go

931eede

Co-authored-by: AndrewWinterman <[email protected]>

Zerpet self-assigned this Jun 26, 2025

dkPranav added 2 commits June 27, 2025 16:30

Refactor delivery attributes and telemetry integration for improved o…

3009c4c

…bservability

Remove unused DeliveryResponse type and associated methods

3c9333e

dkPranav requested review from AndrewWinterman and adcharre June 27, 2025 11:04

Go fmt

cb91b98

Zerpet reviewed Jul 1, 2025

View reviewed changes

Update opentelemetry.go

a7e5515

Co-authored-by: Aitor Pérez Cedres <[email protected]>

-var tracer = otel.Tracer("amqp091")
+func tracer() trace.Tracer {
+ return otel.Tracer("amqp091")
+}

	spanName := fmt.Sprintf("%s publish", routinKey)
	spanName := fmt.Sprintf("publish %s", routinKey)

Add Opentelemetry Support #304

Are you sure you want to change the base?

Add Opentelemetry Support #304

Uh oh!

Conversation

dkPranav commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About

Changes

Example for Consumption

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Zerpet commented Jun 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zerpet commented Aug 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dkPranav commented Jun 11, 2025 •

edited

Loading