Skip to content

Conversation

@cameron-p-m
Copy link

@cameron-p-m cameron-p-m commented Nov 28, 2025

How it works

Traces are triggered by the client. If a traceparent comment exists in the query

/* traceparent:00-456e5d656857f12406b34e87c8f63553-6a44c13332fba2c1-01 */ SELECT * FROM test6;

Then a span is started from that parent. This allows the sample rate to be controlled by the upstream client. IE if App includes verbose tracing so will Yugabyte and the spans will match. Without this parent there are so spans emitted. Thread-local storage manages active span context stack.

Postgres provides DTrace probe points. This PR adds OTEL beside those native points. This provides query / parse / plan / execute lifecycle and should be easy to maintain longterm.

Outbound RPCs are also instrumented so the client can see what RPCs were send for their query, how long they took and what they were for.

Current problems

  • Correctness. Accurate time / clock measurements should be looked into, as well as accurate trace results. Need to use a timing with precise clock.
  • Performance. OTEL has an impact on performance. Client controlled spans are less impactful because they are rare, but still needs to be measured.
  • Most of the code is very naive. I have NOT attempted to make it nice or remove unneeded parts, just tried to get it to work.
  • Currently there are no tests for the span creation / propagation. It also parses the traceparent manually instead of using the standard library.
  • There are some unneeded, unoptimized paths for collect table metadata for span attributes. This does not have to make it into the final version. If it is, it should be thought about more.
  • There are comments everywhere to make it easier to follow, they might not be wanted in the final version.
  • Instead of using https://github.com/yugabyte/yugabyte-db-thirdparty to build the dependencies, I only used a build script: https://github.com/Shopify/yugabyte-db/blob/cd0d79f82477cd8438a83d3a9acae2990043ad1d/build-support/build_opentelemetry.sh#L19 The production solution would build these deps as others in Yugabyte are built.
  • The OTEL library HTTP or proto exporter could not built with Yugabyte due to conflicting protobuf versions. There may be a way to do this, it should be explored. As an alternative I created a prototype HTTP exporter. https://github.com/Shopify/yugabyte-db/blob/cd0d79f82477cd8438a83d3a9acae2990043ad1d/src/yb/util/otel_http_exporter.cc#L25 This exporter is very dumb and does not support things we probably need like batching, various failure modes, it does not have any tests etc. Ideally we just use the built in version.
  • The current version has a lot of debugger logs. This is intentional for prototype but they are too frequent to be deployed at scale.
  • Forward declarations for Postgres C functions were added to get my x86 build to complete. There is probably a better way.
  • A production version should have a runtime flag to disable tracing as well an ENV variable control to enable it at startup. The runtime flag would be important to ship to production.

@cameron-p-m cameron-p-m changed the base branch from master to 2025.1.1 November 28, 2025 22:27
@CLAassistant
Copy link

CLAassistant commented Nov 28, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ cameron-p-m
❌ khosrow
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants