Skip to content

Conversation

@fhanau
Copy link
Contributor

@fhanau fhanau commented Oct 10, 2025

Edge testing showed that the given error only happens after the return event timestamp has already been provided, which happens in one edge case when a DO request is aborted and spans are only reported as the IncomingRequest is being destructed. At that time, we can't look up the time anymore, but the return event timestamp will still be helpful.
As indicated in the TODO comment for the case where the IoContext itself is no longer available, we'll want to change span lifetimes at a later time so that they don't outlive the IncomingRequest/IoContext, but for now we'll provide the reasonable timestamp available to us.

Also perform some cleanup while we're at it (promote warnings to errors since we've always had the return event available in this case so if they still appear that's an actual issue, avoid superfluous timer lookups).

@fhanau fhanau requested review from a team as code owners October 10, 2025 23:05
// TODO(o11y): Once we report the user tracing spanOpen event as soon as a span is created, we
// should be able to fold this virtual call and just get the timestamp directly.
span.emplace(kj::mv(operationName), startTime.orDefault(obs->getTime()));
span.emplace(kj::mv(operationName), startTime.orDefault([&obs]() { obs->getTime() }));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this saves us a timer lookup whenever we already have a timestamp, something we want to get right for performance.

@github-actions
Copy link

github-actions bot commented Oct 10, 2025

The generated output of @cloudflare/workers-types matches the snapshot in types/generated-snapshot 🎉

@fhanau
Copy link
Contributor Author

fhanau commented Oct 10, 2025

As discussed with Mar in a call, it might be possible to work around this edge case by restructuring IoContext::IncomingRequest::~IoContext_IncomingRequest(), but I don't understand that code very well, it would be difficult to rule out that this introduces new bugs.

@fhanau fhanau force-pushed the felix/101025-stw-time branch from a8de521 to f6e2966 Compare October 10, 2025 23:11
@fhanau
Copy link
Contributor Author

fhanau commented Oct 10, 2025

@jmorrell-cloudflare this addresses a warning encountered in edge testing – in a rare edge case, we are unable to get a timestamp when closing a span following an exception for DO-based requests. This PR proposes just providing the return timestamp in that case (which is already available at that time) without reporting a warning, as it's the best time we have (unless we figure out a different way to manage these span lifetimes later). Does that approach work for you?

@mar-cf
Copy link
Contributor

mar-cf commented Oct 15, 2025

Missing tests or a link to a repro.

If this happens on abort only, is it that concerning that we wouldn't have a timestamp to use?

What would it take to change the spans not to outlive the aborted context and report early?

@fhanau
Copy link
Contributor Author

fhanau commented Oct 15, 2025

  1. Missing tests or a link to a repro.

  1. If this happens on abort only, is it that concerning that we wouldn't have a timestamp to use?

  1. What would it take to change the spans not to outlive the aborted context and report early?

Mar I thought we already discussed the rationale for this change including 1) and 2) at the Friday meeting. We also touched on 3) – might be possible but I'm not sure if it can be done safely as summarized in #5282 (comment) – we would want some more senior folks to weigh in and it would be hard to rule out that it would introduce other bugs. My understanding is that getting span lifetimes right 100% of the time is not in scope for our next milestone.
That being said, these questions are non-trivial and I'm happy to discuss them some more internally.

…lid incomingRequest

Edge testing showed that the given error only happens after the return event
timestamp has already been provided, which happens in one edge case when a DO
request is aborted and spans are only reported as the IncomingRequest is being
destructed. At that time, we can't look up the time anymore, but the return
event timestamp will still be helpful.
As indicated in the TODO comment for the case where the IoContext itself is no
longer available, we'll want to change span lifetimes at a later time so that
they don't outlive the IncomingRequest/IoContext, but for now we'll provide the
reasonable timestamp available to us.

Also perform some cleanup while we're at it (promote warnings to errors since
we've always had the return event available in this case so if they still appear
that's an actual issue, avoid superfluous timer lookups).
@fhanau fhanau force-pushed the felix/101025-stw-time branch from f6e2966 to b9c7d97 Compare October 15, 2025 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants