-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace (and all) exporters should have an export timeout #346
Comments
@toumorokoshi Will it be ok to add |
We probably want to have a general timeout mechanism in the export layer, hopefully we can configure this once for all exporters. |
I was checking #385 but I got some questions on the issue itself, so I'll comment here. I don't understand what is the scope of this issue. According to the specification: "Export() must not block indefinitely, there must be a reasonable upper limit after which the call must time out with an error result (typically FailedRetryable)." [1] The specification also states that the Is this issue about ensuring that |
From reading this Zipkin review and the comments on this issue, I thought that the right scope for the PR was to:
But now from reading your message I've some mixed feelings about the PR. Maybe @c24t and @toumorokoshi can throw some light on about what's the expected scope for this issue. Also, thank you for pointing out that part of the specification... I didn't see it! |
I think we can treat exporter timeout and processor timeout as separate issues. Our exporters don't currently support timeouts, which is a violation of the spec as written. Ideally each exporter would have its own (possibly globally-configurable) timeout, and the processors would handle Even if we had that, I think it's still a good idea for the span processor to have a hard timeout. We could, for example, keep retrying a The timeout interval for an exporter depends on the expected behavior of the exporter since different backends will have different SLAs. The timeout interval for a span processor might only depend on the export interval, for example if we want to guarantee that it exports a batch of spans every X seconds.
I think it's fine to go ahead with a general purpose timeout in span processors. |
I like this implementation because it puts the timeout mechanism in a place accessible for all the implementations of anything with a simple def timeout(timeout):
def inner_0(function):
def inner_1(*args, timeout=timeout, **kwargs):
print(timeout)
function(*args, **kwargs)
return inner_1
return inner_0
@timeout(90)
def export_0(first, second):
print(first)
print(second)
@timeout(100)
def export_1(first, second):
print(first)
print(second)
export_0(1, 2, timeout=9)
export_1(1, 2)
In this way, different exporters can have different timeout values defined for them in their own code by simply changing the argument passed to the |
For exporters that use I think it's a good idea to have a general purpose timeout decorator -- and for exporters to use that decorator if there's not a better option -- I just mean to say that a timeout on the span processor doesn't obviate the need for one on the exporter. |
Ok, I see your point. What I mean is that the specification seems to be pretty strong on the requirement that So, for example, let's consider a possible from time import sleep
from socket_library import socket, SocketTimeout
...
# @timeout(60) # The problem explained below would be fixed by uncommenting this line.
def export(self, timeout=60):
...
try:
socket.get(..., timeout)
except SocketTimeout:
while True:
sleep(1) I am exaggerating, of course 🙂 . But that is my point, I think the specification requires |
Actually I think using timeout-decorator or pebble is a good idea. But I don't know how I feel about adding a dependency on this. From the reviews in #385, I re-implemented timeout functionality using decorators and It could be helpful if the implementation path is chosen.
import sys
from time import sleep, time
from multiprocessing import Queue, Process
# Implementation
def _target(queue, function, *args, **kwargs):
try:
queue.put((True, function(*args, **kwargs)))
except:
queue.put((False, sys.exc_info()[1]))
def timeout(timeout):
def inner_0(function):
def inner_1(*args, timeout=timeout, **kwargs):
pqueue = Queue(1)
process = Process(target=_target, args=(pqueue, function), daemon=True)
process.start()
timeout = timeout + time()
while process.is_alive():
sleep(0.5)
if timeout < time():
process.terminate()
raise TimeoutError
flag, load = pqueue.get()
if flag:
return load
else:
raise load
return inner_1
return inner_0
# Usage
@timeout(5)
def export():
for number in range(1, 5):
print(number)
sleep(0.5)
return True
try:
if export(timeout=2):
print("Worked")
except TimeoutError:
print("Failed")
if export():
print("Worked") |
Personally I think that If we need any additional threads or processes to handle timeouts, the cost-benefit ratio is wrong IMHO. After all, a well-written exporter should be able to handle timeouts efficiently.
I'd implement FailedRetryable such that the spans are only retried in the next interval. Immediate retrying IMHO only makes sense for SimpleSpanProcessor. |
Ok, maybe another random idea here. How about instead of blocking on So, yes, basically using some kind of P.D.: this |
I second @Oberon00. If the exporters honor the timeouts we could base the timeout logic on span processors on that. |
@aabmass will follow up to determine if this is in the spec or not. |
As per the spec:
|
@aabmass are you still looking at this issue? |
@codeboten sorry for the delay. I'm gonna unassign myself for now in case any one else has time to pick it up cc @lzchen |
Hi @srikanthccv , is someone working on this issue? I would like to pick this issue. Thanks |
The OpenTelemetry SDK should have a timeout on exporters during the flush call, or some way to configure this behavior.
This would ensure that the flush thread doesn't get hung indefinitely: https://github.com/open-telemetry/opentelemetry-python/pull/320/files#r360471105
The text was updated successfully, but these errors were encountered: