You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is more of a niche feature but I'm attempting to serve nnsight as an api using FastAPI. I'm accessing the model and invoking the trace using loop.run_in_executor(). This mostly works. However, when I run multiple requests at the same time I receive the error: RuntimeError: trying to pop from empty mode stack. I assume this is because in nnsight there's some sort of global tracing involved when calculating the compute graph, which according to the error is not threadsafe? Not sure if thats the case, would love some some insight on this.
I'm fairly certain loop.run_in_executor() should work fine, as this functions normally with regular huggingface, and this is how vLLM handles async requests in their AsyncLLMEngine. I'm wondering if other people have noticed this error and whether there are any ways to circumvent this.
The text was updated successfully, but these errors were encountered:
@lithafnium Hey I'd love to know more about this if you could get me some small reproduceable example. I dont think nnsight is close to threadsafe although maybe there are some features in nnsight you could disable to get it working. On another note, 0.4 is going to have vllm support so potentially you can just use that?
This is more of a niche feature but I'm attempting to serve
nnsight
as an api using FastAPI. I'm accessing the model and invoking the trace usingloop.run_in_executor()
. This mostly works. However, when I run multiple requests at the same time I receive the error:RuntimeError: trying to pop from empty mode stack
. I assume this is because innnsight
there's some sort of global tracing involved when calculating the compute graph, which according to the error is not threadsafe? Not sure if thats the case, would love some some insight on this.I'm fairly certain
loop.run_in_executor()
should work fine, as this functions normally with regular huggingface, and this is howvLLM
handles async requests in theirAsyncLLMEngine
. I'm wondering if other people have noticed this error and whether there are any ways to circumvent this.The text was updated successfully, but these errors were encountered: