Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nnsight with multithreading #280

Open
lithafnium opened this issue Oct 25, 2024 · 2 comments
Open

nnsight with multithreading #280

lithafnium opened this issue Oct 25, 2024 · 2 comments

Comments

@lithafnium
Copy link

lithafnium commented Oct 25, 2024

This is more of a niche feature but I'm attempting to serve nnsight as an api using FastAPI. I'm accessing the model and invoking the trace using loop.run_in_executor(). This mostly works. However, when I run multiple requests at the same time I receive the error: RuntimeError: trying to pop from empty mode stack. I assume this is because in nnsight there's some sort of global tracing involved when calculating the compute graph, which according to the error is not threadsafe? Not sure if thats the case, would love some some insight on this.

I'm fairly certain loop.run_in_executor() should work fine, as this functions normally with regular huggingface, and this is how vLLM handles async requests in their AsyncLLMEngine. I'm wondering if other people have noticed this error and whether there are any ways to circumvent this.

@lithafnium
Copy link
Author

Additionally, I get the error: AttributeError: 'LlamaDecoderLayer' object has no attribute 'output'

@JadenFiotto-Kaufman
Copy link
Member

@lithafnium Hey I'd love to know more about this if you could get me some small reproduceable example. I dont think nnsight is close to threadsafe although maybe there are some features in nnsight you could disable to get it working. On another note, 0.4 is going to have vllm support so potentially you can just use that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants