Open
Description
OS
Windows
GPU Library
CUDA 12.x
Python version
3.12
Pytorch version
2.6.0
Model
No response
Describe the bug
With TabbyAPI, when using Windows 11 64bit pro without WSL regular GPU split mode works fine, but tensor parallelism tanks performance by 25% or so.
When using Debian 12 on the same machine, tensor parallelism increases performance by 25% or so...
I tried to use WSL2 to get around this, but my performance was less then half of what it was on Windows 11 native...
There are no logs or errors that I can see.
Reproduction steps
Install latest cuda, visual studio and git. Then install exllamav2 & TabbyAPI.
Expected behavior
works as expected
Logs
.
Additional context
.
Acknowledgements
- I have looked for similar issues before submitting this one.
- I understand that the developers have lives and my issue will be answered when possible.
- I understand the developers of this program are human, and I will ask my questions politely.