Skip to content

[BUG] Windows 11 Tensor Parallelism slow #760

Open
@frenzybiscuit

Description

@frenzybiscuit

OS

Windows

GPU Library

CUDA 12.x

Python version

3.12

Pytorch version

2.6.0

Model

No response

Describe the bug

With TabbyAPI, when using Windows 11 64bit pro without WSL regular GPU split mode works fine, but tensor parallelism tanks performance by 25% or so.

When using Debian 12 on the same machine, tensor parallelism increases performance by 25% or so...

I tried to use WSL2 to get around this, but my performance was less then half of what it was on Windows 11 native...

There are no logs or errors that I can see.

Reproduction steps

Install latest cuda, visual studio and git. Then install exllamav2 & TabbyAPI.

Expected behavior

works as expected

Logs

.

Additional context

.

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions