-
-
Notifications
You must be signed in to change notification settings - Fork 760
Open
Labels
Contributions WelcomeWe welcome contributions to fix this issue!We welcome contributions to fix this issue!Cross PlatformLow RiskRisk of bugs in transformers and other librariesRisk of bugs in transformers and other librariesMedium Priority(will be worked on after all high priority issues)(will be worked on after all high priority issues)ROCm
Description
System Info
System Info
Working on a kubernetes deployment with debian + pytorch 2.4.0 + ROCm 6.1.
The deployment is using the multiple backend alpha release available in the parent bitsandbytes repo.
Reproduction
Trying to load a model with bitsandbytes fails because there is no access to rocminfo.
def get_rocm_gpu_arch() -> str:
logger = logging.getLogger(__name__)
try:
if torch.version.hip:
result = subprocess.run(["rocminfo"], capture_output=True, text=True)
match = re.search(r"Name:\s+gfx([a-zA-Z\d]+)", result.stdout)
ERROR:bitsandbytes.cuda_specs:Could not detect ROCm GPU architecture: [Errno 2] No such file or directory: 'rocminfo'
WARNING:bitsandbytes.cuda_specs:
ROCm GPU architecture detection failed despite ROCm being available.
Expected behavior
I would prefer if I could set the architecture via an environment variable and rocminfo
would be the fallback option if the env var is not set.
Here is the related cope snippet.
Happy to work on this if other people feel it is a good workaround.
Metadata
Metadata
Assignees
Labels
Contributions WelcomeWe welcome contributions to fix this issue!We welcome contributions to fix this issue!Cross PlatformLow RiskRisk of bugs in transformers and other librariesRisk of bugs in transformers and other librariesMedium Priority(will be worked on after all high priority issues)(will be worked on after all high priority issues)ROCm