-
-
Notifications
You must be signed in to change notification settings - Fork 740
Open
Labels
Description
System Info
rocminfo
:
$ rocminfo
ROCk module version 6.8.5 is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
Uuid: CPU-XX
Marketing Name: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4700
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 65780304(0x3ebba50) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 65780304(0x3ebba50) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 65780304(0x3ebba50) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1100
Uuid: GPU-85631fd855c9cea1
Marketing Name: Radeon RX 7900 XTX
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2482
BDFID: 768
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 342
SDMA engine uCode:: 21
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Reproduction
Follow the installation instruction here: https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend
Build from source.
And run my Codes:
Traceback (most recent call last):
File "codes/Geneformer/examples/new/age_classification_95M.py", line 75, in <module>
all_metrics = cc.validate(model_directory="codes/Geneformer/Geneformer/gf-12L-95M-i4096",
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "codes/Geneformer/Geneformer/geneformer/classifier.py", line 800, in validate
trainer = self.hyperopt_classifier(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "codes/Geneformer/Geneformer/geneformer/classifier.py", line 1043, in hyperopt_classifier
model = pu.load_model(
^^^^^^^^^^^^^^
File "codes/Geneformer/Geneformer/geneformer/perturber_utils.py", line 171, in load_model
model = model_class.from_pretrained(**model_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/geneformer/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4245, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/geneformer/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4585, in _load_pretrained_model
set_module_tensor_to_device(model, key, "cpu", value)
File "miniconda3/envs/geneformer/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 349, in set_module_tensor_to_device
new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/geneformer/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 335, in to
return self._quantize(device)
^^^^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/geneformer/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 297, in _quantize
w_4bit, quant_state = bnb.functional.quantize_4bit(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/geneformer/lib/python3.11/site-packages/bitsandbytes/functional.py", line 991, in quantize_4bit
return backends[A.device.type].quantize_4bit(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/geneformer/lib/python3.11/site-packages/bitsandbytes/backends/cpu.py", line 142, in quantize_4bit
return quantize_4bit_impl(A, absmax, out, blocksize, compress_statistics, quant_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/geneformer/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "miniconda3/envs/geneformer/lib/python3.11/site-packages/bitsandbytes/backends/cpu_xpu_common.py", line 362, in quantize_4bit_impl
raise NotImplementedError("bnb_4bit_use_double_quant is not supported yet for CPU/XPU")
NotImplementedError: bnb_4bit_use_double_quant is not supported yet for CPU/XPU
Expected behavior
ROCm support has bugs.