FP8 execution requires 2D input matrices with height divisible by 8 and width divisible by 16 #1422

Liufeiran123 · 2025-01-25T12:06:44Z

AssertionError: FP8 execution requires 2D input matrices with height divisible by 8 and width divisible by 16, but got tensor with dims=[896, 712]

timmoon10 · 2025-01-27T21:10:29Z

These divisibility requirements are from FP8 Tensor Cores. The simplest fix is to pad to the nearest multiple of 32, but also this compute seems too small to get full GPU utilization. It may be better to disable FP8 for small layers to avoid the extra overhead:

# Construct model
layer1 = te.Linear(712, 896)
layer2 = te.Linear(896, 4096)
layer3 = te.Linear(4096, 4096)

# Forward pass: layer1 in FP32, layer2 and layer3 in FP8
x = torch.randn(4096, 712)
with te.fp8_autocast():
    with te.fp8_autocast(enabled=False):
        x = layer1(x)
    x = layer2(x)
    x = layer3(x)
loss = loss_fn(x)

# Backward pass
loss.backward()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP8 execution requires 2D input matrices with height divisible by 8 and width divisible by 16 #1422

FP8 execution requires 2D input matrices with height divisible by 8 and width divisible by 16 #1422

Liufeiran123 commented Jan 25, 2025

timmoon10 commented Jan 27, 2025

FP8 execution requires 2D input matrices with height divisible by 8 and width divisible by 16 #1422

FP8 execution requires 2D input matrices with height divisible by 8 and width divisible by 16 #1422

Comments

Liufeiran123 commented Jan 25, 2025

timmoon10 commented Jan 27, 2025