Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP8 execution requires 2D input matrices with height divisible by 8 and width divisible by 16 #1422

Open
Liufeiran123 opened this issue Jan 25, 2025 · 1 comment

Comments

@Liufeiran123
Copy link

AssertionError: FP8 execution requires 2D input matrices with height divisible by 8 and width divisible by 16, but got tensor with dims=[896, 712]

@timmoon10
Copy link
Collaborator

These divisibility requirements are from FP8 Tensor Cores. The simplest fix is to pad to the nearest multiple of 32, but also this compute seems too small to get full GPU utilization. It may be better to disable FP8 for small layers to avoid the extra overhead:

# Construct model
layer1 = te.Linear(712, 896)
layer2 = te.Linear(896, 4096)
layer3 = te.Linear(4096, 4096)

# Forward pass: layer1 in FP32, layer2 and layer3 in FP8
x = torch.randn(4096, 712)
with te.fp8_autocast():
    with te.fp8_autocast(enabled=False):
        x = layer1(x)
    x = layer2(x)
    x = layer3(x)
loss = loss_fn(x)

# Backward pass
loss.backward()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants