fix: resolve Metal kernel runtime crash with bfloat16 dtype #228

Ahmed-Ali · 2025-08-20T03:32:15Z

Summary

This PR fixes a runtime crash in the MLX Metal kernel when processing bfloat16 tensors, where the incompatibility between bfloat16 and float types causes Metal compilation failures at runtime.

Problem

The Metal kernel in outlines_core/kernels/mlx.py was directly using -INFINITY (a float literal) in the kernel source. This causes a runtime crash when the input tensor has dtype bfloat16, as Metal doesn't support implicit conversion between bfloat16 and float types.

Solution

A minor change following the same approach applied in the LLGuidance package by providing a compatible tensor for the -INF.

Changes

Modified the Metal kernel to accept neg_inf as an additional input parameter
Updated kernel invocation to pass a properly typed negative infinity value
Maintained backward compatibility and performance characteristics

Testing

✅ All tests pass:

Applied all the tests mentioned in the readme fully and all passes.
Also ran the benchmark tests and nothing stood out.

Type of Change

Bug fix for a runtime crash in the happy flow

To the best of my knowledge, this is a minimal, focused fix that resolves the runtime crash without affecting the kernel's logic or performance characteristics.

…ith float

unaidedelf8777 · 2025-08-27T02:22:08Z

I would suggest instead using a template in the metal kernel, that way there is less FFI pass through and the float("-inf") is not constructed every run like it is in this current version. in the kernel you can add:

/// per IEEE 754 this is equivalent to -inf.
T neg_inf = -(T(1.0) / T(0.0));

and change

out[batch * inp_shape[1] + elem] = bit ? inp[batch * inp_shape[1] + elem] : neg_inf[0];
/// to
out[batch * inp_shape[1] + elem] = bit ? inp[batch * inp_shape[1] + elem] : neg_inf;

and when dispatching the kernel you will need to add the template arg:

@mx.compile
def _apply_token_bitmask_kernel(data: mx.array, mask: mx.array) -> mx.array:
    return _KERNEL(
        inputs=[data, mask],
        template=[("T", data.dtype)], # this
        grid=(data.shape[1], data.shape[0], 1),
        threadgroup=(256, 1, 1),
        output_shapes=[data.shape],
        output_dtypes=[data.dtype],
    )[0]

Fix: metal source fails to compile because bfloat is not compatible w…

473290b

…ith float

Ahmed-Ali marked this pull request as ready for review August 20, 2025 03:46

referenced wrong object before committing

838cff0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve Metal kernel runtime crash with bfloat16 dtype #228

fix: resolve Metal kernel runtime crash with bfloat16 dtype #228

Uh oh!

Ahmed-Ali commented Aug 20, 2025 •

edited

Loading

Uh oh!

unaidedelf8777 commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: resolve Metal kernel runtime crash with bfloat16 dtype #228

Are you sure you want to change the base?

fix: resolve Metal kernel runtime crash with bfloat16 dtype #228

Uh oh!

Conversation

Ahmed-Ali commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Testing

Type of Change

Uh oh!

unaidedelf8777 commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ahmed-Ali commented Aug 20, 2025 •

edited

Loading