Grammar API v2 by mhayes853 · Pull Request #628 · cactus-compute/cactus

mhayes853 · 2026-05-05T03:56:14Z

Introduces a Grammar API to enable structured generation in coming PRs.

For now, this isn't directly integrated with completion, but rather exposes all the necessary APIs to make that happen in the near future. The underlying state machine used is XGrammar, and for the most part the implementation directly wraps it.

There are 4 essential types:

Grammar an actual representation of a grammar, with conveniences for plain JSON, JSON schema, regex, empty, universal, structural tags, etc.
- Also supports mathematical operations such as unioning and concatenating, as well as getting the underlying EBNF string.
GrammarMatcher the state machine type responsible for computing bitmasks.
- accept and next_bitmask do most of the driving. However, there are also methods to rollback, and to check the termination state.
- fork is an efficient way to create a copy of matcher (at it's current state) without having to go through GrammarEngine.
GrammarEngine responsible for compiling GrammarMatcher instances from a Grammar instance.
- Under the hood, this type wraps both GrammarCompiler and CompilerGrammar from XGrammar. I felt it was a conceptually simpler API to keep these concepts bound together.
  - I also felt exposing the concept of a "compiled" grammar was a bit unecessary since it's only purpose is to be used in matcher construction, and the matcher already has fork which can be used for efficient copies.
- The class exposes a singular compile_matcher method which takes a Grammar instance and returns a GrammarMatcher.
GrammarVocabulary tokenizer info required by the other grammar types.
- This type merely wraps TokenizerInfo from XGrammar, there is also now a new tokenizer method to produce it.
- At the FFI level, this is constructed from a model instance/path directly.

Among other things, I also extracted a helper for constructing the tokenizer from the model path, and also updated the parsing logic for the HF tokenizer file within the tokenizer. This is mainly because the grammar matcher needs the tokenizer info, but the only way to load the tokenizer previously was to go through the entire model initialization (when only the tokenizer is needed). XGrammar also provides a transitive dependency on picojson for proper JSON parsing which should be useful going forward.

Signed-off-by: Matthew Hayes <[email protected]>

mhayes853 added 16 commits May 4, 2026 20:52

XGrammar

142384b

Signed-off-by: Matthew Hayes <[email protected]>

Basic Grammar Functionallity

6113a31

Signed-off-by: Matthew Hayes <[email protected]>

Grammar FFI

2b38e67

Signed-off-by: Matthew Hayes <[email protected]>

Bump vendored XGrammar

f0d16a9

Signed-off-by: Matthew Hayes <[email protected]>

Cleanup

d0afd07

Signed-off-by: Matthew Hayes <[email protected]>

Next bitmask regression tests

5c40a83

Signed-off-by: Matthew Hayes <[email protected]>

Expose more matcher APIs

04d483e

Signed-off-by: Matthew Hayes <[email protected]>

EBNF Grammar Method

a157f46

Signed-off-by: Matthew Hayes <[email protected]>

Expose fork on matcher

59387c4

Signed-off-by: Matthew Hayes <[email protected]>

Updates

6332121

Signed-off-by: Matthew Hayes <[email protected]>

Extract grammar compilation logic to reusable GrammarEngine

7938e57

Signed-off-by: Matthew Hayes <[email protected]>

Make GrammarVocabulary a TokenizerInfo wrapper

0c4e2d1

Signed-off-by: Matthew Hayes <[email protected]>

FFI Updates

d72b9c4

Signed-off-by: Matthew Hayes <[email protected]>

Add optional comment

38f0b3b

Signed-off-by: Matthew Hayes <[email protected]>

Ensure XGrammar path is linked on Linux

219d18c

Signed-off-by: Matthew Hayes <[email protected]>

Update Python CLI to include xgrammar path

99a81ab

Signed-off-by: Matthew Hayes <[email protected]>

mhayes853 force-pushed the grammar-api-3 branch from 5ab3f9e to 0be954a Compare May 5, 2026 06:08

Fix linux rust CI

ec0beff

Signed-off-by: Matthew Hayes <[email protected]>

mhayes853 force-pushed the grammar-api-3 branch from 0be954a to ec0beff Compare May 5, 2026 08:04

mhayes853 added 11 commits May 5, 2026 01:23

Update macOS XGrammar build

5e579b2

Signed-off-by: Matthew Hayes <[email protected]>

Export compiler commands

b98a85a

Signed-off-by: Matthew Hayes <[email protected]>

Cleanup

f6ec246

Signed-off-by: Matthew Hayes <[email protected]>

Epsilon Grammar

55f299b

Signed-off-by: Matthew Hayes <[email protected]>

Optional Grammar

f3c97f4

Signed-off-by: Matthew Hayes <[email protected]>

Repeat Grammar

59698ad

Signed-off-by: Matthew Hayes <[email protected]>

Star

32e8828

Signed-off-by: Matthew Hayes <[email protected]>

Cleanup

8dc76c1

Signed-off-by: Matthew Hayes <[email protected]>

Cleanup

6e4a4d0

Signed-off-by: Matthew Hayes <[email protected]>

Use HF metadata for grammar vocabulary

f53179c

Signed-off-by: Matthew Hayes <[email protected]>

FFI Updates

b4f2c7e

Signed-off-by: Matthew Hayes <[email protected]>

mhayes853 force-pushed the grammar-api-3 branch from 2f2dd72 to b4f2c7e Compare May 11, 2026 23:16

Merge v2

de23a02

Signed-off-by: Matthew Hayes <[email protected]>

mhayes853 changed the base branch from main to v2 May 12, 2026 04:41

mhayes853 changed the title ~~Grammar API~~ Grammar API v2 May 12, 2026

mhayes853 added 2 commits May 11, 2026 21:54

Export compile commands

1e844c3

Signed-off-by: Matthew Hayes <[email protected]>

Cleanup

fa4e753

Signed-off-by: Matthew Hayes <[email protected]>

mhayes853 force-pushed the grammar-api-3 branch from 88b6685 to fa4e753 Compare May 12, 2026 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grammar API v2#628

Grammar API v2#628
mhayes853 wants to merge 31 commits into
cactus-compute:v2from
mhayes853:grammar-api-3

mhayes853 commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mhayes853 commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mhayes853 commented May 5, 2026 •

edited

Loading