Skip to content

Conversation

@babusid
Copy link
Contributor

@babusid babusid commented Nov 24, 2025

Solves the issue of softmax assigning probability to reserved tokens and messing up the sampling kernel. This patch extracts the active vocabulary from the HF tokenizer, and uses that to zero out the reserved tokens, and only generate probabilities for valid "active" tokens.

babusid and others added 4 commits November 24, 2025 10:35
Issue is that the config read in the compile phase is a model-specific config object, that is defined in each models' implementation. Maybe the solution is not to add another field, but rather to overload / overwrite the vocab size parameter
Comment on lines +130 to 132
logger.info("TOP LEVEL MODEL CONFIG BEFORE OVERRIDES: %s", str(model_config))
_kwargs = getattr(model_config, "kwargs", {})
model_config = args.overrides.apply(model_config)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note here, i noticed that this override wipes out any kwargs in the original model_config. This PR isn't the place to address it probably, but I just wanted to call it out.

Copy link
Member

@MasterJH5574 MasterJH5574 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @babusid for the enhancement!

@MasterJH5574 MasterJH5574 changed the title Softmax Predicate patch [Sampling] Softmax Predicate patch Nov 25, 2025
@MasterJH5574 MasterJH5574 merged commit 8b2195c into mlc-ai:main Nov 25, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants