Skip to content

Fix tokenizer test failure#1287

Open
zcbenz wants to merge 1 commit into
mainfrom
fix-tokenizer-test
Open

Fix tokenizer test failure#1287
zcbenz wants to merge 1 commit into
mainfrom
fix-tokenizer-test

Conversation

@zcbenz
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz commented May 19, 2026

Fix the failed test:

======================================================================
FAIL [0.000s]: test_tokenizers (test_tokenizers.TestTokenizers) (tokenizer='mlx-community/Llama-3.2-1B-Instruct-4bit')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/actions-runner/_work/mlx-lm/mlx-lm/tests/test_tokenizers.py", line 68, in test_tokenizers
    self.check_tokenizer(tokenizer)
  File "/Users/runner/actions-runner/_work/mlx-lm/mlx-lm/tests/test_tokenizers.py", line 40, in check_tokenizer
    check(tokens)
  File "/Users/runner/actions-runner/_work/mlx-lm/mlx-lm/tests/test_tokenizers.py", line 31, in check
    self.assertEqual(text, expected_text)
AssertionError: '<|begin_of_text|>a,b' != '<|begin_of_text|>a ,b'
- <|begin_of_text|>a,b
+ <|begin_of_text|>a ,b
?                   +

Fix 2 problems:

  • Overriding TokenizerWrapper._detokenizer was not working.
  • Do not trim space in the test, which expects "a ,b" to reserve its spaces with streaming decoding which is handled as ["a", " ,", "b"] and would have the space trimmed without setting clean_spaces = False.

@zcbenz zcbenz requested a review from angeloskath May 19, 2026 01:09
@zcbenz zcbenz force-pushed the fix-tokenizer-test branch from 31caf37 to 419ceb9 Compare May 19, 2026 01:28
@zcbenz zcbenz force-pushed the fix-tokenizer-test branch from 419ceb9 to a716f04 Compare May 19, 2026 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant