Correct tokenization via standard huggingface apply chat template flow


**Summary**  
Replace ad-hoc / hard-coded token counting with a more correct, model-aware tokenization pipeline.

**Problem**  
Current token counting is hard-coded and not aligned with how real models (e.g., Hugging Face transformers) tokenize chat messages. This can produce inaccurate token counts and mislead users doing data budgeting or sanity checks.

**Proposed Solution**  
- Design an abstraction that approximates `tokenizer.apply_chat_template` behavior without pulling the entire `transformers` stack into the extension. Options:
  - Lightweight re-implementation for key templates (OpenAI-style, Llama, etc.).
  - Pluggable tokenization strategies where users select a model format and we apply the corresponding template before counting.  
- Continue using a fast token counting backend (e.g., `tiktoken`-like logic) but ensure the pre-serialization to text matches the model’s expected template.  
- Make tokenizer selection explicit in the UI (e.g., dropdown for “OpenAI”, “Llama”, etc.).

**Acceptance Criteria**  
- For representative chat examples, token counts match (within a small tolerance) what official/tokenizer tooling reports for:
  - At least one OpenAI-style model.
  - One Llama-style model.  
- UI clearly shows which tokenization scheme is being used.  
- No noticeable performance regression compared to the current implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct tokenization via standard huggingface apply chat template flow #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Correct tokenization via standard huggingface apply chat template flow #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions