Skip to content

Conversation

@alec-flowers
Copy link

No description provided.

Signed-off-by: alec-flowers <[email protected]>
- To enable KV Cache Routing we need to have tokens in frontend in order to calculate hashes to match against the indexers hashes created from KV events emitted from the backend.

REQ 3: PostProcessing SHOULD move to the backend engine
- Post-processing is currently done on the engine side, however the intention was to co-located it with the backend where there are many CPU's sitting idle alongside the GPU's.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be "currently done on the frontend side"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, yes.

"""Combine chat template + tokenization in one call."""
pass

class VllmTokenizer(TokenizerProtocol):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it conceptually allow the sglang tokenizer as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was just providing this as a first example.


**Benefits**: Offloads CPU work to backend servers, scales with engine instances

#### Mode 2: vLLM Native Processing (`--vllm-processing`)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this conceptually work with SGL and vLLM paths as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


**Benefits**: Offloads CPU work to backend servers, scales with engine instances

#### Mode 2: vLLM Native Processing (`--vllm-processing`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we think of this in terms of a "hello world backend" example, CLI args, and all - and see if it generalizes to vllm, sglang, trtllm, etc. that each implement the "interface" we defined out of the box for users of our officially supported backends?

For example, maybe any backend would accept args like --dyn-backend-preprocessor and --dyn-backend-postprocessor (subject to change, just following --dyn- prefix convention for dynamo specific args rather than framework args we passthrough), etc. and for vllm backend that happens to map to using vllm's tokenizer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I wasn't sure the best UX so just tried to write something down. Open to what you are suggestion as well.

@@ -0,0 +1,222 @@
# Native Framework Processing Fallback

**Status**: Draft

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion : Let's version it. example V.Alpha


REQ 1: User SHOULD be able to select between dynamo processing and the backend enginge processing.

REQ 2: Preprocessing MUST happen on the frontend.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 'frontend' refer to?

"""Combine chat template + tokenization in one call."""
pass

class VllmTokenizer(TokenizerProtocol):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where this implementation lives, you will need to import vLLM and all its tokenization utilities, correct?


```rust
// Rust passes tokenizer path to Python factory
let tokenizer_path = card.get_tokenizer_path()?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By passing only the path here, we assume the factory can figure out the type of tokenizer from that path?

// Rust passes tokenizer path to Python factory
let tokenizer_path = card.get_tokenizer_path()?;
let adapter = Python::with_gil(|py| {
let py_tokenizer = factory.call1(py, (tokenizer_path,))?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a protocol for the factory too?

@ai-dynamo ai-dynamo deleted a comment from kanchanawela Dec 3, 2025
@ai-dynamo ai-dynamo deleted a comment from kanchanawela Dec 3, 2025
@ai-dynamo ai-dynamo deleted a comment from kanchanawela Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants