backend fallback draft #50

alec-flowers · 2025-11-11T05:46:21Z

No description provided.

Signed-off-by: alec-flowers <[email protected]>

itay · 2025-11-11T06:04:17Z

deps/0011-backend-tokenization-fallback.md

+-  To enable KV Cache Routing we need to have tokens in frontend in order to calculate hashes to match against the indexers hashes created from KV events emitted from the backend.
+
+REQ 3: PostProcessing SHOULD move to the backend engine
+- Post-processing is currently done on the engine side, however the intention was to co-located it with the backend where there are many CPU's sitting idle alongside the GPU's.


Is this meant to be "currently done on the frontend side"?

Woops, yes.

itay · 2025-11-11T06:04:54Z

deps/0011-backend-tokenization-fallback.md

+        """Combine chat template + tokenization in one call."""
+        pass
+
+class VllmTokenizer(TokenizerProtocol):


Would it conceptually allow the sglang tokenizer as well?

Yes, I was just providing this as a first example.

itay · 2025-11-11T06:06:28Z

deps/0011-backend-tokenization-fallback.md

+
+**Benefits**: Offloads CPU work to backend servers, scales with engine instances
+
+#### Mode 2: vLLM Native Processing (`--vllm-processing`)


Will this conceptually work with SGL and vLLM paths as well?

rmccorm4 · 2025-11-11T07:15:43Z

deps/0011-backend-tokenization-fallback.md

+
+**Benefits**: Offloads CPU work to backend servers, scales with engine instances
+
+#### Mode 2: vLLM Native Processing (`--vllm-processing`)


Can we think of this in terms of a "hello world backend" example, CLI args, and all - and see if it generalizes to vllm, sglang, trtllm, etc. that each implement the "interface" we defined out of the box for users of our officially supported backends?

For example, maybe any backend would accept args like --dyn-backend-preprocessor and --dyn-backend-postprocessor (subject to change, just following --dyn- prefix convention for dynamo specific args rather than framework args we passthrough), etc. and for vllm backend that happens to map to using vllm's tokenizer?

Yea I wasn't sure the best UX so just tried to write something down. Open to what you are suggestion as well.

kanchanawela · 2025-11-12T04:19:30Z

deps/0011-backend-tokenization-fallback.md

@@ -0,0 +1,222 @@
+# Native Framework Processing Fallback
+
+**Status**: Draft


Suggestion : Let's version it. example V.Alpha

hypdeb · 2025-11-13T17:38:46Z

deps/0011-backend-tokenization-fallback.md

+
+REQ 1: User SHOULD be able to select between dynamo processing and the backend enginge processing.
+
+REQ 2: Preprocessing MUST happen on the frontend.


What does 'frontend' refer to?

hypdeb · 2025-11-13T17:41:37Z

deps/0011-backend-tokenization-fallback.md

+        """Combine chat template + tokenization in one call."""
+        pass
+
+class VllmTokenizer(TokenizerProtocol):


Where this implementation lives, you will need to import vLLM and all its tokenization utilities, correct?

hypdeb · 2025-11-13T17:46:22Z

deps/0011-backend-tokenization-fallback.md

+
+```rust
+// Rust passes tokenizer path to Python factory
+let tokenizer_path = card.get_tokenizer_path()?;


By passing only the path here, we assume the factory can figure out the type of tokenizer from that path?

hypdeb · 2025-11-13T17:48:01Z

deps/0011-backend-tokenization-fallback.md

+// Rust passes tokenizer path to Python factory
+let tokenizer_path = card.get_tokenizer_path()?;
+let adapter = Python::with_gil(|py| {
+    let py_tokenizer = factory.call1(py, (tokenizer_path,))?;


Should there be a protocol for the factory too?

backend fallback draft

6b4b529

Signed-off-by: alec-flowers <[email protected]>

itay reviewed Nov 11, 2025

View reviewed changes

rmccorm4 reviewed Nov 11, 2025

View reviewed changes

kanchanawela reviewed Nov 12, 2025

View reviewed changes

hypdeb reviewed Nov 13, 2025

View reviewed changes

ai-dynamo deleted a comment from kanchanawela Dec 3, 2025


		Benefits: Offloads CPU work to backend servers, scales with engine instances

		#### Mode 2: vLLM Native Processing (`--vllm-processing`)

		@@ -0,0 +1,222 @@
		# Native Framework Processing Fallback

		Status: Draft


		REQ 1: User SHOULD be able to select between dynamo processing and the backend enginge processing.

		REQ 2: Preprocessing MUST happen on the frontend.

backend fallback draft #50

Are you sure you want to change the base?

backend fallback draft #50

Uh oh!

Conversation

alec-flowers commented Nov 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants