Fix GPU dynamic batching reordering documents silently by raphaelsty · Pull Request #98 · lightonai/next-plaid

raphaelsty · 2026-05-18T10:24:02Z

tokenize_documents_in_batches sorts documents by token length and buckets them by shape on the GPU path, but the returned Vec<PreparedDocumentBatch> discards the original input order. Callers that map embeddings back to input positions (notably colgrep::index::run_pool_stage via original_to_unique) then pair every code unit with the wrong embedding, producing an unusable index. The symptom is silent: indexing succeeds and search runs, but results are unrelated to the query.

Reproduced on axios with LateOn-Code-edge: query "request and response interceptors" returns
helpers/speedometer.js, utils.js, helpers/composeSignals.js on GPU vs core/Axios.js, core/InterceptorManager.js, adapters/xhr.js on CPU. The full semble code-search benchmark dropped from ~0.69 to ~0.16 NDCG@10 on the first repos.

Track the original input position alongside each tokenized document through sorting and bucketing, store it in PreparedDocumentBatch, and have encode_prepared_document_batches restore the caller's input order before returning. Batches produced through code paths that don't populate original_input_indices (e.g. tokenize_documents, prepare_batch_from_tokenizer_encodings) are passed through unchanged, so the public API stays backwards compatible.

`tokenize_documents_in_batches` sorts documents by token length and buckets them by shape on the GPU path, but the returned `Vec<PreparedDocumentBatch>` discards the original input order. Callers that map embeddings back to input positions (notably colgrep::index::run_pool_stage via `original_to_unique`) then pair every code unit with the wrong embedding, producing an unusable index. The symptom is silent: indexing succeeds and search runs, but results are unrelated to the query. Reproduced on `axios` with `LateOn-Code-edge`: query "request and response interceptors" returns `helpers/speedometer.js`, `utils.js`, `helpers/composeSignals.js` on GPU vs `core/Axios.js`, `core/InterceptorManager.js`, `adapters/xhr.js` on CPU. The full semble code-search benchmark dropped from ~0.69 to ~0.16 NDCG@10 on the first repos. Track the original input position alongside each tokenized document through sorting and bucketing, store it in `PreparedDocumentBatch`, and have `encode_prepared_document_batches` restore the caller's input order before returning. Batches produced through code paths that don't populate `original_input_indices` (e.g. `tokenize_documents`, `prepare_batch_from_tokenizer_encodings`) are passed through unchanged, so the public API stays backwards compatible. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

raphaelsty merged commit 548b760 into main May 18, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GPU dynamic batching reordering documents silently#98

Fix GPU dynamic batching reordering documents silently#98
raphaelsty merged 1 commit into
mainfrom
fix/gpu-dynamic-batch-doc-order

raphaelsty commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raphaelsty commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant