Skip to content

perf: Add vector search stage metrics#5974

Draft
franciscojavierarceo wants to merge 6 commits into
ogx-ai:mainfrom
franciscojavierarceo:codex/vector-search-stage-metrics
Draft

perf: Add vector search stage metrics#5974
franciscojavierarceo wants to merge 6 commits into
ogx-ai:mainfrom
franciscojavierarceo:codex/vector-search-stage-metrics

Conversation

@franciscojavierarceo
Copy link
Copy Markdown
Collaborator

What changed

  • Adds ogx.vector_io.query_stage_duration_seconds for retrieval stage timing.
  • Adds ogx.vector_io.query_result_count for returned chunk counts.
  • Emits stage metrics from the shared vector store query path for embedding generation, backend search, and neural reranking.
  • Adds unit coverage for metric instrument definitions and stage emission from VectorStoreWithIndex.query_chunks.

Why

The existing router-level retrieval duration tells us that search was slow, but not whether the regression came from embedding, backend lookup, reranking, or unexpectedly large result sets. These stage-level metrics make future RAG/search regressions easier to localize.

Review notes

This draft intentionally includes the same hot-path logging cleanup as #5972 because both changes touch src/ogx/providers/utils/memory/vector_store.py and the logging hook inspects that file. If #5972 lands first, this PR should shrink to the metric changes after updating.

Validation

  • git diff --check
  • Commit hooks passed, including ruff, ruff format, mypy, provider codegen, logging checks, and repository policy hooks.
  • Attempted targeted test: uv run pytest tests/unit/telemetry/test_vector_io_metrics.py tests/unit/rag/test_vector_store.py -q, but direct uv run did not start because the shell uv is 0.5.29 and the repo requires >=0.7.0.

Add a configurable prefix-to-encoding mapping with sensible defaults for
common non-OpenAI model families (llama, mistral, claude, gemma, qwen,
phi, deepseek). This is the first step toward supporting compaction with
non-OpenAI models.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Replace the 2-step resolution (admin config OR tiktoken) with a 5-step
chain: per-request extra_body override, admin default, tiktoken built-in,
model-family prefix mapping, character-based estimate. Explicit choices
fail hard with InvalidParameterError; automatic resolution falls through.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

# Conflicts:
#	docs/docs/providers/responses/inline_builtin.mdx
#	src/ogx/providers/inline/responses/builtin/config.py
#	src/ogx/providers/inline/responses/builtin/responses/openai_responses.py
#	tests/unit/providers/inline/responses/builtin/responses/test_tokenizer_resolution.py
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
@franciscojavierarceo franciscojavierarceo added RAG Relates to RAG functionality of the agents API python Pull requests that update python code codex labels May 27, 2026
@franciscojavierarceo franciscojavierarceo changed the title [codex] Add vector search stage metrics perf: Add vector search stage metrics May 27, 2026
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex python Pull requests that update python code RAG Relates to RAG functionality of the agents API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant