Skip to content

Query performance regression in V2 segment format #86

@tjgreen42

Description

@tjgreen42

Summary

Commit 15b8df3 (V2 segment format) introduced a significant query performance regression. Query execution time increased dramatically due to segments being opened/closed O(T × S) times instead of O(S) times (where T = query terms, S = segments).

Symptoms

  • Query latency increased ~29x on MS-MARCO dataset (based on profiling sample counts)
  • tp_segment_open consuming 24% of query CPU time
  • Observed in benchmark dashboard as spike in query latency metrics

Root Cause

The scoring loop was structured as:

Phase 1 - Get doc_freq:
  for each term:
    for each segment: open → read doc_freq → close

Phase 2 - Score:
  for each term:
    for each segment: open → iterate postings → close

Each tp_segment_open is expensive because it:

  1. Allocates reader structure
  2. Reads segment header from disk
  3. Reads entire page index (potentially multiple pages)
  4. For V2: potentially preloads CTID table

For a query with 5 terms and 10 segments, this resulted in 100 segment opens instead of 10.

Fix

Restructure to open each segment once:

for each segment:
  open
  for each term: get doc_freq + score
  close

Fix is in PR #85.

Profiling Data

Baseline (d560f1e, before V2):

  • 104 samples total
  • Top function: kernel spin lock (9.6%)
  • tp_segment_open not in top functions

Regressed (15b8df3, with V2):

  • 3,022 samples total
  • Top function: tp_segment_open (24%)
  • ~29x more CPU time in queries

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions