Fix search performance regression and TUI race condition#262
Conversation
The regexp_matches approach caused significant performance regression in TUI fast search (Parquet scans). ILIKE is natively optimized by DuckDB for Parquet columnar scans. FTS5 deep search path remains unchanged for body text accuracy. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
When starting an inline search, increment loadRequestID to prevent in-flight loadMessages responses from overwriting search results. Previously, pressing (a) to view all messages then quickly searching could cause the stale loadMessages response to replace search results with unfiltered messages. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- TestStaleLoadMessagesDoesNotOverwriteSearch: verifies that in-flight loadMessages responses (from sort toggle or "all messages" view) are ignored after a search starts, preventing stale results from overwriting search results - TestSearchClearsStaleMessages: verifies messages are cleared immediately when search starts so stale results never display - TestBuildSearchConditions_UsesILIKENotRegex: verifies the fast search path uses ILIKE (fast on Parquet) instead of regexp_matches (slow) - TestBuildAggregateSearchConditions_UsesILIKENotRegex: same check for the aggregate search path - Remove dead code: escapeRegex and startsWithWordChar (no longer used after reverting to ILIKE) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
buildStatsSearchConditions used to call buildAggregateSearchConditions and slice off the text-term prefix using a hand-tracked countArgsForTextTerms helper. Any drift between the helper and the actual per-term arg count (subject + snippet + 2 sender = 4) would cause aggregate stats queries to fail with unmatched placeholders when a search mixed text terms with non-text operators like from:. Extract the non-text filter portion of buildAggregateSearchConditions into a new buildNonTextSearchConditions helper and call it directly from buildStatsSearchConditions, eliminating the arg-slicing workaround and the brittle countArgsForTextTerms constant entirely. Add regression tests that lock the invariant in place by counting "?" placeholders in the emitted WHERE conditions and comparing to len(args) across a matrix of query shapes (text only, text+from, text+to+subject, label view, date/size filters, etc.) for both builders.
roborev: Combined Review (
|
I still disagree with this recommendation. It is better to show nothing than stale search. I discussed this in #256 |
Benchmarks 6 hot-path operations (SearchFast, SearchFastWithStats, Aggregate, GetTotalStats, ListMessages, SubAggregate) against a 100K-message Parquet dataset with 500 participants across 50 domains. Data generated via DuckDB SQL for fast setup (~1s). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Confirmed. Adding a minimal benchmark so we can keep an eye on this |
roborev: Combined Review (
|
Tested and ran into a bunch of performance & context issues that surfaced in the TUI. Original approach using ILIKE is better it seems like for large archives. (Note this is a resubmission of #256)
Changes