Arena implementation #665

xav-db · 2025-10-20T15:10:01Z

Description

Related Issues

Closes #670 #666 #667 #672 #668 #661 #655 #654 #652 #436

Checklist when merging to main

No compiler warnings (if applicable)
Code is formatted with rustfmt
No useless or dead code (if applicable)
Code is easy to understand
Doc comments are used for all functions, enums, structs, and fields (where appropriate)
All tests pass
Performance has not regressed (assuming change was not to fix a bug)
Version number has been updated in helix-cli/Cargo.toml and helixdb/Cargo.toml

Additional Notes

…ix-db into arena-implementation

…implementation

This integrates the fix from PR #667 (#667) which adds an .id() method to the ID struct to resolve compilation errors when using AddE with borrowed IDs in FOR loops. Changes: - Added ID.id() method returning u128 in helix-db/src/utils/id.rs - Added test case for AddE with borrowed IDs from FOR loop destructuring The helixc code generator consistently uses .id() to access ID values. Before this change, calling .id() on &ID (borrowed ID) would fail. The new method leverages Rust's auto-deref so both ID.id() and &ID.id() work seamlessly. This fixes queries like: FOR {from_id, to_id, ...} IN edges { AddE<MyEdge>::From(from_id)::To(to_id) } Co-Authored-By: ishaksebsib <[email protected]>

…roperty handling ## Changes ### 1. Added Reserved Property Support - Created `ReservedProp` enum for built-in properties (id, label, version, etc.) - Added `Step::ReservedPropertyAccess` variant to generate direct field access - Reserved properties now return correct Value types (e.g., Value::Id for id field) ### 2. Updated Property Access Generator - Modified `gen_property_access()` to identify reserved vs user-defined properties - Reserved properties use direct struct field access - User-defined properties continue using `get_property()` HashMap lookup ### 3. Enhanced Boolean Operation Optimization - Updated `WhereRef` Display implementation for reserved property optimization - Generates `Value::Id(ID::from(val.id))` for id field in filters - Maintains `get_property()` optimization for user-defined properties ### 4. Fixed IS_IN Validation - Separated IS_IN from scalar boolean operations - IS_IN now correctly expects `Type::Array` arguments with scalar elements - Added `get_reserved_property_type()` helper for type validation - Updated Node/Edge/Vector validation to check reserved properties first ### 5. Testing - Added IS_IN test with both reserved (id) and user-defined (field) properties - Verified correct code generation and type safety ## Impact Resolves the issues from original PR #666: - IS_IN with arrays now validates at compile-time - Reserved property access returns correct Value types for IS_IN operations - Queries like `WHERE(_::{id}::IS_IN(node_ids))` now compile and execute correctly Co-Authored-By: ishaksebsib <[email protected]>

This fix resolves runtime errors when attempting to DROP non-existent nodes, edges, or vectors. Previously, empty traversals would throw "Conversion error: Incorrect Type: Empty" errors. Added TraversalValue::Empty => Ok(()) case to make DROP operations idempotent - dropping non-existent items now succeeds silently. Integrates PR #670 into the arena-implementation branch. Co-Authored-By: ishaksebsib

## Summary This PR integrates the fix from #670 into the `arena-implementation` branch. It resolves runtime errors when attempting to DROP non-existent nodes, edges, or vectors. ## Problem Queries attempting to drop non-existent items failed with: ``` Conversion error: Incorrect Type: Empty ``` This occurred because `drop_traversal` did not handle `TraversalValue::Empty` returned when querying for non-existent items. ## Solution Added `TraversalValue::Empty => Ok(())` case in the match statement of `drop_traversal` in `helix-db/src/helix_engine/traversal_core/ops/util/drop.rs:51`. This treats empty traversals as successful no-ops, making DROP operations idempotent. ## Testing - ✅ `cargo check` passes with no errors - ✅ All 109 `helix_engine` tests pass, including all DROP-related tests ## Related - Integrates fix from PR #670 - No changes needed to helixc generator - the fix automatically applies to generated code Co-Authored-By: ishaksebsib

## Summary This PR integrates the fix from #667 into the arena-implementation branch. It adds an .id() method to the ID struct to resolve compilation errors when using AddE with borrowed IDs in FOR loops. ## Changes - Added ID.id() method in helix-db/src/utils/id.rs returning u128 directly - Added test case in hql-tests/tests/add_e_borrowed_ids/ validating the fix ## Technical Details The helixc code generator consistently generates .id() calls to access ID values. Before this change, calling .id() on &ID (borrowed ID) would fail. This fix leverages Rust's auto-deref so both ID.id() and &ID.id() work seamlessly. ## Testing ✅ cargo check --workspace passes ✅ All 27 ID struct unit tests pass ✅ New test case compiles successfully Co-Authored-By: ishaksebsib

#674) ## Summary This PR integrates the fixes from PR #666 into the current `arena-implementation` branch. PR #666 originally fixed IS_IN validation and runtime handling for reserved properties, but the current implementation has diverged significantly from when that PR was created. ## Problem **Analyzer Issue**: The type validator incorrectly grouped `IS_IN` with other boolean operations, expecting scalar arguments instead of arrays. This caused compilation errors for queries like `nodes::WHERE(_::{id}::IS_IN(node_ids))` when `node_ids` is `[ID]`. **Runtime Issue**: The current implementation removed `filterable.rs` and its special handling for reserved properties (id, label, etc.). Reserved properties are now struct fields, not in the properties HashMap, so `get_property("id")` returns `None`. **Root Cause**: 1. IS_IN validation expects scalars instead of arrays 2. Reserved properties lost special handling during refactoring 3. Type mismatch between `Value::String` and `Value::Id` for ID operations ## Solution ### 1. Added Reserved Property Support - Created `ReservedProp` enum for built-in properties: Id, Label, Version, FromNode, ToNode, Deleted, Level, Distance, Data - Added `Step::ReservedPropertyAccess` variant to generate direct field access - Reserved properties now return correct Value types (e.g., `Value::Id` for id field) ### 2. Updated Property Access Generator - Modified `gen_property_access()` in `analyzer/utils.rs` to identify reserved vs user-defined properties - Reserved properties use direct struct field access: `Value::Id(ID::from(val.id))` - User-defined properties continue using `get_property()` HashMap lookup ### 3. Enhanced Boolean Operation Optimization - Updated `WhereRef` Display implementation in `traversal_steps.rs` for reserved properties - Generates optimized filter code with correct Value types - Example: `Value::Id(ID::from(val.id)).is_in(&data.node_ids)` ### 4. Fixed IS_IN Validation - Separated IS_IN from scalar boolean operations in `traversal_validation.rs` - IS_IN now correctly expects `Type::Array(Box<Type::Scalar(ft)))` arguments - Added `get_reserved_property_type()` helper for type-safe validation - Updated Node/Edge/Vector validation to check reserved properties first ## Files Changed - `helix-db/src/helixc/generator/traversal_steps.rs` - Added ReservedProp enum and Display logic - `helix-db/src/helixc/analyzer/utils.rs` - Updated gen_property_access() - `helix-db/src/helixc/analyzer/methods/traversal_validation.rs` - Fixed IS_IN validation and reserved property checking - `hql-tests/tests/is_in/` - Added test cases for IS_IN with reserved and user-defined properties ## Testing Added comprehensive IS_IN tests: ```hql QUERY GetNodes (fields: [String]) => node <- N<MyNode>::WHERE(_::{field}::IS_IN(fields)) RETURN node QUERY GetNodesByID (node_ids: [ID]) => node <- N<MyNode>::WHERE(_::{id}::IS_IN(node_ids)) RETURN node ``` **Verified**: - ✅ Queries compile successfully - ✅ Correct code generation for reserved properties (direct field access) - ✅ Correct code generation for user-defined properties (get_property) - ✅ Type safety maintained for array operations ## Generated Code Example For reserved property `id`: ```rust .filter_ref(|val, txn|{ if let Ok(val) = val { Ok(Value::Id(ID::from(val.id)).is_in(&data.node_ids)) } else { Ok(false) } }) ``` For user-defined property `field`: ```rust .filter_ref(|val, txn|{ if let Ok(val) = val { Ok(val .get_property("field") .map_or(false, |v| v.is_in(&data.fields))) } else { Ok(false) } }) ``` ## Impact - IS_IN operations now work correctly with arrays at compile-time and runtime - Reserved properties return proper Value types for IS_IN comparisons - Queries like `WHERE(_::{id}::IS_IN(node_ids))` compile and execute without panics - No breaking changes to existing functionality Co-Authored-By: ishaksebsib <[email protected]>

xav-db · 2025-10-29T12:08:19Z

@greptile

greptile-apps

_{181 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-10-29T12:11:39Z

helix-db/src/helix_gateway/worker_pool/mod.rs

+                    loop {
+                        // cont_rx.try_recv() then rx.recv()
+
+                        match cont_rx.try_recv() {
+                            Ok((ret_chan, cfn)) => {
+                                ret_chan.send(cfn().map_err(Into::into)).expect("todo")
+                            }
+                            Err(flume::TryRecvError::Disconnected) => {
+                                error!("Continuation Channel was dropped")
+                            }
+                            Err(flume::TryRecvError::Empty) => {}
+                        }
+
+                        match rx.recv() {
+                            Ok((req, ret_chan)) => request_mapper(
+                                req,
+                                ret_chan,
+                                graph_access.clone(),
+                                &router,
+                                &io_rt,
+                                &cont_tx,
+                            ),
+                            Err(flume::RecvError::Disconnected) => {
+                                error!("Request Channel was dropped")
+                            }
                        }
-                        Err(_) => error!("Continuation Channel was dropped"),
-                    })
-                    .recv(&rx, |m| match m {
-                        Ok((req, ret_chan)) => request_mapper(
-                            req,
-                            ret_chan,
-                            graph_access.clone(),
-                            &router,
-                            &io_rt,
-                            &cont_tx,
-                        ),
-                        Err(_) => error!("Request Channel was dropped"),
-                    })
-                    .wait();
+                    }


logic: Busy loop when cont_rx is empty will cause high CPU usage. After try_recv() returns Empty, the thread immediately blocks on rx.recv(), but this creates a busy-wait pattern where one channel is polled in a tight loop.

Consider adding a small yield or sleep when both channels are empty, or revert to using the selector which handles this efficiently.

Prompt To Fix With AI

This is a comment left during a code review. Path: helix-db/src/helix_gateway/worker_pool/mod.rs Line: 110:136 Comment: **logic:** Busy loop when `cont_rx` is empty will cause high CPU usage. After `try_recv()` returns `Empty`, the thread immediately blocks on `rx.recv()`, but this creates a busy-wait pattern where one channel is polled in a tight loop. Consider adding a small yield or sleep when both channels are empty, or revert to using the selector which handles this efficiently. How can I resolve this? If you propose a fix, please make it concise.

## Description  ## Related Issues  Closes # ## Checklist when merging to main  - [ ] No compiler warnings (if applicable) - [ ] Code is formatted with `rustfmt` - [ ] No useless or dead code (if applicable) - [ ] Code is easy to understand - [ ] Doc comments are used for all functions, enums, structs, and fields (where appropriate) - [ ] All tests pass - [ ] Performance has not regressed (assuming change was not to fix a bug) - [ ] Version number has been updated in `helix-cli/Cargo.toml` and `helixdb/Cargo.toml` ## Additional Notes   <h2>Greptile Overview</h2> Updated On: 2025-10-12 08:34:56 UTC <h3>Summary</h3> This PR adds MMR (Maximal Marginal Relevance) and RRF (Reciprocal Rank Fusion) rerankers to HelixDB, enabling diversity-aware and multi-source ranking fusion in search results. The implementation spans the full stack from core Rust implementations to HQL query language integration. **Major Changes:** - **Core Reranker Module** (`helix-db/src/helix_engine/reranker/`): Complete reranker infrastructure with trait-based design, enabling pluggable reranking strategies - **MMR Implementation**: Balances relevance vs. diversity with configurable lambda parameter (0.0-1.0) and multiple distance metrics (cosine, euclidean, dot product) - **RRF Implementation**: Combines multiple ranked lists using reciprocal rank fusion formula with configurable k parameter - **HQL Integration**: Full parser, analyzer, and code generator support for `::RerankRRF()` and `::RerankMMR()` query steps - **Traversal API**: Fluent `.rerank()` adapter enabling rerankers to chain naturally in traversal pipelines - **Comprehensive Tests**: 800+ lines of unit tests in core implementations plus HQL integration tests covering edge cases and parameter variations **Quality Indicators:** - Well-structured error handling with custom error types - Extensive test coverage including boundary conditions, empty inputs, and high-dimensional vectors - Clear documentation with usage examples and algorithm explanations - Consistent code style following project conventions <details><summary><h3>Important Files Changed</h3></summary> File Analysis | Filename | Score | Overview | |----------|-------|----------| | helix-db/src/helix_engine/reranker/fusion/mmr.rs | 5/5 | Implements MMR (Maximal Marginal Relevance) reranker with support for multiple distance metrics (cosine, euclidean, dot product), comprehensive test coverage, and proper error handling | | helix-db/src/helix_engine/reranker/fusion/rrf.rs | 5/5 | Implements RRF (Reciprocal Rank Fusion) reranker for combining multiple ranked lists, includes static method for multi-list fusion, comprehensive tests | | helix-db/src/helix_engine/reranker/reranker.rs | 5/5 | Defines core `Reranker` trait and helper functions (`extract_score`, `update_score`) for working with TraversalValue types across different reranker implementations | | helix-db/src/helix_engine/reranker/adapters/mod.rs | 5/5 | Provides `RerankAdapter` trait to integrate reranking into traversal iterator chains, enabling fluent API like `.rerank(&mmr_reranker, None)` | | helix-db/src/helixc/parser/graph_step_parse_methods.rs | 5/5 | Implements parser methods for `RerankRRF` and `RerankMMR` steps, handling optional parameters and distance metric parsing | | helix-db/src/helixc/analyzer/methods/traversal_validation.rs | 5/5 | Adds validation and code generation logic for reranker steps, validating parameters and generating appropriate Rust code | | helix-db/src/helixc/generator/traversal_steps.rs | 5/5 | Adds code generation structs for `RerankRRF` and `RerankMMR` steps with proper Display implementations that generate correct Rust method calls | | hql-tests/tests/rerankers/queries.hx | 5/5 | Comprehensive test queries covering RRF and MMR rerankers with various parameter combinations, chaining, and edge cases | </details> </details> <details><summary><h3>Sequence Diagram</h3></summary> ```mermaid sequenceDiagram participant User as User Query participant Parser as HQL Parser participant Analyzer as Type Analyzer participant Generator as Code Generator participant Runtime as Runtime Engine participant Reranker as Reranker (MMR/RRF) participant Storage as Vector Storage User->>Parser: SearchV<Document>(vec, 100)::RerankMMR(lambda: 0.7) Parser->>Parser: parse_step() → RerankMMR Parser->>Parser: parse_rerank_mmr() Parser->>Analyzer: AST with RerankMMR step Analyzer->>Analyzer: validate_traversal() Analyzer->>Analyzer: Validate lambda parameter Analyzer->>Analyzer: Validate distance metric Analyzer->>Generator: Generate RerankMMR code Generator->>Generator: Create MMRReranker::new() call Generator->>Generator: Generate .rerank() method Generator->>Runtime: Compiled query function User->>Runtime: Execute query with parameters Runtime->>Storage: search_v(query_vec, 100) Storage-->>Runtime: Vec<TraversalValue> (100 items) Runtime->>Reranker: .rerank(&MMRReranker, None) Reranker->>Reranker: Collect all items Reranker->>Reranker: extract_score() for each Reranker->>Reranker: Select first item (highest score) loop For remaining items Reranker->>Reranker: Calculate relevance (lambda * score) Reranker->>Reranker: Calculate max similarity to selected Reranker->>Reranker: MMR = λ*relevance - (1-λ)*similarity Reranker->>Reranker: Select item with highest MMR end Reranker->>Reranker: update_score() for reranked items Reranker-->>Runtime: Vec<TraversalValue> (reranked) Runtime-->>User: Query results (diversified) ``` </details>

xav-db added 30 commits October 7, 2025 10:16

wip

da565a5

wip initial rerankers

c706e00

finishing reranker implementaiton. just hql to go now

e14a928

wip for ql

70ef081

clippy fixing

8f0672d

fixing issue with imports

791967a

trying to fix imports

bc75958

fixing hql missing config

3c85674

fixing

a1083e0

testing phase 1. internals and storage.

5dc4a57

phase 2.1 implemented parser

9f0717a

phase 2.2, 2.3 analyzer and generator

eb3e4d3

phase 3

45bef6a

refactoring tests

edef1bd

removing todos and basic refactor

5828175

cli tests

7f8e224

clone removal

8abd3e5

adding bumpalo

256d498

making everything work properly with lifetimes and bumpalo

391f159

changing how it works with storage core

6e974b2

completed integration with bump alo

982c0bc

implementing native endianness and migration function

d2b726c

adding one hop

6cbf657

fixing

db5dbd5

adding

82bbe94

adding sindex

7c11b79

adding prints

2a6f23a

adding no input one hop

d52e1f3

fixing

a95f2fc

adding more prints

68621d9

matthewsanetra and others added 25 commits October 22, 2025 19:08

replace match with map_or_else

61948c0

fixing tests

90f0705

fix uuid test

677e3e1

correctly use options for backwards compatibility

ab82c9a

everything works

ac43b33

Merge branch 'arena-implementation' of https://github.com/HelixDB/hel…

5a3e73a

…ix-db into arena-implementation

compiler implementation working 90%

9a15d84

all tests pass

15040f8

fixing value

8c9a758

fixing n from index

a55280d

pushing

cac4ee9

fixing query language for return types.

54e61a7

fixing everything in hql

1a40376

Merge branch 'dev' of https://github.com/HelixDB/helix-db into arena-…

7705641

…implementation

removing queries

647ad2a

sorting issues with serialization

7d87da3

fixing clippy issues

b146b83

fixint is in

5476b26

removing queries

e5b2504

greptile-apps bot reviewed Oct 29, 2025

View reviewed changes

xav-db and others added 3 commits October 29, 2025 14:01

fixing reranking

3d8836c

fixing reranker tests

175dfed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Arena implementation #665

Arena implementation #665

Uh oh!

xav-db commented Oct 20, 2025 •

edited

Loading

Uh oh!

xav-db commented Oct 29, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Arena implementation #665

Are you sure you want to change the base?

Arena implementation #665

Uh oh!

Conversation

xav-db commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Checklist when merging to main

Additional Notes

Uh oh!

xav-db commented Oct 29, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xav-db commented Oct 20, 2025 •

edited

Loading