-
-
Notifications
You must be signed in to change notification settings - Fork 136
Arena implementation #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Arena implementation #665
Conversation
…ix-db into arena-implementation
…implementation
This integrates the fix from PR #667 (#667) which adds an .id() method to the ID struct to resolve compilation errors when using AddE with borrowed IDs in FOR loops. Changes: - Added ID.id() method returning u128 in helix-db/src/utils/id.rs - Added test case for AddE with borrowed IDs from FOR loop destructuring The helixc code generator consistently uses .id() to access ID values. Before this change, calling .id() on &ID (borrowed ID) would fail. The new method leverages Rust's auto-deref so both ID.id() and &ID.id() work seamlessly. This fixes queries like: FOR {from_id, to_id, ...} IN edges { AddE<MyEdge>::From(from_id)::To(to_id) } Co-Authored-By: ishaksebsib <[email protected]>
…roperty handling ## Changes ### 1. Added Reserved Property Support - Created `ReservedProp` enum for built-in properties (id, label, version, etc.) - Added `Step::ReservedPropertyAccess` variant to generate direct field access - Reserved properties now return correct Value types (e.g., Value::Id for id field) ### 2. Updated Property Access Generator - Modified `gen_property_access()` to identify reserved vs user-defined properties - Reserved properties use direct struct field access - User-defined properties continue using `get_property()` HashMap lookup ### 3. Enhanced Boolean Operation Optimization - Updated `WhereRef` Display implementation for reserved property optimization - Generates `Value::Id(ID::from(val.id))` for id field in filters - Maintains `get_property()` optimization for user-defined properties ### 4. Fixed IS_IN Validation - Separated IS_IN from scalar boolean operations - IS_IN now correctly expects `Type::Array` arguments with scalar elements - Added `get_reserved_property_type()` helper for type validation - Updated Node/Edge/Vector validation to check reserved properties first ### 5. Testing - Added IS_IN test with both reserved (id) and user-defined (field) properties - Verified correct code generation and type safety ## Impact Resolves the issues from original PR #666: - IS_IN with arrays now validates at compile-time - Reserved property access returns correct Value types for IS_IN operations - Queries like `WHERE(_::{id}::IS_IN(node_ids))` now compile and execute correctly Co-Authored-By: ishaksebsib <[email protected]>
This fix resolves runtime errors when attempting to DROP non-existent nodes, edges, or vectors. Previously, empty traversals would throw "Conversion error: Incorrect Type: Empty" errors. Added TraversalValue::Empty => Ok(()) case to make DROP operations idempotent - dropping non-existent items now succeeds silently. Integrates PR #670 into the arena-implementation branch. Co-Authored-By: ishaksebsib
## Summary This PR integrates the fix from #670 into the `arena-implementation` branch. It resolves runtime errors when attempting to DROP non-existent nodes, edges, or vectors. ## Problem Queries attempting to drop non-existent items failed with: ``` Conversion error: Incorrect Type: Empty ``` This occurred because `drop_traversal` did not handle `TraversalValue::Empty` returned when querying for non-existent items. ## Solution Added `TraversalValue::Empty => Ok(())` case in the match statement of `drop_traversal` in `helix-db/src/helix_engine/traversal_core/ops/util/drop.rs:51`. This treats empty traversals as successful no-ops, making DROP operations idempotent. ## Testing - ✅ `cargo check` passes with no errors - ✅ All 109 `helix_engine` tests pass, including all DROP-related tests ## Related - Integrates fix from PR #670 - No changes needed to helixc generator - the fix automatically applies to generated code Co-Authored-By: ishaksebsib
## Summary This PR integrates the fix from #667 into the arena-implementation branch. It adds an .id() method to the ID struct to resolve compilation errors when using AddE with borrowed IDs in FOR loops. ## Changes - Added ID.id() method in helix-db/src/utils/id.rs returning u128 directly - Added test case in hql-tests/tests/add_e_borrowed_ids/ validating the fix ## Technical Details The helixc code generator consistently generates .id() calls to access ID values. Before this change, calling .id() on &ID (borrowed ID) would fail. This fix leverages Rust's auto-deref so both ID.id() and &ID.id() work seamlessly. ## Testing ✅ cargo check --workspace passes ✅ All 27 ID struct unit tests pass ✅ New test case compiles successfully Co-Authored-By: ishaksebsib
#674) ## Summary This PR integrates the fixes from PR #666 into the current `arena-implementation` branch. PR #666 originally fixed IS_IN validation and runtime handling for reserved properties, but the current implementation has diverged significantly from when that PR was created. ## Problem **Analyzer Issue**: The type validator incorrectly grouped `IS_IN` with other boolean operations, expecting scalar arguments instead of arrays. This caused compilation errors for queries like `nodes::WHERE(_::{id}::IS_IN(node_ids))` when `node_ids` is `[ID]`. **Runtime Issue**: The current implementation removed `filterable.rs` and its special handling for reserved properties (id, label, etc.). Reserved properties are now struct fields, not in the properties HashMap, so `get_property("id")` returns `None`. **Root Cause**: 1. IS_IN validation expects scalars instead of arrays 2. Reserved properties lost special handling during refactoring 3. Type mismatch between `Value::String` and `Value::Id` for ID operations ## Solution ### 1. Added Reserved Property Support - Created `ReservedProp` enum for built-in properties: Id, Label, Version, FromNode, ToNode, Deleted, Level, Distance, Data - Added `Step::ReservedPropertyAccess` variant to generate direct field access - Reserved properties now return correct Value types (e.g., `Value::Id` for id field) ### 2. Updated Property Access Generator - Modified `gen_property_access()` in `analyzer/utils.rs` to identify reserved vs user-defined properties - Reserved properties use direct struct field access: `Value::Id(ID::from(val.id))` - User-defined properties continue using `get_property()` HashMap lookup ### 3. Enhanced Boolean Operation Optimization - Updated `WhereRef` Display implementation in `traversal_steps.rs` for reserved properties - Generates optimized filter code with correct Value types - Example: `Value::Id(ID::from(val.id)).is_in(&data.node_ids)` ### 4. Fixed IS_IN Validation - Separated IS_IN from scalar boolean operations in `traversal_validation.rs` - IS_IN now correctly expects `Type::Array(Box<Type::Scalar(ft)))` arguments - Added `get_reserved_property_type()` helper for type-safe validation - Updated Node/Edge/Vector validation to check reserved properties first ## Files Changed - `helix-db/src/helixc/generator/traversal_steps.rs` - Added ReservedProp enum and Display logic - `helix-db/src/helixc/analyzer/utils.rs` - Updated gen_property_access() - `helix-db/src/helixc/analyzer/methods/traversal_validation.rs` - Fixed IS_IN validation and reserved property checking - `hql-tests/tests/is_in/` - Added test cases for IS_IN with reserved and user-defined properties ## Testing Added comprehensive IS_IN tests: ```hql QUERY GetNodes (fields: [String]) => node <- N<MyNode>::WHERE(_::{field}::IS_IN(fields)) RETURN node QUERY GetNodesByID (node_ids: [ID]) => node <- N<MyNode>::WHERE(_::{id}::IS_IN(node_ids)) RETURN node ``` **Verified**: - ✅ Queries compile successfully - ✅ Correct code generation for reserved properties (direct field access) - ✅ Correct code generation for user-defined properties (get_property) - ✅ Type safety maintained for array operations ## Generated Code Example For reserved property `id`: ```rust .filter_ref(|val, txn|{ if let Ok(val) = val { Ok(Value::Id(ID::from(val.id)).is_in(&data.node_ids)) } else { Ok(false) } }) ``` For user-defined property `field`: ```rust .filter_ref(|val, txn|{ if let Ok(val) = val { Ok(val .get_property("field") .map_or(false, |v| v.is_in(&data.fields))) } else { Ok(false) } }) ``` ## Impact - IS_IN operations now work correctly with arrays at compile-time and runtime - Reserved properties return proper Value types for IS_IN comparisons - Queries like `WHERE(_::{id}::IS_IN(node_ids))` compile and execute without panics - No breaking changes to existing functionality Co-Authored-By: ishaksebsib <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
181 files reviewed, 1 comment
| loop { | ||
| // cont_rx.try_recv() then rx.recv() | ||
|
|
||
| match cont_rx.try_recv() { | ||
| Ok((ret_chan, cfn)) => { | ||
| ret_chan.send(cfn().map_err(Into::into)).expect("todo") | ||
| } | ||
| Err(flume::TryRecvError::Disconnected) => { | ||
| error!("Continuation Channel was dropped") | ||
| } | ||
| Err(flume::TryRecvError::Empty) => {} | ||
| } | ||
|
|
||
| match rx.recv() { | ||
| Ok((req, ret_chan)) => request_mapper( | ||
| req, | ||
| ret_chan, | ||
| graph_access.clone(), | ||
| &router, | ||
| &io_rt, | ||
| &cont_tx, | ||
| ), | ||
| Err(flume::RecvError::Disconnected) => { | ||
| error!("Request Channel was dropped") | ||
| } | ||
| } | ||
| Err(_) => error!("Continuation Channel was dropped"), | ||
| }) | ||
| .recv(&rx, |m| match m { | ||
| Ok((req, ret_chan)) => request_mapper( | ||
| req, | ||
| ret_chan, | ||
| graph_access.clone(), | ||
| &router, | ||
| &io_rt, | ||
| &cont_tx, | ||
| ), | ||
| Err(_) => error!("Request Channel was dropped"), | ||
| }) | ||
| .wait(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Busy loop when cont_rx is empty will cause high CPU usage. After try_recv() returns Empty, the thread immediately blocks on rx.recv(), but this creates a busy-wait pattern where one channel is polled in a tight loop.
Consider adding a small yield or sleep when both channels are empty, or revert to using the selector which handles this efficiently.
Prompt To Fix With AI
This is a comment left during a code review.
Path: helix-db/src/helix_gateway/worker_pool/mod.rs
Line: 110:136
Comment:
**logic:** Busy loop when `cont_rx` is empty will cause high CPU usage. After `try_recv()` returns `Empty`, the thread immediately blocks on `rx.recv()`, but this creates a busy-wait pattern where one channel is polled in a tight loop.
Consider adding a small yield or sleep when both channels are empty, or revert to using the selector which handles this efficiently.
How can I resolve this? If you propose a fix, please make it concise.## Description
<!-- Provide a brief description of the changes in this PR -->
## Related Issues
<!-- Link to any related issues using #issue_number -->
Closes #
## Checklist when merging to main
<!-- Mark items with "x" when completed -->
- [ ] No compiler warnings (if applicable)
- [ ] Code is formatted with `rustfmt`
- [ ] No useless or dead code (if applicable)
- [ ] Code is easy to understand
- [ ] Doc comments are used for all functions, enums, structs, and
fields (where appropriate)
- [ ] All tests pass
- [ ] Performance has not regressed (assuming change was not to fix a
bug)
- [ ] Version number has been updated in `helix-cli/Cargo.toml` and
`helixdb/Cargo.toml`
## Additional Notes
<!-- Add any additional information that would be helpful for reviewers
-->
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
Updated On: 2025-10-12 08:34:56 UTC
<h3>Summary</h3>
This PR adds MMR (Maximal Marginal Relevance) and RRF (Reciprocal Rank
Fusion) rerankers to HelixDB, enabling diversity-aware and multi-source
ranking fusion in search results. The implementation spans the full
stack from core Rust implementations to HQL query language integration.
**Major Changes:**
- **Core Reranker Module** (`helix-db/src/helix_engine/reranker/`):
Complete reranker infrastructure with trait-based design, enabling
pluggable reranking strategies
- **MMR Implementation**: Balances relevance vs. diversity with
configurable lambda parameter (0.0-1.0) and multiple distance metrics
(cosine, euclidean, dot product)
- **RRF Implementation**: Combines multiple ranked lists using
reciprocal rank fusion formula with configurable k parameter
- **HQL Integration**: Full parser, analyzer, and code generator support
for `::RerankRRF()` and `::RerankMMR()` query steps
- **Traversal API**: Fluent `.rerank()` adapter enabling rerankers to
chain naturally in traversal pipelines
- **Comprehensive Tests**: 800+ lines of unit tests in core
implementations plus HQL integration tests covering edge cases and
parameter variations
**Quality Indicators:**
- Well-structured error handling with custom error types
- Extensive test coverage including boundary conditions, empty inputs,
and high-dimensional vectors
- Clear documentation with usage examples and algorithm explanations
- Consistent code style following project conventions
<details><summary><h3>Important Files Changed</h3></summary>
File Analysis
| Filename | Score | Overview |
|----------|-------|----------|
| helix-db/src/helix_engine/reranker/fusion/mmr.rs | 5/5 | Implements
MMR (Maximal Marginal Relevance) reranker with support for multiple
distance metrics (cosine, euclidean, dot product), comprehensive test
coverage, and proper error handling |
| helix-db/src/helix_engine/reranker/fusion/rrf.rs | 5/5 | Implements
RRF (Reciprocal Rank Fusion) reranker for combining multiple ranked
lists, includes static method for multi-list fusion, comprehensive tests
|
| helix-db/src/helix_engine/reranker/reranker.rs | 5/5 | Defines core
`Reranker` trait and helper functions (`extract_score`, `update_score`)
for working with TraversalValue types across different reranker
implementations |
| helix-db/src/helix_engine/reranker/adapters/mod.rs | 5/5 | Provides
`RerankAdapter` trait to integrate reranking into traversal iterator
chains, enabling fluent API like `.rerank(&mmr_reranker, None)` |
| helix-db/src/helixc/parser/graph_step_parse_methods.rs | 5/5 |
Implements parser methods for `RerankRRF` and `RerankMMR` steps,
handling optional parameters and distance metric parsing |
| helix-db/src/helixc/analyzer/methods/traversal_validation.rs | 5/5 |
Adds validation and code generation logic for reranker steps, validating
parameters and generating appropriate Rust code |
| helix-db/src/helixc/generator/traversal_steps.rs | 5/5 | Adds code
generation structs for `RerankRRF` and `RerankMMR` steps with proper
Display implementations that generate correct Rust method calls |
| hql-tests/tests/rerankers/queries.hx | 5/5 | Comprehensive test
queries covering RRF and MMR rerankers with various parameter
combinations, chaining, and edge cases |
</details>
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User as User Query
participant Parser as HQL Parser
participant Analyzer as Type Analyzer
participant Generator as Code Generator
participant Runtime as Runtime Engine
participant Reranker as Reranker (MMR/RRF)
participant Storage as Vector Storage
User->>Parser: SearchV<Document>(vec, 100)::RerankMMR(lambda: 0.7)
Parser->>Parser: parse_step() → RerankMMR
Parser->>Parser: parse_rerank_mmr()
Parser->>Analyzer: AST with RerankMMR step
Analyzer->>Analyzer: validate_traversal()
Analyzer->>Analyzer: Validate lambda parameter
Analyzer->>Analyzer: Validate distance metric
Analyzer->>Generator: Generate RerankMMR code
Generator->>Generator: Create MMRReranker::new() call
Generator->>Generator: Generate .rerank() method
Generator->>Runtime: Compiled query function
User->>Runtime: Execute query with parameters
Runtime->>Storage: search_v(query_vec, 100)
Storage-->>Runtime: Vec<TraversalValue> (100 items)
Runtime->>Reranker: .rerank(&MMRReranker, None)
Reranker->>Reranker: Collect all items
Reranker->>Reranker: extract_score() for each
Reranker->>Reranker: Select first item (highest score)
loop For remaining items
Reranker->>Reranker: Calculate relevance (lambda * score)
Reranker->>Reranker: Calculate max similarity to selected
Reranker->>Reranker: MMR = λ*relevance - (1-λ)*similarity
Reranker->>Reranker: Select item with highest MMR
end
Reranker->>Reranker: update_score() for reranked items
Reranker-->>Runtime: Vec<TraversalValue> (reranked)
Runtime-->>User: Query results (diversified)
```
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Description
Related Issues
Closes #670 #666 #667 #672 #668 #661 #655 #654 #652 #436
Checklist when merging to main
rustfmthelix-cli/Cargo.tomlandhelixdb/Cargo.tomlAdditional Notes