Skip to content

Conversation

@xav-db
Copy link
Member

@xav-db xav-db commented Oct 20, 2025

Description

Related Issues

Closes #670 #666 #667 #672 #668 #661 #655 #654 #652 #436

Checklist when merging to main

  • No compiler warnings (if applicable)
  • Code is formatted with rustfmt
  • No useless or dead code (if applicable)
  • Code is easy to understand
  • Doc comments are used for all functions, enums, structs, and fields (where appropriate)
  • All tests pass
  • Performance has not regressed (assuming change was not to fix a bug)
  • Version number has been updated in helix-cli/Cargo.toml and helixdb/Cargo.toml

Additional Notes

matthewsanetra and others added 25 commits October 22, 2025 19:08
This integrates the fix from PR #667 (#667) which adds an .id() method to the ID struct to resolve compilation errors when using AddE with borrowed IDs in FOR loops.

Changes:
- Added ID.id() method returning u128 in helix-db/src/utils/id.rs
- Added test case for AddE with borrowed IDs from FOR loop destructuring

The helixc code generator consistently uses .id() to access ID values. Before this change, calling .id() on &ID (borrowed ID) would fail. The new method leverages Rust's auto-deref so both ID.id() and &ID.id() work seamlessly.

This fixes queries like:
  FOR {from_id, to_id, ...} IN edges {
    AddE<MyEdge>::From(from_id)::To(to_id)
  }

Co-Authored-By: ishaksebsib <[email protected]>
…roperty handling

## Changes

### 1. Added Reserved Property Support
- Created `ReservedProp` enum for built-in properties (id, label, version, etc.)
- Added `Step::ReservedPropertyAccess` variant to generate direct field access
- Reserved properties now return correct Value types (e.g., Value::Id for id field)

### 2. Updated Property Access Generator
- Modified `gen_property_access()` to identify reserved vs user-defined properties
- Reserved properties use direct struct field access
- User-defined properties continue using `get_property()` HashMap lookup

### 3. Enhanced Boolean Operation Optimization
- Updated `WhereRef` Display implementation for reserved property optimization
- Generates `Value::Id(ID::from(val.id))` for id field in filters
- Maintains `get_property()` optimization for user-defined properties

### 4. Fixed IS_IN Validation
- Separated IS_IN from scalar boolean operations
- IS_IN now correctly expects `Type::Array` arguments with scalar elements
- Added `get_reserved_property_type()` helper for type validation
- Updated Node/Edge/Vector validation to check reserved properties first

### 5. Testing
- Added IS_IN test with both reserved (id) and user-defined (field) properties
- Verified correct code generation and type safety

## Impact
Resolves the issues from original PR #666:
- IS_IN with arrays now validates at compile-time
- Reserved property access returns correct Value types for IS_IN operations
- Queries like `WHERE(_::{id}::IS_IN(node_ids))` now compile and execute correctly

Co-Authored-By: ishaksebsib <[email protected]>
This fix resolves runtime errors when attempting to DROP non-existent
nodes, edges, or vectors. Previously, empty traversals would throw
"Conversion error: Incorrect Type: Empty" errors.

Added TraversalValue::Empty => Ok(()) case to make DROP operations
idempotent - dropping non-existent items now succeeds silently.

Integrates PR #670 into the arena-implementation branch.

Co-Authored-By: ishaksebsib
## Summary

This PR integrates the fix from #670 into the `arena-implementation`
branch. It resolves runtime errors when attempting to DROP non-existent
nodes, edges, or vectors.

## Problem

Queries attempting to drop non-existent items failed with:
```
Conversion error: Incorrect Type: Empty
```

This occurred because `drop_traversal` did not handle
`TraversalValue::Empty` returned when querying for non-existent items.

## Solution

Added `TraversalValue::Empty => Ok(())` case in the match statement of
`drop_traversal` in
`helix-db/src/helix_engine/traversal_core/ops/util/drop.rs:51`.

This treats empty traversals as successful no-ops, making DROP
operations idempotent.

## Testing

- ✅ `cargo check` passes with no errors
- ✅ All 109 `helix_engine` tests pass, including all DROP-related tests

## Related

- Integrates fix from PR #670
- No changes needed to helixc generator - the fix automatically applies
to generated code

Co-Authored-By: ishaksebsib
## Summary

This PR integrates the fix from #667 into the arena-implementation
branch. It adds an .id() method to the ID struct to resolve compilation
errors when using AddE with borrowed IDs in FOR loops.

## Changes

- Added ID.id() method in helix-db/src/utils/id.rs returning u128
directly
- Added test case in hql-tests/tests/add_e_borrowed_ids/ validating the
fix

## Technical Details

The helixc code generator consistently generates .id() calls to access
ID values. Before this change, calling .id() on &ID (borrowed ID) would
fail. This fix leverages Rust's auto-deref so both ID.id() and &ID.id()
work seamlessly.

## Testing

✅ cargo check --workspace passes
✅ All 27 ID struct unit tests pass  
✅ New test case compiles successfully

Co-Authored-By: ishaksebsib
#674)

## Summary

This PR integrates the fixes from PR #666 into the current
`arena-implementation` branch. PR #666 originally fixed IS_IN validation
and runtime handling for reserved properties, but the current
implementation has diverged significantly from when that PR was created.

## Problem

**Analyzer Issue**: The type validator incorrectly grouped `IS_IN` with
other boolean operations, expecting scalar arguments instead of arrays.
This caused compilation errors for queries like
`nodes::WHERE(_::{id}::IS_IN(node_ids))` when `node_ids` is `[ID]`.

**Runtime Issue**: The current implementation removed `filterable.rs`
and its special handling for reserved properties (id, label, etc.).
Reserved properties are now struct fields, not in the properties
HashMap, so `get_property("id")` returns `None`.

**Root Cause**: 
1. IS_IN validation expects scalars instead of arrays
2. Reserved properties lost special handling during refactoring
3. Type mismatch between `Value::String` and `Value::Id` for ID
operations

## Solution

### 1. Added Reserved Property Support
- Created `ReservedProp` enum for built-in properties: Id, Label,
Version, FromNode, ToNode, Deleted, Level, Distance, Data
- Added `Step::ReservedPropertyAccess` variant to generate direct field
access
- Reserved properties now return correct Value types (e.g., `Value::Id`
for id field)

### 2. Updated Property Access Generator
- Modified `gen_property_access()` in `analyzer/utils.rs` to identify
reserved vs user-defined properties
- Reserved properties use direct struct field access:
`Value::Id(ID::from(val.id))`
- User-defined properties continue using `get_property()` HashMap lookup

### 3. Enhanced Boolean Operation Optimization
- Updated `WhereRef` Display implementation in `traversal_steps.rs` for
reserved properties
- Generates optimized filter code with correct Value types
- Example: `Value::Id(ID::from(val.id)).is_in(&data.node_ids)`

### 4. Fixed IS_IN Validation
- Separated IS_IN from scalar boolean operations in
`traversal_validation.rs`
- IS_IN now correctly expects `Type::Array(Box<Type::Scalar(ft)))`
arguments
- Added `get_reserved_property_type()` helper for type-safe validation
- Updated Node/Edge/Vector validation to check reserved properties first

## Files Changed

- `helix-db/src/helixc/generator/traversal_steps.rs` - Added
ReservedProp enum and Display logic
- `helix-db/src/helixc/analyzer/utils.rs` - Updated
gen_property_access()
- `helix-db/src/helixc/analyzer/methods/traversal_validation.rs` - Fixed
IS_IN validation and reserved property checking
- `hql-tests/tests/is_in/` - Added test cases for IS_IN with reserved
and user-defined properties

## Testing

Added comprehensive IS_IN tests:
```hql
QUERY GetNodes (fields: [String]) =>
    node <- N<MyNode>::WHERE(_::{field}::IS_IN(fields))
    RETURN node

QUERY GetNodesByID (node_ids: [ID]) =>
    node <- N<MyNode>::WHERE(_::{id}::IS_IN(node_ids))
    RETURN node
```

**Verified**:
- ✅ Queries compile successfully
- ✅ Correct code generation for reserved properties (direct field
access)
- ✅ Correct code generation for user-defined properties (get_property)
- ✅ Type safety maintained for array operations

## Generated Code Example

For reserved property `id`:
```rust
.filter_ref(|val, txn|{
    if let Ok(val) = val {
        Ok(Value::Id(ID::from(val.id)).is_in(&data.node_ids))
    } else {
        Ok(false)
    }
})
```

For user-defined property `field`:
```rust
.filter_ref(|val, txn|{
    if let Ok(val) = val {
        Ok(val
        .get_property("field")
        .map_or(false, |v| v.is_in(&data.fields)))
    } else {
        Ok(false)
    }
})
```

## Impact

- IS_IN operations now work correctly with arrays at compile-time and
runtime
- Reserved properties return proper Value types for IS_IN comparisons
- Queries like `WHERE(_::{id}::IS_IN(node_ids))` compile and execute
without panics
- No breaking changes to existing functionality

Co-Authored-By: ishaksebsib <[email protected]>
@xav-db
Copy link
Member Author

xav-db commented Oct 29, 2025

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

181 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +110 to +136
loop {
// cont_rx.try_recv() then rx.recv()

match cont_rx.try_recv() {
Ok((ret_chan, cfn)) => {
ret_chan.send(cfn().map_err(Into::into)).expect("todo")
}
Err(flume::TryRecvError::Disconnected) => {
error!("Continuation Channel was dropped")
}
Err(flume::TryRecvError::Empty) => {}
}

match rx.recv() {
Ok((req, ret_chan)) => request_mapper(
req,
ret_chan,
graph_access.clone(),
&router,
&io_rt,
&cont_tx,
),
Err(flume::RecvError::Disconnected) => {
error!("Request Channel was dropped")
}
}
Err(_) => error!("Continuation Channel was dropped"),
})
.recv(&rx, |m| match m {
Ok((req, ret_chan)) => request_mapper(
req,
ret_chan,
graph_access.clone(),
&router,
&io_rt,
&cont_tx,
),
Err(_) => error!("Request Channel was dropped"),
})
.wait();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Busy loop when cont_rx is empty will cause high CPU usage. After try_recv() returns Empty, the thread immediately blocks on rx.recv(), but this creates a busy-wait pattern where one channel is polled in a tight loop.

Consider adding a small yield or sleep when both channels are empty, or revert to using the selector which handles this efficiently.

Prompt To Fix With AI
This is a comment left during a code review.
Path: helix-db/src/helix_gateway/worker_pool/mod.rs
Line: 110:136

Comment:
**logic:** Busy loop when `cont_rx` is empty will cause high CPU usage. After `try_recv()` returns `Empty`, the thread immediately blocks on `rx.recv()`, but this creates a busy-wait pattern where one channel is polled in a tight loop.

Consider adding a small yield or sleep when both channels are empty, or revert to using the selector which handles this efficiently.

How can I resolve this? If you propose a fix, please make it concise.

xav-db and others added 3 commits October 29, 2025 14:01
## Description
<!-- Provide a brief description of the changes in this PR -->

## Related Issues
<!-- Link to any related issues using #issue_number -->

Closes #

## Checklist when merging to main
<!-- Mark items with "x" when completed -->

- [ ] No compiler warnings (if applicable)
- [ ] Code is formatted with `rustfmt`
- [ ] No useless or dead code (if applicable)
- [ ] Code is easy to understand
- [ ] Doc comments are used for all functions, enums, structs, and
fields (where appropriate)
- [ ] All tests pass
- [ ] Performance has not regressed (assuming change was not to fix a
bug)
- [ ] Version number has been updated in `helix-cli/Cargo.toml` and
`helixdb/Cargo.toml`

## Additional Notes
<!-- Add any additional information that would be helpful for reviewers
-->

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

Updated On: 2025-10-12 08:34:56 UTC

<h3>Summary</h3>

This PR adds MMR (Maximal Marginal Relevance) and RRF (Reciprocal Rank
Fusion) rerankers to HelixDB, enabling diversity-aware and multi-source
ranking fusion in search results. The implementation spans the full
stack from core Rust implementations to HQL query language integration.

**Major Changes:**
- **Core Reranker Module** (`helix-db/src/helix_engine/reranker/`):
Complete reranker infrastructure with trait-based design, enabling
pluggable reranking strategies
- **MMR Implementation**: Balances relevance vs. diversity with
configurable lambda parameter (0.0-1.0) and multiple distance metrics
(cosine, euclidean, dot product)
- **RRF Implementation**: Combines multiple ranked lists using
reciprocal rank fusion formula with configurable k parameter
- **HQL Integration**: Full parser, analyzer, and code generator support
for `::RerankRRF()` and `::RerankMMR()` query steps
- **Traversal API**: Fluent `.rerank()` adapter enabling rerankers to
chain naturally in traversal pipelines
- **Comprehensive Tests**: 800+ lines of unit tests in core
implementations plus HQL integration tests covering edge cases and
parameter variations

**Quality Indicators:**
- Well-structured error handling with custom error types
- Extensive test coverage including boundary conditions, empty inputs,
and high-dimensional vectors
- Clear documentation with usage examples and algorithm explanations
- Consistent code style following project conventions

<details><summary><h3>Important Files Changed</h3></summary>



File Analysis



| Filename | Score | Overview |
|----------|-------|----------|
| helix-db/src/helix_engine/reranker/fusion/mmr.rs | 5/5 | Implements
MMR (Maximal Marginal Relevance) reranker with support for multiple
distance metrics (cosine, euclidean, dot product), comprehensive test
coverage, and proper error handling |
| helix-db/src/helix_engine/reranker/fusion/rrf.rs | 5/5 | Implements
RRF (Reciprocal Rank Fusion) reranker for combining multiple ranked
lists, includes static method for multi-list fusion, comprehensive tests
|
| helix-db/src/helix_engine/reranker/reranker.rs | 5/5 | Defines core
`Reranker` trait and helper functions (`extract_score`, `update_score`)
for working with TraversalValue types across different reranker
implementations |
| helix-db/src/helix_engine/reranker/adapters/mod.rs | 5/5 | Provides
`RerankAdapter` trait to integrate reranking into traversal iterator
chains, enabling fluent API like `.rerank(&mmr_reranker, None)` |
| helix-db/src/helixc/parser/graph_step_parse_methods.rs | 5/5 |
Implements parser methods for `RerankRRF` and `RerankMMR` steps,
handling optional parameters and distance metric parsing |
| helix-db/src/helixc/analyzer/methods/traversal_validation.rs | 5/5 |
Adds validation and code generation logic for reranker steps, validating
parameters and generating appropriate Rust code |
| helix-db/src/helixc/generator/traversal_steps.rs | 5/5 | Adds code
generation structs for `RerankRRF` and `RerankMMR` steps with proper
Display implementations that generate correct Rust method calls |
| hql-tests/tests/rerankers/queries.hx | 5/5 | Comprehensive test
queries covering RRF and MMR rerankers with various parameter
combinations, chaining, and edge cases |

</details>


</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User as User Query
    participant Parser as HQL Parser
    participant Analyzer as Type Analyzer
    participant Generator as Code Generator
    participant Runtime as Runtime Engine
    participant Reranker as Reranker (MMR/RRF)
    participant Storage as Vector Storage

    User->>Parser: SearchV<Document>(vec, 100)::RerankMMR(lambda: 0.7)
    Parser->>Parser: parse_step() → RerankMMR
    Parser->>Parser: parse_rerank_mmr()
    Parser->>Analyzer: AST with RerankMMR step
    
    Analyzer->>Analyzer: validate_traversal()
    Analyzer->>Analyzer: Validate lambda parameter
    Analyzer->>Analyzer: Validate distance metric
    Analyzer->>Generator: Generate RerankMMR code
    
    Generator->>Generator: Create MMRReranker::new() call
    Generator->>Generator: Generate .rerank() method
    Generator->>Runtime: Compiled query function
    
    User->>Runtime: Execute query with parameters
    Runtime->>Storage: search_v(query_vec, 100)
    Storage-->>Runtime: Vec<TraversalValue> (100 items)
    
    Runtime->>Reranker: .rerank(&MMRReranker, None)
    Reranker->>Reranker: Collect all items
    Reranker->>Reranker: extract_score() for each
    Reranker->>Reranker: Select first item (highest score)
    
    loop For remaining items
        Reranker->>Reranker: Calculate relevance (lambda * score)
        Reranker->>Reranker: Calculate max similarity to selected
        Reranker->>Reranker: MMR = λ*relevance - (1-λ)*similarity
        Reranker->>Reranker: Select item with highest MMR
    end
    
    Reranker->>Reranker: update_score() for reranked items
    Reranker-->>Runtime: Vec<TraversalValue> (reranked)
    Runtime-->>User: Query results (diversified)
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants