-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Summary
When running analyze_polarity_axis on graphs with 1500+ concepts, auto-discovery can be slow due to embedding distance calculations across the full concept space.
Observed Behavior
- Graph size: 1,550 concepts, 8,988 relationships
- Polarity axis with
auto_discover: true, max_candidates: 30completed but took noticeable time - The analysis itself returned valuable results (18 concepts projected onto axis)
Proposed Optimizations
1. Bounded Neighborhood Discovery
Instead of searching all concepts, limit auto-discovery to:
- N-hop graph neighbors of the poles (e.g., 2-3 hops)
- Concepts with existing relationships to either pole
- Pre-filter by ontology if specified
2. Early Termination
- Stop discovery once
max_candidatesreached with high-confidence matches - Use similarity threshold to skip distant concepts early
3. Batch Embedding Comparison
- If comparing against many candidates, batch the cosine similarity calculations
- Consider using HNSW index for approximate nearest neighbor if available
4. Caching Pole Embeddings
- Cache the axis vector (positive_embedding - negative_embedding) for reuse
- Pre-compute axis projections for frequently-used pole pairs
Context
Polarity axis analysis (ADR-070) is a powerful feature for exploring semantic dimensions in the graph. With the Architecture-ADRs ontology (115 files → 1550 concepts), we demonstrated:
- Binary Classification (-27% grounding) ↔ Probabilistic Values (+21% grounding)
- Neo4j (-1% to 0%) ↔ Apache AGE (+2%, 100 related concepts)
The epistemic grounding correctly captured architectural evolution - deprecated approaches have negative grounding, current approaches have positive grounding.
Acceptance Criteria
- Auto-discovery completes in <3 seconds for graphs up to 5,000 concepts
- Add
discovery_strategyparameter:"neighborhood"(default),"global","index" - Document performance characteristics in ADR-070
Related
- ADR-070: Polarity Axis Triangulation
- ADR-055: HNSW Vector Indexing (could be leveraged for candidate discovery)
Metadata
Metadata
Assignees
Labels
No labels