Skip to content

Polarity axis analysis performance optimization for large graphs #180

@aaronsb

Description

@aaronsb

Summary

When running analyze_polarity_axis on graphs with 1500+ concepts, auto-discovery can be slow due to embedding distance calculations across the full concept space.

Observed Behavior

  • Graph size: 1,550 concepts, 8,988 relationships
  • Polarity axis with auto_discover: true, max_candidates: 30 completed but took noticeable time
  • The analysis itself returned valuable results (18 concepts projected onto axis)

Proposed Optimizations

1. Bounded Neighborhood Discovery

Instead of searching all concepts, limit auto-discovery to:

  • N-hop graph neighbors of the poles (e.g., 2-3 hops)
  • Concepts with existing relationships to either pole
  • Pre-filter by ontology if specified

2. Early Termination

  • Stop discovery once max_candidates reached with high-confidence matches
  • Use similarity threshold to skip distant concepts early

3. Batch Embedding Comparison

  • If comparing against many candidates, batch the cosine similarity calculations
  • Consider using HNSW index for approximate nearest neighbor if available

4. Caching Pole Embeddings

  • Cache the axis vector (positive_embedding - negative_embedding) for reuse
  • Pre-compute axis projections for frequently-used pole pairs

Context

Polarity axis analysis (ADR-070) is a powerful feature for exploring semantic dimensions in the graph. With the Architecture-ADRs ontology (115 files → 1550 concepts), we demonstrated:

  • Binary Classification (-27% grounding) ↔ Probabilistic Values (+21% grounding)
  • Neo4j (-1% to 0%) ↔ Apache AGE (+2%, 100 related concepts)

The epistemic grounding correctly captured architectural evolution - deprecated approaches have negative grounding, current approaches have positive grounding.

Acceptance Criteria

  • Auto-discovery completes in <3 seconds for graphs up to 5,000 concepts
  • Add discovery_strategy parameter: "neighborhood" (default), "global", "index"
  • Document performance characteristics in ADR-070

Related

  • ADR-070: Polarity Axis Triangulation
  • ADR-055: HNSW Vector Indexing (could be leveraged for candidate discovery)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions