Skip to content

feather-store/feather

Repository files navigation

Feather DB

Embedded vector database + living context engine

Part of Hawky.ai — AI-Native Digital Marketing OS

PyPI Crates.io License: MIT Website

Feather DB is an embedded vector database and living context engine — zero-server, file-based, with a built-in knowledge graph and adaptive memory decay. No separate database server required.


What's Inside (v0.5.0)

Capability Description
ANN Search Sub-millisecond approximate nearest-neighbor search via HNSW
Multimodal Pockets Text, image, audio vectors stored per entity under a single ID
Context Graph Typed + weighted edges, reverse index, auto-link by similarity
Living Context Recall-count-based sticky memory — frequently accessed items resist decay
Namespace / Entity / Attributes Generic partition + subject + KV metadata for any domain
Graph Visualizer Self-contained D3 force-graph HTML — fully offline, no CDN
Single-file persistence .feather binary format (v5); v3/v4 files load transparently

Installation

pip install feather-db

CLI (Rust):

cargo install feather-db-cli

Build from source:

git clone https://github.com/feather-store/feather
cd feather
python setup.py build_ext --inplace

Quick Start

import feather_db
import numpy as np

# Open or create a database
db = feather_db.DB.open("context.feather", dim=768)

# Add a vector with metadata
meta = feather_db.Metadata()
meta.content = "User prefers dark mode"
meta.importance = 0.9
db.add(id=1, vec=np.random.rand(768).astype(np.float32), meta=meta)

# Semantic search
results = db.search(np.random.rand(768).astype(np.float32), k=5)
for r in results:
    print(r.id, r.score, r.metadata.content)

db.save()

Core Features

Multimodal Pockets

Each named modality gets its own independent HNSW index with its own dimensionality. A single entity ID can hold text, visual, and audio vectors simultaneously.

db.add(id=42, vec=text_vec,   modality="text")    # 768-dim
db.add(id=42, vec=image_vec,  modality="visual")  # 512-dim
db.add(id=42, vec=audio_vec,  modality="audio")   # 256-dim

results = db.search(query_vec, k=10, modality="visual")

Context Graph

Typed, weighted edges between records. Nine built-in relationship types plus free-form strings.

from feather_db import RelType

# Link records with typed relationships
db.link(from_id=1, to_id=2, rel_type=RelType.CAUSED_BY, weight=0.9)
db.link(from_id=1, to_id=3, rel_type=RelType.SUPPORTS,  weight=0.7)

# Query graph structure
edges    = db.get_edges(1)          # outgoing edges
incoming = db.get_incoming(2)       # reverse index

# Auto-create edges by vector similarity
db.auto_link(modality="text", threshold=0.85, rel_type=RelType.RELATED_TO)

Built-in relationship types: related_to, derived_from, caused_by, contradicts, supports, precedes, part_of, references, multimodal_of.

Context Chain (Vector Search + Graph Expansion)

One call that combines semantic vector search with n-hop BFS graph traversal:

result = db.context_chain(
    query=query_vec,
    k=5,           # seed nodes from vector search
    hops=2,        # BFS graph expansion depth
    modality="text"
)

for node in result.nodes:
    print(node.id, node.score, node.hop_distance)

for edge in result.edges:
    print(edge.source_id, "->", edge.target_id, edge.rel_type)

Score = similarity × hop_decay × importance × stickiness

Namespace / Entity / Attributes

Generic partitioning for multi-tenant, multi-domain use:

from feather_db import FilterBuilder, MarketingProfile

# Build metadata with domain profile
profile = feather_db.MarketingProfile()
profile.set_brand("nike")
profile.set_user("user_8821")
profile.set_channel("instagram")
profile.set_ctr(0.045)
meta = profile.to_metadata()

db.add(id=100, vec=vec, meta=meta)

# Filter by namespace + entity + attribute
f = FilterBuilder().namespace("nike").entity("user_8821").attribute("channel", "instagram").build()
results = db.search(query_vec, k=10, filter=f)

Works for any domain — healthcare, e-commerce, finance — by subclassing DomainProfile.

Living Context / Adaptive Decay

Records accessed more frequently resist temporal decay:

from feather_db import ScoringConfig

cfg = ScoringConfig(half_life=30.0, weight=0.3, min=0.0)
results = db.search(query_vec, k=10, scoring=cfg)

Formula:

stickiness    = 1 + log(1 + recall_count)
effective_age = age_in_days / stickiness
recency       = 0.5 ^ (effective_age / half_life_days)
final_score   = ((1 - time_weight) * similarity + time_weight * recency) * importance

touch() is called automatically on every search hit. Call db.touch(id) manually to boost salience.

Graph Visualization

Exports a self-contained, offline D3 force-graph HTML — no CDN, no server:

from feather_db.graph import visualize, export_graph

# Interactive HTML force graph
visualize(db, output_path="/tmp/graph.html")

# JSON for D3 / Cytoscape (namespace-filtered)
data = export_graph(db, namespace_filter="nike")

Import / Export

# D3 / Cytoscape-compatible JSON
json_str = db.export_graph_json(namespace_filter="nike", entity_filter="user_8821")

# Raw vector retrieval
vec   = db.get_vector(id=42, modality="text")
ids   = db.get_all_ids(modality="visual")

# Metadata update without touching HNSW index
db.update_metadata(id=42, meta=new_meta)
db.update_importance(id=42, importance=0.95)

Filtered Search

from feather_db import FilterBuilder

results = db.search(
    query_vec, k=10,
    filter=FilterBuilder()
        .namespace("nike")
        .entity("user_8821")
        .attribute("channel", "instagram")
        .source("pipeline-v1")
        .importance_gte(0.5)
        .build()
)

Metadata Fields

meta = feather_db.Metadata()
meta.timestamp      = int(time.time())    # Unix timestamp
meta.importance     = 0.9                 # [0.0–1.0]
meta.type           = feather_db.ContextType.FACT  # FACT | PREFERENCE | EVENT | CONVERSATION
meta.source         = "pipeline-v1"
meta.content        = "Human-readable content"
meta.tags_json      = '["tag1","tag2"]'
meta.namespace_id   = "nike"             # partition key
meta.entity_id      = "user_8821"        # subject key
meta.set_attribute("channel", "instagram")   # safe KV setter (use this, not meta.attributes['k']=v)
val = meta.get_attribute("channel")

Rust CLI

# Add a record
feather add --db my.feather --id 1 --vec "0.1,0.2,0.3" --modality text

# Search
feather search --db my.feather --vec "0.1,0.2,0.3" --k 5

# Link two records
feather link --db my.feather --from 1 --to 2

Performance

Metric Value
Add rate 2,000–5,000 vectors/sec
Search latency (k=10) 0.5–1.5 ms
Max vectors per modality 1,000,000 (configurable)
HNSW params M=16, ef_construction=200
File format Binary .feather v5

SIMD (AVX2/AVX512) optimizations are available in space_l2.h. Enable with -DUSE_AVX -march=native in setup.py.


File Format

[magic: 4B = "FEAT"] [version: 4B = 5]
--- Metadata Section ---
[meta_count: 4B]
  for each record:
    [id: 8B] [serialized Metadata including namespace/entity/attributes/edges]
--- Modality Indices Section ---
[modal_count: 4B]
  for each modality:
    [name_len: 2B] [name: N bytes]
    [dim: 4B] [element_count: 4B]
    for each element:
      [id: 8B] [float32 vector: dim * 4 bytes]

v3 and v4 files load transparently — missing fields default to empty.


Examples

File Description
examples/context_graph_demo.py Full context graph demo — auto-link, context_chain, D3 HTML export
examples/marketing_living_context.py Multi-brand namespace/entity/attribute filtering + importance feedback
examples/feather_inspector.py Local HTTP inspector — force graph, PCA scatter, edit, delete

Run any example:

python setup.py build_ext --inplace
python3 examples/context_graph_demo.py

Architecture

[Generic Core — C++17]
feather::DB
  ├── modality_indices_  (unordered_map<string, ModalityIndex>)  — one HNSW per modality
  ├── metadata_store_    (unordered_map<uint64_t, Metadata>)     — shared metadata by ID
  └── Methods: add, search, link, context_chain, auto_link, export_graph_json ...

[Python Layer]
feather_db (pybind11)
  ├── DB, Metadata, ContextType, ScoringConfig
  ├── Edge, IncomingEdge, ContextNode, ContextEdge, ContextChainResult
  ├── FilterBuilder       — fluent search filter helper
  ├── DomainProfile       — generic namespace/entity/attributes base class
  ├── MarketingProfile    — digital marketing typed adapter
  ├── RelType             — standard relationship type constants
  └── graph.visualize()   — D3 force-graph HTML exporter

[Rust CLI]
feather-db-cli (FFI via extern "C" from src/feather_core.cpp)

Known Limitations

Issue Detail
No concurrent writes HNSW is not thread-safe for simultaneous adds
No vector deletion HNSW marks deletions; data stays until compaction
Max 1M vectors/modality Hardcoded in get_or_create_index; increase max_elements to raise
meta.attributes['k'] = v silent no-op pybind11 map copy; use meta.set_attribute(k, v)
tags_json is raw string Tag filtering uses substring search, not proper JSON parsing

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Submit a pull request

See CONTRIBUTING.md for details.


License

MIT — see LICENSE


Acknowledgments

About

Embedded vector database + living context engine Part of Hawky.ai — AI-Native Digital Marketing OS

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors