Embedded vector database + living context engine
Part of Hawky.ai — AI-Native Digital Marketing OS
Feather DB is an embedded vector database and living context engine — zero-server, file-based, with a built-in knowledge graph and adaptive memory decay. No separate database server required.
| Capability | Description |
|---|---|
| ANN Search | Sub-millisecond approximate nearest-neighbor search via HNSW |
| Multimodal Pockets | Text, image, audio vectors stored per entity under a single ID |
| Context Graph | Typed + weighted edges, reverse index, auto-link by similarity |
| Living Context | Recall-count-based sticky memory — frequently accessed items resist decay |
| Namespace / Entity / Attributes | Generic partition + subject + KV metadata for any domain |
| Graph Visualizer | Self-contained D3 force-graph HTML — fully offline, no CDN |
| Single-file persistence | .feather binary format (v5); v3/v4 files load transparently |
pip install feather-dbCLI (Rust):
cargo install feather-db-cliBuild from source:
git clone https://github.com/feather-store/feather
cd feather
python setup.py build_ext --inplaceimport feather_db
import numpy as np
# Open or create a database
db = feather_db.DB.open("context.feather", dim=768)
# Add a vector with metadata
meta = feather_db.Metadata()
meta.content = "User prefers dark mode"
meta.importance = 0.9
db.add(id=1, vec=np.random.rand(768).astype(np.float32), meta=meta)
# Semantic search
results = db.search(np.random.rand(768).astype(np.float32), k=5)
for r in results:
print(r.id, r.score, r.metadata.content)
db.save()Each named modality gets its own independent HNSW index with its own dimensionality. A single entity ID can hold text, visual, and audio vectors simultaneously.
db.add(id=42, vec=text_vec, modality="text") # 768-dim
db.add(id=42, vec=image_vec, modality="visual") # 512-dim
db.add(id=42, vec=audio_vec, modality="audio") # 256-dim
results = db.search(query_vec, k=10, modality="visual")Typed, weighted edges between records. Nine built-in relationship types plus free-form strings.
from feather_db import RelType
# Link records with typed relationships
db.link(from_id=1, to_id=2, rel_type=RelType.CAUSED_BY, weight=0.9)
db.link(from_id=1, to_id=3, rel_type=RelType.SUPPORTS, weight=0.7)
# Query graph structure
edges = db.get_edges(1) # outgoing edges
incoming = db.get_incoming(2) # reverse index
# Auto-create edges by vector similarity
db.auto_link(modality="text", threshold=0.85, rel_type=RelType.RELATED_TO)Built-in relationship types: related_to, derived_from, caused_by, contradicts, supports, precedes, part_of, references, multimodal_of.
One call that combines semantic vector search with n-hop BFS graph traversal:
result = db.context_chain(
query=query_vec,
k=5, # seed nodes from vector search
hops=2, # BFS graph expansion depth
modality="text"
)
for node in result.nodes:
print(node.id, node.score, node.hop_distance)
for edge in result.edges:
print(edge.source_id, "->", edge.target_id, edge.rel_type)Score = similarity × hop_decay × importance × stickiness
Generic partitioning for multi-tenant, multi-domain use:
from feather_db import FilterBuilder, MarketingProfile
# Build metadata with domain profile
profile = feather_db.MarketingProfile()
profile.set_brand("nike")
profile.set_user("user_8821")
profile.set_channel("instagram")
profile.set_ctr(0.045)
meta = profile.to_metadata()
db.add(id=100, vec=vec, meta=meta)
# Filter by namespace + entity + attribute
f = FilterBuilder().namespace("nike").entity("user_8821").attribute("channel", "instagram").build()
results = db.search(query_vec, k=10, filter=f)Works for any domain — healthcare, e-commerce, finance — by subclassing DomainProfile.
Records accessed more frequently resist temporal decay:
from feather_db import ScoringConfig
cfg = ScoringConfig(half_life=30.0, weight=0.3, min=0.0)
results = db.search(query_vec, k=10, scoring=cfg)Formula:
stickiness = 1 + log(1 + recall_count)
effective_age = age_in_days / stickiness
recency = 0.5 ^ (effective_age / half_life_days)
final_score = ((1 - time_weight) * similarity + time_weight * recency) * importance
touch() is called automatically on every search hit. Call db.touch(id) manually to boost salience.
Exports a self-contained, offline D3 force-graph HTML — no CDN, no server:
from feather_db.graph import visualize, export_graph
# Interactive HTML force graph
visualize(db, output_path="/tmp/graph.html")
# JSON for D3 / Cytoscape (namespace-filtered)
data = export_graph(db, namespace_filter="nike")# D3 / Cytoscape-compatible JSON
json_str = db.export_graph_json(namespace_filter="nike", entity_filter="user_8821")
# Raw vector retrieval
vec = db.get_vector(id=42, modality="text")
ids = db.get_all_ids(modality="visual")
# Metadata update without touching HNSW index
db.update_metadata(id=42, meta=new_meta)
db.update_importance(id=42, importance=0.95)from feather_db import FilterBuilder
results = db.search(
query_vec, k=10,
filter=FilterBuilder()
.namespace("nike")
.entity("user_8821")
.attribute("channel", "instagram")
.source("pipeline-v1")
.importance_gte(0.5)
.build()
)meta = feather_db.Metadata()
meta.timestamp = int(time.time()) # Unix timestamp
meta.importance = 0.9 # [0.0–1.0]
meta.type = feather_db.ContextType.FACT # FACT | PREFERENCE | EVENT | CONVERSATION
meta.source = "pipeline-v1"
meta.content = "Human-readable content"
meta.tags_json = '["tag1","tag2"]'
meta.namespace_id = "nike" # partition key
meta.entity_id = "user_8821" # subject key
meta.set_attribute("channel", "instagram") # safe KV setter (use this, not meta.attributes['k']=v)
val = meta.get_attribute("channel")# Add a record
feather add --db my.feather --id 1 --vec "0.1,0.2,0.3" --modality text
# Search
feather search --db my.feather --vec "0.1,0.2,0.3" --k 5
# Link two records
feather link --db my.feather --from 1 --to 2| Metric | Value |
|---|---|
| Add rate | 2,000–5,000 vectors/sec |
| Search latency (k=10) | 0.5–1.5 ms |
| Max vectors per modality | 1,000,000 (configurable) |
| HNSW params | M=16, ef_construction=200 |
| File format | Binary .feather v5 |
SIMD (AVX2/AVX512) optimizations are available in space_l2.h. Enable with -DUSE_AVX -march=native in setup.py.
[magic: 4B = "FEAT"] [version: 4B = 5]
--- Metadata Section ---
[meta_count: 4B]
for each record:
[id: 8B] [serialized Metadata including namespace/entity/attributes/edges]
--- Modality Indices Section ---
[modal_count: 4B]
for each modality:
[name_len: 2B] [name: N bytes]
[dim: 4B] [element_count: 4B]
for each element:
[id: 8B] [float32 vector: dim * 4 bytes]
v3 and v4 files load transparently — missing fields default to empty.
| File | Description |
|---|---|
examples/context_graph_demo.py |
Full context graph demo — auto-link, context_chain, D3 HTML export |
examples/marketing_living_context.py |
Multi-brand namespace/entity/attribute filtering + importance feedback |
examples/feather_inspector.py |
Local HTTP inspector — force graph, PCA scatter, edit, delete |
Run any example:
python setup.py build_ext --inplace
python3 examples/context_graph_demo.py[Generic Core — C++17]
feather::DB
├── modality_indices_ (unordered_map<string, ModalityIndex>) — one HNSW per modality
├── metadata_store_ (unordered_map<uint64_t, Metadata>) — shared metadata by ID
└── Methods: add, search, link, context_chain, auto_link, export_graph_json ...
[Python Layer]
feather_db (pybind11)
├── DB, Metadata, ContextType, ScoringConfig
├── Edge, IncomingEdge, ContextNode, ContextEdge, ContextChainResult
├── FilterBuilder — fluent search filter helper
├── DomainProfile — generic namespace/entity/attributes base class
├── MarketingProfile — digital marketing typed adapter
├── RelType — standard relationship type constants
└── graph.visualize() — D3 force-graph HTML exporter
[Rust CLI]
feather-db-cli (FFI via extern "C" from src/feather_core.cpp)
| Issue | Detail |
|---|---|
| No concurrent writes | HNSW is not thread-safe for simultaneous adds |
| No vector deletion | HNSW marks deletions; data stays until compaction |
| Max 1M vectors/modality | Hardcoded in get_or_create_index; increase max_elements to raise |
meta.attributes['k'] = v silent no-op |
pybind11 map copy; use meta.set_attribute(k, v) |
| tags_json is raw string | Tag filtering uses substring search, not proper JSON parsing |
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
See CONTRIBUTING.md for details.
MIT — see LICENSE