Zero-dependency BM25 search engine with RRF fusion — extracted from MisakaNet.
- Pure Python, stdlib only
- BM25 ranking with configurable k1/b
- Metadata-weighted scoring
- RRF (Reciprocal Rank Fusion) for multi-query fusion
- CJK-aware tokenization
- Works in air-gapped environments
pip install misakanet-corefrom misakanet_core import BM25, ScoredDocument, tokenize, rrf
# Prepare corpus
docs = [
ScoredDocument("doc1", tokenize("the cat sat on the mat")),
ScoredDocument("doc2", tokenize("the dog sat on the log")),
ScoredDocument("doc3", tokenize("cats and dogs are friends")),
]
# Build index and search
engine = BM25(docs)
results = engine.search("cat dog", top_k=5)
for result in results:
print(f"{result.doc_id}: {result.score:.4f}")
# Multi-query fusion with RRF
from misakanet_core import SearchResult, rrf
query1 = engine.search("cat")
query2 = engine.search("dog")
fused = rrf([query1, query2], top_k=3)| misakanet-core | elasticsearch | tantivy | whoosh | |
|---|---|---|---|---|
| Dependencies | Zero | JVM | Rust toolchain | Pure Python |
| Install time | 0.5s | 5min+ | 2min+ | 2s |
| Air-gapped | ✅ | ❌ | ❌ | ✅ |
| CJK support | ✅ | ✅ |
MIT