Skip to content

Ikalus1988/misakanet-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

misakanet-core

Zero-dependency BM25 search engine with RRF fusion — extracted from MisakaNet.

  • Pure Python, stdlib only
  • BM25 ranking with configurable k1/b
  • Metadata-weighted scoring
  • RRF (Reciprocal Rank Fusion) for multi-query fusion
  • CJK-aware tokenization
  • Works in air-gapped environments

Installation

pip install misakanet-core

Usage

from misakanet_core import BM25, ScoredDocument, tokenize, rrf

# Prepare corpus
docs = [
    ScoredDocument("doc1", tokenize("the cat sat on the mat")),
    ScoredDocument("doc2", tokenize("the dog sat on the log")),
    ScoredDocument("doc3", tokenize("cats and dogs are friends")),
]

# Build index and search
engine = BM25(docs)
results = engine.search("cat dog", top_k=5)

for result in results:
    print(f"{result.doc_id}: {result.score:.4f}")

# Multi-query fusion with RRF
from misakanet_core import SearchResult, rrf
query1 = engine.search("cat")
query2 = engine.search("dog")
fused = rrf([query1, query2], top_k=3)

Why not use elasticsearch / tantivy / whoosh?

misakanet-core elasticsearch tantivy whoosh
Dependencies Zero JVM Rust toolchain Pure Python
Install time 0.5s 5min+ 2min+ 2s
Air-gapped
CJK support ⚠️ ⚠️

License

MIT

About

Zero-dependency BM25 search engine with RRF fusion

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages