Releases: irfanalidv/ragfallback
v2.2.0 — Native async retrieval + CacheMonitor
What's new
Native async retrieval
AdaptiveRAGRetriever.aquery_with_fallback() — real coroutine using LangChain ainvoke(). Enables true concurrent eval in GoldenRunner and production FastAPI backends. Falls back to thread pool automatically if the underlying model doesn't implement ainvoke.
CacheMonitor
ragfallback.tracking.CacheMonitor wraps any LangChain retriever and tracks hit rate, per-category latency (hit vs miss), TTL expiry, and LRU eviction. Zero new dependencies — stdlib only. Pass to GoldenRunner via cache_monitor= and cache stats appear in GoldenReport alongside RAGAS scores and P95 latency.
GoldenRunner upgrade
run_async() now uses native aquery_with_fallback() — 75 queries run concurrently instead of serializing through a thread pool.
Install
pip install ragfallback==2.2.0
Numbers
- 102 unit tests passing (Python 3.9 / 3.10 / 3.11)
- CI regression gate green on SQuAD golden dataset
- 21 new tests added this release
Full changelog
See CHANGELOG.md for complete details.
v2.1.0 — MLOps evaluation layer (RAGAS + CI regression gate + GoldenRunner)
What's new
ragfallback now ships a complete MLOps evaluation layer — something most RAG libraries don't include at all.
ragfallback/mlops/ — new package
GoldenRunner
Runs your retrieval pipeline against a labeled golden dataset (JSON file or list[dict]), tracks per-sample latency, computes recall@3, recall@5, and P95 latency across all samples. Fully async via asyncio.gather.
RagasHook
Wraps RAGAS evaluation — faithfulness, answer relevance, context precision, context recall. Falls back to heuristic scoring if ragas is not installed. No crash, logged warning only.
BaselineRegistry
Stores metric snapshots per dataset in a committed JSON file. compare_or_fail() raises RegressionError if any quality metric drops more than 5%, or P95 latency spikes more than 12% vs the stored baseline.
QuerySimulator
Generates adversarial query mixes from any base query set:
short_keyword— first 2 content words onlylong_nl— expanded with verbose instruction prefixambiguous— proper nouns strippedout_of_domain— completely unrelated topic injection
simulate_unhappy_paths() produces all 4 types for every input query (4× expansion).
MLflowLogger
Logs all GoldenReport fields as MLflow metrics and params. No-op if mlflow is not installed.
generate_locustfile(output_path, endpoint)
Writes a ready-to-run Locust load test file simulating realistic RAG traffic — short keyword (40%), long NL (20%), out-of-domain (10%).
CI regression gate
A new mlops-regression-gate job runs on every push to main:
- Builds golden dataset from SQuAD (CC BY-SA 4.0, no API key needed)
- Indexes passages in ChromaDB using
all-MiniLM-L6-v2(local, no API key) - Runs
GoldenRunnerasync across 20 samples - Calls
compare_or_fail()against committedexamples/baselines.json - Exits
0(pass) or1(regression detected)
Bug fixes
recall_at_know counts distinct relevant docs in top-k so duplicates cannot push recall above1.0BaselineRegistry.compare_or_failaccepts a separatelatency_thresholdparameter (default0.12) for looser P95 gating in noisy CI environments
Install
pip install ragfallback[mlops]python examples/build_golden_dataset.py
python examples/ci_regression_gate.pyFull changelog
See CHANGELOG.md
v2.0.2 — PyPI Description Fix
ragfallback v2.0.2
What's Changed
- PyPI package description updated — The long description on PyPI now accurately reflects what the library does: intelligent fallback mechanisms for RAG pipelines, including query variation, confidence scoring, cost tracking, and silent failure prevention.
No API changes. Safe to upgrade from v2.0.1.
Install
pip install ragfallback==2.0.2Or with open-source extras (recommended):
pip install ragfallback[huggingface,sentence-transformers,faiss]==2.0.2Quick Example
from ragfallback import AdaptiveRAGRetriever
retriever = AdaptiveRAGRetriever(
vector_store=vector_store,
llm=llm,
embedding_model=embeddings,
fallback_strategy="query_variations",
max_attempts=3
)
result = retriever.query_with_fallback(question="What is the revenue?")
print(result.answer) # The answer
print(result.confidence) # 0.92
print(result.cost) # $0.0000 (with open-source stack)Full Changelog: https://github.com/irfanalidv/ragfallback/blob/main/CHANGELOG.md
PyPI: https://pypi.org/project/ragfallback/2.0.2/