Skip to content

konjoai/kyro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Kyro

Production RAG pipeline with hybrid retrieval, reranking, and RAGAS evaluation. No vendor lock-in β€” plug in OpenAI, Anthropic, or a local Squish server.

Planning Docs

  • PLAN.md β€” implementation checklist and sprint gates
  • KORE_PLAN.md β€” strategic roadmap and market analysis
  • kyro_production_plan.md β€” production rollout and operations plan

Architecture

Documents (PDF/MD/HTML/code)
        β”‚
        β–Ό
    Ingest & Chunk      RecursiveChunker | SentenceWindowChunker
        β”‚
        β–Ό
    Embed               sentence-transformers β†’ float32 (384–1536d)
        β”‚
        β–Ό
    Qdrant Store        cosine similarity index
        β”‚
    β”Œβ”€β”€β”€β”΄β”€β”€β”€β”
 Dense   Sparse         HNSW search + BM25 (rank-bm25)
    β””β”€β”€β”€β”¬β”€β”€β”€β”˜
        β”‚  Reciprocal Rank Fusion (Ξ±=0.7)
        β–Ό
    Rerank              cross-encoder/ms-marco-MiniLM-L-6-v2
        β”‚
        β–Ό
    Generate            OpenAI | Anthropic | Squish
        β”‚
        β–Ό
    Evaluate            RAGAS: faithfulness / relevancy / precision / recall

Quickstart

git clone https://github.com/konjoai/kyro.git
cd kyro
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip install -e .

cp .env.example .env
# edit .env β€” set OPENAI_API_KEY and QDRANT_URL

# Start Qdrant (Docker)
docker compose -f docker/docker-compose.yml up qdrant -d

# Ingest a directory
konjoai ingest docs/

# Ask a question
konjoai query "What is the main architecture?"

# Start the API server
konjoai serve

CLI

konjoai ingest <path>     Ingest files/dirs into vector store
konjoai query  <question> Retrieve and answer using indexed documents
konjoai serve             Start FastAPI server (default :8000)
konjoai status            Show collection stats

API

Method Path Description
POST /ingest Ingest a file or directory
POST /query RAG query with optional decomposition + CRAG + Self-RAG reflective critique
POST /agent/query Bounded ReAct-style agent query with step trace (steps[])
POST /eval RAGAS evaluation over QA samples
GET /health Collection health + document count
GET /metrics Prometheus exposition (requires otel_enabled=true + pip install prometheus-client)

Multi-tenancy is off by default. Enable with MULTI_TENANCY_ENABLED=true and JWT_SECRET_KEY=<secret>. When enabled, every request must include an Authorization: Bearer <jwt> header; the sub claim is used as tenant_id to scope Qdrant reads and writes.

# Install JWT dep
pip install PyJWT>=2.8

# Example: query as tenant "acme-corp"
TOKEN=$(python3 -c "import jwt, time; print(jwt.encode({'sub':'acme-corp','exp':int(time.time())+3600}, 'my-secret', algorithm='HS256'))")
curl -s -X POST http://localhost:8000/query \
    -H "Authorization: Bearer $TOKEN" \
    -H "Content-Type: application/json" \
    -d '{"question":"What is the refund policy?"}'

Docs at http://localhost:8000/docs after konjoai serve.

Python SDK

from konjoai.sdk import KonjoClient

client = KonjoClient("http://localhost:8000", api_key="sk-...")

# RAG query
response = client.query("What is the refund policy?", top_k=5)
print(response.answer)
for src in response.sources:
    print(f"  {src.source}  score={src.score:.3f}")

# Streaming tokens
for chunk in client.query_stream("Summarise the onboarding guide"):
    print(chunk.text, end="", flush=True)

# Ingest a directory
result = client.ingest("/path/to/docs")
print(f"Indexed {result.chunks_indexed} chunks from {result.sources_processed} sources")

# Health check
health = client.health()
print(health.status, health.vector_count)

# ReAct agent
agent_result = client.agent_query("Find all compliance requirements", max_steps=5)
print(agent_result.answer)

MCP Server

Expose Kyro to any MCP-compatible agent (Claude Desktop, etc.):

# Install optional MCP transport
pip install mcp

# Run the MCP server over stdio
python -m konjoai.mcp --base-url http://localhost:8000 --api-key sk-...

Available tools: kyro_query, kyro_ingest, kyro_health, kyro_agent_query.

CRAG and Self-RAG can be enabled per request with request body flags, or with headers:

curl -s -X POST http://localhost:8000/query \
    -H 'Content-Type: application/json' \
    -H 'use_decomposition: true' \
    -H 'use_crag: true' \
    -H 'use_self_rag: true' \
    -d '{"question":"Compare return policy and exchange policy updates by owner","top_k":5,"use_decomposition":true,"use_crag":true,"use_self_rag":true}'

When decomposition is enabled, /query includes:

  • decomposition_used
  • decomposition_sub_queries
  • decomposition_synthesis_hint

When Self-RAG is enabled, /query telemetry includes:

  • self_rag_iteration_scores (ISREL/ISSUP/ISUSE per iteration)
  • self_rag_total_tokens (cumulative generation tokens across iterations)

/agent/query is protected by request_timeout_seconds; requests exceeding this ceiling return HTTP 504.

Configuration

All settings via .env (see .env.example):

Variable Default Description
QDRANT_URL http://localhost:6333 Qdrant instance URL
EMBED_MODEL sentence-transformers/all-MiniLM-L6-v2 HuggingFace embedding model
EMBED_DEVICE cpu mps for Apple Silicon
CHUNK_STRATEGY recursive recursive | sentence_window
GENERATOR_BACKEND openai openai | anthropic | squish
OPENAI_API_KEY β€” Required for OpenAI backend
SQUISH_BASE_URL http://localhost:11434/v1 Local Squish/Ollama endpoint
REQUEST_TIMEOUT_SECONDS 30.0 Per-request timeout ceiling for API routes

Evaluation

kyro ships RAGAS gates out of the box:

konjoai serve &
curl -s -X POST http://localhost:8000/eval \
  -H 'Content-Type: application/json' \
  -d '{"samples": [{"question": "...", "answer": "...", "contexts": ["..."], "ground_truth": "..."}]}'

Target benchmarks (Weeks 3–7 gate):

  • faithfulness β‰₯ 0.75
  • answer_relevancy β‰₯ 0.80

License

MIT

About

πŸš€πŸ§  Kyro: a production RAG pipeline β€” hybrid retrieval πŸ”, reranking 🎯, RAGAS evals πŸ“Š. 🏠 Local-first with Squish + Vectro. Plug into OpenAI πŸ€–, Anthropic 🌟, or go fully offline πŸ”Œ.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages