A focused RAG (Retrieval Augmented Generation) system for programming books that lets you query and explore your technical library using natural language.
flowchart LR
PDFs[Technical Books] --> Chunker[Text Chunker]
Chunker --> Embedder[Ollama Embedder]
Embedder --> VectorDB[(Vector DB)]
Query[User Query] --> Semantic[Semantic Search]
VectorDB --> Semantic
Semantic --> LLM[Ollama LLM]
LLM --> Response[Response]
graph TD
CLI[Command Line Interface] --> Core[Core Engine]
Core --> Indexer[Document Indexer]
Core --> Search[Search Engine]
Core --> Embedder[Embedding Service]
Core --> LLM[LLM Service]
Indexer --> Chunker[Text Chunker]
Indexer --> DocTracker[Document Tracker]
Search --> DB[(Vector Database)]
subgraph Processing Pipeline
Chunker --> Embedder
Embedder --> DB
end
subgraph Query Pipeline
Search --> LLM
DB --> Search
end
from kbol.indexer import BookIndexer
from pathlib import Path
indexer = BookIndexer()
books_dir = Path("data/books")
results = await indexer.process_books(books_dir)
print(f"Processed {len(results)} chunks")Below are powerful example queries that leverage the diverse technical and theoretical knowledge in the book collection:
# Technical Architecture Patterns
poetry run kbol query "Compare event-driven, flow-based, and microservice architectural patterns"
poetry run kbol query "What are the key metrics for evaluating microservices architecture success?"
poetry run kbol query "How does Domain-Driven Design evolve when moving from monoliths to microservices?"# Functional Programming Concepts
poetry run kbol query "Explain monads using examples from Clojure and Scala"
poetry run kbol query "Compare approaches to immutability and state management in Python vs Clojure"
poetry run kbol query "Show patterns for combining functional and reactive programming"# Cross-Domain Knowledge Synthesis
poetry run kbol query "How do behavioral science insights inform better API design?"
poetry run kbol query "What principles from critical thinking apply to software architecture?"
poetry run kbol query "How do concepts from systems thinking apply across microservices and macroeconomics?"# Practical Implementation Patterns
poetry run kbol query "Show me Clojure examples of map and reduce with real-world use cases"
poetry run kbol query "What are the best practices for testing event-driven microservices?"
poetry run kbol query "Compare Python and Clojure approaches to handling concurrency"# API Design and Evolution
poetry run kbol query "What patterns emerge for managing API evolution in microservices?"
poetry run kbol query "How do micro-frontends impact API design and management?"
poetry run kbol query "Compare REST, GraphQL, and event-driven API patterns"These queries demonstrate the system’s ability to synthesize knowledge across:
- Software architecture and systems design
- Functional and reactive programming paradigms
- Cross-disciplinary insights from philosophy and economics
- Practical development patterns and best practices
- Modern API and integration approaches
When executed, C-c C-v t will tangle these examples to scripts/example_queries.sh, creating a ready-to-use script of example queries.
# Check ML Engineering fundamentals
time poetry run kbol query "What does the book 'Machine Learning Engineering with Python' cover about MLOps and production deployment?" | tee data/answers/foundations-mlp.md | head
# Verify feature engineering coverage
poetry run kbol query "What practical examples are shown in 'Python Feature Engineering Cookbook'?" | tee data/answers/foundations-fe.md | head
# Compare LLM books
poetry run kbol query "How do 'Building LLM Powered Applications' and the 'LLM Engineer's Handbook' differ in their coverage of LLM implementation?"# Explore NLP progression
poetry run kbol query "How does 'Mastering NLP from Foundations to LLMs' structure the learning path from basic NLP to advanced LLMs?"# XGBoost applications
poetry run kbol query "What are the key concepts covered in 'XGBoost for Regression Predictive Modeling' regarding time series analysis?"# Genetic algorithms
poetry run kbol query "What practical Python examples are provided in 'Hands-On Genetic Algorithms with Python'?"# PyTorch mastery
poetry run kbol query "What advanced PyTorch concepts are covered in 'Mastering PyTorch' versus basic implementations?"# Compare implementations
poetry run kbol query "How does 'Machine Learning with PyTorch and Scikit-Learn' approach model development differently from 'Mastering PyTorch'?"# RAG implementation
poetry run kbol query "How does 'RAG-Driven Generative AI' approach vector databases and embedding strategies?"# Architecture considerations
poetry run kbol query "What are the main architectural patterns discussed in 'RAG-Driven Generative AI' for building production RAG systems?"# Compare approaches
poetry run kbol query "Compare the approaches to causal inference between 'Causal Inference and Discovery in Python' and 'Causal Inference in R'"# Implementation details
poetry run kbol query "What practical examples are provided in 'Causal Inference and Discovery in Python' for causal discovery?"# Security applications
poetry run kbol query "What cybersecurity use cases are covered in 'Artificial Intelligence for Cybersecurity'?"# Implementation patterns
poetry run kbol query "What are the main security patterns and frameworks discussed in 'Artificial Intelligence for Cybersecurity'?"# Math concepts
poetry run kbol query "What statistical concepts from '15 Math Concepts Every Data Scientist Should Know' are applied in 'Bayesian Analysis with Python'?"# Bayesian applications
poetry run kbol query "How does 'Bayesian Analysis with Python' implement MCMC sampling in practice?"# Forecasting techniques
poetry run kbol query "How does 'Modern Time Series Forecasting with Python' handle different forecasting techniques?"# Implementation patterns
poetry run kbol query "What are the main architectural patterns for time series prediction discussed in 'Modern Time Series Forecasting with Python'?"# Implementation examples
poetry run kbol query "What practical implementations are covered in 'Deep Reinforcement Learning Hands-On'?"# Advanced concepts
poetry run kbol query "How does 'Deep Reinforcement Learning Hands-On' approach advanced topics like multi-agent systems?"CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS book_chunks (
id SERIAL PRIMARY KEY,
book_title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(384),
page_number INTEGER,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS book_chunks_embedding_idx ON book_chunks
USING ivfflat (embedding vector_cosine_ops);sequenceDiagram
participant PDF as PDF Books
participant Chunker as Text Chunker
participant Embedder as Ollama Embedder
participant DB as Vector DB
PDF->>Chunker: Raw Text
Chunker->>Chunker: Split into Chunks
loop Each Chunk
Chunker->>Embedder: Text Chunk
Embedder->>Embedder: Generate Embedding
Embedder->>DB: Store Chunk + Embedding
end
- Setup your environment:
make setup - Run the complete demo with a sample book:
make demo - Try some example queries:
# Query about specific topics poetry run kbol query "Explain monads from the functional programming books" # Find code examples poetry run kbol query "Show me Clojure examples of map and reduce" # Compare concepts poetry run kbol query "Compare Python and Clojure approaches to immutability"
| Command | Description |
|---|---|
| make setup | Initial setup of development environment |
| make demo | Run complete demo pipeline |
| make load-books | Link books from your collection |
| make process-books | Process books into chunks with embeddings |
| make stats | Show statistics about processed books |
| make clean | Clean generated files and directories |
The system uses a PostgreSQL database with vector similarity search capabilities:
CREATE TABLE book_chunks (
id SERIAL PRIMARY KEY,
book_title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(384),
page_number INTEGER,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);MIT
Jason Walsh (https://wal.sh)