A powerful semantic search system for log files that enables natural language queries over your logs using vector embeddings and local LLM integration.
- Multiple Embedding Strategies: Choose between local SentenceTransformer, Ollama embeddings, or remote embedding server
- Flexible Storage: Uses ChromaDB for persistent vector storage with configurable paths
- Local LLM Integration: Generates AI responses using Ollama with customizable models
- Interactive Query Interface: Rich terminal interface with markdown rendering
- GPU Acceleration: Optional GPU support for faster embedding generation
- Automatic File Detection: Intelligently detects and indexes all text-based files by content analysis
- Security-First Design: Client-side trust_remote_code management with consent prompts and persistent tracking
- Environment Configuration: Fully configurable via
.envfiles
I advise you run in a venv.
python -m venv venv
source venv/bin/activateor on Windows
python.exe -m venv venv
.\venv\Scripts\Activatethen
pip install -r requirements.txtCopy the example environment file and customize it:
cp .env.example .envEdit .env to configure your setup:
# Ollama Configuration
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=qwen3:8b
OLLAMA_EMBEDDING_MODEL=nomic-embed-text:latest
# Embedding Configuration
EMBEDDING_SERVER=http://localhost:5000
EMBEDDING_MODEL=nomic-ai/nomic-embed-text-v1.5
# Storage Configuration
CHROMA_PATH=./chroma_db
DEFAULT_CHUNK_SIZE=2000
# Default Settings
USE_LOCAL_EMBEDDINGS=true
USE_LOCAL_OLLAMA=trueIndex a directory with automatic file detection:
python index.py /path/to/your/filesThe system will:
- Automatically detect all text-based files by content analysis
- Skip binary files and common build/cache directories
- Prompt for trust_remote_code consent if needed for the embedding model
Or specify embedding type:
# Use local SentenceTransformer embeddings (default)
python index.py /path/to/files --local-embeddings
# Use Ollama embeddings
python index.py /path/to/files --ollama-embeddings
# Use remote embedding server
python index.py /path/to/files --remote-embeddingsAdditional options:
# Custom model and chunk size
python index.py /path/to/logs --model custom-model --chunk-size 1500
# Custom ChromaDB path
python index.py /path/to/logs --chroma-path ./my_custom_dbStart the interactive query interface:
python ask.pyThe system will:
- Auto-detect the embedding type used during indexing
- Apply same trust_remote_code settings for consistency
- Generate responses using Ollama's local LLM
- Generate an output file ask-YYYY-Month-DD-HH-MM.md
Or specify a custom output file:
python ask.py my_queries.md-
Unified Indexer (
index.py)- Processes repositories with automatic file detection
- Supports multiple embedding strategies via handler classes
- Chunks content into configurable segments (configured via DEFAULT_CHUNK_SIZE)
- Client-side trust_remote_code management
- Stores embeddings in ChromaDB (collection: 'vectors') with metadata tracking
-
Query Interface (
ask.py)- Interactive CLI for natural language queries
- Auto-detects embedding type and trust settings from metadata
- Generates responses using Ollama's local LLM
- Consistent security model with indexing phase
- Saves all Q&A pairs with timestamps as markdown files
-
Embedding Server (
embedding_server.py)- Optional remote embedding service with GPU support
- Respects client-side trust_remote_code decisions
- RESTful API with health checks and server info
- Dynamic model loading with trust setting caching
- Supports batch processing and multiple model variants
-
Trust Manager (
trust_manager.py)- Centralized security management for trust_remote_code
- Auto-detection of models requiring remote code execution
- Interactive consent prompts with risk/benefit explanations
- Persistent approval tracking in .env files
- CLI tools for managing trust settings
- LocalEmbeddingHandler: Uses SentenceTransformer with automatic GPU detection
- OllamaEmbeddingHandler: Leverages Ollama's embedding API
- RemoteEmbeddingHandler: Connects to remote embedding server with retry logic
| Variable | Description | Default |
|---|---|---|
OLLAMA_HOST |
Ollama server URL | http://localhost:11434 |
OLLAMA_MODEL |
LLM model for responses | qwen3:8b |
OLLAMA_EMBEDDING_MODEL |
Ollama embedding model | nomic-embed-text:latest |
EMBEDDING_SERVER |
Remote embedding server URL | http://localhost:5000 |
EMBEDDING_MODEL |
Embedding model name | nomic-ai/nomic-embed-text-v1.5 |
CHROMA_PATH |
ChromaDB storage path | ./chroma_db |
DEFAULT_CHUNK_SIZE |
Default text chunk size | 2000 |
DEFAULT_OUTPUT_FILE_PREFIX |
Prefix for auto-generated output files | ask |
DEFAULT_TOP_K |
Default number of results to retrieve | 5 |
USE_LOCAL_EMBEDDINGS |
Default embedding strategy | true |
USE_LOCAL_OLLAMA |
Use local Ollama instance | true |
TRUST_REMOTE_CODE_* |
Model-specific trust settings | Auto-managed |
python index.py /path/to/files/ [options]
Options:
--local-embeddings Use local SentenceTransformer (default)
--ollama-embeddings Use Ollama embedding API
--remote-embeddings Use remote embedding server
--model MODEL Override embedding model
--chunk-size SIZE Text chunk size (default from DEFAULT_CHUNK_SIZE env var)
--chroma-path PATH ChromaDB storage pathpython embedding_server.py [options]
Options:
--host HOST Bind host (default: 0.0.0.0)
--port PORT Bind port (default: 5000)
--model MODEL SentenceTransformer model
--max-length LENGTH Max sequence length (default: 512)
--batch-size SIZE Encoding batch size (default: 32)
--debug Enable debug modeThe system includes a comprehensive security framework for models that require trust_remote_code=True. This client-side security system:
- Auto-detects which models likely need remote code execution based on known patterns
- Prompts for informed consent with detailed security warnings
- Persists decisions in
.envwith model-specific hash tracking - Client-side control - trust decisions made locally, not on remote servers
- Cross-component consistency - same security model for indexing, querying, and serving
- Detection: System analyzes model names against known patterns
- User Consent: Interactive prompts with clear risk/benefit explanations
- Persistence: Decisions saved locally with model identification hashes
- Communication: Client sends trust settings to remote embedding servers
# List all approved/denied models
python trust_manager.py --list
# Check if a specific model needs trust_remote_code
python trust_manager.py --check "nomic-ai/nomic-embed-text-v1.5"When you first use a model requiring remote code execution:
==============================================================
SECURITY WARNING: Remote Code Execution
==============================================================
Model: nomic-ai/nomic-embed-text-v1.5
This model may require 'trust_remote_code=True' which allows
the model to execute arbitrary code during loading.
RISKS:
- The model could execute malicious code
- Your system could be compromised
- Data could be stolen or corrupted
BENEFITS:
- Access to newer/specialized models
- Better embedding quality for some models
Your choice will be saved for this model.
==============================================================
Allow remote code execution for this model? [y/N]:
Approval decisions are stored in your .env file:
# Example entries (automatically managed)
# TRUST_REMOTE_CODE_A1B2C3D4_MODEL=nomic-ai/nomic-embed-text-v1.5
TRUST_REMOTE_CODE_A1B2C3D4=true
# TRUST_REMOTE_CODE_E5F6G7H8_MODEL=sentence-transformers/all-MiniLM-L6-v2
TRUST_REMOTE_CODE_E5F6G7H8=falseStart the embedding server for distributed setups:
# Start with GPU acceleration
python embedding_server.py --host 0.0.0.0 --port 5000
# Custom model and settings
python embedding_server.py \
--model sentence-transformers/all-MiniLM-L6-v2 \
--max-length 256 \
--batch-size 64Index multiple folders to the same database:
# Index first folder
python index.py /path/to/files1 --chroma-path ./shared_db
# Add second folder to same database
python index.py /path/to/files2 --chroma-path ./shared_dbEnsure Ollama is running with required models:
# Install Ollama models
ollama pull qwen3:8b
ollama pull nomic-embed-text:latest
# Start Ollama (usually runs as service)
ollama serveOllama environment variables can be set in the environment you are running in. In Windows that's done by searching for "Environment Vairables" and adding each relevant description to the "Variable Name" and each value to the "Variable Value" fields. In Linux (most) you'll add these to your .bashrc as shown below or added to your systemctl ollama.service file as Environment="OLLAMA_HOST=0.0.0.0" etc. MacOs users can edit thewir .zprofile file and add as below.
OLLAMA_CONTEXT_LENGTH:8192
OLLAMA_DEBUG:INFO
OLLAMA_HOST:http://0.0.0.0:11434
OLLAMA_INTEL_GPU:false
OLLAMA_KEEP_ALIVE:3h
OLLAMA_MAX_LOADED_MODELS:1
OLLAMA_MAX_QUEUE:512 The embedding server provides RESTful endpoints:
POST /embed- Generate embeddings for text arraysGET /health- Health check with server statusGET /info- Detailed server and GPU information
Example request:
curl -X POST http://localhost:5000/embed \
-H "Content-Type: application/json" \
-d '{"texts": ["def hello_world():", "print(\"Hello, World!\")"]}'- CUDA/GPU Issues: The system automatically falls back to CPU if GPU is unavailable
- Model Loading: Ensure sufficient RAM/VRAM for embedding models
- Ollama Connection: Verify Ollama is running and accessible at configured host
- ChromaDB Permissions: Ensure write permissions for ChromaDB storage path
The system uses a single standardized configuration:
- Database path:
./chroma_db(configurable via CHROMA_PATH) - Collection name:
vectors - Metadata tracking for indexing configuration
- chromadb: Vector database for embeddings
- sentence-transformers: Local embedding generation (optional, only needed for local embeddings)
- ollama: LLM client for local inference
- rich: Enhanced terminal output and markdown rendering
- flask: Web server for embedding API
- python-dotenv: Environment configuration management
- tiktoken: Token counting utilities
- einops: Tensor operations for advanced models
- requests: HTTP client for remote services
- pypdf*: PDF parsing library
├── index.py # Unified indexing script
├── ask.py # Interactive query interface
├── embedding_server.py # Remote embedding server
├── trust_manager.py # Security: trust_remote_code management
├── requirements.txt # Python dependencies
├── .env_example # Environment configuration template
└── chroma_db/ # Default ChromaDB storage (created after indexing)
I welcome any assistance on this project, especially around trying new models for better performance and testing against ore logs than I have at my disposal!
Please just fork off of dev and then submit a PR