Filesystem Finder Agent

An AI agent that takes a keyword, natural-language query, or file, then searches across all folders/drives you point it to and returns:

Exact locations (paths) that match
Ranked results (keyword + semantic)
A markdown report (auto-generated) summarizing what it found
Optional JSON with structured hits (path, snippet, score, tags)

Goal: Make “where is X in my mess of files?” effortless—whether X is a filename, phrase inside documents, a particular code construct, or a concept.

Core Features

Hybrid Search
- Literal: filename, extension, exact-phrase, regex
- Semantic: embeddings + vector search for concepts (e.g., "performance tuning" -> finds perf_notes.md, relevant PRs)
- Metadata filters: size, modified date, owner, extension, repo name
Multi-Location Scanning
- Local paths (multiple roots)
- External mounts (e.g., /Volumes/*)
- (Optional later) Cloud connectors (S3, GDrive) via provider SDKs
Agentic Planning (Claude)
- Planner agent (interprets the user’s intent, decides strategies: literal, semantic, code-aware)
- Executor tools (filesystem walker, indexer, vector DB, previewer)
- Synthesizer (writes a readable report.md and a machine-friendly results.json)
Code-Aware Search (optional)
- Language-aware heuristics (function/class names, imports, dependency graphs)
- Parse ASTs for precise matches (e.g., TypeScript, Python, Java)
Auto Markdown Report
- For each query, produce reports/<timestamp>_<slug>.md with:
  - Summary of intent
  - What was searched (locations, filters)
  - Top N hits with context snippets
  - Reasoning notes from the agent
  - Next-step suggestions
Privacy & Safety
- Explicit allowlist of roots
- Ignore patterns (.gitignore-style)
- Dry-run & preview modes

Architecture (High-level)

User -> CLI/UI -> Orchestrator -> Planner (Claude) -> Tool Router -> {Indexer, FS Walker, VectorSearch, Preview}
                                                    -> Synthesizer (Claude) -> report.md + results.json

Orchestrator: Manages a run; feeds context to Claude; caches partial results
Planner (Claude): Converts intent to steps & tool calls
Tools:
- FS Walker: walks directories, respects ignore rules, yields file paths, basic metadata
- Content Extractor: text from files (txt, md, code, pdf via OCR/lib later)
- Indexer: builds an index (keyword + embeddings) and stores in a local vector DB (e.g., LanceDB, SQLite+FAISS, Chroma)
- VectorSearch: nearest-neighbor semantic lookup
- Previewer: gathers snippets around matches
Synthesizer (Claude): Given raw hits + user prompt, writes a clean markdown summary and optional JSON

Data Flow

Ingest (one-time or on demand)
- Walk roots -> extract text -> chunk -> embed -> upsert to vector DB
Plan
- Claude decides which search modes to use based on the prompt/file
Search
- Literal + semantic queries
- Combine & rerank
Synthesize
- Claude writes report.md and saves results.json

Example Prompts

"Find any mention of JWT secrets or .env files modified last month in /Users/sneh/projects and /Volumes/WorkArchive."
"Where did I store the doc about Stripe webhook retries?"
"Given this PDF (upload), find related meeting notes and summarize decisions."

CLI Sketch

finder \
  --roots "/Users/sneh/projects,/Volumes/WorkArchive" \
  --query "stripe webhook retries" \
  --filters "ext:md,txt;mtime:30d" \
  --out ./runs

# Or with a seed file:
finder --roots "/data" --file ./docs/contract.pdf --out ./runs

Sample Output Tree

runs/
  2025-08-26_stripe-webhook-retries/
    report.md
    results.json
    logs/

Tech Stack

Language: Python or TypeScript (choose your comfort)
LLM: Claude (Anthropic API)
Embeddings: Claude embeddings or open-source (e.g., BGE-small) if offline is needed
Vector DB: LanceDB / FAISS / Chroma (local)
Parsing: textract/pypdf (pdf), python-magic (mime), tree-sitter (code, optional)
Config: .finderagent.yaml for roots, ignores, db path

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
docs		docs
src		src
tests		tests
.gitignore		.gitignore
activate.sh		activate.sh
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Filesystem Finder Agent

Core Features

Architecture (High-level)

Data Flow

Example Prompts

CLI Sketch

Sample Output Tree

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Filesystem Finder Agent

Core Features

Architecture (High-level)

Data Flow

Example Prompts

CLI Sketch

Sample Output Tree

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages