kno-sdk

A Python library for cloning, indexing, and semantically searching Git repositories using embeddings (OpenAI or SBERT) and Chroma — plus a high-level agent_query for autonomous code agent.

🚀 Features

Clone or update any Git repository with a single call
Extract semantic code chunks via Tree-Sitter grammars (functions, classes, methods, etc.)
Fallback to line-based chunking for unsupported languages or large files
Embed code or text with your choice of:
- OpenAI's text-embedding-ada-002 via OpenAIEmbeddings
- Local SBERT model (e.g. microsoft/graphcodebert-base) via SBERTEmbeddings
Persist vector store in a .kno/ folder using Chroma
Auto-commit & push the embedding database back to your repo
Fast similarity search over indexed code chunks
Autonomous agent for code analysis via agent_query()

📦 Installation

pip install kno-sdk

🏁 Quickstart

from kno_sdk import clone_and_index, search, EmbeddingMethod

# 1. Clone (or pull) and index a repository
repo_index = clone_and_index(
    repo_url="https://github.com/SyedGhazanferAnwar/NestJs-MovieApp",
    branch="master",
    embedding=EmbeddingMethod.SBERT,      # or EmbeddingMethod.OPENAI
    cloned_repo_base_dir="repos"                      # where to clone locally
)
print("Indexed at:", repo_index.path)
print("Directory snapshot:\n", repo_index.digest)

# 2. Perform semantic search
results = search(
    repo_url="https://github.com/SyedGhazanferAnwar/NestJs-MovieApp",
    branch="master",
    embedding=EmbeddingMethod.SBERT,
    cloned_repo_base_dir="repos",
    query="NestFactory",
    k=5
)
for i, chunk in enumerate(results, 1):
    print(f"--- Result #{i} ---\n{chunk}\n")

# 3. Autonomous Code-Analysis Agent
from kno_sdk import agent_query, EmbeddingMethod, LLMProvider

# First create a repo index
repo_index = clone_and_index(
    repo_url="https://github.com/WebGoat/WebGoat",
    branch="main",
    embedding=EmbeddingMethod.SBERT,
    cloned_repo_base_dir="repos"
)

# Then use the index with agent_query
result = agent_query(
    repo_index=repo_index,
    llm_provider=LLMProvider.ANTHROPIC,
    llm_model="claude-3-haiku-20240307",
    llm_temperature=0.0,
    llm_max_tokens=4096,
    llm_system_prompt="You are a senior code-analysis agent.",
    prompt="Find issues, bugs and vulnerabilities in this repo, and explain each with exact code locations.",
    MODEL_API_KEY="your_api_key_here"
)

print(result)

📖 API Reference

clone_and_index(...) → RepoIndex

Clone (or pull) a repository, embed its files, and persist a Chroma database in .kno folder. Finally, commit & push the .kno/ folder back to the original repo.

def clone_and_index(
    repo_url: str,
    branch: str = "main",
    embedding: EmbeddingMethod = EmbeddingMethod.SBERT,
    cloned_repo_base_dir: str = "."
) -> RepoIndex

repo_url — Git HTTPS/SSH URL
branch — branch to clone or update (default: main)
embedding — EmbeddingMethod.OPENAI or EmbeddingMethod.SBERT
base_dir — local directory to clone into (default: current working dir)

Returns a RepoIndex object with:

path: pathlib.Path — local clone directory
digest: str — textual snapshot of the directory tree
vector_store: Chroma — the Chroma collection instance

search(...) → List[str]

Run a similarity search on an existing .kno/ Chroma database.

def search(
    repo_url: str,
    branch: str = "main",
    embedding: EmbeddingMethod = EmbeddingMethod.SBERT,
    query: str = "",
    k: int = 8,
    cloned_repo_base_dir: str = "."
) -> List[str]

query — your natural-language or code search prompt
k — number of top results to return

Returns a list of the top-k matching code/text chunks.

agent_query(...) → str

High-level agent that clones, indexes, and then iteratively uses tools (search_code, read_file, etc.) plus an LLM to fulfill your prompt.

def agent_query(
    repo_url: str,
    branch: str = "main",
    embedding: EmbeddingMethod = EmbeddingMethod.SBERT,
    cloned_repo_base_dir: str = str(Path.cwd()),
    llm_provider: LLMProvider = LLMProvider.ANTHROPIC,
    llm_model: str = "claude-3-haiku-20240307",
    llm_temperature: float = 0.0,
    llm_max_tokens: int = 4096,
    llm_system_prompt: str = "",
    prompt: str = "",
    MODEL_API_KEY: str = "",
) -> str

repo_url, branch, embedding, base_dir — same as above
llm_provider — LLMProvider.OPENAI or LLMProvider.ANTHROPIC
llm_model — model name (e.g. "gpt-4" or "claude-3-haiku-20240307")
llm_temperature, llm_max_tokens — sampling params
llm_system_prompt — initial system message for the agent
prompt — your user query/task description
MODEL_API_KEY — sets OPENAI_API_KEY or ANTHROPIC_API_KEY

Returns the agent's Final Answer as a string.

EmbeddingMethod

class EmbeddingMethod(str, Enum):
    OPENAI = "OpenAIEmbeddings"
    SBERT  = "SBERTEmbeddings"

Choose between OpenAI's hosted embeddings or a local SBERT model.

🔍 How It Works

Clone or PullUses GitPython to clone depth-1 or pull the latest changes.
Directory SnapshotBuilds a small "digest" of files/folders (up to ~1 K tokens).
Chunk Extraction
- Tree-sitter for language-aware extraction of functions, classes, etc.
- Fallback to fixed-size line chunks for unknown languages or large files.
Embedding
- Streams each chunk into your chosen embedding backend.
- Respects a 16 000-token cap per chunk.
Vector Store
- Persists embeddings in a namespaced Chroma collection under .kno/.
- Only indexes files once (skips already-populated collections).
Commit & Push
- Automatically stages, commits, and pushes .kno/ back to your remote.
Autonomous Agent

RAG prompt
Tool calls (search_code, read_file, …)
Iterative LLM planning & execution
Stops on "Final Answer:" or max iterations

⚙️ Configuration

Skip directories: .git, node_modules, build, dist, target, .vscode, .kno
Skip files: package-lock.json, yarn.lock, .prettierignore
Binary extensions: common image, audio, video, archive, font, and binary file types

All of the above can be modified by forking the source and adjusting the skip_dirs, skip_files, and BINARY_EXTS sets.

🔧 Dependencies

🤝 Contributing

Fork this repo
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Please run pytest before submitting and follow the existing code style.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kno-sdk

🚀 Features

📦 Installation

🏁 Quickstart

📖 API Reference

clone_and_index(...) → RepoIndex

search(...) → List[str]

agent_query(...) → str

EmbeddingMethod

🔍 How It Works

⚙️ Configuration

🔧 Dependencies

🤝 Contributing

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

kno-sdk

🚀 Features

📦 Installation

🏁 Quickstart

📖 API Reference

clone_and_index(...) → RepoIndex

search(...) → List[str]

agent_query(...) → str

EmbeddingMethod

🔍 How It Works

⚙️ Configuration

🔧 Dependencies

🤝 Contributing