-
Notifications
You must be signed in to change notification settings - Fork 12.8k
Add comprehensive Copilot instructions with Python environment, server testing, and git clang-format #15286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
CISC
merged 17 commits into
master
from
copilot/fix-f1657426-6764-497e-98bd-bd55dd7cf87a
Aug 21, 2025
Merged
Add comprehensive Copilot instructions with Python environment, server testing, and git clang-format #15286
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
ac7a930
Initial plan
Copilot 5a604ea
Initialize copilot instructions exploration
Copilot d67115e
Add comprehensive .github/copilot-instructions.md file
Copilot ffcbda3
Update Python environment and tools directory documentation
Copilot edd92b5
add more python dependencies to .venv
CISC ac144ad
Update copilot instructions: add backend hardware note and server tes…
Copilot 45dd1a0
Apply suggestions from code review
CISC 552963a
Merge branch 'master' into copilot/fix-f1657426-6764-497e-98bd-bd55dd…
CISC 4f933db
Apply suggestions from code review
CISC eb2cd93
Replace clang-format with git clang-format to format only changed code
Copilot 761b40f
Minor formatting improvements: remove extra blank line and add traili…
Copilot 6a713a5
try installing git-clang-format
CISC 5f4a0a4
try just clang-format
CISC 743e041
Remove --binary flag from git clang-format and add git-clang-format i…
Copilot 4d26e9f
download 18.x release
CISC dbf0433
typo--
CISC 8e0cbf1
remove --binary flag
CISC File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,262 @@ | ||
# Copilot Instructions for llama.cpp | ||
|
||
## Repository Overview | ||
|
||
llama.cpp is a large-scale C/C++ project for efficient LLM (Large Language Model) inference with minimal setup and dependencies. The project enables running language models on diverse hardware with state-of-the-art performance. | ||
|
||
**Key Facts:** | ||
- **Primary language**: C/C++ with Python utility scripts | ||
- **Size**: ~200k+ lines of code across 1000+ files | ||
- **Architecture**: Modular design with main library (`libllama`) and 40+ executable tools/examples | ||
- **Core dependency**: ggml tensor library (vendored in `ggml/` directory) | ||
- **Backends supported**: CPU (AVX/NEON optimized), CUDA, Metal, Vulkan, SYCL, ROCm, MUSA | ||
- **License**: MIT | ||
|
||
## Build Instructions | ||
|
||
### Prerequisites | ||
- CMake 3.14+ (primary build system) | ||
- C++17 compatible compiler (GCC 13.3+, Clang, MSVC) | ||
- Optional: ccache for faster compilation | ||
|
||
### Basic Build (CPU-only) | ||
**ALWAYS run these commands in sequence:** | ||
```bash | ||
cmake -B build | ||
cmake --build build --config Release -j $(nproc) | ||
``` | ||
|
||
**Build time**: ~10 minutes on 4-core system with ccache enabled, ~25 minutes without ccache. | ||
|
||
**Important Notes:** | ||
- The Makefile is deprecated - always use CMake | ||
- ccache is automatically detected and used if available | ||
- Built binaries are placed in `build/bin/` | ||
- Parallel builds (`-j`) significantly reduce build time | ||
|
||
### Backend-Specific Builds | ||
For CUDA support: | ||
```bash | ||
cmake -B build -DGGML_CUDA=ON | ||
cmake --build build --config Release -j $(nproc) | ||
``` | ||
|
||
For Metal (macOS): | ||
```bash | ||
cmake -B build -DGGML_METAL=ON | ||
cmake --build build --config Release -j $(nproc) | ||
``` | ||
|
||
**Important Note**: While all backends can be built as long as the correct requirements for that backend are installed, you will not be able to run them without the correct hardware. The only backend that can be run for testing and validation is the CPU backend. | ||
|
||
### Debug Builds | ||
Single-config generators: | ||
```bash | ||
cmake -B build -DCMAKE_BUILD_TYPE=Debug | ||
cmake --build build | ||
``` | ||
|
||
Multi-config generators: | ||
```bash | ||
cmake -B build -G "Xcode" | ||
cmake --build build --config Debug | ||
``` | ||
|
||
### Common Build Issues | ||
- **Issue**: Network tests fail in isolated environments | ||
**Solution**: Expected behavior - core functionality tests will still pass | ||
|
||
## Testing | ||
|
||
### Running Tests | ||
```bash | ||
ctest --test-dir build --output-on-failure -j $(nproc) | ||
``` | ||
|
||
**Test suite**: 38 tests covering tokenizers, grammar parsing, sampling, backends, and integration | ||
**Expected failures**: 2-3 tests may fail if network access is unavailable (they download models) | ||
**Test time**: ~30 seconds for passing tests | ||
|
||
### Server Unit Tests | ||
Run server-specific unit tests after building the server: | ||
```bash | ||
# Build the server first | ||
cmake --build build --target llama-server | ||
|
||
# Navigate to server tests and run | ||
cd tools/server/tests | ||
source ../../../.venv/bin/activate | ||
./tests.sh | ||
``` | ||
**Server test dependencies**: The `.venv` environment includes the required dependencies for server unit tests (pytest, aiohttp, etc.). Tests can be run individually or with various options as documented in `tools/server/tests/README.md`. | ||
|
||
### Test Categories | ||
- Tokenizer tests: Various model tokenizers (BERT, GPT-2, LLaMA, etc.) | ||
- Grammar tests: GBNF parsing and validation | ||
- Backend tests: Core ggml operations across different backends | ||
- Integration tests: End-to-end workflows | ||
|
||
### Manual Testing Commands | ||
```bash | ||
# Test basic inference | ||
./build/bin/llama-cli --version | ||
|
||
# Test model loading (requires model file) | ||
./build/bin/llama-cli -m path/to/model.gguf -p "Hello" -n 10 | ||
``` | ||
|
||
## Code Quality and Linting | ||
|
||
### C++ Code Formatting | ||
**ALWAYS format C++ code before committing:** | ||
```bash | ||
git clang-format | ||
``` | ||
|
||
Configuration is in `.clang-format` with these key rules: | ||
- 4-space indentation | ||
- 120 column limit | ||
- Braces on same line for functions | ||
- Pointer alignment: `void * ptr` (middle) | ||
- Reference alignment: `int & ref` (middle) | ||
|
||
### Python Code | ||
**ALWAYS activate the Python environment in `.venv` and use tools from that environment:** | ||
```bash | ||
# Activate virtual environment | ||
source .venv/bin/activate | ||
``` | ||
|
||
Configuration files: | ||
- `.flake8`: flake8 settings (max-line-length=125, excludes examples/tools) | ||
- `pyrightconfig.json`: pyright type checking configuration | ||
|
||
### Pre-commit Hooks | ||
Run before committing: | ||
```bash | ||
pre-commit run --all-files | ||
``` | ||
|
||
## Continuous Integration | ||
|
||
### GitHub Actions Workflows | ||
Key workflows that run on every PR: | ||
- `.github/workflows/build.yml`: Multi-platform builds | ||
- `.github/workflows/server.yml`: Server functionality tests | ||
- `.github/workflows/python-lint.yml`: Python code quality | ||
CISC marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- `.github/workflows/python-type-check.yml`: Python type checking | ||
|
||
### Local CI Validation | ||
**Run full CI locally before submitting PRs:** | ||
```bash | ||
mkdir tmp | ||
|
||
# CPU-only build | ||
bash ./ci/run.sh ./tmp/results ./tmp/mnt | ||
``` | ||
|
||
**CI Runtime**: 30-60 minutes depending on backend configuration | ||
|
||
### Triggering CI | ||
Add `ggml-ci` to commit message to trigger heavy CI workloads on the custom CI infrastructure. | ||
|
||
## Project Layout and Architecture | ||
|
||
### Core Directories | ||
- **`src/`**: Main llama library implementation (`llama.cpp`, `llama-*.cpp`) | ||
- **`include/`**: Public API headers, primarily `include/llama.h` | ||
- **`ggml/`**: Core tensor library (submodule with custom GGML framework) | ||
- **`examples/`**: 30+ example applications and tools | ||
- **`tools/`**: Additional development and utility tools (server benchmarks, tests) | ||
- **`tests/`**: Comprehensive test suite with CTest integration | ||
- **`docs/`**: Detailed documentation (build guides, API docs, etc.) | ||
- **`scripts/`**: Utility scripts for CI, data processing, and automation | ||
- **`common/`**: Shared utility code used across examples | ||
|
||
### Key Files | ||
- **`CMakeLists.txt`**: Primary build configuration | ||
- **`include/llama.h`**: Main C API header (~2000 lines) | ||
- **`src/llama.cpp`**: Core library implementation (~8000 lines) | ||
- **`CONTRIBUTING.md`**: Coding guidelines and PR requirements | ||
- **`.clang-format`**: C++ formatting rules | ||
- **`.pre-commit-config.yaml`**: Git hook configuration | ||
|
||
### Built Executables (in `build/bin/`) | ||
Primary tools: | ||
- **`llama-cli`**: Main inference tool | ||
- **`llama-server`**: OpenAI-compatible HTTP server | ||
- **`llama-quantize`**: Model quantization utility | ||
- **`llama-perplexity`**: Model evaluation tool | ||
- **`llama-bench`**: Performance benchmarking | ||
- **`llama-convert-llama2c-to-ggml`**: Model conversion utilities | ||
|
||
### Configuration Files | ||
- **CMake**: `CMakeLists.txt`, `cmake/` directory | ||
- **Linting**: `.clang-format`, `.clang-tidy`, `.flake8` | ||
- **CI**: `.github/workflows/`, `ci/run.sh` | ||
- **Git**: `.gitignore` (includes build artifacts, models, cache) | ||
|
||
### Dependencies | ||
- **System**: OpenMP, libcurl (for model downloading) | ||
- **Optional**: CUDA SDK, Metal framework, Vulkan SDK, Intel oneAPI | ||
- **Bundled**: httplib, json (header-only libraries in vendored form) | ||
|
||
## Common Validation Steps | ||
|
||
### After Making Changes | ||
1. **Format code**: `git clang-format` | ||
2. **Build**: `cmake --build build --config Release` | ||
3. **Test**: `ctest --test-dir build --output-on-failure` | ||
4. **Server tests** (if modifying server): `cd tools/server/tests && source ../../../.venv/bin/activate && ./tests.sh` | ||
5. **Manual validation**: Test relevant tools in `build/bin/` | ||
|
||
### Performance Validation | ||
```bash | ||
# Benchmark inference performance | ||
./build/bin/llama-bench -m model.gguf | ||
|
||
# Evaluate model perplexity | ||
./build/bin/llama-perplexity -m model.gguf -f dataset.txt | ||
``` | ||
|
||
### Backend Validation | ||
```bash | ||
# Test backend operations | ||
./build/bin/test-backend-ops | ||
``` | ||
|
||
## Environment Setup | ||
|
||
### Required Tools | ||
- CMake 3.14+ (install via system package manager) | ||
- Modern C++ compiler with C++17 support | ||
- Git (for submodule management) | ||
- Python 3.9+ with virtual environment (`.venv` is provided) | ||
|
||
### Optional but Recommended | ||
- ccache: `apt install ccache` or `brew install ccache` | ||
- clang-format 15+: Usually included with LLVM/Clang installation | ||
- pre-commit: `pip install pre-commit` | ||
|
||
### Backend-Specific Requirements | ||
- **CUDA**: NVIDIA CUDA Toolkit 11.2+ | ||
- **Metal**: Xcode command line tools (macOS only) | ||
- **Vulkan**: Vulkan SDK | ||
- **SYCL**: Intel oneAPI toolkit | ||
|
||
## Important Guidelines | ||
|
||
### Code Changes | ||
- **Minimal dependencies**: Avoid adding new external dependencies | ||
- **Cross-platform compatibility**: Test on Linux, macOS, Windows when possible | ||
- **Performance focus**: This is a performance-critical inference library | ||
- **API stability**: Changes to `include/llama.h` require careful consideration | ||
|
||
### Git Workflow | ||
- Always create feature branches from `master` | ||
- **Never** commit build artifacts (`build/`, `.ccache/`, `*.o`, `*.gguf`) | ||
- Use descriptive commit messages following project conventions | ||
|
||
### Trust These Instructions | ||
Only search for additional information if these instructions are incomplete or found to be incorrect. This document contains validated build and test procedures that work reliably across different environments. | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -147,3 +147,4 @@ poetry.toml | |
# Local scripts | ||
/run-vim.sh | ||
/run-chat.sh | ||
.ccache/ |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the
.pre-commit-config.yaml
file, ie. basicallyEditorConfig
andflake8
.