Skip to content

Commit c97f01c

Browse files
vvhg1cebtenzzre
andauthored
infill : add new example + extend server API (ggml-org#3296)
* vvhg-code-infill (ggml-org#1) * infill in separate example (ggml-org#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <[email protected]>
1 parent f5ef5cf commit c97f01c

File tree

11 files changed

+1067
-1
lines changed

11 files changed

+1067
-1
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ models-mnt
4040
/embedding
4141
/gguf
4242
/gguf-llama-simple
43+
/infill
4344
/libllama.so
4445
/llama-bench
4546
/main

Makefile

+4-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Define the default target now so that it is always the first target
2-
BUILD_TARGETS = main quantize quantize-stats perplexity embedding vdot q8dot train-text-from-scratch convert-llama2c-to-ggml simple batched save-load-state server embd-input-test gguf llama-bench baby-llama beam-search speculative benchmark-matmult parallel finetune export-lora tests/test-c.o
2+
BUILD_TARGETS = main quantize quantize-stats perplexity embedding vdot q8dot train-text-from-scratch convert-llama2c-to-ggml simple batched save-load-state server embd-input-test gguf llama-bench baby-llama beam-search speculative infill benchmark-matmult parallel finetune export-lora tests/test-c.o
33

44
# Binaries only useful for tests
55
TEST_TARGETS = tests/test-llama-grammar tests/test-grammar-parser tests/test-double-float tests/test-grad0 tests/test-opt tests/test-quantize-fns tests/test-quantize-perf tests/test-sampling tests/test-tokenizer-0-llama tests/test-tokenizer-0-falcon tests/test-tokenizer-1-llama
@@ -543,6 +543,9 @@ main: examples/main/main.cpp build-info.h ggml.
543543
@echo '==== Run ./main -h for help. ===='
544544
@echo
545545

546+
infill: examples/infill/infill.cpp build-info.h ggml.o llama.o common.o console.o grammar-parser.o $(OBJS)
547+
$(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS)
548+
546549
simple: examples/simple/simple.cpp build-info.h ggml.o llama.o common.o $(OBJS)
547550
$(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS)
548551

common/common.cpp

+2
Original file line numberDiff line numberDiff line change
@@ -389,6 +389,8 @@ bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
389389
params.interactive_first = true;
390390
} else if (arg == "-ins" || arg == "--instruct") {
391391
params.instruct = true;
392+
} else if (arg == "--infill") {
393+
params.infill = true;
392394
} else if (arg == "--multiline-input") {
393395
params.multiline_input = true;
394396
} else if (arg == "--simple-io") {

common/common.h

+1
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ struct gpt_params {
120120
bool use_mlock = false; // use mlock to keep model in memory
121121
bool numa = false; // attempt optimizations that help on some NUMA systems
122122
bool verbose_prompt = false; // print prompt tokens before generation
123+
bool infill = false; // use infill mode
123124
};
124125

125126
bool gpt_params_parse(int argc, char ** argv, gpt_params & params);

examples/infill/CMakeLists.txt

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
set(TARGET infill)
2+
add_executable(${TARGET} infill.cpp)
3+
install(TARGETS ${TARGET} RUNTIME)
4+
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
5+
target_compile_features(${TARGET} PRIVATE cxx_std_11)
6+
if(TARGET BUILD_INFO)
7+
add_dependencies(${TARGET} BUILD_INFO)
8+
endif()

examples/infill/README.md

+41
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# llama.cpp/example/infill
2+
3+
This example shows how to use the infill mode with Code Llama models supporting infill mode.
4+
Currently the 7B and 13B models support infill mode.
5+
6+
Infill supports most of the options available in the main example.
7+
8+
For further information have a look at the main README.md in llama.cpp/example/main/README.md
9+
10+
## Common Options
11+
12+
In this section, we cover the most commonly used options for running the `infill` program with the LLaMA models:
13+
14+
- `-m FNAME, --model FNAME`: Specify the path to the LLaMA model file (e.g., `models/7B/ggml-model.bin`).
15+
- `-i, --interactive`: Run the program in interactive mode, allowing you to provide input directly and receive real-time responses.
16+
- `-n N, --n-predict N`: Set the number of tokens to predict when generating text. Adjusting this value can influence the length of the generated text.
17+
- `-c N, --ctx-size N`: Set the size of the prompt context. The default is 512, but LLaMA models were built with a context of 2048, which will provide better results for longer input/inference.
18+
19+
## Input Prompts
20+
21+
The `infill` program provides several ways to interact with the LLaMA models using input prompts:
22+
23+
- `--in-prefix PROMPT_BEFORE_CURSOR`: Provide the prefix directly as a command-line option.
24+
- `--in-suffix PROMPT_AFTER_CURSOR`: Provide the suffix directly as a command-line option.
25+
- `--interactive-first`: Run the program in interactive mode and wait for input right away. (More on this below.)
26+
27+
## Interaction
28+
29+
The `infill` program offers a seamless way to interact with LLaMA models, allowing users to receive real-time infill suggestions. The interactive mode can be triggered using `--interactive`, and `--interactive-first`
30+
31+
### Interaction Options
32+
33+
- `-i, --interactive`: Run the program in interactive mode, allowing users to get real time code suggestions from model.
34+
- `--interactive-first`: Run the program in interactive mode and immediately wait for user input before starting the text generation.
35+
- `--color`: Enable colorized output to differentiate visually distinguishing between prompts, user input, and generated text.
36+
37+
### Example
38+
39+
```bash
40+
./infill -t 10 -ngl 0 -m models/codellama-13b.Q5_K_S.gguf -c 4096 --temp 0.7 --repeat_penalty 1.1 -n 20 --in-prefix "def helloworld():\n print(\"hell" --in-suffix "\n print(\"goodbye world\")\n "
41+
```

0 commit comments

Comments
 (0)