This repository has been archived by the owner on May 12, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 161
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
269 changed files
with
52 additions
and
61,413 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +0,0 @@ | ||
[submodule "pybind11"] | ||
path = pybind11 | ||
url = https://github.com/pybind/pybind11.git | ||
branch = master | ||
|
||
[submodule "llama.cpp"] | ||
path = llama.cpp | ||
url = https://github.com/ggerganov/llama.cpp.git | ||
branch = master | ||
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,125 +1,59 @@ | ||
# PyLLaMACpp | ||
Official supported Python bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) + gpt4all | ||
# PyGPT4All | ||
Official Python CPU inference for [GPT4All](https://github.com/nomic-ai/gpt4all) language models based on [llama.cpp](https://github.com/ggerganov/llama.cpp) and [ggml](https://github.com/ggerganov/ggml). | ||
|
||
[](https://opensource.org/licenses/MIT) | ||
[](https://pypi.org/project/pyllamacpp/) | ||
|
||
|
||
For those who don't know, `llama.cpp` is a port of Facebook's LLaMA model in pure C/C++: | ||
|
||
<blockquote> | ||
|
||
- Without dependencies | ||
- Apple silicon first-class citizen - optimized via ARM NEON | ||
- AVX2 support for x86 architectures | ||
- Mixed F16 / F32 precision | ||
- 4-bit quantization support | ||
- Runs on the CPU | ||
|
||
</blockquote> | ||
|
||
# Table of contents | ||
<!-- TOC --> | ||
* [Installation](#installation) | ||
* [Usage](#usage) | ||
* [Supported model](#supported-model) | ||
* [GPT4All](#gpt4all) | ||
* [Discussions and contributions](#discussions-and-contributions) | ||
* [License](#license) | ||
<!-- TOC --> | ||
**NB: Under active development** | ||
|
||
# Installation | ||
1. The easy way is to use the prebuilt wheels | ||
```bash | ||
pip install pyllamacpp | ||
pip install pygpt4all | ||
``` | ||
|
||
However, the compilation process of `llama.cpp` is taking into account the architecture of the target `CPU`, | ||
so you might need to build it from source: | ||
2. Build it from source: | ||
|
||
```shell | ||
git clone --recursive https://github.com/nomic-ai/pyllamacpp && cd pyllamacpp | ||
git clone --recursive https://github.com/nomic-ai/pygpt4all && cd pygpt4all | ||
pip install . | ||
``` | ||
|
||
# Usage | ||
|
||
A simple `Pythonic` API is built on top of `llama.cpp` C/C++ functions. You can call it from Python as follows: | ||
|
||
```python | ||
from pyllamacpp.model import Model | ||
|
||
def new_text_callback(text: str): | ||
print(text, end="", flush=True) | ||
### GPT4All model | ||
|
||
model = Model(ggml_model='./models/gpt4all-model.bin', n_ctx=512) | ||
model.generate("Once upon a time, ", n_predict=55, new_text_callback=new_text_callback, n_threads=8) | ||
``` | ||
If you don't want to use the `callback`, you can get the results from the `generate` method once the inference is finished: | ||
Download a GPT4All model from https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/. The easiest approach is download a file whose name ends in ggml.bin | ||
|
||
```python | ||
generated_text = model.generate("Once upon a time, ", n_predict=55) | ||
print(generated_text) | ||
``` | ||
|
||
## Interactive Mode | ||
|
||
If you want to run the program in interactive mode you can add the `grab_text_callback` function and set `interactive` to True in the generate function. `grab_text_callback` should always return a string unless you wish to signal EOF in which case you should return None. | ||
from pygpt4all.models.gpt4all import GPT4All | ||
|
||
```py | ||
from pyllamacpp.model import Model | ||
def new_text_callback(text): | ||
print(text, end="") | ||
|
||
def new_text_callback(text: str): | ||
print(text, end="", flush=True) | ||
|
||
def grab_text_callback(): | ||
inpt = input() | ||
# To signal EOF, return None | ||
if inpt == "END": | ||
return None | ||
return inpt | ||
|
||
model = Model(ggml_model='./models/gpt4all-model.bin', n_ctx=512) | ||
|
||
# prompt from https://github.com/ggerganov/llama.cpp/blob/master/prompts/chat-with-bob.txt | ||
prompt = """ | ||
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision. To do this, Bob uses a database of information collected from many different sources, including books, journals, online articles, and more. | ||
User: Hello, Bob. | ||
Bob: Hello. How may I help you today? | ||
User: Please tell me the largest city in Europe. | ||
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia. | ||
User:""" | ||
|
||
model.generate(prompt, n_predict=256, new_text_callback=new_text_callback, grab_text_callback=grab_text_callback, interactive=True, repeat_penalty=1.0, antiprompt=["User:"]) | ||
model = GPT4All('./models/ggml-gpt4all-j.bin') | ||
model.generate("Once upon a time, ", n_predict=55, new_text_callback=new_text_callback) | ||
``` | ||
|
||
* You can pass any `llama context` [parameter](https://nomic-ai.github.io/pyllamacpp/#pyllamacpp.constants.LLAMA_CONTEXT_PARAMS_SCHEMA) as a keyword argument to the `Model` class | ||
* You can pass any `gpt` [parameter](https://nomic-ai.github.io/pyllamacpp/#pyllamacpp.constants.GPT_PARAMS_SCHEMA) as a keyword argument to the `generarte` method | ||
* You can always refer to the [short documentation](https://nomic-ai.github.io/pyllamacpp/) for more details. | ||
|
||
### GPT4All-J model | ||
|
||
# Supported model | ||
Download the GPT4All-J model from https://gpt4all.io/models/ggml-gpt4all-j.bin | ||
|
||
### GPT4All | ||
```python | ||
from pygpt4all.models.gpt4all_j import GPT4All_J | ||
|
||
Download a GPT4All model from https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/. | ||
The easiest approach is download a file whose name ends in `ggml.bin`--older model versions require conversion. | ||
def new_text_callback(text): | ||
print(text, end="") | ||
|
||
If you have an older model downloaded that you want to convert, in your terminal run: | ||
```shell | ||
pyllamacpp-convert-gpt4all path/to/gpt4all_model.bin path/to/llama_tokenizer path/to/gpt4all-converted.bin | ||
model = GPT4All_J('./models/ggml-gpt4all-j.bin') | ||
model.generate("Once upon a time, ", n_predict=55, new_text_callback=new_text_callback) | ||
``` | ||
|
||
# FAQs | ||
* Where to find the llama tokenizer? [#5](https://github.com/nomic-ai/pyllamacpp/issues/5) | ||
|
||
# Discussions and contributions | ||
If you find any bug, please open an [issue](https://github.com/nomic-ai/pyllamacpp/issues). | ||
[//]: # (* You can always refer to the [short documentation](https://nomic-ai.github.io/pyllamacpp/) for more details.) | ||
|
||
If you have any feedback, or you want to share how you are using this project, feel free to use the [Discussions](https://github.com/nomic-ai/pyllamacpp/discussions) and open a new topic. | ||
|
||
# License | ||
|
||
This project is licensed under the same license as [llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/LICENSE) (MIT [License](./LICENSE)). | ||
|
||
|
||
This project is licensed under the MIT [License](./LICENSE). | ||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,5 @@ | ||
# PyLLaMaCpp API Reference | ||
# PyGPT-J API Reference | ||
|
||
|
||
::: pyllamacpp.model | ||
::: pygpt4all.models | ||
|
||
::: pyllamacpp.constants | ||
options: | ||
show_if_no_docstring: true | ||
|
||
::: pyllamacpp.utils |
Oops, something went wrong.