A python client based on alpaca.cpp.
The most important change W.R.T the original code is that context is not maintained between calls. That is there is no state, so by default it doesn't behave like a chat bot. That said, it's easy to add that simply by keeping track of all user and system utterances.
Build cpp binary:
cd cpp
mkdir -p build
make
cd ..
Set up python environment:
conda create -n alpaca python=3.8
pip install -r requirements.txt
pip install streamlit==1.20.0 # For the streamlit demo below.
conda activate alpaca
python demo_cli.py --alpaca-cli-path cpp/build/alpaca --model-path $MODEL_DIR/ggml-alpaca-7b-q4.bin
Navigate to http://127.0.0.1:8080/docs for the docs after starting it.
export ALPACA_CLI_PATH="$PWD/cpp/build/alpaca"
export ALPACA_MODEL_PATH="$MODEL_DIR/ggml-alpaca-7b-q4.bin"
uvicorn alpaca_api:app --port 8080
You can call also call the API using the streamlit demo (This requires installing streamlit (see above).)
streamlit run demo_st.py
from alpaca import Alpaca, InferenceRequest
alpaca = Alpaca(alpaca_cli_path, model_path)
try:
output = alpaca.run_simple(InferenceRequest(input_text="Are alpacas afraid of snakes?"))["output"]
finally:
alpaca.stop()
Launches the JSON API via docker. Navigate to http://127.0.0.1:8080/docs for the docs after starting it.
docker build . -f docker/Dockerfile -t alpaca_api
docker run --name alpaca_api --mount type=bind,source="$MODEL_DIR",target=/models -p 8080:8080 -d alpaca_api
This is a fork of alpaca.cpp, which itself gives the following credit:
This combines Facebook's LLaMA, Stanford Alpaca, alpaca-lora and corresponding weights by Eric Wang (which uses Jason Phang's implementation of LLaMA on top of Hugging Face Transformers), and llama.cpp by Georgi Gerganov. The chat implementation is based on Matvey Soloviev's Interactive Mode for llama.cpp. Inspired by Simon Willison's getting started guide for LLaMA. Andy Matuschak's thread on adapting this to 13B, using fine tuning weights by Sam Witteveen.