Skip to content

Python client and REST API for calling an Instruction-Tuned Chat-Style LLM

License

Notifications You must be signed in to change notification settings

muelletm/alpaca.py

 
 

Repository files navigation

Alpaca.py

A python client based on alpaca.cpp.

The most important change W.R.T the original code is that context is not maintained between calls. That is there is no state, so by default it doesn't behave like a chat bot. That said, it's easy to add that simply by keeping track of all user and system utterances.

Build

Build cpp binary:

cd cpp
mkdir -p build
make
cd ..

Set up python environment:

conda create -n alpaca python=3.8
pip install -r requirements.txt 
pip install streamlit==1.20.0  # For the streamlit demo below.
conda activate alpaca

Try it out

Command line

python demo_cli.py --alpaca-cli-path cpp/build/alpaca --model-path $MODEL_DIR/ggml-alpaca-7b-q4.bin 

JSON REST Api (FastApi)

Navigate to http://127.0.0.1:8080/docs for the docs after starting it.

export ALPACA_CLI_PATH="$PWD/cpp/build/alpaca"
export ALPACA_MODEL_PATH="$MODEL_DIR/ggml-alpaca-7b-q4.bin"
uvicorn alpaca_api:app --port 8080

You can call also call the API using the streamlit demo (This requires installing streamlit (see above).)

streamlit run demo_st.py 

Python Module:

from alpaca import Alpaca, InferenceRequest

alpaca = Alpaca(alpaca_cli_path, model_path)
try:
    output = alpaca.run_simple(InferenceRequest(input_text="Are alpacas afraid of snakes?"))["output"]
finally:
    alpaca.stop()

Docker

Launches the JSON API via docker. Navigate to http://127.0.0.1:8080/docs for the docs after starting it.

docker build . -f docker/Dockerfile -t alpaca_api
docker run --name alpaca_api --mount type=bind,source="$MODEL_DIR",target=/models -p 8080:8080 -d alpaca_api

Credit

This is a fork of alpaca.cpp, which itself gives the following credit:

This combines Facebook's LLaMA, Stanford Alpaca, alpaca-lora and corresponding weights by Eric Wang (which uses Jason Phang's implementation of LLaMA on top of Hugging Face Transformers), and llama.cpp by Georgi Gerganov. The chat implementation is based on Matvey Soloviev's Interactive Mode for llama.cpp. Inspired by Simon Willison's getting started guide for LLaMA. Andy Matuschak's thread on adapting this to 13B, using fine tuning weights by Sam Witteveen.

About

Python client and REST API for calling an Instruction-Tuned Chat-Style LLM

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 80.7%
  • C++ 16.5%
  • Python 1.4%
  • Makefile 1.3%
  • Other 0.1%