Skip to content

This quickstart will serve a small language model on CPUs, using vLLM inference runtime

rh-ai-quickstart/llm-cpu-serving

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 

Repository files navigation

AI Leadership & Strategy Lifecoach

Deploy a lightweight AI leadership chat assistant using a large language model on CPU-based infrastructure.

Detailed description

The AI Leadership & Strategy Lifecoach is a lightweight quickstart designed to give AI leaders a trusted sounding board for key decisions. Chat with this lifecoach for quick, strategic insights and actionable advice.

This quickstart was designed for environments where GPUs are not available or necessary, making it ideal for lightweight inference use cases, prototyping, or constrained environments. By making the most of vLLM on CPU-based infrastructure, this AI lifecoach can be deployed to almost any OpenShift AI environment.

This quickstart includes a Helm chart for deploying:

  • An OpenShift AI Project.
  • vLLM with CPU support running an instance of TinyLlama.
  • AnythingLLM, a versatile chat interface, running as a workbench and connected to the vLLM.

Use this project to quickly spin up a minimal vLLM instance and start serving models like TinyLlama on CPU—no GPU required. 🚀

Architecture diagrams

architecture.png

Requirements

Minimum hardware requirements

  • No GPU needed! 🤖
  • 2 cores
  • 4 Gi
  • Storage: 5Gi

Recommended hardware requirements

  • No GPU needed! 🤖
  • 8 cores
  • 8 Gi
  • Storage: 5Gi

Note: This version is compiled for Intel CPU's (preferably with AWX512 enabled to be able to run compressed models as well, but optional).
Here's an example machine from AWS that works well: https://instances.vantage.sh/aws/ec2/m6i.4xlarge

Minimum software requirements

  • Red Hat OpenShift 4.16.24 or later
  • Red Hat OpenShift AI 2.16.2 or later
  • Dependencies for Single-model server:
    • Red Hat OpenShift Service Mesh
    • Red Hat OpenShift Serverless

Required user permissions

  • Standard user. No elevated cluster permissions required.

Deploy

Follow the below steps to deploy and test the AI Lifecoach.

Clone

git clone https://github.com/rh-ai-quickstart/llm-cpu-serving.git && \
    cd llm-cpu-serving/  

Create the project

PROJECT="ai-lifecoach"

oc new-project ${PROJECT}

Install with Helm

helm install ${PROJECT} helm/ --namespace  ${PROJECT} 

Wait for pods

oc -n ${PROJECT}  get pods -w
(Output)
NAME                                         READY   STATUS    RESTARTS   AGE
anythingllm-0                                 3/3     Running     0          76s
anythingllm-seed-lchf6                        0/1     Completed   0          76s
tinyllama-1b-cpu-predictor-544bdf75f9-x9fwh   2/2     Running     0          75s

Test

You can get the OpenShift AI Dashboard URL by:

oc get routes rhods-dashboard -n redhat-ods-applications

Once inside the dashboard, navigate to Data Science Projects -> tinyllama-cpu-demo (or what you called your ${PROJECT} if you changed from default).

OpenShift AI Projects

Inside the project you can see Workbenches, open up the one for AnythingLLM.

OpenShift AI Projects

Finally, click on the AI Director Lifecoach Workspace that's pre-created for you and you can start chatting with your AI Leadership & Strategy Lifecoach! :)
Try for example asking it:

Hi, I'm trying to keep up with all the AI changes while also balancing my life but getting overwhelmed, how can I deal with this?

It will provide you a reply and some citations related to the question.

AnythingLLM

Delete

helm uninstall ${PROJECT} --namespace ${PROJECT} 

References

Tags

  • Product: OpenShift AI
  • Use case: Productivity
  • Business challenge: Adopt and scale AI

About

This quickstart will serve a small language model on CPUs, using vLLM inference runtime

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Smarty 100.0%