llm inference online

inference engine

llm inference engine can run llm locality, using this model to inference offline without server.

this server can expose host's ip and port to other client to use llm online.

users exploring the intenet can input their words to model and waiting for its response.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
async_openai_chat_completion_client.py		async_openai_chat_completion_client.py
imitation_concurrent_request.py		imitation_concurrent_request.py
llama3_chat_template.jinja		llama3_chat_template.jinja
model_server.sh		model_server.sh
openai_chat_completion_client.py		openai_chat_completion_client.py
readme.md		readme.md