llm inference engine can run llm locality, using this model to inference offline without server.
this server can expose host's ip and port to other client to use llm online.
users exploring the intenet can input their words to model and waiting for its response.