Executable: mlx_lm
Commands:
mlx_lm generatemlx_lm chatmlx_lm server
Each command uses the same OptionParser, so --help output shows the same option
set for generate, chat, and server.
Example:
bundle exec ruby -Ilib exe/mlx_lm generate --helpmlx_lm generate --model /path/to/model --prompt "Hello" --max-tokens 64 --temp 0.7 --top-p 0.9mlx_lm chat --model /path/to/model --system-prompt "You are concise." --max-tokens 256 --temp 0.7In chat mode, enter exit, quit, empty input, or EOF (Ctrl-D) to stop.
mlx_lm server --model /path/to/model --host 127.0.0.1 --port 8080Serves:
GET /v1/modelsPOST /v1/chat/completions
Current defaults (MlxLm::CLI.default_args):
prompt: ""max_tokens: 256temp: 0.0top_p: 1.0seed: nilrepetition_penalty: nilrepetition_context_size: 20host: "127.0.0.1"port: 8080system_prompt: nilverbose: false
Used means command logic currently consumes the parsed value. Parsed only means
the option is accepted by parser/help but not used by that command implementation.
| Option | Default | generate |
chat |
server |
Notes |
|---|---|---|---|---|---|
--model |
nil |
Used | Used | Used | Required in practice; passed to LoadUtils.load / Server.start. |
--prompt |
"" |
Used | Parsed only | Parsed only | Chat reads from stdin instead of --prompt. |
--max-tokens |
256 |
Used | Used | Parsed only | Server request body provides per-request max_tokens. |
--temp |
0.0 |
Used | Used | Parsed only | Server request body provides temperature. |
--top-p |
1.0 |
Used | Parsed only | Parsed only | Chat sampler currently ignores top_p. |
--seed |
nil |
Parsed only | Parsed only | Parsed only | Parsed, never applied in current command paths. |
--repetition-penalty |
nil |
Parsed only | Parsed only | Parsed only | Parsed, never wired into make_logits_processors. |
--repetition-context-size |
20 |
Parsed only | Parsed only | Parsed only | Parsed, never wired into make_logits_processors. |
--host |
127.0.0.1 |
Parsed only | Parsed only | Used | Only used by server. |
--port |
8080 |
Parsed only | Parsed only | Used | Only used by server. |
--system-prompt |
nil |
Parsed only | Used | Parsed only | Prepended as a system message for chat. |
--verbose |
false |
Used | Parsed only | Parsed only | Only affects generate output mode/stats. |
- Help text says
--modelcan be a model path or HuggingFace ID, but current runtime loading expects a local model directory (config.json, tokenizer files, safetensors). - Repetition and seed flags are currently parser-level only and do not change runtime generation behavior.
chatalways builds prompts from chat turns viaMlxLm::ChatTemplate.apply;--promptis ignored.serverserves OpenAI-style endpoints but does not consume CLI generation flags (temperature/top-p/etc.).