Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add model inspect cli #776

Merged
merged 4 commits into from
Feb 10, 2025
Merged

Add model inspect cli #776

merged 4 commits into from
Feb 10, 2025

Conversation

engelmi
Copy link
Member

@engelmi engelmi commented Feb 10, 2025

Added show CLI command to display model info

AI Models are shipped with a lot of (meta) information such as the used architecture, the chat template it requires and so on. In order to make these available to the user, the new CLI command show with the option support for --all and --json has been implemented.
At the moment the GGUF file format - which includes the model as well as the (meta) information in one file - is fully supported. Other formats where the model and information is stored in different files are not (yet) supported and only display basic information such as the model name, path and

Signed-off-by: Michael Engel [email protected]

Example: smollm:135 (ollama)
$ ramalama show smollm:135m
smollm:135m
   Path: /home/mengel/.local/share/ramalama/models/ollama/smollm:135m
   Registry: ollama
   Format: GGUF
   Version: 3
   Endianness: little
   Metadata: 39 entries
   Tensors: 272 entries
Example: granite-7b-lab-Q4_K_M.gguf (huggingface)
$ ramalama show huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-
Q4_K_M.gguf
granite-7b-lab-Q4_K_M.gguf
   Path: /home/mengel/.local/share/ramalama/models/huggingface/instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf
   Registry: huggingface
   Format: GGUF
   Version: 3
   Endianness: little
   Metadata: 25 entries
   Tensors: 291 entries
Exmaple: smollm:135 with all
$ ramalama show smollm:135m --all
smollm:135m
   Path: /home/mengel/.local/share/ramalama/models/ollama/smollm:135m
   Registry: ollama
   Format: GGUF
   Version: 3
   Endianness: little
   Metadata: 
      general.architecture: llama
      general.base_model.0.name: SmolLM 135M
      general.base_model.0.organization: HuggingFaceTB
      general.base_model.0.repo_url: https://huggingface.co/HuggingFaceTB/SmolLM-135M
      general.base_model.count: 1
      general.basename: SmolLM
      general.file_type: 2
      general.finetune: Instruct
      general.languages: ['en']
      general.license: apache-2.0
      general.name: SmolLM 135M
      general.organization: HuggingFaceTB
      general.quantization_version: 2
      general.size_label: 135M
      general.tags: ['alignment-handbook', 'trl', 'sft']
      general.type: model
      llama.attention.head_count: 9
      llama.attention.head_count_kv: 3
      llama.attention.layer_norm_rms_epsilon: 9.999999747378752e-06
      llama.block_count: 30
      llama.context_length: 2048
      llama.embedding_length: 576
      llama.feed_forward_length: 1536
      llama.rope.dimension_count: 64
      llama.rope.freq_base: 10000.0
      llama.vocab_size: 49152
      tokenizer.ggml.add_bos_token: False
      tokenizer.ggml.add_space_prefix: False
      tokenizer.ggml.bos_token_id: 1
      tokenizer.ggml.eos_token_id: 2
      tokenizer.ggml.model: gpt2
      tokenizer.ggml.padding_token_id: 2
      tokenizer.ggml.pre: smollm
      tokenizer.ggml.unknown_token_id: 0
   Tensors: 
      0: ('token_embd.weight', 'GGML_TYPE_Q8_0', 2, 0)
      1: ('blk.0.attn_norm.weight', 'GGML_TYPE_F32', 1, 30081024)
      2: ('blk.0.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 30083328)
      3: ('blk.0.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 30580992)
      4: ('blk.0.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 31078656)
      5: ('blk.0.ffn_norm.weight', 'GGML_TYPE_F32', 1, 31576320)
      6: ('blk.0.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 31578624)
      7: ('blk.0.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 31640832)
      8: ('blk.0.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 31827456)
      9: ('blk.0.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 32014080)
      10: ('blk.1.attn_norm.weight', 'GGML_TYPE_F32', 1, 32076288)
      11: ('blk.1.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 32078592)
      12: ('blk.1.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 32576256)
      13: ('blk.1.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 33073920)
      14: ('blk.1.ffn_norm.weight', 'GGML_TYPE_F32', 1, 33571584)
      15: ('blk.1.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 33573888)
      16: ('blk.1.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 33636096)
      17: ('blk.1.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 33822720)
      18: ('blk.1.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 34009344)
      19: ('blk.10.attn_norm.weight', 'GGML_TYPE_F32', 1, 34071552)
      20: ('blk.10.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 34073856)
      21: ('blk.10.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 34571520)
      22: ('blk.10.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 35069184)
      23: ('blk.10.ffn_norm.weight', 'GGML_TYPE_F32', 1, 35566848)
      24: ('blk.10.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 35569152)
      25: ('blk.10.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 35631360)
      26: ('blk.10.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 35817984)
      27: ('blk.10.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 36004608)
      28: ('blk.11.attn_norm.weight', 'GGML_TYPE_F32', 1, 36066816)
      29: ('blk.11.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 36069120)
      30: ('blk.11.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 36566784)
      31: ('blk.11.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 37064448)
      32: ('blk.11.ffn_norm.weight', 'GGML_TYPE_F32', 1, 37562112)
      33: ('blk.11.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 37564416)
      34: ('blk.11.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 37626624)
      35: ('blk.11.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 37813248)
      36: ('blk.11.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 37999872)
      37: ('blk.12.attn_norm.weight', 'GGML_TYPE_F32', 1, 38062080)
      38: ('blk.12.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 38064384)
      39: ('blk.12.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 38562048)
      40: ('blk.12.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 39059712)
      41: ('blk.12.ffn_norm.weight', 'GGML_TYPE_F32', 1, 39557376)
      42: ('blk.12.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 39559680)
      43: ('blk.12.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 39621888)
      44: ('blk.12.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 39808512)
      45: ('blk.12.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 39995136)
      46: ('blk.13.attn_norm.weight', 'GGML_TYPE_F32', 1, 40057344)
      47: ('blk.13.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 40059648)
      48: ('blk.13.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 40557312)
      49: ('blk.13.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 41054976)
      50: ('blk.13.ffn_norm.weight', 'GGML_TYPE_F32', 1, 41552640)
      51: ('blk.13.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 41554944)
      52: ('blk.13.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 41617152)
      53: ('blk.13.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 41803776)
      54: ('blk.13.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 41990400)
      55: ('blk.14.attn_norm.weight', 'GGML_TYPE_F32', 1, 42052608)
      56: ('blk.14.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 42054912)
      57: ('blk.14.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 42552576)
      58: ('blk.14.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 43050240)
      59: ('blk.14.ffn_norm.weight', 'GGML_TYPE_F32', 1, 43547904)
      60: ('blk.14.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 43550208)
      61: ('blk.14.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 43612416)
      62: ('blk.14.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 43799040)
      63: ('blk.14.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 43985664)
      64: ('blk.15.attn_norm.weight', 'GGML_TYPE_F32', 1, 44047872)
      65: ('blk.15.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 44050176)
      66: ('blk.15.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 44547840)
      67: ('blk.15.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 45045504)
      68: ('blk.15.ffn_norm.weight', 'GGML_TYPE_F32', 1, 45543168)
      69: ('blk.15.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 45545472)
      70: ('blk.15.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 45607680)
      71: ('blk.15.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 45794304)
      72: ('blk.15.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 45980928)
      73: ('blk.16.attn_norm.weight', 'GGML_TYPE_F32', 1, 46043136)
      74: ('blk.16.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 46045440)
      75: ('blk.16.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 46543104)
      76: ('blk.16.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 47040768)
      77: ('blk.16.ffn_norm.weight', 'GGML_TYPE_F32', 1, 47538432)
      78: ('blk.16.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 47540736)
      79: ('blk.16.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 47602944)
      80: ('blk.16.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 47789568)
      81: ('blk.16.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 47976192)
      82: ('blk.17.attn_norm.weight', 'GGML_TYPE_F32', 1, 48038400)
      83: ('blk.17.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 48040704)
      84: ('blk.17.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 48538368)
      85: ('blk.17.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 49036032)
      86: ('blk.17.ffn_norm.weight', 'GGML_TYPE_F32', 1, 49533696)
      87: ('blk.17.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 49536000)
      88: ('blk.17.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 49598208)
      89: ('blk.17.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 49784832)
      90: ('blk.17.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 49971456)
      91: ('blk.18.attn_norm.weight', 'GGML_TYPE_F32', 1, 50033664)
      92: ('blk.18.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 50035968)
      93: ('blk.18.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 50533632)
      94: ('blk.18.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 51031296)
      95: ('blk.18.ffn_norm.weight', 'GGML_TYPE_F32', 1, 51528960)
      96: ('blk.18.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 51531264)
      97: ('blk.18.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 51593472)
      98: ('blk.18.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 51780096)
      99: ('blk.18.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 51966720)
      100: ('blk.19.attn_norm.weight', 'GGML_TYPE_F32', 1, 52028928)
      101: ('blk.19.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 52031232)
      102: ('blk.19.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 52528896)
      103: ('blk.19.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 53026560)
      104: ('blk.19.ffn_norm.weight', 'GGML_TYPE_F32', 1, 53524224)
      105: ('blk.19.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 53526528)
      106: ('blk.19.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 53588736)
      107: ('blk.19.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 53775360)
      108: ('blk.19.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 53961984)
      109: ('blk.2.attn_norm.weight', 'GGML_TYPE_F32', 1, 54024192)
      110: ('blk.2.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 54026496)
      111: ('blk.2.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 54524160)
      112: ('blk.2.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 55021824)
      113: ('blk.2.ffn_norm.weight', 'GGML_TYPE_F32', 1, 55519488)
      114: ('blk.2.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 55521792)
      115: ('blk.2.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 55584000)
      116: ('blk.2.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 55770624)
      117: ('blk.2.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 55957248)
      118: ('blk.20.attn_norm.weight', 'GGML_TYPE_F32', 1, 56019456)
      119: ('blk.20.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 56021760)
      120: ('blk.20.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 56519424)
      121: ('blk.20.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 57017088)
      122: ('blk.20.ffn_norm.weight', 'GGML_TYPE_F32', 1, 57514752)
      123: ('blk.20.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 57517056)
      124: ('blk.20.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 57579264)
      125: ('blk.20.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 57765888)
      126: ('blk.20.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 57952512)
      127: ('blk.21.attn_norm.weight', 'GGML_TYPE_F32', 1, 58014720)
      128: ('blk.21.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 58017024)
      129: ('blk.21.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 58514688)
      130: ('blk.21.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 59012352)
      131: ('blk.21.ffn_norm.weight', 'GGML_TYPE_F32', 1, 59510016)
      132: ('blk.21.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 59512320)
      133: ('blk.21.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 59574528)
      134: ('blk.21.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 59761152)
      135: ('blk.21.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 59947776)
      136: ('blk.22.attn_norm.weight', 'GGML_TYPE_F32', 1, 60009984)
      137: ('blk.22.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 60012288)
      138: ('blk.22.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 60509952)
      139: ('blk.22.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 61007616)
      140: ('blk.22.ffn_norm.weight', 'GGML_TYPE_F32', 1, 61505280)
      141: ('blk.22.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 61507584)
      142: ('blk.22.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 61569792)
      143: ('blk.22.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 61756416)
      144: ('blk.22.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 61943040)
      145: ('blk.23.attn_norm.weight', 'GGML_TYPE_F32', 1, 62005248)
      146: ('blk.23.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 62007552)
      147: ('blk.23.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 62505216)
      148: ('blk.23.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 63002880)
      149: ('blk.23.ffn_norm.weight', 'GGML_TYPE_F32', 1, 63500544)
      150: ('blk.23.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 63502848)
      151: ('blk.23.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 63565056)
      152: ('blk.23.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 63751680)
      153: ('blk.23.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 63938304)
      154: ('blk.24.attn_norm.weight', 'GGML_TYPE_F32', 1, 64000512)
      155: ('blk.24.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 64002816)
      156: ('blk.24.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 64500480)
      157: ('blk.24.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 64998144)
      158: ('blk.24.ffn_norm.weight', 'GGML_TYPE_F32', 1, 65495808)
      159: ('blk.24.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 65498112)
      160: ('blk.24.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 65560320)
      161: ('blk.24.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 65746944)
      162: ('blk.24.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 65933568)
      163: ('blk.25.attn_norm.weight', 'GGML_TYPE_F32', 1, 65995776)
      164: ('blk.25.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 65998080)
      165: ('blk.25.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 66495744)
      166: ('blk.25.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 66993408)
      167: ('blk.25.ffn_norm.weight', 'GGML_TYPE_F32', 1, 67491072)
      168: ('blk.25.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 67493376)
      169: ('blk.25.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 67555584)
      170: ('blk.25.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 67742208)
      171: ('blk.25.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 67928832)
      172: ('blk.26.attn_norm.weight', 'GGML_TYPE_F32', 1, 67991040)
      173: ('blk.26.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 67993344)
      174: ('blk.26.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 68491008)
      175: ('blk.26.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 68988672)
      176: ('blk.26.ffn_norm.weight', 'GGML_TYPE_F32', 1, 69486336)
      177: ('blk.26.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 69488640)
      178: ('blk.26.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 69550848)
      179: ('blk.26.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 69737472)
      180: ('blk.26.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 69924096)
      181: ('blk.27.attn_norm.weight', 'GGML_TYPE_F32', 1, 69986304)
      182: ('blk.27.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 69988608)
      183: ('blk.27.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 70486272)
      184: ('blk.27.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 70983936)
      185: ('blk.27.ffn_norm.weight', 'GGML_TYPE_F32', 1, 71481600)
      186: ('blk.27.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 71483904)
      187: ('blk.27.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 71546112)
      188: ('blk.27.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 71732736)
      189: ('blk.27.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 71919360)
      190: ('blk.28.attn_norm.weight', 'GGML_TYPE_F32', 1, 71981568)
      191: ('blk.28.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 71983872)
      192: ('blk.28.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 72481536)
      193: ('blk.28.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 72979200)
      194: ('blk.28.ffn_norm.weight', 'GGML_TYPE_F32', 1, 73476864)
      195: ('blk.28.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 73479168)
      196: ('blk.28.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 73541376)
      197: ('blk.28.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 73728000)
      198: ('blk.28.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 73914624)
      199: ('blk.29.attn_norm.weight', 'GGML_TYPE_F32', 1, 73976832)
      200: ('blk.29.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 73979136)
      201: ('blk.29.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 74476800)
      202: ('blk.29.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 74974464)
      203: ('blk.29.ffn_norm.weight', 'GGML_TYPE_F32', 1, 75472128)
      204: ('blk.29.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 75474432)
      205: ('blk.29.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 75536640)
      206: ('blk.29.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 75723264)
      207: ('blk.29.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 75909888)
      208: ('blk.3.attn_norm.weight', 'GGML_TYPE_F32', 1, 75972096)
      209: ('blk.3.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 75974400)
      210: ('blk.3.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 76472064)
      211: ('blk.3.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 76969728)
      212: ('blk.3.ffn_norm.weight', 'GGML_TYPE_F32', 1, 77467392)
      213: ('blk.3.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 77469696)
      214: ('blk.3.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 77531904)
      215: ('blk.3.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 77718528)
      216: ('blk.3.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 77905152)
      217: ('blk.4.attn_norm.weight', 'GGML_TYPE_F32', 1, 77967360)
      218: ('blk.4.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 77969664)
      219: ('blk.4.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 78467328)
      220: ('blk.4.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 78964992)
      221: ('blk.4.ffn_norm.weight', 'GGML_TYPE_F32', 1, 79462656)
      222: ('blk.4.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 79464960)
      223: ('blk.4.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 79527168)
      224: ('blk.4.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 79713792)
      225: ('blk.4.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 79900416)
      226: ('blk.5.attn_norm.weight', 'GGML_TYPE_F32', 1, 79962624)
      227: ('blk.5.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 79964928)
      228: ('blk.5.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 80462592)
      229: ('blk.5.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 80960256)
      230: ('blk.5.ffn_norm.weight', 'GGML_TYPE_F32', 1, 81457920)
      231: ('blk.5.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 81460224)
      232: ('blk.5.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 81522432)
      233: ('blk.5.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 81709056)
      234: ('blk.5.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 81895680)
      235: ('blk.6.attn_norm.weight', 'GGML_TYPE_F32', 1, 81957888)
      236: ('blk.6.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 81960192)
      237: ('blk.6.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 82457856)
      238: ('blk.6.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 82955520)
      239: ('blk.6.ffn_norm.weight', 'GGML_TYPE_F32', 1, 83453184)
      240: ('blk.6.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 83455488)
      241: ('blk.6.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 83517696)
      242: ('blk.6.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 83704320)
      243: ('blk.6.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 83890944)
      244: ('blk.7.attn_norm.weight', 'GGML_TYPE_F32', 1, 83953152)
      245: ('blk.7.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 83955456)
      246: ('blk.7.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 84453120)
      247: ('blk.7.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 84950784)
      248: ('blk.7.ffn_norm.weight', 'GGML_TYPE_F32', 1, 85448448)
      249: ('blk.7.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 85450752)
      250: ('blk.7.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 85512960)
      251: ('blk.7.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 85699584)
      252: ('blk.7.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 85886208)
      253: ('blk.8.attn_norm.weight', 'GGML_TYPE_F32', 1, 85948416)
      254: ('blk.8.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 85950720)
      255: ('blk.8.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 86448384)
      256: ('blk.8.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 86946048)
      257: ('blk.8.ffn_norm.weight', 'GGML_TYPE_F32', 1, 87443712)
      258: ('blk.8.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 87446016)
      259: ('blk.8.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 87508224)
      260: ('blk.8.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 87694848)
      261: ('blk.8.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 87881472)
      262: ('blk.9.attn_norm.weight', 'GGML_TYPE_F32', 1, 87943680)
      263: ('blk.9.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 87945984)
      264: ('blk.9.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 88443648)
      265: ('blk.9.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 88941312)
      266: ('blk.9.ffn_norm.weight', 'GGML_TYPE_F32', 1, 89438976)
      267: ('blk.9.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 89441280)
      268: ('blk.9.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 89503488)
      269: ('blk.9.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 89690112)
      270: ('blk.9.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 89876736)
      271: ('output_norm.weight', 'GGML_TYPE_F32', 1, 89938944)

Fix: Set directory and filename in Model base class

The directory and filename of a model is determined by the respective model implementation, e.g. Ollama or Huggingface. If, however, these two fields are not defined in the model base class, then accessing them for a specific model instance might fail since these do not exist.

Signed-off-by: Michael Engel [email protected]

Summary by Sourcery

New Features:

  • Added "show" CLI command to display model information, including metadata and tensors, with optional --all and --json flags.

Copy link
Contributor

sourcery-ai bot commented Feb 10, 2025

Reviewer's Guide by Sourcery

This pull request introduces a new 'show' CLI command that displays model information, including support for both concise and full details (when using the --all and --json options). The changes include enhancements for GGUF file format support, refactoring of the model base class to properly set and retrieve directory and filename information, and cleanup of duplicate implementations in model subclasses. Additionally, new modules for parsing GGUF data and serializing model metadata have been added.

Sequence diagram for the 'show' CLI command

sequenceDiagram
    actor User as User
    participant CLI as Show CLI
    participant Model as New(Model)
    participant Parser as GGUFInfoParser
    participant BaseInfo as ModelInfoBase

    User->>CLI: Execute 'ramalama show MODEL [--json] [--all]'
    CLI->>Model: Instantiate New(args.MODEL, args)
    CLI->>Parser: is_model_gguf(model, args)?
    alt GGUF model supported
      Parser-->>CLI: true
      CLI->>Parser: parse(model, args)
      Parser-->>CLI: gguf_info
      CLI->>CLI: gguf_info.serialize(json, all)
      CLI-->>User: Display full model info output
    else Not a GGUF model
      Parser-->>CLI: false
      CLI->>Model: get filename, get_model_path(args), get_model_registry(args)
      CLI->>BaseInfo: Create ModelInfoBase instance
      BaseInfo-->>CLI: Instance created
      CLI->>BaseInfo: serialize(json)
      CLI-->>User: Display base model info output
    end
Loading

Class diagram for Model, GGUFInfoParser, and ModelInfo classes

classDiagram
    class Model {
        - model: string
        - directory: string
        - filename: string
        + __init__(model)
        + get_model_path(args)
        + get_model_registry(args)
        + login(args)
        + build_exec_args_bench(args, model_path)
    }

    class ModelInfoBase {
        + Name: string
        + Registry: string
        + Path: string
        + serialize(json: bool): string
        + to_json(): string
    }

    class GGUFModelInfo {
        + Format: string
        + Version: number
        + Metadata: Map
        + Tensors: list
        + LittleEndian: bool
        + serialize(json: bool, all: bool): string
        + to_json(all: bool): string
        <<constructor>>
    }

    class GGUFInfoParser {
        <<static>> + is_model_gguf(model, cli_args): bool
        <<static>> + parse(model, cli_args): GGUFModelInfo
    }

    ModelInfoBase <|-- GGUFModelInfo
Loading

File-Level Changes

Change Details Files
Implemented new CLI command 'show' for displaying AI model information.
  • Added a new 'show_parser' function to register the command and its arguments in the CLI.
  • Created the 'show_cli' function to handle the command logic, checking for GGUF format and printing model info in either plain text or JSON format.
ramalama/cli.py
Refactored the model base class to set directory and filename properly.
  • Updated the init method in the base Model class to split the model string into directory and filename.
  • Added a get_model_registry method to determine and return the model registry from the model path.
ramalama/model.py
Removed duplicate logic for setting directory and filename from model subclass implementations.
  • Removed the duplicated code in the Huggingface implementation that previously split the model string.
  • Removed initialization of filename in the URL-based model implementation to centralize behavior in the base class.
ramalama/huggingface.py
ramalama/url.py
Added GGUF parser to support models with embedded metadata and tensors.
  • Introduced a new module that defines GGML type enums and provides parsing logic for GGUF formatted model files.
  • Implemented functions to read strings, numbers, booleans, and values of various types according to the GGUF specification.
ramalama/gguf_parser.py
Defined model info serialization for displaying model metadata.
  • Added a new module that defines ModelInfoBase and GGUFModelInfo along with a Tensor dataclass.
  • Implemented methods to serialize the model information into human-readable plain text and JSON formats, and added support for detailed display using the '--all' flag.
ramalama/model_info.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @engelmi - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a try-except block around the GGUF parsing logic to handle potential file errors more gracefully.
  • The ModelInfoBase.serialize method could use a more robust string formatting approach.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

ramalama/gguf_parser.py Show resolved Hide resolved
@engelmi
Copy link
Member Author

engelmi commented Feb 10, 2025

In a follow-up PR we can introduce downloading the metadata in separate files (non-gguf encoded models) such the tokenizer_config.json which is used for granite-8b-code-instruct-4k on huggingface, for example.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 10, 2025

New files must be added to install.sh, looks like our CI didn't catch that here

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 10, 2025

Looks like a good idea to me! CI needs to be fixed up

@engelmi
Copy link
Member Author

engelmi commented Feb 10, 2025

New files must be added to install.sh, looks like our CI didn't catch that here

Ah, forgot about that... let me check if I can fix the CI and the CI check as well.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 10, 2025

We did run this as part of CI in the past, it would have picked up many of these things:

https://github.com/containers/ramalama/blob/main/test/ci.sh

it's an option to re-add that. Our CI got reworked a few times and it likely slipped through the cracks. The files check was just this grep:

check_packaging() {
  cd ramalama
  for i in *.py; do
    grep -q "$i" ../install.sh
  done

  cd -
}

which seems kinda basic, but meh, it worked. Slightly better would be an install and smoke test, just execute something very simple on a properly installed system.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 10, 2025

I'm assuming we considered the:

llama-batched                  llama-embedding                llama-gguf-hash                llama-llava-clip-quantize-cli  llama-minicpmv-cli             llama-qwen2vl-cli              llama-simple-chat
llama-batched-bench            llama-eval-callback            llama-gguf-split               llama-lookahead                llama-parallel                 llama-retrieval                llama-speculative
llama-bench                    llama-export-lora              llama-gritlm                   llama-lookup                   llama-passkey                  llama-run                      llama-speculative-simple
llama-cli                      llama-gbnf-validator           llama-imatrix                  llama-lookup-create            llama-perplexity               llama-save-load-state          llama-tokenize
llama-convert-llama2c-to-ggml  llama-gen-docs                 llama-infill                   llama-lookup-merge             llama-quantize                 llama-server                   llama-tts
llama-cvector-generator        llama-gguf                     llama-llava-cli                llama-lookup-stats             llama-quantize-stats           llama-simple

binaries, some of them might already implement this... I'm ok with a from scratch python3 solution though 😄

@engelmi
Copy link
Member Author

engelmi commented Feb 10, 2025

I'm assuming we considered the:

llama-batched                  llama-embedding                llama-gguf-hash                llama-llava-clip-quantize-cli  llama-minicpmv-cli             llama-qwen2vl-cli              llama-simple-chat
llama-batched-bench            llama-eval-callback            llama-gguf-split               llama-lookahead                llama-parallel                 llama-retrieval                llama-speculative
llama-bench                    llama-export-lora              llama-gritlm                   llama-lookup                   llama-passkey                  llama-run                      llama-speculative-simple
llama-cli                      llama-gbnf-validator           llama-imatrix                  llama-lookup-create            llama-perplexity               llama-save-load-state          llama-tokenize
llama-convert-llama2c-to-ggml  llama-gen-docs                 llama-infill                   llama-lookup-merge             llama-quantize                 llama-server                   llama-tts
llama-cvector-generator        llama-gguf                     llama-llava-cli                llama-lookup-stats             llama-quantize-stats           llama-simple

binaries, some of them might already implement this... I'm ok with a from scratch python3 solution though 😄

Haven't thought about this, to be honest.
Quickly checking, it seems that llama-gguf would also provide the GGUF information. However, considering we probably want to combine this with non-GGUF models, providing different formats (e.g. --json) and maybe using this inside ramalama further, I think the python implementation suit better. But up to you :) @ericcurtin

@ericcurtin
Copy link
Collaborator

SGTM

@engelmi engelmi force-pushed the add-model-info-cli branch 3 times, most recently from 98aead8 to a82bcd6 Compare February 10, 2025 13:32
docs/ramalama-show.1.md Outdated Show resolved Hide resolved
docs/ramalama-show.1.md Outdated Show resolved Hide resolved
docs/ramalama-show.1.md Outdated Show resolved Hide resolved
ramalama/cli.py Outdated Show resolved Hide resolved
install.sh Outdated Show resolved Hide resolved
@rhatdan
Copy link
Member

rhatdan commented Feb 10, 2025

@engelmi Great job, I really like this feature.

BUT:

These need tests.
I really prefer inspect over show.

We potentially will need to add something like --format option to allow users goformat stuff like podman inspect.

docs/ramalama.1.md Outdated Show resolved Hide resolved
ramalama/cli.py Outdated Show resolved Hide resolved
@rhatdan rhatdan changed the title Add model info cli Add model inspect cli Feb 10, 2025
@rhatdan
Copy link
Member

rhatdan commented Feb 10, 2025

On next pass please fix the commit message show->inspect.

@engelmi
Copy link
Member Author

engelmi commented Feb 10, 2025

@engelmi Great job, I really like this feature.

BUT:

These need tests. I really prefer inspect over show.

We potentially will need to add something like --format option to allow users goformat stuff like podman inspect.

Thanks!
Yes, changed show -> inspect.
If I see it correctly, there are only system tests and no unit tests (yet). Should I start adding unit tests or are some system tests fine? @rhatdan

@rhatdan
Copy link
Member

rhatdan commented Feb 10, 2025

I would love to start adding unit tests, but for now system tests are fine.

I just want you to add a new test/system/100-inspect.bats or something and add a few tests on the tiny models.

@rhatdan
Copy link
Member

rhatdan commented Feb 10, 2025

Rename ramalama/model_info.py to ramalama/model_inspect.py

ramalama/cli.py Outdated Show resolved Hide resolved
@engelmi engelmi force-pushed the add-model-info-cli branch 3 times, most recently from 66627ee to 64843f4 Compare February 10, 2025 16:35
AI Models are shipped with a lot of (meta) information such as
the used architecture, the chat template it requires and so on.
In order to make these available to the user, the new CLI command
inspect with the option support for --all and --json has been
implemented.
At the moment the GGUF file format - which includes the model as
well as the (meta) information in one file - is fully supported.
Other formats where the model and information is stored in different
files are not (yet) supported and only display basic information
such as the model name, path and registry.

Signed-off-by: Michael Engel <[email protected]>
The directory and filename of a model is determined by the
respective model implementation, e.g. Ollama or Huggingface.
If, however, these two fields are not defined in the model
base class, then accessing them for a specific model instance
might fail since these do not exist.

Signed-off-by: Michael Engel <[email protected]>
@rhatdan rhatdan merged commit 0d841ec into containers:main Feb 10, 2025
12 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants