Add model inspect cli #776

engelmi · 2025-02-10T09:31:19Z

Added show CLI command to display model info

AI Models are shipped with a lot of (meta) information such as the used architecture, the chat template it requires and so on. In order to make these available to the user, the new CLI command show with the option support for --all and --json has been implemented.
At the moment the GGUF file format - which includes the model as well as the (meta) information in one file - is fully supported. Other formats where the model and information is stored in different files are not (yet) supported and only display basic information such as the model name, path and

Signed-off-by: Michael Engel [email protected]

Example: smollm:135 (ollama)

$ ramalama show smollm:135m
smollm:135m
   Path: /home/mengel/.local/share/ramalama/models/ollama/smollm:135m
   Registry: ollama
   Format: GGUF
   Version: 3
   Endianness: little
   Metadata: 39 entries
   Tensors: 272 entries

Example: granite-7b-lab-Q4_K_M.gguf (huggingface)

$ ramalama show huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-
Q4_K_M.gguf
granite-7b-lab-Q4_K_M.gguf
   Path: /home/mengel/.local/share/ramalama/models/huggingface/instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf
   Registry: huggingface
   Format: GGUF
   Version: 3
   Endianness: little
   Metadata: 25 entries
   Tensors: 291 entries

Exmaple: smollm:135 with all

$ ramalama show smollm:135m --all
smollm:135m
   Path: /home/mengel/.local/share/ramalama/models/ollama/smollm:135m
   Registry: ollama
   Format: GGUF
   Version: 3
   Endianness: little
   Metadata: 
      general.architecture: llama
      general.base_model.0.name: SmolLM 135M
      general.base_model.0.organization: HuggingFaceTB
      general.base_model.0.repo_url: https://huggingface.co/HuggingFaceTB/SmolLM-135M
      general.base_model.count: 1
      general.basename: SmolLM
      general.file_type: 2
      general.finetune: Instruct
      general.languages: ['en']
      general.license: apache-2.0
      general.name: SmolLM 135M
      general.organization: HuggingFaceTB
      general.quantization_version: 2
      general.size_label: 135M
      general.tags: ['alignment-handbook', 'trl', 'sft']
      general.type: model
      llama.attention.head_count: 9
      llama.attention.head_count_kv: 3
      llama.attention.layer_norm_rms_epsilon: 9.999999747378752e-06
      llama.block_count: 30
      llama.context_length: 2048
      llama.embedding_length: 576
      llama.feed_forward_length: 1536
      llama.rope.dimension_count: 64
      llama.rope.freq_base: 10000.0
      llama.vocab_size: 49152
      tokenizer.ggml.add_bos_token: False
      tokenizer.ggml.add_space_prefix: False
      tokenizer.ggml.bos_token_id: 1
      tokenizer.ggml.eos_token_id: 2
      tokenizer.ggml.model: gpt2
      tokenizer.ggml.padding_token_id: 2
      tokenizer.ggml.pre: smollm
      tokenizer.ggml.unknown_token_id: 0
   Tensors: 
      0: ('token_embd.weight', 'GGML_TYPE_Q8_0', 2, 0)
      1: ('blk.0.attn_norm.weight', 'GGML_TYPE_F32', 1, 30081024)
      2: ('blk.0.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 30083328)
      3: ('blk.0.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 30580992)
      4: ('blk.0.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 31078656)
      5: ('blk.0.ffn_norm.weight', 'GGML_TYPE_F32', 1, 31576320)
      6: ('blk.0.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 31578624)
      7: ('blk.0.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 31640832)
      8: ('blk.0.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 31827456)
      9: ('blk.0.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 32014080)
      10: ('blk.1.attn_norm.weight', 'GGML_TYPE_F32', 1, 32076288)
      11: ('blk.1.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 32078592)
      12: ('blk.1.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 32576256)
      13: ('blk.1.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 33073920)
      14: ('blk.1.ffn_norm.weight', 'GGML_TYPE_F32', 1, 33571584)
      15: ('blk.1.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 33573888)
      16: ('blk.1.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 33636096)
      17: ('blk.1.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 33822720)
      18: ('blk.1.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 34009344)
      19: ('blk.10.attn_norm.weight', 'GGML_TYPE_F32', 1, 34071552)
      20: ('blk.10.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 34073856)
      21: ('blk.10.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 34571520)
      22: ('blk.10.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 35069184)
      23: ('blk.10.ffn_norm.weight', 'GGML_TYPE_F32', 1, 35566848)
      24: ('blk.10.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 35569152)
      25: ('blk.10.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 35631360)
      26: ('blk.10.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 35817984)
      27: ('blk.10.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 36004608)
      28: ('blk.11.attn_norm.weight', 'GGML_TYPE_F32', 1, 36066816)
      29: ('blk.11.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 36069120)
      30: ('blk.11.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 36566784)
      31: ('blk.11.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 37064448)
      32: ('blk.11.ffn_norm.weight', 'GGML_TYPE_F32', 1, 37562112)
      33: ('blk.11.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 37564416)
      34: ('blk.11.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 37626624)
      35: ('blk.11.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 37813248)
      36: ('blk.11.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 37999872)
      37: ('blk.12.attn_norm.weight', 'GGML_TYPE_F32', 1, 38062080)
      38: ('blk.12.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 38064384)
      39: ('blk.12.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 38562048)
      40: ('blk.12.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 39059712)
      41: ('blk.12.ffn_norm.weight', 'GGML_TYPE_F32', 1, 39557376)
      42: ('blk.12.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 39559680)
      43: ('blk.12.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 39621888)
      44: ('blk.12.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 39808512)
      45: ('blk.12.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 39995136)
      46: ('blk.13.attn_norm.weight', 'GGML_TYPE_F32', 1, 40057344)
      47: ('blk.13.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 40059648)
      48: ('blk.13.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 40557312)
      49: ('blk.13.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 41054976)
      50: ('blk.13.ffn_norm.weight', 'GGML_TYPE_F32', 1, 41552640)
      51: ('blk.13.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 41554944)
      52: ('blk.13.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 41617152)
      53: ('blk.13.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 41803776)
      54: ('blk.13.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 41990400)
      55: ('blk.14.attn_norm.weight', 'GGML_TYPE_F32', 1, 42052608)
      56: ('blk.14.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 42054912)
      57: ('blk.14.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 42552576)
      58: ('blk.14.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 43050240)
      59: ('blk.14.ffn_norm.weight', 'GGML_TYPE_F32', 1, 43547904)
      60: ('blk.14.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 43550208)
      61: ('blk.14.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 43612416)
      62: ('blk.14.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 43799040)
      63: ('blk.14.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 43985664)
      64: ('blk.15.attn_norm.weight', 'GGML_TYPE_F32', 1, 44047872)
      65: ('blk.15.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 44050176)
      66: ('blk.15.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 44547840)
      67: ('blk.15.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 45045504)
      68: ('blk.15.ffn_norm.weight', 'GGML_TYPE_F32', 1, 45543168)
      69: ('blk.15.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 45545472)
      70: ('blk.15.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 45607680)
      71: ('blk.15.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 45794304)
      72: ('blk.15.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 45980928)
      73: ('blk.16.attn_norm.weight', 'GGML_TYPE_F32', 1, 46043136)
      74: ('blk.16.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 46045440)
      75: ('blk.16.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 46543104)
      76: ('blk.16.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 47040768)
      77: ('blk.16.ffn_norm.weight', 'GGML_TYPE_F32', 1, 47538432)
      78: ('blk.16.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 47540736)
      79: ('blk.16.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 47602944)
      80: ('blk.16.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 47789568)
      81: ('blk.16.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 47976192)
      82: ('blk.17.attn_norm.weight', 'GGML_TYPE_F32', 1, 48038400)
      83: ('blk.17.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 48040704)
      84: ('blk.17.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 48538368)
      85: ('blk.17.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 49036032)
      86: ('blk.17.ffn_norm.weight', 'GGML_TYPE_F32', 1, 49533696)
      87: ('blk.17.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 49536000)
      88: ('blk.17.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 49598208)
      89: ('blk.17.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 49784832)
      90: ('blk.17.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 49971456)
      91: ('blk.18.attn_norm.weight', 'GGML_TYPE_F32', 1, 50033664)
      92: ('blk.18.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 50035968)
      93: ('blk.18.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 50533632)
      94: ('blk.18.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 51031296)
      95: ('blk.18.ffn_norm.weight', 'GGML_TYPE_F32', 1, 51528960)
      96: ('blk.18.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 51531264)
      97: ('blk.18.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 51593472)
      98: ('blk.18.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 51780096)
      99: ('blk.18.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 51966720)
      100: ('blk.19.attn_norm.weight', 'GGML_TYPE_F32', 1, 52028928)
      101: ('blk.19.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 52031232)
      102: ('blk.19.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 52528896)
      103: ('blk.19.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 53026560)
      104: ('blk.19.ffn_norm.weight', 'GGML_TYPE_F32', 1, 53524224)
      105: ('blk.19.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 53526528)
      106: ('blk.19.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 53588736)
      107: ('blk.19.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 53775360)
      108: ('blk.19.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 53961984)
      109: ('blk.2.attn_norm.weight', 'GGML_TYPE_F32', 1, 54024192)
      110: ('blk.2.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 54026496)
      111: ('blk.2.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 54524160)
      112: ('blk.2.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 55021824)
      113: ('blk.2.ffn_norm.weight', 'GGML_TYPE_F32', 1, 55519488)
      114: ('blk.2.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 55521792)
      115: ('blk.2.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 55584000)
      116: ('blk.2.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 55770624)
      117: ('blk.2.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 55957248)
      118: ('blk.20.attn_norm.weight', 'GGML_TYPE_F32', 1, 56019456)
      119: ('blk.20.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 56021760)
      120: ('blk.20.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 56519424)
      121: ('blk.20.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 57017088)
      122: ('blk.20.ffn_norm.weight', 'GGML_TYPE_F32', 1, 57514752)
      123: ('blk.20.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 57517056)
      124: ('blk.20.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 57579264)
      125: ('blk.20.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 57765888)
      126: ('blk.20.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 57952512)
      127: ('blk.21.attn_norm.weight', 'GGML_TYPE_F32', 1, 58014720)
      128: ('blk.21.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 58017024)
      129: ('blk.21.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 58514688)
      130: ('blk.21.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 59012352)
      131: ('blk.21.ffn_norm.weight', 'GGML_TYPE_F32', 1, 59510016)
      132: ('blk.21.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 59512320)
      133: ('blk.21.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 59574528)
      134: ('blk.21.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 59761152)
      135: ('blk.21.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 59947776)
      136: ('blk.22.attn_norm.weight', 'GGML_TYPE_F32', 1, 60009984)
      137: ('blk.22.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 60012288)
      138: ('blk.22.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 60509952)
      139: ('blk.22.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 61007616)
      140: ('blk.22.ffn_norm.weight', 'GGML_TYPE_F32', 1, 61505280)
      141: ('blk.22.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 61507584)
      142: ('blk.22.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 61569792)
      143: ('blk.22.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 61756416)
      144: ('blk.22.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 61943040)
      145: ('blk.23.attn_norm.weight', 'GGML_TYPE_F32', 1, 62005248)
      146: ('blk.23.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 62007552)
      147: ('blk.23.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 62505216)
      148: ('blk.23.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 63002880)
      149: ('blk.23.ffn_norm.weight', 'GGML_TYPE_F32', 1, 63500544)
      150: ('blk.23.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 63502848)
      151: ('blk.23.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 63565056)
      152: ('blk.23.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 63751680)
      153: ('blk.23.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 63938304)
      154: ('blk.24.attn_norm.weight', 'GGML_TYPE_F32', 1, 64000512)
      155: ('blk.24.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 64002816)
      156: ('blk.24.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 64500480)
      157: ('blk.24.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 64998144)
      158: ('blk.24.ffn_norm.weight', 'GGML_TYPE_F32', 1, 65495808)
      159: ('blk.24.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 65498112)
      160: ('blk.24.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 65560320)
      161: ('blk.24.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 65746944)
      162: ('blk.24.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 65933568)
      163: ('blk.25.attn_norm.weight', 'GGML_TYPE_F32', 1, 65995776)
      164: ('blk.25.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 65998080)
      165: ('blk.25.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 66495744)
      166: ('blk.25.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 66993408)
      167: ('blk.25.ffn_norm.weight', 'GGML_TYPE_F32', 1, 67491072)
      168: ('blk.25.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 67493376)
      169: ('blk.25.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 67555584)
      170: ('blk.25.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 67742208)
      171: ('blk.25.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 67928832)
      172: ('blk.26.attn_norm.weight', 'GGML_TYPE_F32', 1, 67991040)
      173: ('blk.26.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 67993344)
      174: ('blk.26.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 68491008)
      175: ('blk.26.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 68988672)
      176: ('blk.26.ffn_norm.weight', 'GGML_TYPE_F32', 1, 69486336)
      177: ('blk.26.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 69488640)
      178: ('blk.26.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 69550848)
      179: ('blk.26.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 69737472)
      180: ('blk.26.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 69924096)
      181: ('blk.27.attn_norm.weight', 'GGML_TYPE_F32', 1, 69986304)
      182: ('blk.27.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 69988608)
      183: ('blk.27.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 70486272)
      184: ('blk.27.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 70983936)
      185: ('blk.27.ffn_norm.weight', 'GGML_TYPE_F32', 1, 71481600)
      186: ('blk.27.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 71483904)
      187: ('blk.27.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 71546112)
      188: ('blk.27.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 71732736)
      189: ('blk.27.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 71919360)
      190: ('blk.28.attn_norm.weight', 'GGML_TYPE_F32', 1, 71981568)
      191: ('blk.28.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 71983872)
      192: ('blk.28.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 72481536)
      193: ('blk.28.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 72979200)
      194: ('blk.28.ffn_norm.weight', 'GGML_TYPE_F32', 1, 73476864)
      195: ('blk.28.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 73479168)
      196: ('blk.28.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 73541376)
      197: ('blk.28.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 73728000)
      198: ('blk.28.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 73914624)
      199: ('blk.29.attn_norm.weight', 'GGML_TYPE_F32', 1, 73976832)
      200: ('blk.29.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 73979136)
      201: ('blk.29.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 74476800)
      202: ('blk.29.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 74974464)
      203: ('blk.29.ffn_norm.weight', 'GGML_TYPE_F32', 1, 75472128)
      204: ('blk.29.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 75474432)
      205: ('blk.29.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 75536640)
      206: ('blk.29.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 75723264)
      207: ('blk.29.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 75909888)
      208: ('blk.3.attn_norm.weight', 'GGML_TYPE_F32', 1, 75972096)
      209: ('blk.3.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 75974400)
      210: ('blk.3.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 76472064)
      211: ('blk.3.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 76969728)
      212: ('blk.3.ffn_norm.weight', 'GGML_TYPE_F32', 1, 77467392)
      213: ('blk.3.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 77469696)
      214: ('blk.3.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 77531904)
      215: ('blk.3.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 77718528)
      216: ('blk.3.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 77905152)
      217: ('blk.4.attn_norm.weight', 'GGML_TYPE_F32', 1, 77967360)
      218: ('blk.4.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 77969664)
      219: ('blk.4.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 78467328)
      220: ('blk.4.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 78964992)
      221: ('blk.4.ffn_norm.weight', 'GGML_TYPE_F32', 1, 79462656)
      222: ('blk.4.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 79464960)
      223: ('blk.4.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 79527168)
      224: ('blk.4.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 79713792)
      225: ('blk.4.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 79900416)
      226: ('blk.5.attn_norm.weight', 'GGML_TYPE_F32', 1, 79962624)
      227: ('blk.5.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 79964928)
      228: ('blk.5.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 80462592)
      229: ('blk.5.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 80960256)
      230: ('blk.5.ffn_norm.weight', 'GGML_TYPE_F32', 1, 81457920)
      231: ('blk.5.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 81460224)
      232: ('blk.5.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 81522432)
      233: ('blk.5.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 81709056)
      234: ('blk.5.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 81895680)
      235: ('blk.6.attn_norm.weight', 'GGML_TYPE_F32', 1, 81957888)
      236: ('blk.6.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 81960192)
      237: ('blk.6.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 82457856)
      238: ('blk.6.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 82955520)
      239: ('blk.6.ffn_norm.weight', 'GGML_TYPE_F32', 1, 83453184)
      240: ('blk.6.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 83455488)
      241: ('blk.6.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 83517696)
      242: ('blk.6.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 83704320)
      243: ('blk.6.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 83890944)
      244: ('blk.7.attn_norm.weight', 'GGML_TYPE_F32', 1, 83953152)
      245: ('blk.7.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 83955456)
      246: ('blk.7.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 84453120)
      247: ('blk.7.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 84950784)
      248: ('blk.7.ffn_norm.weight', 'GGML_TYPE_F32', 1, 85448448)
      249: ('blk.7.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 85450752)
      250: ('blk.7.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 85512960)
      251: ('blk.7.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 85699584)
      252: ('blk.7.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 85886208)
      253: ('blk.8.attn_norm.weight', 'GGML_TYPE_F32', 1, 85948416)
      254: ('blk.8.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 85950720)
      255: ('blk.8.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 86448384)
      256: ('blk.8.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 86946048)
      257: ('blk.8.ffn_norm.weight', 'GGML_TYPE_F32', 1, 87443712)
      258: ('blk.8.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 87446016)
      259: ('blk.8.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 87508224)
      260: ('blk.8.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 87694848)
      261: ('blk.8.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 87881472)
      262: ('blk.9.attn_norm.weight', 'GGML_TYPE_F32', 1, 87943680)
      263: ('blk.9.ffn_down.weight', 'GGML_TYPE_Q4_0', 2, 87945984)
      264: ('blk.9.ffn_gate.weight', 'GGML_TYPE_Q4_0', 2, 88443648)
      265: ('blk.9.ffn_up.weight', 'GGML_TYPE_Q4_0', 2, 88941312)
      266: ('blk.9.ffn_norm.weight', 'GGML_TYPE_F32', 1, 89438976)
      267: ('blk.9.attn_k.weight', 'GGML_TYPE_Q4_0', 2, 89441280)
      268: ('blk.9.attn_output.weight', 'GGML_TYPE_Q4_0', 2, 89503488)
      269: ('blk.9.attn_q.weight', 'GGML_TYPE_Q4_0', 2, 89690112)
      270: ('blk.9.attn_v.weight', 'GGML_TYPE_Q4_0', 2, 89876736)
      271: ('output_norm.weight', 'GGML_TYPE_F32', 1, 89938944)

Fix: Set directory and filename in Model base class

The directory and filename of a model is determined by the respective model implementation, e.g. Ollama or Huggingface. If, however, these two fields are not defined in the model base class, then accessing them for a specific model instance might fail since these do not exist.

Signed-off-by: Michael Engel [email protected]

Summary by Sourcery

New Features:

Added "show" CLI command to display model information, including metadata and tensors, with optional --all and --json flags.

sourcery-ai · 2025-02-10T09:31:24Z

Reviewer's Guide by Sourcery

This pull request introduces a new 'show' CLI command that displays model information, including support for both concise and full details (when using the --all and --json options). The changes include enhancements for GGUF file format support, refactoring of the model base class to properly set and retrieve directory and filename information, and cleanup of duplicate implementations in model subclasses. Additionally, new modules for parsing GGUF data and serializing model metadata have been added.

Sequence diagram for the 'show' CLI command

sequenceDiagram
    actor User as User
    participant CLI as Show CLI
    participant Model as New(Model)
    participant Parser as GGUFInfoParser
    participant BaseInfo as ModelInfoBase

    User->>CLI: Execute 'ramalama show MODEL [--json] [--all]'
    CLI->>Model: Instantiate New(args.MODEL, args)
    CLI->>Parser: is_model_gguf(model, args)?
    alt GGUF model supported
      Parser-->>CLI: true
      CLI->>Parser: parse(model, args)
      Parser-->>CLI: gguf_info
      CLI->>CLI: gguf_info.serialize(json, all)
      CLI-->>User: Display full model info output
    else Not a GGUF model
      Parser-->>CLI: false
      CLI->>Model: get filename, get_model_path(args), get_model_registry(args)
      CLI->>BaseInfo: Create ModelInfoBase instance
      BaseInfo-->>CLI: Instance created
      CLI->>BaseInfo: serialize(json)
      CLI-->>User: Display base model info output
    end

Class diagram for Model, GGUFInfoParser, and ModelInfo classes

classDiagram
    class Model {
        - model: string
        - directory: string
        - filename: string
        + __init__(model)
        + get_model_path(args)
        + get_model_registry(args)
        + login(args)
        + build_exec_args_bench(args, model_path)
    }

    class ModelInfoBase {
        + Name: string
        + Registry: string
        + Path: string
        + serialize(json: bool): string
        + to_json(): string
    }

    class GGUFModelInfo {
        + Format: string
        + Version: number
        + Metadata: Map
        + Tensors: list
        + LittleEndian: bool
        + serialize(json: bool, all: bool): string
        + to_json(all: bool): string
        <<constructor>>
    }

    class GGUFInfoParser {
        <<static>> + is_model_gguf(model, cli_args): bool
        <<static>> + parse(model, cli_args): GGUFModelInfo
    }

    ModelInfoBase <|-- GGUFModelInfo

File-Level Changes

Change	Details	Files
Implemented new CLI command 'show' for displaying AI model information.	Added a new 'show_parser' function to register the command and its arguments in the CLI. Created the 'show_cli' function to handle the command logic, checking for GGUF format and printing model info in either plain text or JSON format.	`ramalama/cli.py`
Refactored the model base class to set directory and filename properly.	Updated the init method in the base Model class to split the model string into directory and filename. Added a get_model_registry method to determine and return the model registry from the model path.	`ramalama/model.py`
Removed duplicate logic for setting directory and filename from model subclass implementations.	Removed the duplicated code in the Huggingface implementation that previously split the model string. Removed initialization of filename in the URL-based model implementation to centralize behavior in the base class.	`ramalama/huggingface.py` `ramalama/url.py`
Added GGUF parser to support models with embedded metadata and tensors.	Introduced a new module that defines GGML type enums and provides parsing logic for GGUF formatted model files. Implemented functions to read strings, numbers, booleans, and values of various types according to the GGUF specification.	`ramalama/gguf_parser.py`
Defined model info serialization for displaying model metadata.	Added a new module that defines ModelInfoBase and GGUFModelInfo along with a Tensor dataclass. Implemented methods to serialize the model information into human-readable plain text and JSON formats, and added support for detailed display using the '--all' flag.	`ramalama/model_info.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @engelmi - I've reviewed your changes - here's some feedback:

Overall Comments:

Consider adding a try-except block around the GGUF parsing logic to handle potential file errors more gracefully.
The ModelInfoBase.serialize method could use a more robust string formatting approach.

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

ramalama/gguf_parser.py

engelmi · 2025-02-10T09:38:20Z

In a follow-up PR we can introduce downloading the metadata in separate files (non-gguf encoded models) such the tokenizer_config.json which is used for granite-8b-code-instruct-4k on huggingface, for example.

ericcurtin · 2025-02-10T10:19:07Z

New files must be added to install.sh, looks like our CI didn't catch that here

ericcurtin · 2025-02-10T10:27:37Z

Looks like a good idea to me! CI needs to be fixed up

engelmi · 2025-02-10T10:57:34Z

New files must be added to install.sh, looks like our CI didn't catch that here

Ah, forgot about that... let me check if I can fix the CI and the CI check as well.

ericcurtin · 2025-02-10T11:04:58Z

We did run this as part of CI in the past, it would have picked up many of these things:

https://github.com/containers/ramalama/blob/main/test/ci.sh

it's an option to re-add that. Our CI got reworked a few times and it likely slipped through the cracks. The files check was just this grep:

check_packaging() {
  cd ramalama
  for i in *.py; do
    grep -q "$i" ../install.sh
  done

  cd -
}

which seems kinda basic, but meh, it worked. Slightly better would be an install and smoke test, just execute something very simple on a properly installed system.

ericcurtin · 2025-02-10T11:16:20Z

I'm assuming we considered the:

llama-batched                  llama-embedding                llama-gguf-hash                llama-llava-clip-quantize-cli  llama-minicpmv-cli             llama-qwen2vl-cli              llama-simple-chat
llama-batched-bench            llama-eval-callback            llama-gguf-split               llama-lookahead                llama-parallel                 llama-retrieval                llama-speculative
llama-bench                    llama-export-lora              llama-gritlm                   llama-lookup                   llama-passkey                  llama-run                      llama-speculative-simple
llama-cli                      llama-gbnf-validator           llama-imatrix                  llama-lookup-create            llama-perplexity               llama-save-load-state          llama-tokenize
llama-convert-llama2c-to-ggml  llama-gen-docs                 llama-infill                   llama-lookup-merge             llama-quantize                 llama-server                   llama-tts
llama-cvector-generator        llama-gguf                     llama-llava-cli                llama-lookup-stats             llama-quantize-stats           llama-simple

binaries, some of them might already implement this... I'm ok with a from scratch python3 solution though 😄

engelmi · 2025-02-10T11:34:49Z

I'm assuming we considered the:

llama-batched                  llama-embedding                llama-gguf-hash                llama-llava-clip-quantize-cli  llama-minicpmv-cli             llama-qwen2vl-cli              llama-simple-chat
llama-batched-bench            llama-eval-callback            llama-gguf-split               llama-lookahead                llama-parallel                 llama-retrieval                llama-speculative
llama-bench                    llama-export-lora              llama-gritlm                   llama-lookup                   llama-passkey                  llama-run                      llama-speculative-simple
llama-cli                      llama-gbnf-validator           llama-imatrix                  llama-lookup-create            llama-perplexity               llama-save-load-state          llama-tokenize
llama-convert-llama2c-to-ggml  llama-gen-docs                 llama-infill                   llama-lookup-merge             llama-quantize                 llama-server                   llama-tts
llama-cvector-generator        llama-gguf                     llama-llava-cli                llama-lookup-stats             llama-quantize-stats           llama-simple

binaries, some of them might already implement this... I'm ok with a from scratch python3 solution though 😄

Haven't thought about this, to be honest.
Quickly checking, it seems that llama-gguf would also provide the GGUF information. However, considering we probably want to combine this with non-GGUF models, providing different formats (e.g. --json) and maybe using this inside ramalama further, I think the python implementation suit better. But up to you :) @ericcurtin

ericcurtin · 2025-02-10T12:18:19Z

SGTM

docs/ramalama-show.1.md

ramalama/cli.py

ramalama/gguf_parser.py

install.sh

rhatdan · 2025-02-10T13:46:21Z

@engelmi Great job, I really like this feature.

BUT:

These need tests.
I really prefer inspect over show.

We potentially will need to add something like --format option to allow users goformat stuff like podman inspect.

docs/ramalama.1.md

ramalama/cli.py

rhatdan · 2025-02-10T14:10:16Z

On next pass please fix the commit message show->inspect.

engelmi · 2025-02-10T14:10:41Z

@engelmi Great job, I really like this feature.

BUT:

These need tests. I really prefer inspect over show.

We potentially will need to add something like --format option to allow users goformat stuff like podman inspect.

Thanks!
Yes, changed show -> inspect.
If I see it correctly, there are only system tests and no unit tests (yet). Should I start adding unit tests or are some system tests fine? @rhatdan

rhatdan · 2025-02-10T14:13:39Z

I would love to start adding unit tests, but for now system tests are fine.

I just want you to add a new test/system/100-inspect.bats or something and add a few tests on the tiny models.

rhatdan · 2025-02-10T14:51:49Z

Rename ramalama/model_info.py to ramalama/model_inspect.py

test/system/100-inspect.bats

ramalama/cli.py

AI Models are shipped with a lot of (meta) information such as the used architecture, the chat template it requires and so on. In order to make these available to the user, the new CLI command inspect with the option support for --all and --json has been implemented. At the moment the GGUF file format - which includes the model as well as the (meta) information in one file - is fully supported. Other formats where the model and information is stored in different files are not (yet) supported and only display basic information such as the model name, path and registry. Signed-off-by: Michael Engel <[email protected]>

The directory and filename of a model is determined by the respective model implementation, e.g. Ollama or Huggingface. If, however, these two fields are not defined in the model base class, then accessing them for a specific model instance might fail since these do not exist. Signed-off-by: Michael Engel <[email protected]>

Signed-off-by: Michael Engel <[email protected]>

engelmi requested review from rhatdan, ericcurtin, bmahabirbu, maxamillion, swarajpande5, jhjaggars, cgruver and slp as code owners February 10, 2025 09:31

sourcery-ai bot reviewed Feb 10, 2025

View reviewed changes

ramalama/gguf_parser.py Show resolved Hide resolved

engelmi force-pushed the add-model-info-cli branch 3 times, most recently from 98aead8 to a82bcd6 Compare February 10, 2025 13:32