needle finetune <data.jsonl> . Do you have example for this? Curious about the format.
Is this correct ? I can run finetuning, with epoch 12 and the GPU utilization is very low. The result the model can't understand it. Btw, I did copy the result of checkpoint into needle.pkl. despite seems the finetuning result was good
{"query":"List my Pulumi stacks","tools":"[{\"name\":\"pulumi_ls\",\"description\":\"List Pulumi stacks.\",\"parameters\":{}}]","answers":"[{\"name\":\"pulumi_ls\",\"arguments\":{}}]"}
{"query":"Show Pulumi stack list","tools":"[{\"name\":\"pulumi_ls\",\"description\":\"List Pulumi stacks.\",\"parameters\":{}}]","answers":"[{\"name\":\"pulumi_ls\",\"arguments\":{}}]"}
─────────────────────────────────────
Epoch 18/20
─────────────────────────────────────
Text loss nan
Text val ppl 1.00
Quant val ppl 1.00 (INT4)
─── Single-Call (2 samples) ──
JSON parse 100.0%
Name F1 100.0%
Param haluc 0.0%
Param miss 0.0%
Value acc 0.0%
Args acc 100.0%
Call F1 100.0%
Exact match 100.0%
#tools n name_f1 nTP nFP nFN call_f1 cTP cFP cFN exact parse
1 2 100.0% 2 0 0 100.0% 2 0 0 100.0% 100.0%
─── Retrieval (2 queries) ─────
Recall@1 100.0%
Recall@2 100.0%
Recall@3 100.0%
Recall@4 100.0%
Recall@5 100.0%
MRR 1.000
─────────────────────────────────────
Throughput 56.4 tok/s (4 samples, 0.9s, full INT4)
─── Samples (2) ───────────────────
[1] Query: Can you list the Pulumi stacks?
Tools: [{"name":"pulumi_ls","description":"List Pulumi stacks.","parameters":{}}]
Ref: [{"name":"pulumi_ls","arguments":{}}]
Text: [{"name":"pulumi_ls","arguments":{}}]
[2] Query: Get Pulumi stack list
Tools: [{"name":"pulumi_ls","description":"List Pulumi stacks.","parameters":{}}]
Ref: [{"name":"pulumi_ls","arguments":{}}]
Text: [{"name":"pulumi_ls","arguments":{}}]
─────────────────────────────────────
Checkpoint: checkpoints/needle_finetuned_20260513191815_414198_12_512_0.pkl
needle finetune <data.jsonl> . Do you have example for this? Curious about the format.
Is this correct ? I can run finetuning, with epoch 12 and the GPU utilization is very low. The result the model can't understand it. Btw, I did copy the result of checkpoint into needle.pkl. despite seems the finetuning result was good