When training on my experimental fitness dataset with ~100 variants and binary labels, I encountered some nan spearman correlation coefficient sometimes.
In the meta-training stage, evaluating always reported nan spearmanr. The training loss declines as expected, but the evaluating performance does not change much basically in the meta-training stage.
Weirdly, when doing CV in meta-transfer, the evaluation turns to be normal without nan issues (see my meta-transfer logs).
How do I know the model is trained well or do I need to use other metrics?
Here are my shell script for meta-training and meta-transfer
protein=AHCY_Human
echo "Meta-train PLMs on the auxiliary tasks for ${protein}"
python main.py -md esm2 -m meta -ts 40 -tb 1 -r 16 -ls 5 -mi 5 -mtb 16 -meb 64 -alr 5e-3 -as 5 -p ${protein}
echo "Transfer the meta-trained model to the target task"
python main.py -md esm2 -m meta-transfer -ts 40 -tb 16 -r 16 -ls 5 -mi 5 -mtb 16 -meb 64 -alr 5e-3 -as 5 -p ${protein}
echo "This may take several minutes, and the trained model will be saved to checkpoints/meta-transfer"
python main.py -md esm2 -m meta-transfer -ts 40 -tb 16 -r 16 -ls 5 -mi 5 -mtb 16 -meb 64 -alr 5e-3 -as 5 -p ${protein} -t
Here are my training logs:
Training epoch 13: 100%|| 6/6 [00:35<00:00, 6.00s/it, loss=4.64]
train_loss: 4.667
lr: 1.0e-04
Evaluating: 40%| | 2/5 [00:06<00:09, 3.26s/it, ndcg=0.431]
/mnt/data2024/zhqin/01.protein_design/Pro-FSFP/fsfp/trainer.py:182: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
logs[metric] = spearmanr(predicts, targets).statistic
Evaluating: 80%|██████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 4/5 [00:12<00:03, 3.23s/it, ndcg=0.92]/mnt/data2024/zhqin/01.protein_design/Pro-FSFP/fsfp/trainer.py:182: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
logs[metric] = spearmanr(predicts, targets).statistic
Evaluating: 100%|| 5/5 [00:16<00:00, 3.25s/it, ndcg=0]
spearmanr: nan
ndcg: 0.396
topk_pr: 0.040
Training epoch 14: 100%|| 6/6 [00:36<00:00, 6.06s/it, loss=4.62]
train_loss: 4.560
lr: 1.0e-04
train_loss: 4.327
lr: 1.0e-04
Evaluating: 40%|| 2/5 [00:02<00:03, 1.10s/it, ndcg=0.431]/mnt/data2024/zhqin/01.protein_design/Pro-FSFP/fsfp/trainer.py:182: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
logs[metric] = spearmanr(predicts, targets).statistic
Evaluating: 80%|| 4/5 [00:04<00:01, 1.09s/it, ndcg=0.877]/mnt/data2024/zhqin/01.protein_design/Pro-FSFP/fsfp/trainer.py:182: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
logs[metric] = spearmanr(predicts, targets).statistic
Evaluating: 100%|| 5/5 [00:05<00:00, 1.10s/it, ndcg=0]
spearmanr: nan
ndcg: 0.388
topk_pr: 0.040
Early stopped at epoch 16
Best validating ndcg reached at epoch 1: 0.510
Here are my meta-transfer training logs:
======================Cross validation: Split 1======================
spearmanr: 0.338
ndcg: 0.631
topk_pr: 0.050
Early stopped at epoch 16
Best validating spearmanr reached at epoch 1: 0.378
======================Cross validation: Split 2======================
spearmanr: 0.462
ndcg: 0.877
topk_pr: 0.100
Early stopped at epoch 16
Best validating spearmanr reached at epoch 1: 0.462
======================Cross validation: Split 3======================
spearmanr: 0.298
ndcg: 0.500
topk_pr: 0.050
Early stopped at epoch 16
Best validating spearmanr reached at epoch 1: 0.378
======================Cross validation: Split 4======================
spearmanr: nan
ndcg: 0.000
topk_pr: 0.000
Early stopped at epoch 15
Best validating spearmanr reached at epoch 0: -inf
======================Cross validation: Split 5======================
spearmanr: 0.338
ndcg: 0.631
topk_pr: 0.050
CV-estimated best validating spearmanr reached at epoch 1: nan
and the final testing results:
======================Breakdown results======================
size spearmanr ndcg topk_pr
single_local 3 0.866 1.000 0.333
single_cross 84 0.173 0.487 0.100
single_rest 87 0.222 0.527 0.100
all_rest 87 0.222 0.527 0.100
When training on my experimental fitness dataset with ~100 variants and binary labels, I encountered some
nanspearman correlation coefficient sometimes.In the meta-training stage, evaluating always reported
nanspearmanr. The training loss declines as expected, but the evaluating performance does not change much basically in the meta-training stage.Weirdly, when doing CV in meta-transfer, the evaluation turns to be normal without
nanissues (see my meta-transfer logs).How do I know the model is trained well or do I need to use other metrics?
Here are my shell script for meta-training and meta-transfer
Here are my training logs:
Here are my meta-transfer training logs:
and the final testing results:
======================Breakdown results====================== size spearmanr ndcg topk_pr single_local 3 0.866 1.000 0.333 single_cross 84 0.173 0.487 0.100 single_rest 87 0.222 0.527 0.100 all_rest 87 0.222 0.527 0.100