Skip to content

nan spearmanr in training log #8

@Gabriel-QIN

Description

@Gabriel-QIN

When training on my experimental fitness dataset with ~100 variants and binary labels, I encountered some nan spearman correlation coefficient sometimes.
In the meta-training stage, evaluating always reported nan spearmanr. The training loss declines as expected, but the evaluating performance does not change much basically in the meta-training stage.
Weirdly, when doing CV in meta-transfer, the evaluation turns to be normal without nan issues (see my meta-transfer logs).

How do I know the model is trained well or do I need to use other metrics?

Here are my shell script for meta-training and meta-transfer

protein=AHCY_Human
echo "Meta-train PLMs on the auxiliary tasks for ${protein}"
python main.py -md esm2 -m meta -ts 40 -tb 1 -r 16 -ls 5 -mi 5 -mtb 16 -meb 64 -alr 5e-3 -as 5 -p ${protein}
echo "Transfer the meta-trained model to the target task"
python main.py -md esm2 -m meta-transfer -ts 40 -tb 16 -r 16 -ls 5 -mi 5 -mtb 16 -meb 64 -alr 5e-3 -as 5 -p ${protein}
echo "This may take several minutes, and the trained model will be saved to checkpoints/meta-transfer"
python main.py -md esm2 -m meta-transfer -ts 40 -tb 16 -r 16 -ls 5 -mi 5 -mtb 16 -meb 64 -alr 5e-3 -as 5 -p ${protein} -t

Here are my training logs:

Training epoch 13: 100%|| 6/6 [00:35<00:00,  6.00s/it, loss=4.64]
train_loss: 4.667
lr: 1.0e-04
Evaluating:  40%| | 2/5 [00:06<00:09,  3.26s/it, ndcg=0.431]
/mnt/data2024/zhqin/01.protein_design/Pro-FSFP/fsfp/trainer.py:182: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  logs[metric] = spearmanr(predicts, targets).statistic
Evaluating:  80%|██████████████████████████████████████████████████████████████████████████████████████████████████████▍                         | 4/5 [00:12<00:03,  3.23s/it, ndcg=0.92]/mnt/data2024/zhqin/01.protein_design/Pro-FSFP/fsfp/trainer.py:182: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  logs[metric] = spearmanr(predicts, targets).statistic
Evaluating: 100%|| 5/5 [00:16<00:00,  3.25s/it, ndcg=0]
spearmanr: nan
ndcg: 0.396
topk_pr: 0.040
Training epoch 14: 100%|| 6/6 [00:36<00:00,  6.06s/it, loss=4.62]
train_loss: 4.560
lr: 1.0e-04

train_loss: 4.327
lr: 1.0e-04
Evaluating:  40%|| 2/5 [00:02<00:03,  1.10s/it, ndcg=0.431]/mnt/data2024/zhqin/01.protein_design/Pro-FSFP/fsfp/trainer.py:182: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  logs[metric] = spearmanr(predicts, targets).statistic
Evaluating:  80%|| 4/5 [00:04<00:01,  1.09s/it, ndcg=0.877]/mnt/data2024/zhqin/01.protein_design/Pro-FSFP/fsfp/trainer.py:182: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  logs[metric] = spearmanr(predicts, targets).statistic
Evaluating: 100%|| 5/5 [00:05<00:00,  1.10s/it, ndcg=0]
spearmanr: nan
ndcg: 0.388
topk_pr: 0.040
Early stopped at epoch 16
Best validating ndcg reached at epoch 1: 0.510

Here are my meta-transfer training logs:

======================Cross validation: Split 1======================
spearmanr: 0.338
ndcg: 0.631
topk_pr: 0.050
Early stopped at epoch 16
Best validating spearmanr reached at epoch 1: 0.378
======================Cross validation: Split 2======================
spearmanr: 0.462
ndcg: 0.877
topk_pr: 0.100
Early stopped at epoch 16
Best validating spearmanr reached at epoch 1: 0.462
======================Cross validation: Split 3======================
spearmanr: 0.298
ndcg: 0.500
topk_pr: 0.050
Early stopped at epoch 16
Best validating spearmanr reached at epoch 1: 0.378
======================Cross validation: Split 4======================
spearmanr: nan
ndcg: 0.000
topk_pr: 0.000
Early stopped at epoch 15
Best validating spearmanr reached at epoch 0: -inf
======================Cross validation: Split 5======================
spearmanr: 0.338
ndcg: 0.631
topk_pr: 0.050
CV-estimated best validating spearmanr reached at epoch 1: nan

and the final testing results:

======================Breakdown results======================
              size  spearmanr  ndcg  topk_pr
single_local     3      0.866 1.000    0.333
single_cross    84      0.173 0.487    0.100
single_rest     87      0.222 0.527    0.100
all_rest        87      0.222 0.527    0.100

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions