Jeonghwan Cheon*, Se-Bum Paik†
* First author: [email protected]
† Corresponding author: [email protected]
This repository contains the implementation and demo codes for the manuscript "Brain-Inspired Warm-Up Training with Random Noise for Uncertainty Calibration" (currently under review). The preprint is available on arXiv.
Uncertainty calibration, the alignment of predictive confidence with accuracy, is essential for the reliable deployment of machine learning systems in real-world applications. However, current models often fail to achieve this goal, generating responses that are overconfident, inaccurate or even fabricated. Here, we show that the widely adopted initialization method in deep learning, long regarded as standard practice, is in fact a primary source of overconfidence. To address this problem, we introduce a neurodevelopment-inspired warm-up strategy that inherently resolves uncertainty-related issues without requiring pre- or post-processing. In our approach, networks are first briefly trained on random noise and random labels before being exposed to real data. This warm-up phase yields optimal calibration, ensuring that confidence remains well aligned with accuracy throughout subsequent training. Moreover, the resulting networks demonstrate high proficiency in the identification of “unknown” inputs, providing a robust solution for uncertainty calibration in both in-distribution and out-of-distribution contexts.
The detailed and core implementation of the model can be found in the /src directory. We provide demonstration code and a corresponding trained model for experiments using CIFAR-10 on ResNet-18 in the /demo directory.
.
├── demo
│ ├── evaluate_calibration.py
│ ├── evaluate_ood_detection.py
│ ├── pretrained
│ └── train_model.py
├── src
│ ├── data
│ ├── evaluation
│ ├── models
│ ├── training
│ └── utils
└── README.md
Follow these steps to reproduce the results comparing models trained with and without random noise warm-up.
Running train_model.py saves two networks: one with random warm-up and one without. You can skip Step 1 and proceed directly to Steps 2 and 3; in this case, the provided pretrained models will be used.
Step 0: Environment Setup
You can set up the environment using either venv with requirements.txt or poetry with pyproject.toml.
Option 1: Using venv
- Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Option 2: Using Poetry
- Install Poetry.
- Install dependencies:
poetry install
- Activate the shell:
poetry shell
Step 1: Train models
Train both the control model (w/o random warm-up) and the pretrained model (w/ random warm-up). You can adjust hyperparameters using command-line arguments.
python demo/train_model.py --lr 0.1 --momentum 0.9 --weight_decay 1e-4 --epochs_train 50 --epochs_pretrain 5 --batch_size 256The following table lists the hyperparameters used for training. These default values correspond to the parameters used to train the provided pretrained models.
| Argument | Description | Default Value |
|---|---|---|
--lr |
Learning rate | 0.1 |
--momentum |
Momentum | 0.9 |
--weight_decay |
Weight decay | 1e-4 |
--epochs_train |
Number of training epochs | 50 |
--epochs_pretrain |
Number of pretraining epochs | 5 |
--batch_size |
Batch size | 256 |
Step 2: Evaluate model calibration
Compare the Expected Calibration Error (ECE) of the two models. You can select which calibration methods to evaluate using flags.
To run all calibration evaluations:
python demo/evaluate_calibration.py --baseline --temp_scaling --vec_scaling --isotonicTo run only specific evaluations (e.g., baseline and temperature scaling):
python demo/evaluate_calibration.py --baseline --temp_scalingFlags:
--baseline: Run calibration evaluation using raw SoftMax confidence scores.--temp_scaling: Run calibration evaluation after applying temperature scaling.--vec_scaling: Run calibration evaluation after applying vector scaling.--isotonic: Run calibration evaluation after applying isotonic regression.
Step 3: Evaluate out-of-distribution (OOD) detection
Compare the AUROC scores for OOD detection. You can select which methods to evaluate using flags.
To run all OOD detection evaluations:
python demo/evaluate_ood_detection.py --baseline --temp_scaling --odin --energy_scoreTo run only specific evaluations (e.g., baseline and ODIN):
python demo/evaluate_ood_detection.py --baseline --odinFlags:
--baseline: Run OOD detection evaluation using raw SoftMax confidence scores.--temp_scaling: Run OOD detection evaluation after applying temperature.--odin: Run OOD detection evaluation with ODIN.--energy_score: Run OOD detection evaluation with energy score.
@article{cheon2024pretraining
title={Pretraining with random noise for uncertainty calibration},
author={Cheon, Jeonghwan and Paik, Se-Bum},
journal={arXiv preprint arXiv:2412.17411},
year={2024}
}