This repository trains ensemble-based predictive credal sets and evaluates credal fusion strategies on multi-view image classification benchmarks.
The practical workflow is:
- Install dependencies.
- Pick a dataset config.
- Train checkpoints.
- Point the config at those checkpoints.
- Run the evaluation script.
- Read the metrics and diagnostic artifacts written to
outputs/.
configs/ Dataset-specific experiment configs
scripts/train_ensemble.py Train deterministic ensembles or CreBNN prior sets
scripts/run_experiment.py Main multi-view fusion evaluation
scripts/eval_clean.py Clean-set sanity check for deterministic ensembles
scripts/example_crebnn.py Export a predictive credal set tensor from a CreBNN model
src/credal/ Credal set, fusion, and uncertainty code
src/evaluation/ Metrics and diagnostic exports
checkpoints/ Saved trained models
outputs/ Experiment results and plots
Install the Python dependencies:
pip install -r requirements.txtThe code uses PyTorch and will run on GPU if CUDA is available, otherwise on CPU.
The main configs are:
configs/default.yaml: CIFAR-10 deterministic ensemble workflowconfigs/cifar100.yaml: CIFAR-100 deterministic ensemble workflowconfigs/svhn.yaml: SVHN deterministic ensemble workflowconfigs/crebnn_svhn.yaml: SVHN CreBNN workflow
Each config controls:
- dataset and dataloader settings
- view generation for the multi-view test setup
- model type and checkpoint directory
- credal representation and fusion settings
- number of experiment runs and output location
Before evaluation, make sure model.checkpoint_dir in the selected config points to the checkpoint directory you want to use. run_experiment.py loads checkpoints from the config file; it does not take a separate checkpoint path argument.
CredRO is the default deterministic training method:
python scripts/train_ensemble.py --config configs/default.yaml --method credroStandard deep ensemble baseline:
python scripts/train_ensemble.py --config configs/default.yaml --method standardThe same command pattern works for the other deterministic configs:
python scripts/train_ensemble.py --config configs/cifar100.yaml --method credro
python scripts/train_ensemble.py --config configs/svhn.yaml --method credroUseful overrides:
python scripts/train_ensemble.py \
--config configs/default.yaml \
--method credro \
--n_members 5 \
--epochs 200 \
--delta_G 0.5 \
--architectures resnet20,resnet32,resnet56Training writes checkpoints into:
- the directory passed with
--output_dir, or model.checkpoint_dir/<method>when--output_diris omitted
The directory will contain member_*.pt files plus training_info.json.
python scripts/train_ensemble.py --config configs/crebnn_svhn.yaml --method crebnnOptional CreBNN-specific overrides:
python scripts/train_ensemble.py \
--config configs/crebnn_svhn.yaml \
--method crebnn \
--prior_scales 0.5,1.0,2.0 \
--posterior_samples_per_prior 50CreBNN training saves the learned prior-set model into model.checkpoint_dir/<method> by default and also writes crebnn_history.json.
For deterministic ensembles, you can check clean-set performance before running the full multi-view experiment:
python scripts/eval_clean.py \
--config configs/default.yaml \
--checkpoint_dir /path/to/checkpointsYou can also restrict evaluation to a small subset:
python scripts/eval_clean.py \
--config configs/default.yaml \
--checkpoint_dir /path/to/checkpoints \
--subset_size 1000python scripts/run_experiment.py --config configs/default.yamlExamples for the other bundled configs:
python scripts/run_experiment.py --config configs/cifar100.yaml
python scripts/run_experiment.py --config configs/svhn.yaml
python scripts/run_experiment.py --config configs/crebnn_svhn.yamlDebug mode forces a 100-sample subset:
python scripts/run_experiment.py --config configs/default.yaml --debugOverride the number of repeated runs:
python scripts/run_experiment.py --config configs/default.yaml --n_runs 3Run on a custom subset size:
python scripts/run_experiment.py --config configs/default.yaml --subset_size 1000Each evaluation call creates a timestamped experiment directory:
outputs/<experiment_name>_<YYYYMMDD_HHMMSS>/
Inside that directory you will find:
config.yaml: exact config used for the runrun_0.json,run_1.json, ...: per-run summary metricsrun_0_detailed.npz, ...: per-sample detailed outputsaggregate_results.json: mean and standard deviation across runs<experiment_name>_<timestamp>.log: execution logrun_<k>_diagnostics/: CSV, LaTeX, PNG, and PDF diagnostics for runk
The diagnostic directory includes files such as:
fusion_method_comparison.csvconflict_binned_evaluation.csvconflict_vs_performance_curves.csvcorruption_vs_performance_curves.csvtable_a_overall_metrics.textable_b_conflict_summary.textable_c_conflict_binned.texconflict_vs_au_gap.pngconflict_vs_nll_diff.pngconflict_vs_selection_frequency.pngplot_conflict_bin_vs_accuracy.pdfplot_conflict_bin_vs_selection_and_feasibility.pdf
Train:
python scripts/train_ensemble.py --config configs/default.yaml --method credroIf training saved into a new subdirectory, update model.checkpoint_dir in configs/default.yaml to that directory.
Evaluate:
python scripts/run_experiment.py --config configs/default.yamlRead:
outputs/credal_fusion_cifar10_<timestamp>/aggregate_results.json
python scripts/train_ensemble.py --config configs/cifar100.yaml --method credro
python scripts/run_experiment.py --config configs/cifar100.yamlpython scripts/train_ensemble.py --config configs/crebnn_svhn.yaml --method crebnn
python scripts/run_experiment.py --config configs/crebnn_svhn.yamlTo export an example predictive credal tensor from a trained CreBNN model:
python scripts/example_crebnn.py \
--config configs/crebnn_svhn.yaml \
--max_samples 16 \
--output_npz outputs/crebnn_predictive_set_example.npzWhen results do not match expectations, these are the first fields to verify in the chosen config:
experiment.name: controls the output directory prefixexperiment.n_runs: number of repeated runsexperiment.debug_subset: subset mode when setdata.dataset:cifar10,cifar100, orsvhnmodel.type:ensembleorcrebnnmodel.checkpoint_dir: where evaluation loads checkpoints frommodel.architecturesandmodel.ensemble_size: deterministic ensemble layoutcredal.representation:boxorconvex_hullfusion.conjunctive_epsilon: relaxed feasibility tolerance for conjunctive fusionprediction.methods: point predictions extracted from each credal set
Run the unit tests with:
pytest tests/ -v- Caprio and Restuccia, "Credal Information Fusion"
- Wang et al., "CreDRO: Learning Credal Ensembles via DRO"
- Credal Bayesian Deep Learning / CreBNN