Credal Fusion Experiments

This repository trains ensemble-based predictive credal sets and evaluates credal fusion strategies on multi-view image classification benchmarks.

The practical workflow is:

Install dependencies.
Pick a dataset config.
Train checkpoints.
Point the config at those checkpoints.
Run the evaluation script.
Read the metrics and diagnostic artifacts written to outputs/.

Repository Layout

configs/                  Dataset-specific experiment configs
scripts/train_ensemble.py Train deterministic ensembles or CreBNN prior sets
scripts/run_experiment.py Main multi-view fusion evaluation
scripts/eval_clean.py     Clean-set sanity check for deterministic ensembles
scripts/example_crebnn.py Export a predictive credal set tensor from a CreBNN model
src/credal/               Credal set, fusion, and uncertainty code
src/evaluation/           Metrics and diagnostic exports
checkpoints/              Saved trained models
outputs/                  Experiment results and plots

Setup

Install the Python dependencies:

pip install -r requirements.txt

The code uses PyTorch and will run on GPU if CUDA is available, otherwise on CPU.

Configs

The main configs are:

configs/default.yaml: CIFAR-10 deterministic ensemble workflow
configs/cifar100.yaml: CIFAR-100 deterministic ensemble workflow
configs/svhn.yaml: SVHN deterministic ensemble workflow
configs/crebnn_svhn.yaml: SVHN CreBNN workflow

Each config controls:

dataset and dataloader settings
view generation for the multi-view test setup
model type and checkpoint directory
credal representation and fusion settings
number of experiment runs and output location

Before evaluation, make sure model.checkpoint_dir in the selected config points to the checkpoint directory you want to use. run_experiment.py loads checkpoints from the config file; it does not take a separate checkpoint path argument.

Training

1. Train a deterministic ensemble

CredRO is the default deterministic training method:

python scripts/train_ensemble.py --config configs/default.yaml --method credro

Standard deep ensemble baseline:

python scripts/train_ensemble.py --config configs/default.yaml --method standard

The same command pattern works for the other deterministic configs:

python scripts/train_ensemble.py --config configs/cifar100.yaml --method credro
python scripts/train_ensemble.py --config configs/svhn.yaml --method credro

Useful overrides:

python scripts/train_ensemble.py \
  --config configs/default.yaml \
  --method credro \
  --n_members 5 \
  --epochs 200 \
  --delta_G 0.5 \
  --architectures resnet20,resnet32,resnet56

Training writes checkpoints into:

the directory passed with --output_dir, or
model.checkpoint_dir/<method> when --output_dir is omitted

The directory will contain member_*.pt files plus training_info.json.

2. Train a CreBNN prior-set model

python scripts/train_ensemble.py --config configs/crebnn_svhn.yaml --method crebnn

Optional CreBNN-specific overrides:

python scripts/train_ensemble.py \
  --config configs/crebnn_svhn.yaml \
  --method crebnn \
  --prior_scales 0.5,1.0,2.0 \
  --posterior_samples_per_prior 50

CreBNN training saves the learned prior-set model into model.checkpoint_dir/<method> by default and also writes crebnn_history.json.

Sanity Check Trained Checkpoints

For deterministic ensembles, you can check clean-set performance before running the full multi-view experiment:

python scripts/eval_clean.py \
  --config configs/default.yaml \
  --checkpoint_dir /path/to/checkpoints

You can also restrict evaluation to a small subset:

python scripts/eval_clean.py \
  --config configs/default.yaml \
  --checkpoint_dir /path/to/checkpoints \
  --subset_size 1000

Run the Main Fusion Evaluation

Full evaluation

python scripts/run_experiment.py --config configs/default.yaml

Examples for the other bundled configs:

python scripts/run_experiment.py --config configs/cifar100.yaml
python scripts/run_experiment.py --config configs/svhn.yaml
python scripts/run_experiment.py --config configs/crebnn_svhn.yaml

Debug or partial runs

Debug mode forces a 100-sample subset:

python scripts/run_experiment.py --config configs/default.yaml --debug

Override the number of repeated runs:

python scripts/run_experiment.py --config configs/default.yaml --n_runs 3

Run on a custom subset size:

python scripts/run_experiment.py --config configs/default.yaml --subset_size 1000

Result Structure

Each evaluation call creates a timestamped experiment directory:

outputs/<experiment_name>_<YYYYMMDD_HHMMSS>/

Inside that directory you will find:

config.yaml: exact config used for the run
run_0.json, run_1.json, ...: per-run summary metrics
run_0_detailed.npz, ...: per-sample detailed outputs
aggregate_results.json: mean and standard deviation across runs
<experiment_name>_<timestamp>.log: execution log
run_<k>_diagnostics/: CSV, LaTeX, PNG, and PDF diagnostics for run k

The diagnostic directory includes files such as:

fusion_method_comparison.csv
conflict_binned_evaluation.csv
conflict_vs_performance_curves.csv
corruption_vs_performance_curves.csv
table_a_overall_metrics.tex
table_b_conflict_summary.tex
table_c_conflict_binned.tex
conflict_vs_au_gap.png
conflict_vs_nll_diff.png
conflict_vs_selection_frequency.png
plot_conflict_bin_vs_accuracy.pdf
plot_conflict_bin_vs_selection_and_feasibility.pdf

Minimal Reproduction Recipes

CIFAR-10 deterministic pipeline

Train:

python scripts/train_ensemble.py --config configs/default.yaml --method credro

If training saved into a new subdirectory, update model.checkpoint_dir in configs/default.yaml to that directory.

Evaluate:

python scripts/run_experiment.py --config configs/default.yaml

Read:

outputs/credal_fusion_cifar10_<timestamp>/aggregate_results.json

CIFAR-100 deterministic pipeline

python scripts/train_ensemble.py --config configs/cifar100.yaml --method credro
python scripts/run_experiment.py --config configs/cifar100.yaml

SVHN CreBNN pipeline

python scripts/train_ensemble.py --config configs/crebnn_svhn.yaml --method crebnn
python scripts/run_experiment.py --config configs/crebnn_svhn.yaml

To export an example predictive credal tensor from a trained CreBNN model:

python scripts/example_crebnn.py \
  --config configs/crebnn_svhn.yaml \
  --max_samples 16 \
  --output_npz outputs/crebnn_predictive_set_example.npz

Key Config Fields To Check

When results do not match expectations, these are the first fields to verify in the chosen config:

experiment.name: controls the output directory prefix
experiment.n_runs: number of repeated runs
experiment.debug_subset: subset mode when set
data.dataset: cifar10, cifar100, or svhn
model.type: ensemble or crebnn
model.checkpoint_dir: where evaluation loads checkpoints from
model.architectures and model.ensemble_size: deterministic ensemble layout
credal.representation: box or convex_hull
fusion.conjunctive_epsilon: relaxed feasibility tolerance for conjunctive fusion
prediction.methods: point predictions extracted from each credal set

Testing

Run the unit tests with:

pytest tests/ -v

References

Caprio and Restuccia, "Credal Information Fusion"
Wang et al., "CreDRO: Learning Credal Ensembles via DRO"
Credal Bayesian Deep Learning / CreBNN

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
conflict_plots		conflict_plots
scripts		scripts
src		src
tests		tests
README.md		README.md
cifar100_config.yaml		cifar100_config.yaml
cifar100_run_0.json		cifar100_run_0.json
cifar10_config.yaml		cifar10_config.yaml
cifar10_run_0.json		cifar10_run_0.json
conflict_vs_accuracy_combined.pdf		conflict_vs_accuracy_combined.pdf
conflict_vs_delta_sup_combined.pdf		conflict_vs_delta_sup_combined.pdf
conflict_vs_nll_combined.pdf		conflict_vs_nll_combined.pdf
conflict_vs_selection_combined.pdf		conflict_vs_selection_combined.pdf
plot_script.py		plot_script.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Credal Fusion Experiments

Repository Layout

Setup

Configs

Training

1. Train a deterministic ensemble

2. Train a CreBNN prior-set model

Sanity Check Trained Checkpoints

Run the Main Fusion Evaluation

Full evaluation

Debug or partial runs

Result Structure

Minimal Reproduction Recipes

CIFAR-10 deterministic pipeline

CIFAR-100 deterministic pipeline

SVHN CreBNN pipeline

Key Config Fields To Check

Testing

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Credal Fusion Experiments

Repository Layout

Setup

Configs

Training

1. Train a deterministic ensemble

2. Train a CreBNN prior-set model

Sanity Check Trained Checkpoints

Run the Main Fusion Evaluation

Full evaluation

Debug or partial runs

Result Structure

Minimal Reproduction Recipes

CIFAR-10 deterministic pipeline

CIFAR-100 deterministic pipeline

SVHN CreBNN pipeline

Key Config Fields To Check

Testing

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages