Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions docs/model_cards/MVE_lossattenuation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# MVE/Loss Attenuation
## Abstract
Mean Variance Estimation (MVE) is a technique where a neural network is modified to output both a mean prediction μ(x) and a variance estimate σ²(x), modeling the target as a Gaussian distribution. By training with negative log-likelihood (NLL) loss instead of standard MSE for the example of regression, the network learns heteroscedastic (input-dependent) aleatoric uncertainty. A key property is "loss attenuation": the model automatically downweights high-uncertainty samples during training, making it robust to label noise.

* **Original Paper:** [Estimating the Mean and Variance of the Target Probability Distribution (Nix & Weigend, ICNN 1994)](https://ieeexplore.ieee.org/document/374138)
* **Key Work:** [What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? (Kendall & Gal, NeurIPS 2017)](https://arxiv.org/abs/1703.04977)
* **Implementation:** Straightforward in PyTorch/TensorFlow, double the output size to account for variance and use Gaussian NLL loss.

## Easy to apply ⭐️⭐️⭐️⭐️★
MVE is conceptually simple and requires only minor architectural modifications: extending the output and changing the loss function to Gaussian NLL. No specialized libraries are needed. However, training can be sensitive to hyperparameters in more complex deep learning tasks such as object detection where you have to assign a weight to the loss, and a "warm-up" period (training only the mean first) is often necessary to avoid convergence issues where the variance inflates to explain poor mean estimates.

## Data Compatibility ⭐️⭐️⭐️⭐️★
MVE works with any continuous regression target and is compatible with various input types (images, tabular, time series). It naturally handles heteroscedastic data where noise varies across inputs. Not directly applicable to classification without modifications. Requires sufficient data to reliably learn the variance function.

## Task Compatibility ⭐️⭐️⭐️★★
Primarily designed for regression tasks. It has been successfully extended to localization in object detection. The method captures aleatoric (data) uncertainty but does not capture epistemic (model) uncertainty. For that, MVE is typically combined with ensembles or MC Dropout. OOD detection is limited as the model may predict low variance for OOD inputs if they resemble training data. Calibration of variance estimates may require post-hoc adjustment, especially for small datasets or complex architectures.

## Ease of integration ⭐️⭐️⭐️⭐️★
MVE integrates easily into existing pipelines. No retraining of the full architecture is needed. One can often fine-tune an existing model. Works with any architecture where the user has access to the final layer. Can be combined with other UQ methods (ensembles, dropout) for comprehensive uncertainty estimation. It cannot be combined with evidential deep learning directly.

## Computationally Cheap ⭐️⭐️⭐️⭐️⭐️
Very efficient. It works in real-time and only doubles the parameters in the last layer. It only requires a single forward pass during inference, adding no overhead compared to standard networks. Training cost is essentially identical to standard training. This makes MVE one of the most computationally attractive UQ methods available.

## Caveats
* **Only captures aleatoric uncertainty:** MVE estimates data noise, not model uncertainty. Must be combined with other methods (e.g., ensembles, MC Dropout) to capture epistemic uncertainty.
* **Training instability:** Without proper initialization or warm-up periods, the variance can inflate to "explain away" prediction errors, compromising mean accuracy.
* **Overconfidence on OOD data:** The model may predict low variance for out-of-distribution inputs that superficially resemble training data.
* **Gaussian assumption:** Assumes normally distributed errors, which may not hold for all datasets. Heavy-tailed or multimodal noise distributions are not well captured.
* **Calibration required:** Raw variance estimates may need post-hoc calibration to provide reliable uncertainty quantification.