Outliers #47

stweb75 · 2025-02-16T21:49:11Z

Rbeast::beast can accommodate outliers with:

hasOutlier [default FALSE]
boolean; if true, fit a model with an outlier component that refers to potential spikes or dips at isolated data points: Y = trend + outlier + error if season='none',and Y = trend + season + outlier + error if season ~= 'none'.

ocp.minmax [default c(0,10)]
a vector of 2 integers (>=0); the min and max numbers of outlier-type changepoints (ocp) allowed in the time seriestrend component. Ocp refers to spikes or dips at isolated times that can't be modeled as trends or seasonal terms.

There is also the undocumented:

prior$outlierSigFactor = 2.5 # above which datapoints considered outliers

In my experience:

outliers 'absorb' variance and can result in overfitting of the trend and season components.
there often are many more outliers than there appear to be in a plot of the time series; often ncp_mode = ncp_max = ocp.minmax[2].

Below are three examples comparing Rbeast:beast and forecast::tsoutliers. The latter function is explained at Detecting time series in outliers. It is similarly based on a decomposition but identifies fewer outliers than does the former. I have also tried increasing outlierSigFactor.

I don't know how Rbeast::beast identifies outliers but suspect that prior$outlierSigFactor acts similarly to the 3IQR(r) tolerance interval in forecast::tsoutliers. For a symmetric distribution:
Q3(r) = Q2(r) + 0.5IQR(r) ~ mean(r) + 0.5IQR(r); IQR(r)~1.349sd(r); which gives a tolerance interval +/-1.349(0.5 + 3)IQR(r) ~ 4.7sd(r). That's a far more strict criteria than is 2.5sd(r). A further concern is that IQR(r) or mad(r) are robust estimators of scale and sd(r) is not.

Anyhow, my current strategy to avoid overfitting outliers is to set ocp.minmax[2] = length(tsoutliers(data)$index) or, being generous, 1.5length(tsoutliers(data)$index).

library(forecast)
library(Rbeast)
## Example 1
Data(Nile)
# Rbeast::beast(outlierSigFactor = 2.5)
# mode 7 outliers and Pr(ncp=7)=0.54
# 4 ocp with cpPr > 0.95
mod <- beast(Nile, hasOutlier=TRUE)
plot(mod)
mod
# no outliers by forecast::tsoutliers()
tsoutliers(Nile)
# Rbeast::beast(outlierSigFactor = 4.7)
# mode 1 outliers and and Pr(ncp=1)=0.95
# 1 ocp with cpPr > 0.95
mod <- beast123(Y = as.numeric(Nile),
  season="none",
  metadata = list(time = time(Nile),
    deltaTime = 1,
    hasOutlier = TRUE),
  prior = list(outlierSigFactor = 4.7),
  extra = list(dumpInputData=TRUE,
    computeTrendOrder=TRUE,
    computeTrendSlope = TRUE))
plot(mod)
mod
## Example 2
# Rbeast::beast(outlierSigFactor = 2.5)
# mode 10 outliers and Pr(ncp=10)=0.83
# 7 ocp with cpPr > 0.95
mod <- beast(googletrend_beach, hasOutlier=TRUE)
plot(mod)
mod
# five outliers by forecast::tsoutliers()
out <- tsoutliers(googletrend_beach)
# these match the top 5 outliers detected by Rbeast::beast()
out
time(googletrend_beach)[out$index]
# the ts is complex and the outliers are not obvious by visual inspection
plot(googletrend_beach)
points(time(googletrend_beach)[out$index], googletrend_beach[out$index], col="red")
# Rbeast::beast(outlierSigFactor = 4.7)
# mode 2 outliers and and Pr(ncp=2)=0.94
# 1 ocp with cpPr > 0.95
mod <- beast123(Y = googletrend_beach,
  season="harmonic",
  metadata = list(time = time(googletrend_beach),
    deltaTime = 1/12,
    period = 1.0,    
    hasOutlier = TRUE),
  prior = list(outlierSigFactor = 4.7),
  extra = list(dumpInputData=TRUE,
    computeTrendOrder=TRUE,
    computeTrendSlope = TRUE))
plot(mod)
mod
# Rbeast::beast(ocp.minmax=c(0,5))
# mode 5 outliers and and Pr(ncp=5)=0.84
# 3 ocp with cpPr > 0.95
mod <- beast(googletrend_beach, hasOutlier=TRUE, ocp.minmax=c(0,5))
plot(mod)
mod
## Example 3
data(co2)
# mode 10 outliers by Rbeast::beast() however Pr(ncp=10)=0.30
# zero ocp with cpPr > 0.95
mod <- beast(co2, hasOutlier=TRUE)
plot(mod)
mod
# no outliers by forecast::tsoutliers()
tsoutliers(co2)
# Rbeast::beast(outlierSigFactor = 4.7)
# mode 0 outliers and and Pr(ncp=0)=0.579
# zero ocp with cpPr > 0.95
mod <- beast123(Y = co2,
  season="harmonic",
  metadata = list(time = time(co2),
    deltaTime = 1/12,
    period = 1.0,    
    hasOutlier = TRUE),
  prior = list(outlierSigFactor = 4.7),
  extra = list(dumpInputData=TRUE,
    computeTrendOrder=TRUE,
    computeTrendSlope = TRUE))
plot(mod)
mod

The text was updated successfully, but these errors were encountered:

stweb75 · 2025-02-17T01:07:43Z

I extended the example to compare fixed ncp_max or larger outlierSigFactor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outliers #47

Outliers #47

stweb75 commented Feb 16, 2025 •

edited

Loading

stweb75 commented Feb 17, 2025

Outliers #47

Outliers #47

Comments

stweb75 commented Feb 16, 2025 • edited Loading

stweb75 commented Feb 17, 2025

stweb75 commented Feb 16, 2025 •

edited

Loading