You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hasOutlier [default FALSE]
boolean; if true, fit a model with an outlier component that refers to potential spikes or dips at isolated data points: Y = trend + outlier + error if season='none',and Y = trend + season + outlier + error if season ~= 'none'.
ocp.minmax [default c(0,10)]
a vector of 2 integers (>=0); the min and max numbers of outlier-type changepoints (ocp) allowed in the time seriestrend component. Ocp refers to spikes or dips at isolated times that can't be modeled as trends or seasonal terms.
There is also the undocumented:
prior$outlierSigFactor = 2.5 # above which datapoints considered outliers
In my experience:
outliers 'absorb' variance and can result in overfitting of the trend and season components.
there often are many more outliers than there appear to be in a plot of the time series; often ncp_mode = ncp_max = ocp.minmax[2].
Below are three examples comparing Rbeast:beast and forecast::tsoutliers. The latter function is explained at Detecting time series in outliers. It is similarly based on a decomposition but identifies fewer outliers than does the former. I have also tried increasing outlierSigFactor.
I don't know how Rbeast::beast identifies outliers but suspect that prior$outlierSigFactor acts similarly to the 3IQR(r) tolerance interval in forecast::tsoutliers. For a symmetric distribution:
Q3(r) = Q2(r) + 0.5IQR(r) ~ mean(r) + 0.5IQR(r); IQR(r)~1.349sd(r); which gives a tolerance interval +/-1.349(0.5 + 3)IQR(r) ~ 4.7sd(r). That's a far more strict criteria than is 2.5sd(r). A further concern is that IQR(r) or mad(r) are robust estimators of scale and sd(r) is not.
Anyhow, my current strategy to avoid overfitting outliers is to set ocp.minmax[2] = length(tsoutliers(data)$index) or, being generous, 1.5length(tsoutliers(data)$index).
library(forecast)
library(Rbeast)
## Example 1
Data(Nile)
# Rbeast::beast(outlierSigFactor = 2.5)
# mode 7 outliers and Pr(ncp=7)=0.54
# 4 ocp with cpPr > 0.95
mod <- beast(Nile, hasOutlier=TRUE)
plot(mod)
mod
# no outliers by forecast::tsoutliers()
tsoutliers(Nile)
# Rbeast::beast(outlierSigFactor = 4.7)
# mode 1 outliers and and Pr(ncp=1)=0.95
# 1 ocp with cpPr > 0.95
mod <- beast123(Y = as.numeric(Nile),
season="none",
metadata = list(time = time(Nile),
deltaTime = 1,
hasOutlier = TRUE),
prior = list(outlierSigFactor = 4.7),
extra = list(dumpInputData=TRUE,
computeTrendOrder=TRUE,
computeTrendSlope = TRUE))
plot(mod)
mod
## Example 2
# Rbeast::beast(outlierSigFactor = 2.5)
# mode 10 outliers and Pr(ncp=10)=0.83
# 7 ocp with cpPr > 0.95
mod <- beast(googletrend_beach, hasOutlier=TRUE)
plot(mod)
mod
# five outliers by forecast::tsoutliers()
out <- tsoutliers(googletrend_beach)
# these match the top 5 outliers detected by Rbeast::beast()
out
time(googletrend_beach)[out$index]
# the ts is complex and the outliers are not obvious by visual inspection
plot(googletrend_beach)
points(time(googletrend_beach)[out$index], googletrend_beach[out$index], col="red")
# Rbeast::beast(outlierSigFactor = 4.7)
# mode 2 outliers and and Pr(ncp=2)=0.94
# 1 ocp with cpPr > 0.95
mod <- beast123(Y = googletrend_beach,
season="harmonic",
metadata = list(time = time(googletrend_beach),
deltaTime = 1/12,
period = 1.0,
hasOutlier = TRUE),
prior = list(outlierSigFactor = 4.7),
extra = list(dumpInputData=TRUE,
computeTrendOrder=TRUE,
computeTrendSlope = TRUE))
plot(mod)
mod
# Rbeast::beast(ocp.minmax=c(0,5))
# mode 5 outliers and and Pr(ncp=5)=0.84
# 3 ocp with cpPr > 0.95
mod <- beast(googletrend_beach, hasOutlier=TRUE, ocp.minmax=c(0,5))
plot(mod)
mod
## Example 3
data(co2)
# mode 10 outliers by Rbeast::beast() however Pr(ncp=10)=0.30
# zero ocp with cpPr > 0.95
mod <- beast(co2, hasOutlier=TRUE)
plot(mod)
mod
# no outliers by forecast::tsoutliers()
tsoutliers(co2)
# Rbeast::beast(outlierSigFactor = 4.7)
# mode 0 outliers and and Pr(ncp=0)=0.579
# zero ocp with cpPr > 0.95
mod <- beast123(Y = co2,
season="harmonic",
metadata = list(time = time(co2),
deltaTime = 1/12,
period = 1.0,
hasOutlier = TRUE),
prior = list(outlierSigFactor = 4.7),
extra = list(dumpInputData=TRUE,
computeTrendOrder=TRUE,
computeTrendSlope = TRUE))
plot(mod)
mod
The text was updated successfully, but these errors were encountered:
Rbeast::beast can accommodate outliers with:
There is also the undocumented:
In my experience:
Below are three examples comparing Rbeast:beast and forecast::tsoutliers. The latter function is explained at Detecting time series in outliers. It is similarly based on a decomposition but identifies fewer outliers than does the former. I have also tried increasing outlierSigFactor.
I don't know how Rbeast::beast identifies outliers but suspect that prior$outlierSigFactor acts similarly to the 3IQR(r) tolerance interval in forecast::tsoutliers. For a symmetric distribution:
Q3(r) = Q2(r) + 0.5IQR(r) ~ mean(r) + 0.5IQR(r); IQR(r)~1.349sd(r); which gives a tolerance interval +/-1.349(0.5 + 3)IQR(r) ~ 4.7sd(r). That's a far more strict criteria than is 2.5sd(r). A further concern is that IQR(r) or mad(r) are robust estimators of scale and sd(r) is not.
Anyhow, my current strategy to avoid overfitting outliers is to set ocp.minmax[2] = length(tsoutliers(data)$index) or, being generous, 1.5length(tsoutliers(data)$index).
The text was updated successfully, but these errors were encountered: