This repository is a scikit-learn extension for time series cross-validation. It introduces gaps between the training set and the test set, which mitigates the temporal dependence of time series and prevents information leakage.
pip install tscv
pip install tscv --upgrade
I recommend you to update it often.
This extension defines 3 cross-validator classes and 1 function:
GapLeavePOut
GapKFold
GapWalkForward
gap_train_test_split
The three classes can all be passed, as the cv
argument, to the cross_val_score
function in scikit-learn
, just like the native cross-validator classes in scikit-learn
.
The one function is an alternative to the train_test_split
function in scikit-learn
.
The following example uses GapKFold
instead of KFold
as the cross-validator.
import numpy as np
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import cross_val_score
from tscv import GapKFold
iris = datasets.load_iris()
clf = svm.SVC(kernel='linear', C=1)
# use GapKFold as the cross-validator
cv = GapKFold(n_splits=5, gap_before=5, gap_after=5)
scores = cross_val_score(clf, iris.data, iris.target, cv=cv)
The following example uses gap_train_test_split
to split the data set into the training set and the test set.
import numpy as np
from tscv import gap_train_test_split
X, y = np.arange(20).reshape((10, 2)), np.arange(10)
X_train, X_test, y_train, y_test = gap_train_test_split(X, y, test_size=2, gap_size=2)
See the documentation here.
If you need any further help, please use the issue tracker.
- Report bugs in the issue tracker
- Express your use cases in the issue tracker
- Support me at scikit-learn/scikit-learn#13761 if you want to see this extension merged in scikit-learn
This extension is developed mainly by Wenjie Zheng.
The GapWalkForward
cross-validator is adapted from the TimeSeriesSplit
of scikit-learn
(see Kyle Kosic's PR scikit-learn/scikit-learn#13204).
- I would like to thank Christoph Bergmeir, Prabir Burman, and Jeffrey Racine for the helpful discussion.
BSD-3-Clause