-
Notifications
You must be signed in to change notification settings - Fork 0
Preprocessing Data
Simone Carsey edited this page Aug 14, 2025
·
1 revision
For functions preparing data for modeling (e.g. train-test split, group shuffle-split, etc.) see modeling_prep.py. Many of these functions rely on a .csv datafile, such as the type saved from the Importing Data to Python step above. These .csv files are available in the .\processed_data directory in this repo. Example code will reference the copies available there.
Example code for use: For train-test split with county as grouping variable:
from modeling_prep import *
train_data = pd.read_csv('.\processed_data\oregon_train_timeseries.csv',header=0, index_col=1)
split_county_data = county_grouped_shufflesplit(train_data)
For train-test split without grouping variable:
from modeling_prep import *
train_data = pd.read_csv('.\processed_data\oregon_train_timeseries.csv',header=0, index_col=1)
split_data_dict = train_test_split_default(train_data)
For examining one county in particular, use single_oregon_county. For this, you need to have a dictionary of DataFrames already, i.e. from oregon_import().
Regarding single_oregon_county:
| Inputs | Type | Notes | Default Value(s) |
|---|---|---|---|
| data_dict | dict |
A dictionary of DataFrames; typically take result of oregon_import
|
None |
| county_code | int |
FIPS code corresponding to desired county, must be included in each table of the data_dict provided |
None |
Example code for use:
from data_import import *
oregon_data_dict = oregon_import()
wa_dict = single_oregon_county(oregon_data_dict, 41067)