Preprocessing Data

For functions preparing data for modeling (e.g. train-test split, group shuffle-split, etc.) see modeling_prep.py. Many of these functions rely on a .csv datafile, such as the type saved from the Importing Data to Python step above. These .csv files are available in the .\processed_data directory in this repo. Example code will reference the copies available there.

Example code for use: For train-test split with county as grouping variable:

from modeling_prep import *

train_data = pd.read_csv('.\processed_data\oregon_train_timeseries.csv',header=0, index_col=1)
split_county_data = county_grouped_shufflesplit(train_data)

For train-test split without grouping variable:

from modeling_prep import *

train_data = pd.read_csv('.\processed_data\oregon_train_timeseries.csv',header=0, index_col=1)
split_data_dict = train_test_split_default(train_data)

For examining one county in particular, use single_oregon_county. For this, you need to have a dictionary of DataFrames already, i.e. from oregon_import().

Regarding single_oregon_county:

Inputs	Type	Notes	Default Value(s)
data_dict	`dict`	A dictionary of DataFrames; typically take result of `oregon_import`	None
county_code	`int`	FIPS code corresponding to desired county, must be included in each table of the `data_dict` provided	None

Example code for use:

from data_import import *

oregon_data_dict = oregon_import()
wa_dict = single_oregon_county(oregon_data_dict, 41067)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing Data

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Wiki Pages

Clone this wiki locally