Skip to content

Commit d42569a

Browse files
committed
Adding README
1 parent 7a16669 commit d42569a

7 files changed

+541
-94
lines changed

JMPB-2021/README.md

+187
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
# Table of Contents
2+
- [Table of Contents](#table-of-contents)
3+
- [Pre-Requisites](#pre-requisites)
4+
- [Data](#data)
5+
- [Pre-Processing Data](#pre-processing-data)
6+
- [Generating Predictions](#generating-predictions)
7+
- [Training Your Own Model](#training-your-own-model)
8+
9+
## Pre-Requisites
10+
You must be running on Python 3 with the following python packages installed. We also recommend using a machine that has GPU support.
11+
12+
pip install "tensorflow-gpu>=1.13.0,<2.0" # for cpu use "tensorflow>=1.13.0,<2.0"
13+
pip install pandas
14+
pip install numpy
15+
16+
## Data
17+
- **Accelerometer Data**: We assume the input data is obtained from ActiGraph GT3X device and converted into single .csv files. The files should be named as **<subject_id>.csv** and files for all subjects should be put in the same directory. First few lines of a sample csv file are as follows:
18+
~~~
19+
------------ Data File Created By ActiGraph GT3X+ ActiLife v6.13.3 Firmware v3.2.1 date format M/d/yyyy at 30 Hz Filter Normal -----------
20+
Serial Number: NEO1F18120387
21+
Start Time 00:00:00
22+
Start Date 5/7/2014
23+
Epoch Period (hh:mm:ss) 00:00:00
24+
Download Time 10:31:05
25+
Download Date 5/20/2014
26+
Current Memory Address: 0
27+
Current Battery Voltage: 4.07 Mode = 12
28+
--------------------------------------------------
29+
Accelerometer X,Accelerometer Y,Accelerometer Z
30+
-0.182,-0.182,0.962
31+
-0.182,-0.176,0.959
32+
-0.179,-0.182,0.959
33+
-0.179,-0.182,0.959
34+
~~~
35+
36+
- **(Optional) Events Data**: Optionally, you can also provide ActivPal events data, especially if you wish to train your own models, for each subjects as a single .csv file. These files should also be named in the **<subject_id>.csv** format and files for all subjects should be put in the same directory. First few lines of a sample csv file are as follows:
37+
~~~
38+
StartTime,EndTime,Behavior
39+
2014-05-07 09:47:23,2014-05-07 09:48:21,standingStill
40+
2014-05-07 09:48:22,2014-05-07 09:48:26,walking/running
41+
2014-05-07 09:48:27,2014-05-07 09:49:03,standingStill
42+
2014-05-07 09:49:04,2014-05-07 09:49:04,walking/running
43+
2014-05-07 09:49:05,2014-05-07 09:49:11,standingStill
44+
2014-05-07 09:49:12,2014-05-07 09:49:15,walking/running
45+
~~~
46+
47+
## Pre-Processing Data
48+
First, you need to create pre-processed data from the source data. To do this invoke the `pre_process_data.py` script as follows:
49+
50+
python pre_process_data.py --gt3x-dir <gt3x_data_dir> --activpal-dir <activpal_data_dir> --pre-processed-dir <output_dir>
51+
52+
Complete usage details of this script are as follows:
53+
54+
usage: pre_process_data.py [-h] --gt3x-dir GT3X_DIR --pre-processed-dir
55+
PRE_PROCESSED_DIR [--activpal-dir ACTIVPAL_DIR]
56+
[--window-size WINDOW_SIZE]
57+
[--gt3x-frequency GT3X_FREQUENCY]
58+
[--activpal-label-map ACTIVPAL_LABEL_MAP]
59+
[--silent]
60+
61+
Argument parser for preprocessing the input data.
62+
63+
required arguments:
64+
--gt3x-dir GT3X_DIR GT3X data directory
65+
--pre-processed-dir PRE_PROCESSED_DIR
66+
Pre-processed data directory
67+
68+
optional arguments:
69+
-h, --help show this help message and exit
70+
--activpal-dir ACTIVPAL_DIR
71+
ActivPAL data directory
72+
--window-size WINDOW_SIZE
73+
Window size in seconds on which the predictions to be
74+
made
75+
--gt3x-frequency GT3X_FREQUENCY
76+
GT3X device frequency in Hz
77+
--activpal-label-map ACTIVPAL_LABEL_MAP
78+
ActivPal label vocabulary
79+
--silent Whether to hide info messages
80+
81+
## Generating Predictions
82+
You can use the released pre-trained models to generate predictions using your own data. To do so invoke the `make_predictions.py` as follows:
83+
84+
python make_predictions.py --pre-processed-dir <pre-processed-dir> --predictions-dir <predictions-dir>
85+
86+
Complete usage details of this script are as follows:
87+
88+
usage: make_predictions.py [-h] --pre-processed-dir PRE_PROCESSED_DIR
89+
[--predictions-dir PREDICTIONS_DIR]
90+
[--batch-size BATCH_SIZE]
91+
[--num-classes NUM_CLASSES]
92+
[--window-size WINDOW_SIZE]
93+
[--gt3x-frequency GT3X_FREQUENCY] [--no-label]
94+
[--model-checkpoint-path MODEL_CHECKPOINT_PATH]
95+
[--remove-gravity] [--silent]
96+
97+
Argument parser for generating model predictions.
98+
99+
required arguments:
100+
--pre-processed-dir PRE_PROCESSED_DIR
101+
Pre-processed data directory
102+
103+
optional arguments:
104+
-h, --help show this help message and exit
105+
--predictions-dir PREDICTIONS_DIR
106+
Training batch size
107+
--batch-size BATCH_SIZE
108+
Training batch size
109+
--num-classes NUM_CLASSES
110+
Number of classes in the training dataset
111+
--window-size WINDOW_SIZE
112+
Window size in seconds on which the predictions to be
113+
made
114+
--gt3x-frequency GT3X_FREQUENCY
115+
GT3X device frequency in Hz
116+
--no-label Whether to not output the label
117+
--model-checkpoint-path MODEL_CHECKPOINT_PATH
118+
Path where the trained model will be saved
119+
--remove-gravity Whether to remove gravity from accelerometer data
120+
--silent Whether to hide info messages
121+
122+
## Training Your Own Model
123+
To train your own model invoke the `train_model.py` as follows:
124+
125+
python --pre-processed-dir <pre-processed-dir> --model-checkpoint-path <checkpoint-dir>
126+
127+
Complete usage details of this script are as follows:
128+
129+
usage: train_model.py [-h] --pre-processed-dir PRE_PROCESSED_DIR
130+
[--learning-rate LEARNING_RATE]
131+
[--num-epochs NUM_EPOCHS] [--batch-size BATCH_SIZE]
132+
[--dropout-rate DROPOUT_RATE]
133+
[--shuffle-buffer-size SHUFFLE_BUFFER_SIZE]
134+
[--training-data-fraction TRAINING_DATA_FRACTION]
135+
[--validation-data-fraction VALIDATION_DATA_FRACTION]
136+
[--testing-data-fraction TESTING_DATA_FRACTION]
137+
[--model-checkpoint-path MODEL_CHECKPOINT_PATH]
138+
[--window-size WINDOW_SIZE]
139+
[--gt3x-frequency GT3X_FREQUENCY]
140+
[--num-classes NUM_CLASSES]
141+
[--class-weights CLASS_WEIGHTS] [--remove-gravity]
142+
[--silent]
143+
144+
Argument parser for training CNN model.
145+
146+
required arguments:
147+
--pre-processed-dir PRE_PROCESSED_DIR
148+
Pre-processed data directory
149+
150+
optional arguments:
151+
-h, --help show this help message and exit
152+
--learning-rate LEARNING_RATE
153+
Learning rate for training the model
154+
--num-epochs NUM_EPOCHS
155+
Number of epochs to train the model
156+
--batch-size BATCH_SIZE
157+
Training batch size
158+
--dropout-rate DROPOUT_RATE
159+
Dropout rate during training
160+
--shuffle-buffer-size SHUFFLE_BUFFER_SIZE
161+
Training data shuffle buffer size in terms of number
162+
of records
163+
--training-data-fraction TRAINING_DATA_FRACTION
164+
Percentage of subjects to be used for training
165+
--validation-data-fraction VALIDATION_DATA_FRACTION
166+
Percentage of subjects to be used for validation
167+
--testing-data-fraction TESTING_DATA_FRACTION
168+
Percentage of subjects to be used for testing
169+
--model-checkpoint-path MODEL_CHECKPOINT_PATH
170+
Path where the trained model will be saved
171+
--window-size WINDOW_SIZE
172+
Window size in seconds on which the predictions to be
173+
made
174+
--gt3x-frequency GT3X_FREQUENCY
175+
GT3X device frequency in Hz
176+
--num-classes NUM_CLASSES
177+
Number of classes in the training dataset
178+
--class-weights CLASS_WEIGHTS
179+
Class weights for loss aggregation
180+
--remove-gravity Whether to remove gravity from accelerometer data
181+
--silent Whether to hide info messages
182+
183+
Notice that this script relies on several hyperparameters required for training the model such as learning rate, batch size, and number of training epochs etc. The script comes with set of default values for these parameters. However, you may need to tweak these parameters for your dataset to get the best performance.
184+
185+
After training your own model you can use it to generate predictions by passing the model checkpoint path to the `make_predictions.py` script as follows:
186+
187+
python make_predictions.py --pre-processed-dir <pre-processed-dir> --predictions-dir <predictions-dir> --model-checkpoint-path <checkpoint-dir>

JMPB-2021/commons.py

+31-15
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,18 @@
1+
# Copyright 2020 Supun Nakandala. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
# ==============================================================================
15+
116
import os
217
import numpy as np
318
import tensorflow as tf
@@ -22,11 +37,12 @@ def remove_gravity(acc, gt3x_frequency):
2237
return np.expand_dims(acc, axis=0)
2338

2439

25-
def data_generator(pre_processed_dir, subjects, gt3x_frequency, no_remove_gravity, include_time=False):
40+
def data_generator(pre_processed_dir, subjects, gt3x_frequency, remove_gravity, include_time=False):
2641
for i in subjects:
2742
temp = pd.read_pickle(os.path.join(pre_processed_dir, str(i)+".bin"))
43+
2844
acc = temp[["Accelerometer"]].values.tolist()
29-
if not no_remove_gravity:
45+
if remove_gravity:
3046
acc = [remove_gravity(x, gt3x_frequency) for x in acc]
3147
timestamps = pd.to_datetime(temp.Time).dt.strftime('%Y-%m-%d %H:%M:%S').values.tolist()
3248
if 'Behavior' in temp.columns:
@@ -48,27 +64,27 @@ def cnn_model(x, num_classes, training, keep_prob=None):
4864
data_format = 'channels_last'
4965
x = tf.transpose(x, [0, 2, 3, 1])
5066

51-
conv1 = tf.compat.v1.layers.conv2d(inputs=x, filters=32, kernel_size=[5, 3], data_format=data_format, padding= "valid", activation=tf.nn.relu)
52-
pool1 = tf.compat.v1.layers.max_pooling2d(conv1, [2, 1], 2, padding='same', data_format=data_format)
67+
conv1 = tf.layers.conv2d(inputs=x, filters=32, kernel_size=[5, 3], data_format=data_format, padding= "valid", activation=tf.nn.relu)
68+
pool1 = tf.layers.max_pooling2d(conv1, [2, 1], 2, padding='same', data_format=data_format)
5369

54-
conv2 = tf.compat.v1.layers.conv2d(inputs=pool1, filters=64, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
55-
pool2 = tf.compat.v1.layers.max_pooling2d(conv2, [2, 1], 2, padding='same', data_format=data_format)
70+
conv2 = tf.layers.conv2d(inputs=pool1, filters=64, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
71+
pool2 = tf.layers.max_pooling2d(conv2, [2, 1], 2, padding='same', data_format=data_format)
5672

57-
conv3 = tf.compat.v1.layers.conv2d(inputs=pool2, filters=128, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
58-
pool3 = tf.compat.v1.layers.max_pooling2d(conv3, [2, 1], 2, padding='same', data_format=data_format)
73+
conv3 = tf.layers.conv2d(inputs=pool2, filters=128, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
74+
pool3 = tf.layers.max_pooling2d(conv3, [2, 1], 2, padding='same', data_format=data_format)
5975

60-
conv4 = tf.compat.v1.layers.conv2d(inputs=pool3, filters=256, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
76+
conv4 = tf.layers.conv2d(inputs=pool3, filters=256, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
6177
if keep_prob is not None:
62-
conv4 = tf.compat.v1.layers.dropout(conv4, rate=keep_prob, training=training)
63-
pool4 = tf.compat.v1.layers.max_pooling2d(conv4, [2, 1], 2, padding='same', data_format=data_format)
78+
conv4 = tf.layers.dropout(conv4, rate=keep_prob, training=training)
79+
pool4 = tf.layers.max_pooling2d(conv4, [2, 1], 2, padding='same', data_format=data_format)
6480

65-
conv5 = tf.compat.v1.layers.conv2d(inputs=pool4, filters=512, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
81+
conv5 = tf.layers.conv2d(inputs=pool4, filters=512, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
6682
if keep_prob is not None:
67-
conv5 = tf.compat.v1.layers.dropout(conv5, rate=keep_prob, training=training)
68-
pool5 = tf.compat.v1.layers.max_pooling2d(conv5, [2, 1], 2, padding='same', data_format=data_format)
83+
conv5 = tf.layers.dropout(conv5, rate=keep_prob, training=training)
84+
pool5 = tf.layers.max_pooling2d(conv5, [2, 1], 2, padding='same', data_format=data_format)
6985

7086
num_features = np.prod(pool5.get_shape().as_list()[1:])
7187

72-
logits = tf.compat.v1.layers.dense(inputs=tf.reshape(pool5,(-1, num_features)), units=num_classes)
88+
logits = tf.layers.dense(inputs=tf.reshape(pool5,(-1, num_features)), units=num_classes)
7389

7490
return logits

JMPB-2021/make_predictions.py

+31-13
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,18 @@
1+
# Copyright 2020 Supun Nakandala. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
# ==============================================================================
15+
116
import os
217
import sys
318
import pandas as pd
@@ -18,12 +33,13 @@
1833
required_arguments.add_argument('--pre-processed-dir', help='Pre-processed data directory', required=True)
1934

2035
optional_arguments.add_argument('--predictions-dir', help='Training batch size', default='./predictions', required=False)
21-
optional_arguments.add_argument('--batch-size', help='Training batch size', default=256, required=False)
22-
optional_arguments.add_argument('--num-classes', help='Number of classes in the training dataset', default=3, required=False)
23-
optional_arguments.add_argument('--window-size', help='Window size in seconds on which the predictions to be made', default=5, required=False)
24-
optional_arguments.add_argument('--gt3x-frequency', help='GT3X device frequency in Hz', default=30, required=False)
36+
optional_arguments.add_argument('--batch-size', help='Training batch size', default=256, type=int, required=False)
37+
optional_arguments.add_argument('--num-classes', help='Number of classes in the training dataset', default=3, type=int, required=False)
38+
optional_arguments.add_argument('--window-size', help='Window size in seconds on which the predictions to be made', default=3, type=int, required=False)
39+
optional_arguments.add_argument('--gt3x-frequency', help='GT3X device frequency in Hz', default=30, type=int, required=False)
2540
optional_arguments.add_argument('--no-label', help='Whether to not output the label', default=False, required=False, action='store_true')
2641
optional_arguments.add_argument('--model-checkpoint-path', help='Path where the trained model will be saved', default='./pre-trained-model', required=False)
42+
optional_arguments.add_argument('--remove-gravity', help='Whether to remove gravity from accelerometer data', default=False, required=False, action='store_true')
2743
optional_arguments.add_argument('--silent', help='Whether to hide info messages', default=False, required=False, action='store_true')
2844
parser._action_groups.append(optional_arguments)
2945
args = parser.parse_args()
@@ -34,32 +50,34 @@
3450
subject_ids = [fname.split('.')[0] for fname in os.listdir(args.pre_processed_dir) if fname.endswith('.bin')]
3551

3652
in_size = args.gt3x_frequency * args.window_size
37-
iterator = tf.compat.v1.data.Iterator.from_structure((tf.float32, tf.int32, tf.string), ((None, 1, in_size, 3), (None, 1), (None)))
53+
iterator = tf.data.Iterator.from_structure((tf.float32, tf.int32, tf.string), ((None, 1, in_size, 3), (None, 1), (None)))
3854
iterator_init_ops = []
3955

4056
for subject_id in subject_ids:
41-
dataset = tf.compat.v1.data.Dataset.from_generator(lambda: data_generator(args.pre_processed_dir, [subject_id], include_time=True), output_types=(tf.float32, tf.int32, tf.string),
57+
dataset = tf.data.Dataset.from_generator(lambda: data_generator(args.pre_processed_dir, [subject_id], args.gt3x_frequency, args.remove_gravity, include_time=True), output_types=(tf.float32, tf.int32, tf.string),
4258
output_shapes=((1, in_size, 3), (1,), ())).batch(args.batch_size)
4359
iterator_init_ops.append(iterator.make_initializer(dataset))
4460

4561
x, y, t = iterator.get_next()
46-
p = tf.argmax(cnn_model(x, args.num_classes), axis=1)
4762

48-
saver = tf.compat.v1.train.Saver()
49-
with tf.compat.v1.Session() as sess:
63+
training = tf.placeholder(tf.bool)
64+
keep_prob = tf.placeholder(tf.float32)
65+
p = tf.argmax(cnn_model(x, args.num_classes, training), axis=1)
66+
67+
saver = tf.train.Saver()
68+
with tf.Session() as sess:
5069
saver.restore(sess, os.path.join(args.model_checkpoint_path, 'model'))
5170

52-
for subject_id, init_op in zip(subject_ids, iterator_init_ops):
71+
for subject_id, init_op in zip(subject_ids, iterator_init_ops):
5372
if not args.silent:
54-
print('Generating predictions for: {}'.format(subject_ids))
55-
73+
print('Generating predictions for: {}'.format(subject_id))
5674
sess.run(init_op)
5775
ts = []
5876
ys = []
5977
ps = []
6078
while True:
6179
try:
62-
temp = [v.flatten().tolist() for v in sess.run([t, y, p])]
80+
temp = [v.flatten().tolist() for v in sess.run([t, y, p], feed_dict={training: False})]
6381
ts.extend(temp[0])
6482
ys.extend(temp[1])
6583
ps.extend(temp[2])

0 commit comments

Comments
 (0)