ADALabUCSD
diff --git a/‎JMPB-2021/README.md
+187 b/‎JMPB-2021/README.md
+187
diff --git a/‎JMPB-2021/commons.py
+31-15 b/‎JMPB-2021/commons.py
+31-15
diff --git a/‎JMPB-2021/make_predictions.py
+31-13 b/‎JMPB-2021/make_predictions.py
+31-13
@@ -0,0 +1,187 @@
+# Table of Contents
+- [Table of Contents](#table-of-contents)
+  - [Pre-Requisites](#pre-requisites)
+  - [Data](#data)
+  - [Pre-Processing Data](#pre-processing-data)
+  - [Generating Predictions](#generating-predictions)
+  - [Training Your Own Model](#training-your-own-model)
+   
+## Pre-Requisites
+You must be running on Python 3 with the following python packages installed. We also recommend using a machine that has GPU support.
+
+    pip install "tensorflow-gpu>=1.13.0,<2.0" # for cpu use "tensorflow>=1.13.0,<2.0"
+    pip install pandas
+    pip install numpy
+
+## Data
+- **Accelerometer Data**: We assume the input data is obtained from ActiGraph GT3X device and converted into single .csv files. The files should be named as **<subject_id>.csv** and files for all subjects should be put in the same directory. First few lines of a sample csv file are as follows:
+    ~~~
+    ------------ Data File Created By ActiGraph GT3X+ ActiLife v6.13.3 Firmware v3.2.1 date format M/d/yyyy at 30 Hz  Filter Normal -----------
+    Serial Number: NEO1F18120387
+    Start Time 00:00:00
+    Start Date 5/7/2014
+    Epoch Period (hh:mm:ss) 00:00:00
+    Download Time 10:31:05
+    Download Date 5/20/2014
+    Current Memory Address: 0
+    Current Battery Voltage: 4.07     Mode = 12
+    --------------------------------------------------
+    Accelerometer X,Accelerometer Y,Accelerometer Z
+    -0.182,-0.182,0.962
+    -0.182,-0.176,0.959
+    -0.179,-0.182,0.959
+    -0.179,-0.182,0.959
+    ~~~
+
+- **(Optional) Events Data**: Optionally, you can also provide ActivPal events data, especially if you wish to train your own models, for each subjects as a single .csv file. These files should also be named in the **<subject_id>.csv** format and files for all subjects should be put in the same directory. First few lines of a sample csv file are as follows:
+    ~~~
+    StartTime,EndTime,Behavior
+    2014-05-07 09:47:23,2014-05-07 09:48:21,standingStill
+    2014-05-07 09:48:22,2014-05-07 09:48:26,walking/running
+    2014-05-07 09:48:27,2014-05-07 09:49:03,standingStill
+    2014-05-07 09:49:04,2014-05-07 09:49:04,walking/running
+    2014-05-07 09:49:05,2014-05-07 09:49:11,standingStill
+    2014-05-07 09:49:12,2014-05-07 09:49:15,walking/running
+    ~~~
+
+## Pre-Processing Data
+First, you need to create pre-processed data from the source data. To do this invoke the `pre_process_data.py` script as follows:
+
+    python pre_process_data.py --gt3x-dir <gt3x_data_dir> --activpal-dir <activpal_data_dir> --pre-processed-dir <output_dir>
+
+Complete usage details of this script are as follows:
+
+    usage: pre_process_data.py [-h] --gt3x-dir GT3X_DIR --pre-processed-dir
+                           PRE_PROCESSED_DIR [--activpal-dir ACTIVPAL_DIR]
+                           [--window-size WINDOW_SIZE]
+                           [--gt3x-frequency GT3X_FREQUENCY]
+                           [--activpal-label-map ACTIVPAL_LABEL_MAP]
+                           [--silent]
+
+    Argument parser for preprocessing the input data.
+
+    required arguments:
+    --gt3x-dir GT3X_DIR   GT3X data directory
+    --pre-processed-dir PRE_PROCESSED_DIR
+                            Pre-processed data directory
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    --activpal-dir ACTIVPAL_DIR
+                            ActivPAL data directory
+    --window-size WINDOW_SIZE
+                            Window size in seconds on which the predictions to be
+                            made
+    --gt3x-frequency GT3X_FREQUENCY
+                            GT3X device frequency in Hz
+    --activpal-label-map ACTIVPAL_LABEL_MAP
+                            ActivPal label vocabulary
+    --silent              Whether to hide info messages
+
+## Generating Predictions
+You can use the released pre-trained models to generate predictions using your own data. To do so invoke the `make_predictions.py` as follows:
+
+    python make_predictions.py --pre-processed-dir <pre-processed-dir> --predictions-dir <predictions-dir>
+
+Complete usage details of this script are as follows:
+
+    usage: make_predictions.py [-h] --pre-processed-dir PRE_PROCESSED_DIR
+                            [--predictions-dir PREDICTIONS_DIR]
+                            [--batch-size BATCH_SIZE]
+                            [--num-classes NUM_CLASSES]
+                            [--window-size WINDOW_SIZE]
+                            [--gt3x-frequency GT3X_FREQUENCY] [--no-label]
+                            [--model-checkpoint-path MODEL_CHECKPOINT_PATH]
+                            [--remove-gravity] [--silent]
+
+    Argument parser for generating model predictions.
+
+    required arguments:
+    --pre-processed-dir PRE_PROCESSED_DIR
+                            Pre-processed data directory
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    --predictions-dir PREDICTIONS_DIR
+                            Training batch size
+    --batch-size BATCH_SIZE
+                            Training batch size
+    --num-classes NUM_CLASSES
+                            Number of classes in the training dataset
+    --window-size WINDOW_SIZE
+                            Window size in seconds on which the predictions to be
+                            made
+    --gt3x-frequency GT3X_FREQUENCY
+                            GT3X device frequency in Hz
+    --no-label            Whether to not output the label
+    --model-checkpoint-path MODEL_CHECKPOINT_PATH
+                            Path where the trained model will be saved
+    --remove-gravity      Whether to remove gravity from accelerometer data
+    --silent              Whether to hide info messages
+
+## Training Your Own Model
+To train your own model invoke the `train_model.py` as follows:
+
+    python --pre-processed-dir <pre-processed-dir> --model-checkpoint-path <checkpoint-dir>
+
+Complete usage details of this script are as follows:
+
+    usage: train_model.py [-h] --pre-processed-dir PRE_PROCESSED_DIR
+                        [--learning-rate LEARNING_RATE]
+                        [--num-epochs NUM_EPOCHS] [--batch-size BATCH_SIZE]
+                        [--dropout-rate DROPOUT_RATE]
+                        [--shuffle-buffer-size SHUFFLE_BUFFER_SIZE]
+                        [--training-data-fraction TRAINING_DATA_FRACTION]
+                        [--validation-data-fraction VALIDATION_DATA_FRACTION]
+                        [--testing-data-fraction TESTING_DATA_FRACTION]
+                        [--model-checkpoint-path MODEL_CHECKPOINT_PATH]
+                        [--window-size WINDOW_SIZE]
+                        [--gt3x-frequency GT3X_FREQUENCY]
+                        [--num-classes NUM_CLASSES]
+                        [--class-weights CLASS_WEIGHTS] [--remove-gravity]
+                        [--silent]
+
+    Argument parser for training CNN model.
+
+    required arguments:
+    --pre-processed-dir PRE_PROCESSED_DIR
+                            Pre-processed data directory
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    --learning-rate LEARNING_RATE
+                            Learning rate for training the model
+    --num-epochs NUM_EPOCHS
+                            Number of epochs to train the model
+    --batch-size BATCH_SIZE
+                            Training batch size
+    --dropout-rate DROPOUT_RATE
+                            Dropout rate during training
+    --shuffle-buffer-size SHUFFLE_BUFFER_SIZE
+                            Training data shuffle buffer size in terms of number
+                            of records
+    --training-data-fraction TRAINING_DATA_FRACTION
+                            Percentage of subjects to be used for training
+    --validation-data-fraction VALIDATION_DATA_FRACTION
+                            Percentage of subjects to be used for validation
+    --testing-data-fraction TESTING_DATA_FRACTION
+                            Percentage of subjects to be used for testing
+    --model-checkpoint-path MODEL_CHECKPOINT_PATH
+                            Path where the trained model will be saved
+    --window-size WINDOW_SIZE
+                            Window size in seconds on which the predictions to be
+                            made
+    --gt3x-frequency GT3X_FREQUENCY
+                            GT3X device frequency in Hz
+    --num-classes NUM_CLASSES
+                            Number of classes in the training dataset
+    --class-weights CLASS_WEIGHTS
+                            Class weights for loss aggregation
+    --remove-gravity      Whether to remove gravity from accelerometer data
+    --silent              Whether to hide info messages
+
+Notice that this script relies on several hyperparameters required for training the model such as learning rate, batch size, and number of training epochs etc. The script comes with set of default values for these parameters. However, you may need to tweak these parameters for your dataset to get the best performance.
+
+After training your own model you can use it to generate predictions by passing the model checkpoint path to the `make_predictions.py` script as follows:
+
+    python make_predictions.py --pre-processed-dir <pre-processed-dir> --predictions-dir <predictions-dir> --model-checkpoint-path <checkpoint-dir>
@@ -1,3 +1,18 @@
+# Copyright 2020 Supun Nakandala. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
 import os
 import numpy as np
 import tensorflow as tf
@@ -22,11 +37,12 @@ def remove_gravity(acc, gt3x_frequency):
     return np.expand_dims(acc, axis=0)
 
 
-def data_generator(pre_processed_dir, subjects, gt3x_frequency, no_remove_gravity, include_time=False):
+def data_generator(pre_processed_dir, subjects, gt3x_frequency, remove_gravity, include_time=False):
     for i in subjects:
         temp = pd.read_pickle(os.path.join(pre_processed_dir, str(i)+".bin"))
+
         acc = temp[["Accelerometer"]].values.tolist()
-        if not no_remove_gravity:
+        if remove_gravity:
             acc = [remove_gravity(x, gt3x_frequency) for x in acc]
         timestamps = pd.to_datetime(temp.Time).dt.strftime('%Y-%m-%d %H:%M:%S').values.tolist()
         if 'Behavior' in temp.columns:
@@ -48,27 +64,27 @@ def cnn_model(x, num_classes, training, keep_prob=None):
     data_format = 'channels_last'
     x = tf.transpose(x, [0, 2, 3, 1])
 
-    conv1 = tf.compat.v1.layers.conv2d(inputs=x, filters=32, kernel_size=[5, 3], data_format=data_format, padding= "valid", activation=tf.nn.relu)
-    pool1 = tf.compat.v1.layers.max_pooling2d(conv1, [2, 1], 2, padding='same', data_format=data_format)
+    conv1 = tf.layers.conv2d(inputs=x, filters=32, kernel_size=[5, 3], data_format=data_format, padding= "valid", activation=tf.nn.relu)
+    pool1 = tf.layers.max_pooling2d(conv1, [2, 1], 2, padding='same', data_format=data_format)
 
-    conv2 = tf.compat.v1.layers.conv2d(inputs=pool1, filters=64, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
-    pool2 = tf.compat.v1.layers.max_pooling2d(conv2, [2, 1], 2, padding='same', data_format=data_format)
+    conv2 = tf.layers.conv2d(inputs=pool1, filters=64, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
+    pool2 = tf.layers.max_pooling2d(conv2, [2, 1], 2, padding='same', data_format=data_format)
 
-    conv3 = tf.compat.v1.layers.conv2d(inputs=pool2, filters=128, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
-    pool3 = tf.compat.v1.layers.max_pooling2d(conv3, [2, 1], 2, padding='same', data_format=data_format)
+    conv3 = tf.layers.conv2d(inputs=pool2, filters=128, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
+    pool3 = tf.layers.max_pooling2d(conv3, [2, 1], 2, padding='same', data_format=data_format)
 
-    conv4 = tf.compat.v1.layers.conv2d(inputs=pool3, filters=256, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
+    conv4 = tf.layers.conv2d(inputs=pool3, filters=256, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
     if keep_prob is not None:
-        conv4 = tf.compat.v1.layers.dropout(conv4, rate=keep_prob, training=training)
-    pool4 = tf.compat.v1.layers.max_pooling2d(conv4, [2, 1], 2, padding='same', data_format=data_format)
+        conv4 = tf.layers.dropout(conv4, rate=keep_prob, training=training)
+    pool4 = tf.layers.max_pooling2d(conv4, [2, 1], 2, padding='same', data_format=data_format)
 
-    conv5 = tf.compat.v1.layers.conv2d(inputs=pool4, filters=512, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
+    conv5 = tf.layers.conv2d(inputs=pool4, filters=512, kernel_size=[5, 1], data_format=data_format, padding= "same", activation=tf.nn.relu)
     if keep_prob is not None:
-        conv5 = tf.compat.v1.layers.dropout(conv5, rate=keep_prob, training=training)
-    pool5 = tf.compat.v1.layers.max_pooling2d(conv5, [2, 1], 2, padding='same', data_format=data_format)
+        conv5 = tf.layers.dropout(conv5, rate=keep_prob, training=training)
+    pool5 = tf.layers.max_pooling2d(conv5, [2, 1], 2, padding='same', data_format=data_format)
 
     num_features = np.prod(pool5.get_shape().as_list()[1:])
 
-    logits = tf.compat.v1.layers.dense(inputs=tf.reshape(pool5,(-1, num_features)), units=num_classes)
+    logits = tf.layers.dense(inputs=tf.reshape(pool5,(-1, num_features)), units=num_classes)
 
     return logits
@@ -1,3 +1,18 @@
+# Copyright 2020 Supun Nakandala. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
 import os
 import sys
 import pandas as pd
@@ -18,12 +33,13 @@
     required_arguments.add_argument('--pre-processed-dir', help='Pre-processed data directory', required=True)
 
     optional_arguments.add_argument('--predictions-dir', help='Training batch size', default='./predictions', required=False)
-    optional_arguments.add_argument('--batch-size', help='Training batch size', default=256, required=False)
-    optional_arguments.add_argument('--num-classes', help='Number of classes in the training dataset', default=3, required=False)
-    optional_arguments.add_argument('--window-size', help='Window size in seconds on which the predictions to be made', default=5, required=False)
-    optional_arguments.add_argument('--gt3x-frequency', help='GT3X device frequency in Hz', default=30, required=False)
+    optional_arguments.add_argument('--batch-size', help='Training batch size', default=256, type=int, required=False)
+    optional_arguments.add_argument('--num-classes', help='Number of classes in the training dataset', default=3, type=int, required=False)
+    optional_arguments.add_argument('--window-size', help='Window size in seconds on which the predictions to be made', default=3, type=int, required=False)
+    optional_arguments.add_argument('--gt3x-frequency', help='GT3X device frequency in Hz', default=30, type=int, required=False)
     optional_arguments.add_argument('--no-label', help='Whether to not output the label', default=False, required=False, action='store_true')
     optional_arguments.add_argument('--model-checkpoint-path', help='Path where the trained model will be saved', default='./pre-trained-model', required=False)
+    optional_arguments.add_argument('--remove-gravity', help='Whether to remove gravity from accelerometer data', default=False, required=False, action='store_true')
     optional_arguments.add_argument('--silent', help='Whether to hide info messages', default=False, required=False, action='store_true')
     parser._action_groups.append(optional_arguments)
     args = parser.parse_args()
@@ -34,32 +50,34 @@
     subject_ids = [fname.split('.')[0] for fname in os.listdir(args.pre_processed_dir) if fname.endswith('.bin')]
 
     in_size = args.gt3x_frequency * args.window_size
-    iterator =  tf.compat.v1.data.Iterator.from_structure((tf.float32, tf.int32, tf.string), ((None, 1, in_size, 3), (None, 1), (None)))
+    iterator =  tf.data.Iterator.from_structure((tf.float32, tf.int32, tf.string), ((None, 1, in_size, 3), (None, 1), (None)))
     iterator_init_ops = []
 
     for subject_id in subject_ids:
-        dataset = tf.compat.v1.data.Dataset.from_generator(lambda: data_generator(args.pre_processed_dir, [subject_id], include_time=True), output_types=(tf.float32, tf.int32, tf.string),
+        dataset = tf.data.Dataset.from_generator(lambda: data_generator(args.pre_processed_dir, [subject_id], args.gt3x_frequency, args.remove_gravity, include_time=True), output_types=(tf.float32, tf.int32, tf.string),
                 output_shapes=((1, in_size, 3), (1,), ())).batch(args.batch_size)
         iterator_init_ops.append(iterator.make_initializer(dataset))
 
     x, y, t = iterator.get_next()
-    p = tf.argmax(cnn_model(x, args.num_classes), axis=1)
 
-    saver = tf.compat.v1.train.Saver()
-    with tf.compat.v1.Session() as sess:
+    training = tf.placeholder(tf.bool)
+    keep_prob = tf.placeholder(tf.float32)
+    p = tf.argmax(cnn_model(x, args.num_classes, training), axis=1)
+
+    saver = tf.train.Saver()
+    with tf.Session() as sess:
         saver.restore(sess, os.path.join(args.model_checkpoint_path, 'model'))
 
-        for subject_id, init_op in zip(subject_ids, iterator_init_ops):    
+        for subject_id, init_op in zip(subject_ids, iterator_init_ops):
             if not args.silent:
-                print('Generating predictions for: {}'.format(subject_ids))
-            
+                print('Generating predictions for: {}'.format(subject_id))
             sess.run(init_op)
             ts = []
             ys = []
             ps = []
             while True:
                 try:
-                    temp = [v.flatten().tolist() for v in sess.run([t, y, p])]
+                    temp = [v.flatten().tolist() for v in sess.run([t, y, p], feed_dict={training: False})]
                     ts.extend(temp[0])
                     ys.extend(temp[1])
                     ps.extend(temp[2])