gwl2108
diff --git a/‎video_prediction/README.md
+99 b/‎video_prediction/README.md
+99
diff --git a/‎video_prediction/download_data.sh
+55 b/‎video_prediction/download_data.sh
+55
diff --git a/‎video_prediction/lstm_ops.py
+110 b/‎video_prediction/lstm_ops.py
+110
diff --git a/‎video_prediction/prediction_input.py
+119 b/‎video_prediction/prediction_input.py
+119
@@ -0,0 +1,99 @@
+# Video Prediction with Neural Advection
+
+*A TensorFlow implementation of the models described in [Finn et al. (2016)]
+(http://arxiv.org/abs/1605.07157).*
+
+This video prediction model, which is optionally conditioned on actions,
+predictions future video by internally predicting how to transform the last
+image (which may have been predicted) into the next image. As a result, it can
+reuse apperance information from previous frames and can better generalize to
+objects not seen in the training set. Some example predictions on novel objects
+are shown below:
+
+![Animation](https://storage.googleapis.com/push_gens/novelgengifs9/16_70.gif)
+![Animation](https://storage.googleapis.com/push_gens/novelgengifs9/2_96.gif)
+![Animation](https://storage.googleapis.com/push_gens/novelgengifs9/1_38.gif)
+![Animation](https://storage.googleapis.com/push_gens/novelgengifs9/11_10.gif)
+![Animation](https://storage.googleapis.com/push_gens/novelgengifs9/3_34.gif)
+
+When the model is conditioned on actions, it changes it's predictions based on
+the passed in action. Here we show the models predictions in response to varying
+the magnitude of the passed in actions, from small to large:
+
+![Animation](https://storage.googleapis.com/push_gens/webgifs/0xact_0.gif)
+![Animation](https://storage.googleapis.com/push_gens/05xact_0.gif)
+![Animation](https://storage.googleapis.com/push_gens/webgifs/1xact_0.gif)
+![Animation](https://storage.googleapis.com/push_gens/webgifs/15xact_0.gif)
+
+![Animation](https://storage.googleapis.com/push_gens/webgifs/0xact_17.gif)
+![Animation](https://storage.googleapis.com/push_gens/webgifs/05xact_17.gif)
+![Animation](https://storage.googleapis.com/push_gens/webgifs/1xact_17.gif)
+![Animation](https://storage.googleapis.com/push_gens/webgifs/15xact_17.gif)
+
+
+Because the model is trained with an l2 objective, it represents uncertainty as
+blur.
+
+## Requirements
+* Tensorflow (see tensorflow.org for installation instructions)
+* spatial_tranformer model in tensorflow/models, for the spatial tranformer
+  predictor (STP).
+
+## Data
+The data used to train this model is located
+[here](https://sites.google.com/site/brainrobotdata/home/push-dataset).
+
+To download the robot data, run the following.
+```shell
+./download_data.sh
+```
+
+## Training the model
+
+To train the model, run the prediction_train.py file.
+```shell
+python prediction_train.py
+```
+
+There are several flags which can control the model that is trained, which are
+exeplified below:
+```shell
+python prediction_train.py \
+  --data_dir=push/push_train \ # path to the training set.
+  --model=CDNA \ # the model type to use - DNA, CDNA, or STP
+  --output_dir=./checkpoints \ # where to save model checkpoints
+  --event_log_dir=./summaries \ # where to save training statistics
+  --num_iterations=100000 \ # number of training iterations
+  --pretrained_model=model \ # path to model to initialize from, random if emtpy
+  --sequence_length=10 \ # the number of total frames in a sequence
+  --context_frames=2 \ # the number of ground truth frames to pass in at start
+  --use_state=1 \ # whether or not to condition on actions and the initial state
+  --num_masks=10 \ # the number of transformations and corresponding masks
+  --schedsamp_k=900.0 \ # the constant used for scheduled sampling or -1
+  --train_val_split=0.95 \ # the percentage of training data for validation
+  --batch_size=32 \ # the training batch size
+  --learning_rate=0.001 \ # the initial learning rate for the Adam optimizer
+```
+
+If the dynamic neural advection (DNA) model is being used, the `--num_masks`
+option should be set to one.
+
+The `--context_frames` option defines both the number of initial ground truth
+frames to pass in, as well as when to start penalizing the model's predictions.
+
+The data directory `--data_dir` should contain tfrecord files with the format
+used in the released push dataset. See
+[here](https://sites.google.com/site/brainrobotdata/home/push-dataset) for
+details. If the `--use_state` option is not set, then the data only needs to
+contain image sequences, not states and actions.
+
+
+## Contact
+
+To ask questions or report issues please open an issue on the tensorflow/models
+[issues tracker](https://github.com/tensorflow/models/issues).
+Please assign issues to @cbfinn.
+
+## Credits
+
+This code was written by Chelsea Finn.
@@ -0,0 +1,55 @@
+#!/bin/bash
+# Copyright 2016 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+
+# Example:
+#
+#   download_dataset.sh datafiles.txt ./tmp
+#
+# will download all of the files listed in the file, datafiles.txt, into
+# a directory, "./tmp".
+#
+# Each line of the datafiles.txt file should contain the path from the
+# bucket root to a file.
+
+ARGC="$#"
+LISTING_FILE=push_datafiles.txt
+if [ "${ARGC}" -ge 1 ]; then
+  LISTING_FILE=$1
+fi
+OUTPUT_DIR="./"
+if [ "${ARGC}" -ge 2 ]; then
+  OUTPUT_DIR=$2
+fi
+
+echo "OUTPUT_DIR=$OUTPUT_DIR"
+
+mkdir "${OUTPUT_DIR}"
+
+function download_file {
+  FILE=$1
+  BUCKET="https://storage.googleapis.com/brain-robotics-data"
+  URL="${BUCKET}/${FILE}"
+  OUTPUT_FILE="${OUTPUT_DIR}/${FILE}"
+  DIRECTORY=`dirname ${OUTPUT_FILE}`
+  echo DIRECTORY=$DIRECTORY
+  mkdir -p "${DIRECTORY}"
+  curl --output ${OUTPUT_FILE} ${URL}
+}
+
+while read filename; do
+  download_file $filename
+done <${LISTING_FILE}
@@ -0,0 +1,110 @@
+# Copyright 2016 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Convolutional LSTM implementation."""
+
+import tensorflow as tf
+
+from tensorflow.contrib.slim import add_arg_scope
+from tensorflow.contrib.slim import layers
+
+
+def init_state(inputs,
+               state_shape,
+               state_initializer=tf.zeros_initializer,
+               dtype=tf.float32):
+  """Helper function to create an initial state given inputs.
+
+  Args:
+    inputs: input Tensor, at least 2D, the first dimension being batch_size
+    state_shape: the shape of the state.
+    state_initializer: Initializer(shape, dtype) for state Tensor.
+    dtype: Optional dtype, needed when inputs is None.
+  Returns:
+     A tensors representing the initial state.
+  """
+  if inputs is not None:
+    # Handle both the dynamic shape as well as the inferred shape.
+    inferred_batch_size = inputs.get_shape().with_rank_at_least(1)[0]
+    batch_size = tf.shape(inputs)[0]
+    dtype = inputs.dtype
+  else:
+    inferred_batch_size = 0
+    batch_size = 0
+
+  initial_state = state_initializer(
+      tf.pack([batch_size] + state_shape),
+      dtype=dtype)
+  initial_state.set_shape([inferred_batch_size] + state_shape)
+
+  return initial_state
+
+
+@add_arg_scope
+def basic_conv_lstm_cell(inputs,
+                         state,
+                         num_channels,
+                         filter_size=5,
+                         forget_bias=1.0,
+                         scope=None,
+                         reuse=None):
+  """Basic LSTM recurrent network cell, with 2D convolution connctions.
+
+  We add forget_bias (default: 1) to the biases of the forget gate in order to
+  reduce the scale of forgetting in the beginning of the training.
+
+  It does not allow cell clipping, a projection layer, and does not
+  use peep-hole connections: it is the basic baseline.
+
+  Args:
+    inputs: input Tensor, 4D, batch x height x width x channels.
+    state: state Tensor, 4D, batch x height x width x channels.
+    num_channels: the number of output channels in the layer.
+    filter_size: the shape of the each convolution filter.
+    forget_bias: the initial value of the forget biases.
+    scope: Optional scope for variable_scope.
+    reuse: whether or not the layer and the variables should be reused.
+
+  Returns:
+     a tuple of tensors representing output and the new state.
+  """
+  spatial_size = inputs.get_shape()[1:3]
+  if state is None:
+    state = init_state(inputs, list(spatial_size) + [2 * num_channels])
+  with tf.variable_scope(scope,
+                         'BasicConvLstmCell',
+                         [inputs, state],
+                         reuse=reuse):
+    inputs.get_shape().assert_has_rank(4)
+    state.get_shape().assert_has_rank(4)
+    c, h = tf.split(3, 2, state)
+    inputs_h = tf.concat(3, [inputs, h])
+    # Parameters of gates are concatenated into one conv for efficiency.
+    i_j_f_o = layers.conv2d(inputs_h,
+                            4 * num_channels, [filter_size, filter_size],
+                            stride=1,
+                            activation_fn=None,
+                            scope='Gates')
+
+    # i = input_gate, j = new_input, f = forget_gate, o = output_gate
+    i, j, f, o = tf.split(3, 4, i_j_f_o)
+
+    new_c = c * tf.sigmoid(f + forget_bias) + tf.sigmoid(i) * tf.tanh(j)
+    new_h = tf.tanh(new_c) * tf.sigmoid(o)
+
+    return new_h, tf.concat(3, [new_c, new_h])
+
+
+
@@ -0,0 +1,119 @@
+# Copyright 2016 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Code for building the input for the prediction model."""
+
+import os
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.platform import flags
+from tensorflow.python.platform import gfile
+
+
+FLAGS = flags.FLAGS
+
+# Original image dimensions
+ORIGINAL_WIDTH = 640
+ORIGINAL_HEIGHT = 512
+COLOR_CHAN = 3
+
+# Default image dimensions.
+IMG_WIDTH = 64
+IMG_HEIGHT = 64
+
+# Dimension of the state and action.
+STATE_DIM = 5
+
+
+def build_tfrecord_input(training=True):
+  """Create input tfrecord tensors.
+
+  Args:
+    training: training or validation data.
+  Returns:
+    list of tensors corresponding to images, actions, and states. The images
+    tensor is 5D, batch x time x height x width x channels. The state and
+    action tensors are 3D, batch x time x dimension.
+  Raises:
+    RuntimeError: if no files found.
+  """
+  filenames = gfile.Glob(os.path.join(FLAGS.data_dir, '*'))
+  if not filenames:
+    raise RuntimeError('No data files found.')
+  index = int(np.floor(FLAGS.train_val_split * len(filenames)))
+  if training:
+    filenames = filenames[:index]
+  else:
+    filenames = filenames[index:]
+  filename_queue = tf.train.string_input_producer(filenames, shuffle=True)
+  reader = tf.TFRecordReader()
+  _, serialized_example = reader.read(filename_queue)
+
+  image_seq, state_seq, action_seq = [], [], []
+
+  for i in range(FLAGS.sequence_length):
+    image_name = 'move/' + str(i) + '/image/encoded'
+    action_name = 'move/' + str(i) + '/commanded_pose/vec_pitch_yaw'
+    state_name = 'move/' + str(i) + '/endeffector/vec_pitch_yaw'
+    if FLAGS.use_state:
+      features = {image_name: tf.FixedLenFeature([1], tf.string),
+                  action_name: tf.FixedLenFeature([STATE_DIM], tf.float32),
+                  state_name: tf.FixedLenFeature([STATE_DIM], tf.float32)}
+    else:
+      features = {image_name: tf.FixedLenFeature([1], tf.string)}
+    features = tf.parse_single_example(serialized_example, features=features)
+
+    image_buffer = tf.reshape(features[image_name], shape=[])
+    image = tf.image.decode_jpeg(image_buffer, channels=COLOR_CHAN)
+    image.set_shape([ORIGINAL_HEIGHT, ORIGINAL_WIDTH, COLOR_CHAN])
+
+    if IMG_HEIGHT != IMG_WIDTH:
+      raise ValueError('Unequal height and width unsupported')
+
+    crop_size = min(ORIGINAL_HEIGHT, ORIGINAL_WIDTH)
+    image = tf.image.resize_image_with_crop_or_pad(image, crop_size, crop_size)
+    image = tf.reshape(image, [1, crop_size, crop_size, COLOR_CHAN])
+    image = tf.image.resize_bicubic(image, [IMG_HEIGHT, IMG_WIDTH])
+    image = tf.cast(image, tf.float32) / 255.0
+    image_seq.append(image)
+
+    if FLAGS.use_state:
+      state = tf.reshape(features[state_name], shape=[1, STATE_DIM])
+      state_seq.append(state)
+      action = tf.reshape(features[action_name], shape=[1, STATE_DIM])
+      action_seq.append(action)
+
+  image_seq = tf.concat(0, image_seq)
+
+  if FLAGS.use_state:
+    state_seq = tf.concat(0, state_seq)
+    action_seq = tf.concat(0, action_seq)
+    [image_batch, action_batch, state_batch] = tf.train.batch(
+        [image_seq, action_seq, state_seq],
+        FLAGS.batch_size,
+        num_threads=FLAGS.batch_size,
+        capacity=100 * FLAGS.batch_size)
+    return image_batch, action_batch, state_batch
+  else:
+    image_batch = tf.train.batch(
+        [image_seq],
+        FLAGS.batch_size,
+        num_threads=FLAGS.batch_size,
+        capacity=100 * FLAGS.batch_size)
+    zeros_batch = tf.zeros([FLAGS.batch_size, FLAGS.sequence_length, STATE_DIM])
+    return image_batch, zeros_batch, zeros_batch
+