Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
anuragranj committed Mar 11, 2019
1 parent 986b38d commit afd4078
Show file tree
Hide file tree
Showing 60 changed files with 17,564 additions and 8 deletions.
22 changes: 22 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
*.pyc
*.pth
*.tar
*.sub
*.npy
*.jpg
*.png
*.zip
main.sh
checkpoints*
visualize/*
!visualize/*.py
log/*
config/*
models/spynet_models/*
dockers/*
test_script.sh
kitti_data/*
datasets/mnist/
datasets/svhn/
results/*
pretrained/*
133 changes: 125 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,125 @@
# Adversarial Collaboration
This is an official repository of
**Adversarial Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation**

[[Project Page]](http://research.nvidia.com/publication/2018-05_Adversarial-Collaboration-Joint)
[[Arxiv]](https://arxiv.org/abs/1805.09806)

### We will release the code soon.
# Competitive Collaboration
This is an official repository of
**Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation**. The project was formerly referred by **Adversarial Collaboration**. We recently ported the entire code to `pytorch-1.0`, so if you discover bugs, please file an issue.

[[Project Page]](http://research.nvidia.com/publication/2018-05_Adversarial-Collaboration-Joint)
[[Arxiv]](https://arxiv.org/abs/1805.09806)

Skip to:
- [Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation](#jointcc)
- [Mixed Domain Learning using MNIST+SVHN](#mnist)
- [Downloads](#downloads)

### Prerequisites
Python3 and pytorch are required. Third party libraries can be installed (in a `python3 ` virtualenv) using:

```bash
pip3 install -r requirements.txt
```
<a name="jointcc"></a>
## Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation

### Preparing training data

#### KITTI
For [KITTI](http://www.cvlibs.net/datasets/kitti/raw_data.php), first download the dataset using this [script](http://www.cvlibs.net/download.php?file=raw_data_downloader.zip) provided on the official website, and then run the following command.

```bash
python3 data/prepare_train_data.py /path/to/raw/kitti/dataset/ --dataset-format 'kitti' --dump-root /path/to/resulting/formatted/data/ --width 832 --height 256 --num-threads 1 --static-frames data/static_frames.txt --with-gt
```

For testing optical flow ground truths on KITTI, download [KITTI2015](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=flow) dataset. You need to download 1) `stereo 2015/flow 2015/scene flow 2015` data set (2 GB), 2) `multi-view extension` (14 GB), and 3) `calibration files` (1 MB) . In addition, download semantic labels from [here](https://keeper.mpdl.mpg.de/f/239c2dda94e54c449401/?dl=1). You should have the following directory structure:
```
kitti2015
| data_scene_flow
| data_scene_flow_calib
| data_scene_flow_multiview
| data_stereo_flow
| semantic_labels
```

#### Cityscapes

For [Cityscapes](https://www.cityscapes-dataset.com/), download the following packages: 1) `leftImg8bit_sequence_trainvaltest.zip`, 2) `camera_trainvaltest.zip`. You will probably need to contact the administrators to be able to get it.

```bash
python3 data/prepare_train_data.py /path/to/cityscapes/dataset/ --dataset-format 'cityscapes' --dump-root /path/to/resulting/formatted/data/ --width 832 --height 342 --num-threads 1
```

Notice that for Cityscapes the `img_height` is set to 342 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 256.

### Training an experiment

Once the data are formatted following the above instructions, you should be able to run a training experiment. Every experiment you run gets logged in `experiment_recorder.md`.

```bash
python3 train.py /path/to/formatted/data --dispnet DispResNet6 --posenet PoseNetB6 \
--masknet MaskNet6 --flownet Back2Future --pretrained-disp /path/to/pretrained/dispnet \
--pretrained-pose /path/to/pretrained/posenet --pretrained-flow /path/to/pretrained/flownet \
--pretrained-mask /path/to/pretrained/masknet -b4 -m0.1 -pf 0.5 -pc 1.0 -s0.1 -c0.3 \
--epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997 --with-flow-gt \
--with-depth-gt --epochs 100 --smoothness-type edgeaware --fix-masknet --fix-flownet \
--log-terminal --name EXPERIMENT_NAME
```


You can then start a `tensorboard` session in this folder by
```bash
tensorboard --logdir=checkpoints/
```
and visualize the training progress by opening [https://localhost:6006](https://localhost:6006) on your browser.

### Evaluation

Disparity evaluation
```bash
python3 test_disp.py --dispnet DispResNet6 --pretrained-dispnet /path/to/dispnet --pretrained-posent /path/to/posenet --dataset-dir /path/to/KITTI_raw --dataset-list /path/to/test_files_list
```

Test file list is available in kitti eval folder. To get fair comparison with [Original paper evaluation code](https://github.com/tinghuiz/SfMLearner/blob/master/kitti_eval/eval_depth.py), don't specify a posenet. However, if you do, it will be used to solve the scale factor ambiguity, the only ground truth used to get it will be vehicle speed which is far more acceptable for real conditions quality measurement, but you will obviously get worse results.

For pose evaluation, you need to download [KITTI Odometry](http://www.cvlibs.net/datasets/kitti/eval_odometry.php) dataset.
```bash
python test_pose.py pretrained/pose_model_best.pth.tar --img-width 832 --img-height 256 --dataset-dir /path/to/kitti/odometry/ --sequences 09 --posenet PoseNetB6
```

Optical Flow evaluation
```bash
python test_flow.py --pretrained-disp /path/to/dispnet --pretrained-pose /path/to/posenet --pretrained-mask /path/to/masknet --pretrained-flow /path/to/flownet --kitti-dir /path/to/kitti2015/dataset
```

Mask evaluation
```bash
python test_mask.py --pretrained-disp /path/to/dispnet --pretrained-pose /path/to/posenet --pretrained-mask /path/to/masknet --pretrained-flow /path/to/flownet --kitti-dir /path/to/kitti2015/dataset
```

<a name="mnist"></a>
## Mixed Domain Learning using MNIST+SVHN

#### Training
For learning classification using Competitive Collaboration with two agents, Alice and Bob, run,
```bash
python3 mnist.py path/to/download/mnist/svhn/datasets/ --name EXP_NAME --log-output --log-terminal --epoch-size 1000 --epochs 400 --wr 1000
```

#### Evaluation
To evaluate the performance of Alice, Bob and Moderator trained using CC, run,
```bash
python3 mnist_eval.py path/to/mnist/svhn/datasets --pretrained-alice pretrained/mnist_svhn/alice.pth.tar --pretrained-bob pretrained/mnist_svhn/bob.pth.tar --pretrained-mod pretrained/mnist_svhn/mod.pth.tar
```

<a name="downloads"></a>
## Downloads
#### Pretrained Models
- [DispNet, PoseNet, MaskNet and FlowNet](https://keeper.mpdl.mpg.de/f/72e946daa4e0481fb735/?dl=1) in joint unsupervised learning of depth, camera motion, optical flow and motion segmentation.
- [Alice, Bob and Moderator](https://keeper.mpdl.mpg.de/f/d0c7d4ebd0d74b84bf10/?dl=1) in Mixed Domain Classification

#### Evaluation Data
- [Semantic Labels for KITTI](https://keeper.mpdl.mpg.de/f/239c2dda94e54c449401/?dl=1)

## Acknowlegements
We thank Frederik Kunstner for verifying the convergence proofs. We are grateful to Clement Pinard for his [github repository](https://github.com/ClementPinard/SfmLearner-Pytorch). We use it as our initial code base. We thank Georgios Pavlakos for helping us with several revisions of the paper. We thank Joel Janai for preparing optical flow visualizations, and Clement Gorard for his Make3d evaluation code.


## References
*Anurag Ranjan, Varun Jampani, Lukas Balles, Deqing Sun, Kihwan Kim, Jonas Wulff and Michael J. Black.* **Competitive Collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation.** CVPR 2019.
137 changes: 137 additions & 0 deletions custom_transforms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
from __future__ import division
import torch
import random
import numpy as np
from scipy.misc import imresize, imrotate

'''Set of tranform random routines that takes list of inputs as arguments,
in order to have random but coherent transformations.'''


class Compose(object):
def __init__(self, transforms):
self.transforms = transforms

def __call__(self, images, intrinsics):
for t in self.transforms:
images, intrinsics = t(images, intrinsics)
return images, intrinsics


class Normalize(object):
def __init__(self, mean, std):
self.mean = mean
self.std = std

def __call__(self, images, intrinsics):
for tensor in images:
for t, m, s in zip(tensor, self.mean, self.std):
t.sub_(m).div_(s)
return images, intrinsics


class NormalizeLocally(object):

def __call__(self, images, intrinsics):
image_tensor = torch.stack(images)
assert(image_tensor.size(1)==3) #3 channel image
mean = image_tensor.transpose(0,1).contiguous().view(3, -1).mean(1)
std = image_tensor.transpose(0,1).contiguous().view(3, -1).std(1)

for tensor in images:
for t, m, s in zip(tensor, mean, std):
t.sub_(m).div_(s)
return images, intrinsics


class ArrayToTensor(object):
"""Converts a list of numpy.ndarray (H x W x C) along with a intrinsics matrix to a list of torch.FloatTensor of shape (C x H x W) with a intrinsics tensor."""

def __call__(self, images, intrinsics):
tensors = []
for im in images:
# put it from HWC to CHW format
im = np.transpose(im, (2, 0, 1))
# handle numpy array
tensors.append(torch.from_numpy(im).float()/255)
return tensors, intrinsics


class RandomHorizontalFlip(object):
"""Randomly horizontally flips the given numpy array with a probability of 0.5"""

def __call__(self, images, intrinsics):
assert intrinsics is not None
if random.random() < 0.5:
output_intrinsics = np.copy(intrinsics)
output_images = [np.copy(np.fliplr(im)) for im in images]
w = output_images[0].shape[1]
output_intrinsics[0,2] = w - output_intrinsics[0,2]
else:
output_images = images
output_intrinsics = intrinsics
return output_images, output_intrinsics

class RandomRotate(object):
"""Randomly rotates images up to 10 degrees and crop them to keep same size as before."""
def __call__(self, images, intrinsics):
if np.random.random() > 0.5:
return images, intrinsics
else:
assert intrinsics is not None
rot = np.random.uniform(0,10)
rotated_images = [imrotate(im, rot) for im in images]

return rotated_images, intrinsics




class RandomScaleCrop(object):
"""Randomly zooms images up to 15% and crop them to keep same size as before."""
def __init__(self, h=0, w=0):
self.h = h
self.w = w

def __call__(self, images, intrinsics):
assert intrinsics is not None
output_intrinsics = np.copy(intrinsics)

in_h, in_w, _ = images[0].shape
x_scaling, y_scaling = np.random.uniform(1,1.1,2)
scaled_h, scaled_w = int(in_h * y_scaling), int(in_w * x_scaling)

output_intrinsics[0] *= x_scaling
output_intrinsics[1] *= y_scaling
scaled_images = [imresize(im, (scaled_h, scaled_w)) for im in images]

if self.h and self.w:
in_h, in_w = self.h, self.w

offset_y = np.random.randint(scaled_h - in_h + 1)
offset_x = np.random.randint(scaled_w - in_w + 1)
cropped_images = [im[offset_y:offset_y + in_h, offset_x:offset_x + in_w] for im in scaled_images]

output_intrinsics[0,2] -= offset_x
output_intrinsics[1,2] -= offset_y

return cropped_images, output_intrinsics

class Scale(object):
"""Scales images to a particular size"""
def __init__(self, h, w):
self.h = h
self.w = w

def __call__(self, images, intrinsics):
assert intrinsics is not None
output_intrinsics = np.copy(intrinsics)

in_h, in_w, _ = images[0].shape
scaled_h, scaled_w = self.h , self.w

output_intrinsics[0] *= (scaled_w / in_w)
output_intrinsics[1] *= (scaled_h / in_h)
scaled_images = [imresize(im, (scaled_h, scaled_w)) for im in images]

return scaled_images, output_intrinsics
Loading

0 comments on commit afd4078

Please sign in to comment.