Skip to content

A repository where you learn the basics for detecting objects, keypoints and making segmentation on images

Notifications You must be signed in to change notification settings

SerenaTetart/Object_Detection

Repository files navigation

Object_Detection

Table of contents

General info

In this repository you will learn the basics for detecting objects, keypoints or even making segmentation on images using models such as: EfficientDet, Mobilenet-SSD, U-Net...

Requirements

For the first project you only need a Google account with Colab and Drive. (I am using Colab pro for the training)

But if you want to train locally you need to install manually Tensorflow Object Detection, you'll find a good tutorial on this link

For the inference you need inference.py shared in this repository, the directory saved_model and the file label_map.pbtxt. (you can automatically download both by running the code in colab)

Project 1 - Object Detection using EfficientDet

What is EfficientDet ?

What is EfficientDet ?

I will assume you have some knowledge in computer vision and CNNs, if not you can skip this part.

EfficientDet is a family of deep learning models designed for object detection, EfficientDet7 achieved state of the art results on COCO dataset, it is both scalable and efficient meaning that it can recognize objects at vastly different scales and need fewer computational performance than the other models.

link to EfficientDet paper

To understand EfficientDet we need to understand two key improvements made:

  1. Bi-directional Feature Pyramid Network (BiFPN)
  2. Compound Scaling

Feature Pyramid Network:

Recognizing objects at different scale is a fundamental challenge in computer vision.

Different authors have tried to solve this differently, there were three main categories of solutions that existed before the introduction of FPN:

But they all have some issues:

  • Featurized image pyramid is too long to train and is infeasible in terms of memory because you need to train a CNN for every scales of an image.
  • Single feature map is actually used by Faster RCNN but lose representational capacity for object detection in the first layers with low level features embedding.
  • Pyramidal feature hierarchy is used by SSD, it avoids using low level features in the first levels of a CNN by directly using the high level feature at the end of a CNN and then adds several new layers but by doing so it misses the opportunity to reuse the earlier layers which are important for detecting small objects.

What Feature Pyramid Network does is to combine low-resolution, semantically strong features in the later layers with high-resolution and semantically weak features in the earlier layers via a top-down pathway and lateral connections. Thus, leading to Multi-scale feature fusion.

It is somehow similar to the architecture of U-Net when you think about it.

Bi-directional Feature Pyramid Network (BiFPN):

Feature network design and evolution of FPN

What EfficientDet, and BiFPN in particular, did was to:

  1. Add bottum-up path aggregation network, conventional top-down FPN is limited by the one-way information flow. (Making it bidirectional)
  2. Remove all nodes that have only one input edge. The intuition is that if a node has only one input edge with no feature fusion, then it will have less contribution to the feature network.
  3. Add an extra edge from the original input to the output node if they are at the same level in order to fuse more features without adding much cost
  4. Treat each bidirectional path as one single layer and have multiple of these to enable more high-level feature fusion.

Compound Scaling:

The second key improvement was made by EfficientNet (the backbone of EfficientDet) with compound scaling.

Previous work mostly scale up a baseline detector by employing bigger backbone networks (ResNets, AmoebaNet..) using larger input images or stacking more FPN layers. These methods are usually ineffective since they only focus on a single or limited scaling dimensions.

They proposed to use a single compound coefficient to jointly scale up all three dimensions while mantaining a balance between all dimensions of the network.

Model Scaling

EfficientDet final architecture:

To conclude by combining an EfficientNet backbone, a Bi directionnal Feature Pyramid Network and convolutions we get this:

EfficientDet model architecture

Training the model:

  1. First open the file ObjectDetection.ipynb of this repository in Colab.
  2. Then you need images in JPG format and annotations in Pascal VOC format (xml files). (You can use FastAnnotations, a framework that I made 😄)
  3. Once you have them simply put them in a zip file named data.zip, don't bother making a train/test or annotation folder everything will be handled automatically to make the process easier.
  4. And now you can upload data.zip to your Drive.
  5. Finally just run the code, it will train an EfficientDet0 model on the data you sent to Drive.

If you want to change the model to let's say EfficientDet5 or Mobilenet-SSD you need to download the model from Tensorflow Object Detection Zoo, for instance the changes needed for EfficientDet5 will be:

!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d0_coco17_tpu-32.tar.gz
!tar -xzvf efficientdet_d0_coco17_tpu-32.tar.gz

to

!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d5_coco17_tpu-32.tar.gz
!tar -xzvf efficientdet_d5_coco17_tpu-32.tar.gz

and

ModelName = 'efficientdet_d0_coco17_tpu-32'

to

ModelName = 'efficientdet_d5_coco17_tpu-32'

After the training this is the result that I get for the dataset Stanford Dogs (with 9 classes only):

And these are some test made with the new model trained:

Testing locally the model:

In order to use the model locally there are a few steps:

I- Download the trained model by running this code in the notebook in colab:

II - Copy and paste where you want the files:

this is where your files are downloaded

III - Modify the PATH variables and run inference.py or your custom code and enjoy:

Project 2 - Water Segmentation using U-Net

What is segmentation ?

What is segmentation ?

In this project we will try to identify water on images thanks to a dataset from Kaggle using a technique called Segmentation.

Segmentation is made with the use of an autoencoder which is an unsupervised Artificial Neural Network that attempts to encode the data by compressing it into the lower dimensions (bottleneck layer or code) and then decode the data to construct the targeted mask.

A mask is an image made of numbers or colors corresponding to the different classes present in the image.

Exemple of mask

Here we are going to use U-Net as our autoencoder which is a model generally used in medical segmentation in order to detect diseases or certain parts of the body in order to operate surgeries.

Below you can see the U-Net architecture:

U-Net architecture


(From 2015 U-Net paper)

In addition, if you want to improve the performances of your model you can add a pretrained model such as ResNet50 or VGG19 as the encoder at the start of your U-Net model and then attach decoder at the end

Training the model:

  1. First open the file Segmentation.ipynb of this repository in Colab.
  2. Then you need images and masks (jpg or png or other type of images...).
  3. Once you have them simply upload them to your Drive.
  4. Finally modify the DATASET_PATH variable and the different paths where the dataset is made.
for img in os.listdir(DATASET_PATH+'/Annotations/ADE20K'):
  for img2 in os.listdir(DATASET_PATH+'/JPEGImages/ADE20K'):
  1. Run the code and enjoy.

And these are some results I got after running the algorithm:

The model has some difficulties around the edges of the water but he has the idea, the reason is because I didn't use a pre-trained model and made everything from scratch with a few epochs.

Testing locally the model:

Just like the object detection above, we need to repeat the same steps:

I- Download the trained model by running this code in the notebook in colab:

II - Copy and paste where you want the files:

this is where your files are downloaded

III - Modify the PATH variables and run inference_segmentation.py or your custom code and enjoy:

Project 3 - Face key-point recognition using CNN

How does it work ?

How does it work ?

Key points can be used for a variety of tasks:

  • Apply a filter on a face
  • Detect emotions on a face
  • Identify someone based on their traits
  • Identify what action a person is doing... (Used for sports, or thiefs in a supermarket)

In order to achieve that we need a neural network composed of a CNN and fully connected layers that predict (x, y) coordinates for each key-points, for instance if we have 5 key-points we'll need a linear layer of 10 outputs (x1, y1, x2, y2, ..., x5, y5).

Neural network architecture (CNN + fully connected layers)

Training the model:

  • First open the file FacekeyPoint.ipynb of this repository in Colab.
  • Then you need images and annotations (jpg and you might use FastAnnotations).
  • Once you have them simply upload them to your Drive.
  • Finally modify the different paths where the dataset is made and run.

Testing locally the model:

About

A repository where you learn the basics for detecting objects, keypoints and making segmentation on images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published