Introduction to CNN

Let's Understand what is an Image:

Most of the colour images we see are RGB(3 channel images).RGB image is 8-bits(industry standard)

And Grayscale image has 1 channel with 8 bit(0-255)

Intensity values for binary image is defined by bits. It only have two colors Black and White.

Why RGB not something else(why not RYB)

A color is defined by one specific wave length or a mixture of these wave lengths. There is a spectrum of colors that only consist of a single wave length, like a specific kind of Red, Blue, Yellow, Green, etc. Mixing these wave lengths at a specific ratio we get colors like White, Brown, and Pink. This is the physical view on colors. However, our eyes have receptors for 3 different colors/wave lengths. The combination of the neural signal strength on these three receptors are interpreted by our brain as a specific color. Simply speaking our cone cells can receive Blue, Green, and Red light. If you see something that is pure Yellow, the Red and Green cones are stimulated. If you now stimulate the cones with the right combination of a pure Red and pure Green wave length your brain will also perceive this as Yellow. Your eye cannot tell the difference. Based on this assumption how the eye works the previous answers are correct. This is why we use RGB for additive colors and CMYK for subtractive colors. This is just the natural result of how we perceive physics (through our eyes). The alternative would be to describe each color (when we want to store it) as a combination of an infinite number of wave length. This is not feasible at all. Why use an infinite number of values when 3 numbers in RGB will do the trick for our eyes?

additive vs subtractive colour

whats' a channel

Images are usually represented as Height x Width x #Channels where #Channels is 3 for RGB images and 1 for grayscale images. Sometimes you see Width x Height x #Channels, but the third dimension is the “channels.”

Note: in deep learning libraries like caffe, the fundamental “activation” tensor is a N x C x H x W quantity. The rearrangement makes sense for deep learning, and in this case the second dimension is the channels.

Fom DNN perspective, channels are where similar features are gathered together

Images and Channels

How does brain visualize

Convolution

why convolution

Facebook’s Yann LeCun helped establish how we use CNNs today—as multiple layers of neurons for processing more complex features at deeper layers of the network. Their properties differ from traditional feed-forward neural networks in some important ways that make them effective at image-based problems. For example, images are frequently best-represented in multiple dimensions: height, width, and depth, where depth corresponds to the number of channels used for each pixel. If an image is encoded with a depth of three, for example, those channels would likely correspond to the red, blue, and green channels. In some workflows, using a traditional fully-connected network could lead to a drastic increase in the number of model parameters—in a 670 x 1040 image, this would be over two million weights in the first layer of the network alone! This would almost certainly lead to something we in the data sciences refer to as “over-fitting”—building a model that is really good at classifying the images that were used to estimate the weight parameters, such as puppies, but really bad at classifying images it hasn’t already been exposed to, such as a puppy wearing a hat. CNNs minimize the number of parameters by allowing different parts of the network to specialize in high-level features like a texture or a repeating pattern.

CNN history

Edge detection:

We can say that sudden changes of discontinuities in an image are called as edges. Significant transitions in an image are called as edges.

Types of edges Generally edges are of three types:

Horizontal edges

Vertical Edges

Diagonal Edges

Why detect edges

Most of the shape information of an image is enclosed in edges. So first we detect these edges in an image and by using these filters and then by enhancing those areas of image which contains edges, sharpness of the image will increase and image will become clearer.

How is convolution performed

very good resource for convolution visualization

convolution explained

why 3x3 not 5x5 or 7x7

Parameters are more so we need more computationally expensive hardware
Since Nvidia has started using 3x3 accelerator in their chip it has become a industry practice
Bigger kernels are slower may give better accuracy while testing. But it all comes down the problem of acc vs time trade off. You don't want to do some cancer test fast, but it should be accurate. But self driving car has to have fast output

Deciding optimal kernel size for CNN

Convolution Operation in network

Session_6 introduction to CNN

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!