Most of the colour images we see are RGB(3 channel images).RGB image is 8-bits(industry standard)
And Grayscale image has 1 channel with 8 bit(0-255)
Intensity values for binary image is defined by bits. It only have two colors Black and White.
A color is defined by one specific wave length or a mixture of these wave lengths. There is a spectrum of colors that only consist of a single wave length, like a specific kind of Red, Blue, Yellow, Green, etc. Mixing these wave lengths at a specific ratio we get colors like White, Brown, and Pink. This is the physical view on colors. However, our eyes have receptors for 3 different colors/wave lengths. The combination of the neural signal strength on these three receptors are interpreted by our brain as a specific color. Simply speaking our cone cells can receive Blue, Green, and Red light. If you see something that is pure Yellow, the Red and Green cones are stimulated. If you now stimulate the cones with the right combination of a pure Red and pure Green wave length your brain will also perceive this as Yellow. Your eye cannot tell the difference. Based on this assumption how the eye works the previous answers are correct. This is why we use RGB for additive colors and CMYK for subtractive colors. This is just the natural result of how we perceive physics (through our eyes). The alternative would be to describe each color (when we want to store it) as a combination of an infinite number of wave length. This is not feasible at all. Why use an infinite number of values when 3 numbers in RGB will do the trick for our eyes?
additive vs subtractive colour
Images are usually represented as Height x Width x #Channels where #Channels is 3 for RGB images and 1 for grayscale images. Sometimes you see Width x Height x #Channels, but the third dimension is the “channels.”
Note: in deep learning libraries like caffe, the fundamental “activation” tensor is a N x C x H x W quantity. The rearrangement makes sense for deep learning, and in this case the second dimension is the channels.
Fom DNN perspective, channels are where similar features are gathered together
Facebook’s Yann LeCun helped establish how we use CNNs today—as multiple layers of neurons for processing more complex features at deeper layers of the network. Their properties differ from traditional feed-forward neural networks in some important ways that make them effective at image-based problems. For example, images are frequently best-represented in multiple dimensions: height, width, and depth, where depth corresponds to the number of channels used for each pixel. If an image is encoded with a depth of three, for example, those channels would likely correspond to the red, blue, and green channels. In some workflows, using a traditional fully-connected network could lead to a drastic increase in the number of model parameters—in a 670 x 1040 image, this would be over two million weights in the first layer of the network alone! This would almost certainly lead to something we in the data sciences refer to as “over-fitting”—building a model that is really good at classifying the images that were used to estimate the weight parameters, such as puppies, but really bad at classifying images it hasn’t already been exposed to, such as a puppy wearing a hat. CNNs minimize the number of parameters by allowing different parts of the network to specialize in high-level features like a texture or a repeating pattern. We can say that sudden changes of discontinuities in an image are called as edges. Significant transitions in an image are called as edges.Types of edges Generally edges are of three types:
Horizontal edges
Vertical Edges
Diagonal Edges
Why detect edges
Most of the shape information of an image is enclosed in edges. So first we detect these edges in an image and by using these filters and then by enhancing those areas of image which contains edges, sharpness of the image will increase and image will become clearer.
very good resource for convolution visualization
-
Parameters are more so we need more computationally expensive hardware
-
Since Nvidia has started using 3x3 accelerator in their chip it has become a industry practice
-
Bigger kernels are slower may give better accuracy while testing. But it all comes down the problem of acc vs time trade off. You don't want to do some cancer test fast, but it should be accurate. But self driving car has to have fast output