1. Introduction to Support Vector Machines

I have added the math in svm.pdf file in the repo pls do read. It add for me to get a better understanding on svm.

Support Vector Machines (SVMs in short) are machine learning algorithms that are used for classification and regression purposes. SVMs are one of the powerful machine learning algorithms for classification, regression and outlier detection purposes. An SVM classifier builds a model that assigns new data points to one of the given categories. Thus, it can be viewed as a non-probabilistic binary linear classifier.

The original SVM algorithm was developed by Vladimir N Vapnik and Alexey Ya. Chervonenkis in 1963. At that time, the algorithm was in early stages. The only possibility is to draw hyperplanes for linear classifier. In 1992, Bernhard E. Boser, Isabelle M Guyon and Vladimir N Vapnik suggested a way to create non-linear classifiers by applying the kernel trick to maximum-margin hyperplanes. The current standard was proposed by Corinna Cortes and Vapnik in 1993 and published in 1995.

SVMs can be used for linear classification purposes. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using the kernel trick. It enable us to implicitly map the inputs into high dimensional feature spaces.

2. Support Vector Machines intuition

Now, we should be familiar with some SVM terminology.

Hyperplane

A hyperplane is a decision boundary which separates between given set of data points having different class labels. The SVM classifier separates data points using a hyperplane with the maximum amount of margin. This hyperplane is known as the maximum margin hyperplane and the linear classifier it defines is known as the maximum margin classifier.

Support Vectors

Support vectors are the sample data points, which are closest to the hyperplane. These data points will define the separating line or hyperplane better by calculating margins.

Margin

A margin is a separation gap between the two lines on the closest data points. It is calculated as the perpendicular distance from the line to support vectors or closest data points. In SVMs, we try to maximize this separation gap so that we get maximum margin.

The following diagram illustrates these concepts visually.

Margin in SVM

SVM Under the hood

In SVMs, our main objective is to select a hyperplane with the maximum possible margin between support vectors in the given dataset. SVM searches for the maximum margin hyperplane in the following 2 step process –

Generate hyperplanes which segregates the classes in the best possible way. There are many hyperplanes that might classify the data. We should look for the best hyperplane that represents the largest separation, or margin, between the two classes.
So, we choose the hyperplane so that distance from it to the support vectors on each side is maximized. If such a hyperplane exists, it is known as the maximum margin hyperplane and the linear classifier it defines is known as a maximum margin classifier.

The following diagram illustrates the concept of maximum margin and maximum margin hyperplane in a clear manner.

Maximum margin hyperplane

Problem with dispersed datasets

Sometimes, the sample data points are so dispersed that it is not possible to separate them using a linear hyperplane. In such a situation, SVMs uses a kernel trick to transform the input space to a higher dimensional space as shown in the diagram below. It uses a mapping function to transform the 2-D input space into the 3-D input space. Now, we can easily segregate the data points using linear separation.

Kernel trick - transformation of input space to higher dimensional space

3. Kernel trick

In practice, SVM algorithm is implemented using a kernel. It uses a technique called the kernel trick. In simple words, a kernel is just a function that maps the data to a higher dimension where data is separable. A kernel transforms a low-dimensional input data space into a higher dimensional space. So, it converts non-linear separable problems to linear separable problems by adding more dimensions to it. Thus, the kernel trick helps us to build a more accurate classifier. Hence, it is useful in non-linear separation problems.

We can define a kernel function as follows-

Kernel function

3.1 Linear kernel

In linear kernel, the kernel function takes the form of a linear function as follows-

linear kernel : K(xi , xj ) = xiT xj

Linear kernel is used when the data is linearly separable. It means that data can be separated using a single line. It is one of the most common kernels to be used. It is mostly used when there are large number of features in a dataset. Linear kernel is often used for text classification purposes.

Training with a linear kernel is usually faster, because we only need to optimize the C regularization parameter. When training with other kernels, we also need to optimize the γ parameter. So, performing a grid search will usually take more time.

Linear kernel can be visualized with the following figure.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
kishore		kishore
.DS_Store		.DS_Store
README.md		README.md
SVM.pdf		SVM.pdf
support-vector-machines-classifier-using-python.ipynb		support-vector-machines-classifier-using-python.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Introduction to Support Vector Machines

2. Support Vector Machines intuition

Hyperplane

Support Vectors

Margin

Margin in SVM

SVM Under the hood

Maximum margin hyperplane

Problem with dispersed datasets

Kernel trick - transformation of input space to higher dimensional space

3. Kernel trick

Kernel function

3.1 Linear kernel

Linear Kernel

About

Releases

Packages

Languages

KishoreKicha14/Linear-SVM

Folders and files

Latest commit

History

Repository files navigation

1. Introduction to Support Vector Machines

2. Support Vector Machines intuition

Hyperplane

Support Vectors

Margin

Margin in SVM

SVM Under the hood

Maximum margin hyperplane

Problem with dispersed datasets

Kernel trick - transformation of input space to higher dimensional space

3. Kernel trick

Kernel function

3.1 Linear kernel

Linear Kernel

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages