Support Vector Machines

Classifiers

SVMs Clearly Explained (StatQuest)

Classifiers divide a sample space using a line or plane, classifying data that appears on one side as class A, and data that appears on the other as class B.

Classifiers are calculated by taking the largest margin between two given data points. The choice of data points is important:

A soft margin classifier (from StatQuest)

A support vector classifier uses a type of classifier called the Soft Margin. Observations within the soft margin can be misclassified. In the above example, the red point on the green side is within the soft margin, so misclassification is allowed. A soft margin is created using slack variables

Example of a Support Vector Classifier

In the above example, the dotted line represents the SVC, classifying dots from plusses.

Making a Prediction

To make a prediction for a new data point \(z\), we use the equation:

\[ w^{*T}z + b^* = \Bigg(\sum_{i=1}^{n} \lambda_i t_i x_i\Bigg)^T z+b^* \]

We are computing the inner product between the new datapoint and the support vectors.

Support Vector Machine

A Support Vector Machine tries classify data that is not linearly separable.

An example of data that is not linearly separable

We do this by creating a kernel function \(\phi(x)\).

For a new data point \(z\), we create a kernel function \(\phi(z)\), and for every given data \(x_i\), we create a kernel function \(\phi(x_i)\). We can then plug this into the above equation:

\[ w^{*T}x + b^* = \Bigg(\sum_{i=1}^{n} \lambda_i t_i \phi(x_i)\Bigg)^T \phi(z)+b^* \]

Values for the Kernel Function

Polynomial

We let \(\phi_d(x) = x^d\), which linearly separates classes contained by other classes:

The transformation for Polynomial \phi_2(x) = x^2

Practice Questions

Question 1

Explain why minimizing the norm of the vector of weights maximizes the margin

Glossary