black 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.
A fairly new contender in the machine learning space
A generalization of the maximal margin classifier
We will talk about
We know that a line can separate a 2-dimensional space, and the plane can separate a 3-dimensional space
A hyperplane in p dimensions is a flat subspace of dimension p-1
This will generalize to any number of dimensions but can be hard to visualize for p>3
A hyperplane will separate a space into regions, one for each side (technically 3 since a point can be directly on the hyperplane)
In two dimensions a hyperplane is defined by the equation
\beta_0 + \beta_1 X_1 + \beta_2 X_2 = 0
And this is the hyperplane where any pair of X = (X_1, X_2)^T that satisfy this equation is on the hyperplane
The two regions formed by this hyperplane are the points that satisfy
\beta_0 + \beta_1 X_1 + \beta_2 X_2 > 0
and
\beta_0 + \beta_1 X_1 + \beta_2 X_2 < 0
Idea:
Given some data, we can find a hyperplane that separates the data
such that we can use the hyperplane defined to classify new observations
There might be many different hyperplanes that separate
that can separate two different regions but we would ideally want to have only one
The Maximal Margin Classifier aims the find the hyperplane that separates the perpendicular distance to the
The vectors from the border points to the hyperplane are the support vectors
These are the only points that directly have any influence on the model
The idea of a Maximal Margin Classifier is great but it will rarely work in practice since it only works for regions that are separately
Create an extension that allows for hyperplanes that "almost" separate the regions
This hyperplane would be called a soft margin
This is once again a trade-off
How do we create hyperplanes that "almost" separate our two classes of observations
where C is a non-negative tuning parameter. and M is the width of the margin
\epsilon_1, ..., \epsilon_n are slack variables
and they allow individual observations to be on the wrong side of the margin or the hyperplane
if
We can think of C as a budget of violations
When C increase we become more tolerant of violations and the margin widens
When C decreases we become less tolerant of violations and the margin widens
Note:
SVM are typically fitted iteratively, if C is chosen too low then there are no correct solutions
C is essentially a tuning parameter that controls the bias-variance trade-off
Only wrongly predicted points affect the hyperplane
Support Vector Classifier are very robust to outliers as they have no effect
Support vector classifiers work well when the classes are linearly separable
We saw in earlier chapters how we handle non-linear transformations by enlargening the feature-space
We can do this in (at least) two ways, using polynomials and kernels
Without going into too many details, the main the algorithm at works ends up calculation similarities between two observations
K(x_i, x_{i'})
Which is some function called a kernel.
Depending on what K is we get different results.
K(x_i, x_{i'}) = \sum_{j=1}^p x_{ij}x_{i'j}
is known as a linear kernel
K(x_i, x_{i'}) = \left(1 + \sum_{j=1}^p x_{ij}x_{i'j}\right)^d
is known as a polynomial kernel of degree d
## line search fails -0.001783313 0.4335592 4.001625e-05 1.157726e-08 -3.468026e-12 8.837028e-11 -1.377543e-16
K(x_i, x_{i'}) = \exp\left(-\gamma\sum_{j=1}^p (x_{ij}x_{i'j})^2\right)
is known as a radial kernel
Where \gamma is a positive constant
This means that the radial kernel has very local behavior
This is actually a more general question
How do we extend a binary classifier to multi-classification
If we have K>2 classes
We construct {K}\choose{2} binary classification models, each comparing 2 classes
An observation is classified by running each of the {K}\choose{2} and tallying up the results
The observation is assigned the class that was predicted most often in the {K}\choose{2} models
If we have K>2 classes
We fit K models, each comparing 1 class against the K-1 remaining classes
Whichever model performs best wins the observation
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |