# Linear Classification/Regression
$
y(x)=\mathbf{w}^T\phi(\mathbf{x})+b
$
![[svm.png]]
Given a dataset of $n$ points $\{\mathbf{x}_n,t_n\},\: t_n\in\{-1,1\}$ we want to find the maximum-margin hyperplane that separates the points $\mathbf{x}_i$ that have $t_i=1$ from those with $t_i=-1$
Absolute distance of a point to the decision boundary is:
$
\frac{t_ny(\mathbf{x}_n)}{||\mathbf{w}||}=\frac{t_n(\mathbf{w}^T\phi(\mathbf{x}_n)+b)}{||\mathbf{w}||}
$
then the ***margin*** is defined as the minimum distance from the decision boundary to the training point:
$
\min_n\frac{t_n(\mathbf{w}^T\phi(\mathbf{x}_n)+b)}{||\mathbf{w}||}
$
The optimization problem is then:
$
\arg\max_{\mathbf{w},b}\left\{\frac{1}{||\mathbf{w}||}\min_nt_n(\mathbf{w}^T\phi(\mathbf{x}_n)+b)\right\}
$
This is difficult to solve, $\mathbf{w}$ and $b$ are scaled so that the distance to the margin is equal to one, since this doesn't change the relative distances between points:
$
\min_nt_ny(\mathbf{x}_n)=\min_nt_n(\mathbf{w}^T\phi(\mathbf{x})+b)=1
$
The optimization problem is then simplified to:
$
\arg\max_{\mathbf{w},b}\left\{\frac{1}{||\mathbf{w}||}\cancelto{1}{\min_nt_n(\mathbf{w}^T\phi(\mathbf{x}_n)+b)}\right\}=\arg\max_{\mathbf{w},b}\left\{\frac{1}{||\mathbf{w}||}\right\}
$
constrained such that:
$
t_n(\mathbf{w}^T\phi(\mathbf{x}_n)+b)\geq 1
$
Notice that
$
\arg\max_{\mathbf{w},b}\left\{\frac{1}{||\mathbf{w}||}\right\}=\arg\min_{\mathbf{w},b}||\mathbf{w}||^2
$
then for ease of differentiation the canonical representation is:
$
\boxed{\begin{aligned}\arg\min_{\mathbf{w},b} \quad&\frac{1}{2}||\mathbf{w}||^2\\\text{s.t.}\quad&t_n(\mathbf{w}^T\phi(\mathbf{x}_n)+b)\geq 1\end{aligned}}
$
This will be computed using the dual representation
# Nonlinear SVMs
Employ kernels:
- Gaussian: $k(\mathbf{x},\mathbf{x})=\exp(-\frac{1}{2\sigma^2}||\mathbf{x}-\mathbf{z}||^2)$
- Polynomial: $k(\mathbf{x},\mathbf{z})=(\mathbf{x}^T\mathbf{z}+C)^M$
- Sigmoidal: $k(\mathbf{x},\mathbf{z})=\tanh(a\mathbf{x}^T\mathbf{z}+c)$