CPSC 340 18，19 Linear Classification

We can use linear regression to do the classification.

Binary Classification - classify into {-1, 1}

Loss function

If $y_i = + 1$, we get the label right if $w^Tx_i > 0$; If $y_i = - 1$, we get the label right if $w^Tx_i > 0$

Which means, if $y_iw^Tx_i > 0$, then the error is 0; if $y_iw^Tx_i < 0$, then the error is $-y_iw^Tx_i$

So the error is $max(0, -y_iw^Tx_i)$

However, it has a degenerate problem - w = 0 always give a lowest value.

$max(0, -y_iw^Tx_i)$ => $\max(0, 1-y_iw^Tx_i)$

SVM is hinge loss with L2-regularization. $f(w) = \max {0, 1-y_iw^Tx_i} + \frac{\lambda}{2}||w||^2$

We can smooth max the degenerate loss with log-sum-exp. $max{0, -y_iw^Tx_i} \approx \log(\exp(0) + \exp(-y_iw^Tx_i))$

For prediction, we use “sign” to map the result to {-1, 1};

For probability, we want to map the $w^Tx_i$ to range [0, 1] - sigmoid function.

One-vs-All classification can turn binary classification into mutliclass classification.

Training phase: for each class ‘c’, train binary classifier to predict whether example is a ‘c’
Prediction phase: apply the classifiers to get a score of each class ‘c’, preict with the highest score.