machine learning cpsc340 ubc

CPSC 340 Feature Selection

Posted by Luna's Blog on February 16, 2022

We need to find the features that are important for predicting y.

Association approach - for each feature j, compute correlation between feature $x^j$ and $y$ (however it ignores variable interactions)
Regression Weight approach - fit ‘w’ based on all features, take features where $w_j$ is large
- has major problem with collinearity: two copies of irrelevant feature or the relevant feature have the same value (L15. p.13)
Search and Score

Search and Score

Define score function f(S)
Search for the variables ‘S’ with the best score

Score Function

The score shouldn’t be training error - train error goes down as you add features

Validation error? Yes!

“Number of Features” Penalties

$score(S) = \frac{1}{2}\sum_{i=1}^n(w_x^Tx_{is} - y_i)^2 + size(s)$

We can use L0-Norm to replace size(s) $score(S) = \frac{1}{2}\sum_{i=1}^n(w_x^Tx_{is} - y_i)^2 + \lambda|w|_0$

How to handle too many choices for $S$ - Forward Selection

Start with empty set {}
Compute score for each feature
Add from the best score
Combine every other feature with the best feature to find the best
Check if the combination improves the score
If yes, go back to step 2, else stop.