Feature Selection

General feature selection problem: Find the features (columns) of ‘X’ that are important for predicting ‘y’.

Association Approach: For each feature ‘j’, compute correlation between feature values xj and ‘y’.
- a simple way
- but ignores the variable interactions
Regression Weight Approach: Fit with all features, take feature where $w_j$ larger that threshold.
- major problem: collinearity, if the irrelevant variable always in line with the relevant featue.
Search and Score, the most common approach
- Define score function f(s)
- search for the “s” with the best score

Search and Score

training error? - end with selecting all features
validation error? - this is what we want, however some problems:
- ‘d’ variables means $2^d$ set of variables
- false positives: some irrelevant variable might help

The L0“norm” is the number of non-zero value (($

_0=size(S))$)

$f(w) = \frac{1}{2}

Xw-y

^2 + \lambda

_0$

Start with an empty set of features, S = [ ].
For each possible feature ‘j’:
1. Compute scores of features in ‘S’ combined with feature ‘j’.
Find the ‘j’ that has the best score when added to ‘S’.
Check if {S ∪ j} improves on the best score found so far.
Add ‘j’ to ‘S’ and go back to Step 2.
- A variation is to stop if no ‘j’ improves the score over just using ‘S’.