Stable Variable Performance

Mehmet Ugurbil

Unversity of Minnesota

02-Jul-2018

Contents

Aim

Show that filtering according a stability criterion degrades the performance of classification.
Note: Filtering is done by keeping only the features that are above a stability threshold.

Null Hypothesis

Filtering features according to their stability increases model performance.

Experiment Design

1. Feature selection by cross validation sets of small sample. This data is taken from performance experiments.
- - Calculation of stability metrics.
2. Filter features below a stability threshold, keep the ones above the threshold.
2. Model validation using SVM classification on entire small sample using filtered features.
- - Model validation performance assesed on hold out testing set.
3. Feature validation using SVM classification on large sample training set using filtered features.
- - Feature validation performance assesed on hold out testing set.

Observations

In most of the cases, monotonic performance decrease is observed as features are filtered more aggressively.

Dataset Descriptions

TIE-Net = Original simulated data - TIE near-faithful causal network.
TIE-Net-Reduced1 = TIE-Net with multiplicity removed according to the original graph.
TIE-Net-Reduced2 = TIE-Net with multiplicity removed using Tie* Algorithm.
- - Note that this is not one dataset, but one for each repeat per sample size (550 total).
- - This also implies that the feature stability doesn't make sense in this dataset, but included for completeness.
TIE-Net-Weak1 = TIE-Net with weak variables multiplied 50 times.
TIE-Net-Weak2 = TIE-Net-Weak1 with gaussian noise, uniformly random deviation.

Stable Variable Performance

Frame 1: Frame 2:

The End