Stable Variable Performance
Mehmet Ugurbil
Unversity of Minnesota
02-Jul-2018
Contents
Aim
Show that filtering according a stability criterion degrades the performance of classification.
Note: Filtering is done by keeping only the features that are above a stability threshold.
Null Hypothesis
Filtering features according to their stability increases model performance.
Experiment Design
1. Feature selection by cross validation sets of small sample. This data is taken from performance experiments.
- - Calculation of stability metrics.
2. Filter features below a stability threshold, keep the ones above the threshold.
2. Model validation using SVM classification on entire small sample using filtered features.
- - Model validation performance assesed on hold out testing set.
3. Feature validation using SVM classification on large sample training set using filtered features.
- - Feature validation performance assesed on hold out testing set.
Observations
In most of the cases, monotonic performance decrease is observed as features are filtered more aggressively.
Dataset Descriptions
TIE-Net = Original simulated data - TIE near-faithful causal network.
TIE-Net-Reduced1 = TIE-Net with multiplicity removed according to the original graph.
TIE-Net-Reduced2 = TIE-Net with multiplicity removed using Tie* Algorithm.
- - Note that this is not one dataset, but one for each repeat per sample size (550 total).
- - This also implies that the feature stability doesn't make sense in this dataset, but included for completeness.
TIE-Net-Weak1 = TIE-Net with weak variables multiplied 50 times.
TIE-Net-Weak2 = TIE-Net-Weak1 with gaussian noise, uniformly random deviation.
Stable Variable Performance