Signal Degradation on MNIST
Mehmet Ugurbil
Unversity of Minnesota
06-Jun-2018
Contents
Aim
Investigate the effect of signal degradation for various data sizes.
Null Hypothesis
Signal degradataion decrease is equal to introduced error.
That means that the drop in AUC value should be equal to percent label swap.
Experiment Design
Access source code at UMN GitHub: Source Code
Task: Classify 4 vs 9. (9->1, 4->0)
Experiment setup:
i. Hold out testing - size 2000 with equal proportion of targets
ii. Classifier: svm poly & rbf, random forest
iii. Feature selection: all, svm_rfe, Hiton
iv. Percent label swap: {0%, 10%, 20%, 50%}.
v. 100 random swapping of target label for percentage > 0%.
vi. Sample size of 100, 200, 500, and 1000 with equal proportions of 4 and 9s.
The experiment steps:
1. Preprocess the data:
* run pre.pbs -> pre.m
2. Run the experiments:
* run all.sh -> unit.pbs -> block.m -> fs.m & cl.m
Report for each sample size:
1. AUC mean and std of classification without label swap.
2. The percentiles (5%:5%:95%) of the AUC distribution for different label swaps.
3. Histogram of mean AUC distribution for different label swap + AUC mean classification without label swap.
Observations
1. Performance is always lower with feature selection.
2. Performance drop is about half of percent swap if less than 50%.
3. Performance drops as sample size decreases.
Experiment Results
The End