Signal Degradation on MNIST

Mehmet Ugurbil

Unversity of Minnesota

06-Jun-2018

Aim
Null Hypothesis
Experiment Design
Observations
Experiment Results
The End

Aim

Investigate the effect of signal degradation for various data sizes.

Null Hypothesis

Signal degradataion decrease is equal to introduced error. That means that the drop in AUC value should be equal to percent label swap.

Experiment Design

Access source code at UMN GitHub: Source Code
Task: Classify 4 vs 9. (9->1, 4->0)
Experiment setup:
i. Hold out testing - size 2000 with equal proportion of targets
ii. Classifier: svm poly & rbf, random forest
iii. Feature selection: all, svm_rfe, Hiton
iv. Percent label swap: {0%, 10%, 20%, 50%}.
v. 100 random swapping of target label for percentage > 0%.
vi. Sample size of 100, 200, 500, and 1000 with equal proportions of 4 and 9s.
The experiment steps:
1. Preprocess the data:
* run pre.pbs -> pre.m
2. Run the experiments:
* run all.sh -> unit.pbs -> block.m -> fs.m & cl.m
Report for each sample size:
1. AUC mean and std of classification without label swap.
2. The percentiles (5%:5%:95%) of the AUC distribution for different label swaps.
3. Histogram of mean AUC distribution for different label swap + AUC mean classification without label swap.

Observations

1. Performance is always lower with feature selection.
2. Performance drop is about half of percent swap if less than 50%.
3. Performance drops as sample size decreases.

Signal Degradation on MNIST

Contents

Aim

Null Hypothesis

Experiment Design

Observations

Experiment Results

The End