Signal Degradation on MNIST

Mehmet Ugurbil

Unversity of Minnesota

06-Jun-2018

Contents

Aim

Investigate the effect of signal degradation for various data sizes.

Null Hypothesis

Signal degradataion decrease is equal to introduced error. That means that the drop in AUC value should be equal to percent label swap.

Experiment Design

Access source code at UMN GitHub: Source Code
Task: Classify 4 vs 9. (9->1, 4->0)
Experiment setup:
  i. Hold out testing - size 2000 with equal proportion of targets
  ii. Classifier: svm poly & rbf, random forest
  iii. Feature selection: all, svm_rfe, Hiton
  iv. Percent label swap: {0%, 10%, 20%, 50%}.
  v. 100 random swapping of target label for percentage > 0%.
  vi. Sample size of 100, 200, 500, and 1000 with equal proportions of 4 and 9s.
The experiment steps:
  1. Preprocess the data:
  * run pre.pbs -> pre.m
  2. Run the experiments:
  * run all.sh -> unit.pbs -> block.m -> fs.m & cl.m
Report for each sample size:
  1. AUC mean and std of classification without label swap.
  2. The percentiles (5%:5%:95%) of the AUC distribution for different label swaps.
  3. Histogram of mean AUC distribution for different label swap + AUC mean classification without label swap.

Observations

1. Performance is always lower with feature selection.
2. Performance drop is about half of percent swap if less than 50%.
3. Performance drops as sample size decreases.

Experiment Results

Frame 1: Frame 2:

The End