name: inverse layout: true class: center, middle, inverse --- name: cover # A Comparison of Classification Methods QING Pei
[edwardtoday@gmail.com](mailto:edwardtoday@gmail.com)
.footnote[January 16, 2013 @ PolyU BRC] --- name: agenda layout: false .left-column[ ## Agenda ] .right-column[ 1. Data Set 2. Experiment 3. Discussion 4. Future Work ] --- .left-column[ ## Agenda ## Data Set ] .right-column[
Hong Kong
Guangzhou
Total
healthy
diabetes
healthy
diabetes
ultrasound
14
257
605
0
876
photoelectric
14
295
322
0
631
pressure
19
320
269
0
608
odor
131
297
310
117
855
face
0
284
142
0
426
tongue
0
296
130
0
426
] --- .left-column[ ## Agenda ## Data Set ] .right-column[ ### Features * Ultrasound > 15 (EMD) + 14 (WAVEP) + 8 (wavelet) + 1 (ApEn) = 38 * Photo-electric > 1 (max) + 1 (max gap) + 1 (std) + 1 (Eigen value) = 4 * Pressure, 3 (period) + 6 (entropy) = 9 * Odor > 707 (geometry) + 196 (wavelet) + 28 (phase) = 931 * Face, 6 (color) * 4 (block) = 24 * Tongue > 12 (color) + 9 (texture) + 13 (geometry) + 7 (others) = 41 ] --- .left-column[ ## Agenda ## Data Set ## Experiment ] .right-column[ ### Experiment Settings * 22 algorithms * Cross-validation * 10-folds * 7 data sets - 6 independent + merged pulse features ] --- .left-column[ ## Agenda ## Data Set ## Experiment ] .right-column[ ### Tested Algorithms
ID
Algorithm
ID
Algorithm
1
Simple Logistic
12
Ridor
2
SMO
13
ADTree
3
Voted Perception
14
FT
4
IBk
15
J48
5
LWL
16
Random Forest
6
AdaBoostM1
17
REPTree
7
Attribute Selected Classifier
18
SimpleCart
8
Random Committee
19
Bayesian Logistic Regression
9
Conjunctive Rule
20
Naive Bayes
10
JRip
21
Bayes Net
11
PART
22
LibSVM
] --- .left-column[ ## Agenda ## Data Set ## Experiment ] .right-column[ ### Comparison * Accuracy - Percent Correct - False Positive Rate - False Negative Rate * Time Consumption - Training Time - Testing Time * Model Size ] --- .left-column[ ## Agenda ## Data Set ## Experiment ] .right-column[ ![](percent_correct.svg) ] --- .left-column[ ## Agenda ## Data Set ## Experiment ] .right-column[ ![](false_positive.svg) ] --- .left-column[ ## Agenda ## Data Set ## Experiment ] .right-column[ ![](false_negative.svg) ] --- .left-column[ ## Agenda ## Data Set ## Experiment ] .right-column[ ![](cpu_time_training.svg) ] --- .left-column[ ## Agenda ## Data Set ## Experiment ] .right-column[ ![](cpu_time_testing.svg) ] --- .left-column[ ## Agenda ## Data Set ## Experiment ] .right-column[ ![](model_size.svg) ] --- .left-column[ ## Agenda ## Data Set ## Experiment ## Discussion ] .right-column[ ### Algorithm Matters * Different algorithms lead to different - Accuracy - Speed - Time consumption - Storage space * And the difference can be dramatic ] --- .left-column[ ## Agenda ## Data Set ## Experiment ## Discussion ] .right-column[ ### Features Are Most Important * Most important factor of classification performance is - Input feature vectors * Length makes a difference - Not too long (odor 931, over-fitting) - Not too short either (photo-electric 4, little information to make decisions) ] --- .left-column[ ## Agenda ## Data Set ## Experiment ## Discussion ] .right-column[ ### Over-fitting
] --- .left-column[ ## Agenda ## Data Set ## Experiment ## Discussion ] .right-column[ ### Current Data Set Is Biased * Guangzhou samples are mostly healthy. * Hong Kong samples are mostly unhealthy. * Does the difference come from other uncontrolled factor? - We cannot tell with the currently available data. ] --- .left-column[ ## Agenda ## Data Set ## Experiment ## Discussion ## Future Work ] .right-column[ * Collect more unbiased samples. * Find an algorithm that works best with our type of data. - Explore why it fits so well. - Find a way to further improve the results of our specific application. ] --- name: last-page template: inverse ## Thank you. .footnote[Slideshow created using [remark](http://github.com/gnab/remark).]