PPT - Fordham University Computer and Information Sciences

Download Report

Transcript PPT - Fordham University Computer and Information Sciences


Machine Learning with
WEKA

WEKA: A Machine
Learning Toolkit
The Explorer
•
•
Eibe Frank
•
•
Department of Computer Science,
University of Waikato, New Zealand
•



Classification and
Regression
Clustering
Association Rules
Attribute Selection
Data Visualization
The Experimenter
The Knowledge
Flow GUI
Conclusions
WEKA: the bird
Copyright: Martin Kramer ([email protected])
4/10/2015
University of Waikato
2
WEKA: the software




Machine learning/data mining software written in
Java (distributed under the GNU Public License)
Used for research, education, and applications
Complements “Data Mining” by Witten & Frank
Main features:
Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
 Graphical user interfaces (incl. data visualization)
 Environment for comparing learning algorithms

4/10/2015
University of Waikato
3
WEKA: versions

There are several versions of WEKA:
WEKA 3.0: “book version” compatible with
description in data mining book
 WEKA 3.2: “GUI version” adds graphical user
interfaces (book version is command-line only)
 WEKA 3.3: “development version” with lots of
improvements


This talk is based on the latest snapshot of WEKA
3.3 (soon to be WEKA 3.4)
4/10/2015
University of Waikato
4
WEKA only deals with “flat” files
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
4/10/2015
University of Waikato
5
WEKA only deals with “flat” files
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
4/10/2015
University of Waikato
6
4/10/2015
University of Waikato
7
4/10/2015
University of Waikato
8
4/10/2015
University of Waikato
9
Explorer: pre-processing the data




Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL
database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:

4/10/2015
Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
University of Waikato
10
4/10/2015
University of Waikato
11
4/10/2015
University of Waikato
12
4/10/2015
University of Waikato
13
4/10/2015
University of Waikato
14
4/10/2015
University of Waikato
15
4/10/2015
University of Waikato
16
4/10/2015
University of Waikato
17
4/10/2015
University of Waikato
18
4/10/2015
University of Waikato
19
4/10/2015
University of Waikato
20
4/10/2015
University of Waikato
21
4/10/2015
University of Waikato
22
4/10/2015
University of Waikato
23
4/10/2015
University of Waikato
24
4/10/2015
University of Waikato
25
4/10/2015
University of Waikato
26
4/10/2015
University of Waikato
27
4/10/2015
University of Waikato
28
4/10/2015
University of Waikato
29
4/10/2015
University of Waikato
30
4/10/2015
University of Waikato
31
Explorer: building “classifiers”


Classifiers in WEKA are models for predicting
nominal or numeric quantities
Implemented learning schemes include:


Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
“Meta”-classifiers include:

4/10/2015
Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
University of Waikato
32
4/10/2015
University of Waikato
33
4/10/2015
University of Waikato
34
4/10/2015
University of Waikato
35
4/10/2015
University of Waikato
36
4/10/2015
University of Waikato
37
4/10/2015
University of Waikato
38
4/10/2015
University of Waikato
39
4/10/2015
University of Waikato
40
4/10/2015
University of Waikato
41
4/10/2015
University of Waikato
42
4/10/2015
University of Waikato
43
4/10/2015
University of Waikato
44
4/10/2015
University of Waikato
45
4/10/2015
University of Waikato
46
4/10/2015
University of Waikato
47
4/10/2015
University of Waikato
48
4/10/2015
University of Waikato
49
4/10/2015
University of Waikato
50
4/10/2015
University of Waikato
51
4/10/2015
University of Waikato
52
4/10/2015
University of Waikato
53
4/10/2015
University of Waikato
54
4/10/2015
University of Waikato
55
4/10/2015
University of Waikato
56
4/10/2015
University of Waikato
57
4/10/2015
University of Waikato
58
4/10/2015
University of Waikato
59
4/10/2015
University of Waikato
60
4/10/2015
University of Waikato
61
4/10/2015
University of Waikato
62
4/10/2015
University of Waikato
63
4/10/2015
University of Waikato
64
4/10/2015
QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture.
University of Waikato
65
4/10/2015
QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture.
University of Waikato
66
4/10/2015
QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture.
University of Waikato
67
4/10/2015
University of Waikato
68
4/10/2015
University of Waikato
69
4/10/2015
University of Waikato
70
4/10/2015
University of Waikato
71
4/10/2015
University of Waikato
72
4/10/2015
University of Waikato
73
4/10/2015
University of Waikato
74
Qu i c k Ti m e™ and a TIF F (LZ W) d ec om pres s or a re ne eded to s ee th i s pi c ture.
4/10/2015
University of Waikato
75
4/10/2015
University of Waikato
76
4/10/2015
University of Waikato
77
4/10/2015
University of Waikato
78
4/10/2015
University of Waikato
79
QuickTime™ and a TIFF (LZW) decompressor are needed to see t his picture.
4/10/2015
University of Waikato
80
QuickTime™ and a TIFF (LZW) decompressor are needed to see t his picture.
4/10/2015
University of Waikato
81
4/10/2015
University of Waikato
82
QuickTime™ and a TIFF (LZW) decompressor are needed to see t his picture.
4/10/2015
University of Waikato
83
4/10/2015
University of Waikato
84
4/10/2015
University of Waikato
85
4/10/2015
University of Waikato
86
4/10/2015
University of Waikato
87
4/10/2015
University of Waikato
88
4/10/2015
University of Waikato
89
4/10/2015
University of Waikato
90
4/10/2015
University of Waikato
91
Explorer: clustering data


WEKA contains “clusterers” for finding groups of
similar instances in a dataset
Implemented schemes are:



k-Means, EM, Cobweb, X-means, FarthestFirst
Clusters can be visualized and compared to “true”
clusters (if given)
Evaluation based on loglikelihood if clustering
scheme produces a probability distribution
4/10/2015
University of Waikato
92
4/10/2015
University of Waikato
93
4/10/2015
University of Waikato
94
4/10/2015
University of Waikato
95
4/10/2015
University of Waikato
96
4/10/2015
University of Waikato
97
4/10/2015
University of Waikato
98
4/10/2015
University of Waikato
99
4/10/2015
University of Waikato
100
4/10/2015
University of Waikato
101
4/10/2015
University of Waikato
102
4/10/2015
University of Waikato
103
4/10/2015
University of Waikato
104
4/10/2015
University of Waikato
105
4/10/2015
University of Waikato
106
4/10/2015
University of Waikato
107
Explorer: finding associations

WEKA contains an implementation of the Apriori
algorithm for learning association rules


Can identify statistical dependencies between
groups of attributes:


Works only with discrete data
milk, butter  bread, eggs (with confidence 0.9 and
support 2000)
Apriori can compute all rules that have a given
minimum support and exceed a given confidence
4/10/2015
University of Waikato
108
4/10/2015
University of Waikato
109
4/10/2015
University of Waikato
110
4/10/2015
University of Waikato
111
4/10/2015
University of Waikato
112
4/10/2015
University of Waikato
113
4/10/2015
University of Waikato
114
4/10/2015
University of Waikato
115
Explorer: attribute selection


Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
Attribute selection methods contain two parts:
A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
 An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …


Very flexible: WEKA allows (almost) arbitrary
combinations of these two
4/10/2015
University of Waikato
116
4/10/2015
University of Waikato
117
4/10/2015
University of Waikato
118
4/10/2015
University of Waikato
119
4/10/2015
University of Waikato
120
4/10/2015
University of Waikato
121
4/10/2015
University of Waikato
122
4/10/2015
University of Waikato
123
4/10/2015
University of Waikato
124
Explorer: data visualization


Visualization very useful in practice: e.g. helps to
determine difficulty of the learning problem
WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)




To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values
“Jitter” option to deal with nominal attributes (and
to detect “hidden” data points)
“Zoom-in” function
4/10/2015
University of Waikato
125
4/10/2015
University of Waikato
126
4/10/2015
University of Waikato
127
4/10/2015
University of Waikato
128
4/10/2015
University of Waikato
129
4/10/2015
University of Waikato
130
4/10/2015
University of Waikato
131
4/10/2015
University of Waikato
132
4/10/2015
University of Waikato
133
4/10/2015
University of Waikato
134
4/10/2015
University of Waikato
135
4/10/2015
University of Waikato
136
4/10/2015
University of Waikato
137
Conclusion: try it yourself!



WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
Also has a list of projects based on WEKA
WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger
,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,
Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,
Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang
4/10/2015
University of Waikato
138