WEKA Explorer Presentation
Download
Report
Transcript WEKA Explorer Presentation
Machine Learning with
WEKA
WEKA: A Machine
Learning Toolkit
The Explorer
•
•
Eibe Frank
•
•
Department of Computer Science,
University of Waikato, New Zealand
•
Classification and
Regression
Clustering
Association Rules
Attribute Selection
Data Visualization
The Experimenter
The Knowledge
Flow GUI
Conclusions
WEKA: the bird
Copyright: Martin Kramer ([email protected])
4/13/2015
University of Waikato
2
WEKA: the software
Machine learning/data mining software written in
Java (distributed under the GNU Public License)
Used for research, education, and applications
Complements “Data Mining” by Witten & Frank
Main features:
Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
Graphical user interfaces (incl. data visualization)
Environment for comparing learning algorithms
4/13/2015
University of Waikato
3
WEKA: versions
There are several versions of WEKA:
WEKA 3.0: “book version” compatible with
description in data mining book
WEKA 3.2: “GUI version” adds graphical user
interfaces (book version is command-line only)
WEKA 3.3: “development version” with lots of
improvements
This talk is based on the latest snapshot of WEKA
3.3 (soon to be WEKA 3.4)
4/13/2015
University of Waikato
4
WEKA only deals with “flat” files
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
4/13/2015
University of Waikato
5
WEKA only deals with “flat” files
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
4/13/2015
University of Waikato
6
4/13/2015
University of Waikato
7
4/13/2015
University of Waikato
8
4/13/2015
University of Waikato
9
Explorer: pre-processing the data
Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL
database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:
4/13/2015
Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
University of Waikato
10
4/13/2015
University of Waikato
11
4/13/2015
University of Waikato
12
4/13/2015
University of Waikato
13
4/13/2015
University of Waikato
14
4/13/2015
University of Waikato
15
4/13/2015
University of Waikato
16
4/13/2015
University of Waikato
17
4/13/2015
University of Waikato
18
4/13/2015
University of Waikato
19
4/13/2015
University of Waikato
20
4/13/2015
University of Waikato
21
4/13/2015
University of Waikato
22
4/13/2015
University of Waikato
23
4/13/2015
University of Waikato
24
4/13/2015
University of Waikato
25
4/13/2015
University of Waikato
26
4/13/2015
University of Waikato
27
4/13/2015
University of Waikato
28
4/13/2015
University of Waikato
29
4/13/2015
University of Waikato
30
4/13/2015
University of Waikato
31
Explorer: building “classifiers”
Classifiers in WEKA are models for predicting
nominal or numeric quantities
Implemented learning schemes include:
Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
“Meta”-classifiers include:
4/13/2015
Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
University of Waikato
32
4/13/2015
University of Waikato
33
4/13/2015
University of Waikato
34
4/13/2015
University of Waikato
35
4/13/2015
University of Waikato
36
4/13/2015
University of Waikato
37
4/13/2015
University of Waikato
38
4/13/2015
University of Waikato
39
4/13/2015
University of Waikato
40
4/13/2015
University of Waikato
41
4/13/2015
University of Waikato
42
4/13/2015
University of Waikato
43
4/13/2015
University of Waikato
44
4/13/2015
University of Waikato
45
4/13/2015
University of Waikato
46
4/13/2015
University of Waikato
47
4/13/2015
University of Waikato
48
4/13/2015
University of Waikato
49
4/13/2015
University of Waikato
50
4/13/2015
University of Waikato
51
4/13/2015
University of Waikato
52
4/13/2015
University of Waikato
53
4/13/2015
University of Waikato
54
4/13/2015
University of Waikato
55
4/13/2015
University of Waikato
56
4/13/2015
University of Waikato
57
4/13/2015
University of Waikato
58
4/13/2015
University of Waikato
59
4/13/2015
University of Waikato
60
4/13/2015
University of Waikato
61
4/13/2015
University of Waikato
62
4/13/2015
University of Waikato
63
4/13/2015
University of Waikato
64
4/13/2015
QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture.
University of Waikato
65
4/13/2015
QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture.
University of Waikato
66
4/13/2015
QuickTime™ and a TI FF (LZW) decompressor are needed t o see this picture.
University of Waikato
67
4/13/2015
University of Waikato
68
4/13/2015
University of Waikato
69
4/13/2015
University of Waikato
70
4/13/2015
University of Waikato
71
4/13/2015
University of Waikato
72
4/13/2015
University of Waikato
73
4/13/2015
University of Waikato
74
Qu i c k Ti m e™ and a TIF F (LZ W) d ec om pres s or a re ne eded to s ee th i s pi c ture.
4/13/2015
University of Waikato
75
4/13/2015
University of Waikato
76
4/13/2015
University of Waikato
77
4/13/2015
University of Waikato
78
4/13/2015
University of Waikato
79
QuickTime™ and a TIFF (LZW) decompressor are needed to see t his picture.
4/13/2015
University of Waikato
80
QuickTime™ and a TIFF (LZW) decompressor are needed to see t his picture.
4/13/2015
University of Waikato
81
4/13/2015
University of Waikato
82
QuickTime™ and a TIFF (LZW) decompressor are needed to see t his picture.
4/13/2015
University of Waikato
83
4/13/2015
University of Waikato
84
4/13/2015
University of Waikato
85
4/13/2015
University of Waikato
86
4/13/2015
University of Waikato
87
4/13/2015
University of Waikato
88
4/13/2015
University of Waikato
89
4/13/2015
University of Waikato
90
4/13/2015
University of Waikato
91
Explorer: clustering data
WEKA contains “clusterers” for finding groups of
similar instances in a dataset
Implemented schemes are:
k-Means, EM, Cobweb, X-means, FarthestFirst
Clusters can be visualized and compared to “true”
clusters (if given)
Evaluation based on loglikelihood if clustering
scheme produces a probability distribution
4/13/2015
University of Waikato
92
4/13/2015
University of Waikato
93
4/13/2015
University of Waikato
94
4/13/2015
University of Waikato
95
4/13/2015
University of Waikato
96
4/13/2015
University of Waikato
97
4/13/2015
University of Waikato
98
4/13/2015
University of Waikato
99
4/13/2015
University of Waikato
100
4/13/2015
University of Waikato
101
4/13/2015
University of Waikato
102
4/13/2015
University of Waikato
103
4/13/2015
University of Waikato
104
4/13/2015
University of Waikato
105
4/13/2015
University of Waikato
106
4/13/2015
University of Waikato
107
Explorer: finding associations
WEKA contains an implementation of the Apriori
algorithm for learning association rules
Can identify statistical dependencies between
groups of attributes:
Works only with discrete data
milk, butter bread, eggs (with confidence 0.9 and
support 2000)
Apriori can compute all rules that have a given
minimum support and exceed a given confidence
4/13/2015
University of Waikato
108
4/13/2015
University of Waikato
109
4/13/2015
University of Waikato
110
4/13/2015
University of Waikato
111
4/13/2015
University of Waikato
112
4/13/2015
University of Waikato
113
4/13/2015
University of Waikato
114
4/13/2015
University of Waikato
115
Explorer: attribute selection
Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
Attribute selection methods contain two parts:
A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
Very flexible: WEKA allows (almost) arbitrary
combinations of these two
4/13/2015
University of Waikato
116
4/13/2015
University of Waikato
117
4/13/2015
University of Waikato
118
4/13/2015
University of Waikato
119
4/13/2015
University of Waikato
120
4/13/2015
University of Waikato
121
4/13/2015
University of Waikato
122
4/13/2015
University of Waikato
123
4/13/2015
University of Waikato
124
Explorer: data visualization
Visualization very useful in practice: e.g. helps to
determine difficulty of the learning problem
WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)
To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values
“Jitter” option to deal with nominal attributes (and
to detect “hidden” data points)
“Zoom-in” function
4/13/2015
University of Waikato
125
4/13/2015
University of Waikato
126
4/13/2015
University of Waikato
127
4/13/2015
University of Waikato
128
4/13/2015
University of Waikato
129
4/13/2015
University of Waikato
130
4/13/2015
University of Waikato
131
4/13/2015
University of Waikato
132
4/13/2015
University of Waikato
133
4/13/2015
University of Waikato
134
4/13/2015
University of Waikato
135
4/13/2015
University of Waikato
136
4/13/2015
University of Waikato
137
Performing experiments
Experimenter makes it easy to compare the
performance of different learning schemes
For classification and regression problems
Results can be written into file or database
Evaluation options: cross-validation, learning
curve, hold-out
Can also iterate over different parameter settings
Significance-testing built in!
4/13/2015
University of Waikato
138
4/13/2015
University of Waikato
139
4/13/2015
University of Waikato
140
4/13/2015
University of Waikato
141
4/13/2015
University of Waikato
142
4/13/2015
University of Waikato
143
4/13/2015
University of Waikato
144
4/13/2015
University of Waikato
145
4/13/2015
University of Waikato
146
4/13/2015
University of Waikato
147
4/13/2015
University of Waikato
148
4/13/2015
University of Waikato
149
4/13/2015
University of Waikato
150
4/13/2015
University of Waikato
151
The Knowledge Flow GUI
New graphical user interface for WEKA
Java-Beans-based interface for setting up and
running machine learning experiments
Data sources, classifiers, etc. are beans and can
be connected graphically
Data “flows” through components: e.g.,
“data source” -> “filter” -> “classifier” -> “evaluator”
Layouts can be saved and loaded again later
4/13/2015
University of Waikato
152
4/13/2015
University of Waikato
153
4/13/2015
University of Waikato
154
4/13/2015
University of Waikato
155
4/13/2015
University of Waikato
156
4/13/2015
University of Waikato
157
4/13/2015
University of Waikato
158
4/13/2015
University of Waikato
159
4/13/2015
University of Waikato
160
4/13/2015
University of Waikato
161
4/13/2015
University of Waikato
162
4/13/2015
University of Waikato
163
4/13/2015
University of Waikato
164
4/13/2015
University of Waikato
165
4/13/2015
University of Waikato
166
4/13/2015
University of Waikato
167
4/13/2015
University of Waikato
168
4/13/2015
University of Waikato
169
4/13/2015
University of Waikato
170
4/13/2015
University of Waikato
171
4/13/2015
University of Waikato
172
Conclusion: try it yourself!
WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
Also has a list of projects based on WEKA
WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger
,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,
Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,
Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang
4/13/2015
University of Waikato
173