Transcript Document
Machine Learning with WEKA
Eibe Frank Department of Computer Science, University of Waikato, New Zealand WEKA: A Machine Learning Toolkit The Explorer • • • • • Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions
WEKA: the bird
WEKA 3.3
5/6/2020
Copyright: Martin Kramer ([email protected])
University of Waikato 2
WEKA: the software
Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms 5/6/2020 University of Waikato 3
WEKA: versions
There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.5.6
5/6/2020 University of Waikato 4
5/6/2020 University of Waikato 5
Launching WEKA (1)
Program
LogWindow
Opens a log window that captures all that is printed to
stdout
or
stderr
. Useful for environments like MS Windows, where WEKA is normally not started from a terminal.
Exit
Closes WEKA 5/6/2020 University of Waikato 6
Launching WEKA (2)
Applications
Lists the main applications within WEKA
Explorer
An environment for exploring data with WEKA (the rest of this documentation deals with this application in more detail).
Experimenter
An environment for performing experiments and conducting statistical tests between learning schemes.
KnowledgeFlow
learning.
This environment supports essentially the same functions as the Explorer but with a drag-and-drop interface. One advantage is that it supports incremental
SimpleCLI
interface.
Provides a simple command-line interface that allows direct execution of WEKA commands for operating systems that do not provide their own command line 5/6/2020 University of Waikato 7
Launching WEKA (3)
Tools
Other useful applications.
ArffViewer
application for viewing ARFF files in spreadsheet format.
An MDI (“multiple document interface”)
SqlViewer
represents an SQL worksheet, for querying databases via JDBC.
EnsembleLibrary
An interface for generating setups for Ensemble Selection [5] (a contribution by Robert Jung and David Michael from Cornell University, Ithaca, NY, USA).
5/6/2020 University of Waikato 8
Launching WEKA (4)
Visualization
Ways of visualizing data with WEKA
Plot
For plotting a 2D plot of a dataset.
ROC
Displays a previously saved ROC (
receiver operating characteristic)
curve (true positive rate vs false positive rate).
TreeVisualizer
decision tree.
For displaying directed graphs, e.g., a
GraphVisualizer
Visualizes XML BIF or DOT format graphs, e.g., for Bayesian networks.
BoundaryVisualizer
Allows the visualization of classifier decision boundaries in two dimensions.
5/6/2020 University of Waikato 9
Launching WEKA (5)
Windows
All open windows are listed here.
Minimize
Minimizes all current windows.
Restore
Restores all minimized windows again.
5/6/2020 University of Waikato 10
Launching WEKA (6)
Help
Online resources for WEKA can be found here.
Weka homepage
homepage.
Opens a browser window with WEKA’s
Online documentation
Directs to the WekaDoc Wiki.
HOWTOs
, code snippets, etc. The general WekaWiki, containing lots of examples and HOWTOs around the development and use of WEKA.
Weka on Sourceforge
Sourceforge.net.
WEKA’s project homepage on
SystemInfo
Lists some internals about the Java/WEKA environment, e.g., the CLASSPATH.
About
The infamous “About” box.
5/6/2020 University of Waikato 11
5/6/2020 University of Waikato 12
WEKA 3.3
5/6/2020 University of Waikato 13
Applications – WEKA Explorer
Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … 5/6/2020 University of Waikato 14
5/6/2020 University of Waikato 15
Section Tabs
Preprocess
. Choose and modify the data being acted on.
Classify
. Train and test learning schemes that classify or perform regression.
Cluster
. Learn clusters for the data.
Associate
. Learn association rules for the data.
Select attributes
. Select the most relevant attributes in the data.
Visualize
. View an interactive 2D plot of the data.
5/6/2020 University of Waikato 16
Status Box
Right-clicking the mouse anywhere inside the status box brings up a little menu. The menu gives two options:
Memory information
. Display in the log box the amount of memory available to WEKA.
Run garbage collector
new tasks. . Force the Java garbage collector to search for memory that is no longer needed and free it up, allowing more memory for Note that the garbage collector is constantly running as a background task anyway.
5/6/2020 University of Waikato 17
Log Button
Clicking on this button brings up a separate window containing a scrollable text field. Each line of text is stamped with the time it was entered into the log. As you perform actions in WEKA, the log keeps a record of what has happened.
For people using the command line or the SimpleCLI, the log now also contains the full setup strings for classification, clustering, attribute selection, etc., so that it is possible to copy/paste them elsewhere.
5/6/2020 University of Waikato 18
WEKA Status Icon
When no processes are running, the bird sits down and takes a nap. The number beside the × symbol gives the number of concurrent processes running. When the system is idle it is zero, but it increases as the number of processes increases. When any process is started, the bird gets up and starts moving around. If it’s standing but stops moving for a long time, it’s sick: something has gone wrong! In that case you should restart the WEKA Explorer.
5/6/2020 University of Waikato 19
Graphical output
Most graphical displays in WEKA, e.g., the GraphVisualizer or the TreeVisualizer, support saving the output to a file. A dialog for saving the output can be brought up with Alt+Shift+left-click. Supported formats are currently Windows Bitmap, JPEG, PNG and EPS (encapsulated Postscript). The dialog also allows you to specify the dimensions of the generated image.
5/6/2020 University of Waikato 20
WEKA deals with “flat” files
@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ...
5/6/2020 University of Waikato 21
WEKA deals with “flat” files
@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ...
5/6/2020 University of Waikato 22
Simple examples - weather data
5/6/2020 University of Waikato 23
Simple examples - weather data
5/6/2020 University of Waikato 24
Simple examples - iris data
5/6/2020 University of Waikato 25
Simple examples – CPU performace data
5/6/2020 University of Waikato 26
Simple examples – Labor
5/6/2020 University of Waikato 27
Simple examples Soybean Data
5/6/2020 University of Waikato 28
Conclusion: try it yourself!
WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang 5/6/2020 University of Waikato 29