Transcript Document

Machine Learning with WEKA

Eibe Frank Department of Computer Science, University of Waikato, New Zealand      WEKA: A Machine Learning Toolkit The Explorer • • • • • Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions

WEKA: the bird

WEKA 3.3

5/6/2020

Copyright: Martin Kramer ([email protected])

University of Waikato 2

WEKA: the software

 Machine learning/data mining software written in Java (distributed under the GNU Public License)   Used for research, education, and applications Complements “Data Mining” by Witten & Frank  Main features:  Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods  Graphical user interfaces (incl. data visualization)  Environment for comparing learning algorithms 5/6/2020 University of Waikato 3

WEKA: versions

 There are several versions of WEKA:  WEKA 3.0: “book version” compatible with description in data mining book  WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)  WEKA 3.3: “development version” with lots of improvements  This talk is based on the latest snapshot of WEKA 3.5.6

5/6/2020 University of Waikato 4

5/6/2020 University of Waikato 5

Launching WEKA (1)

Program 

LogWindow

Opens a log window that captures all that is printed to

stdout

or

stderr

. Useful for environments like MS Windows, where WEKA is normally not started from a terminal.

Exit

Closes WEKA 5/6/2020 University of Waikato 6

Launching WEKA (2)

Applications

Lists the main applications within WEKA 

Explorer

An environment for exploring data with WEKA (the rest of this documentation deals with this application in more detail).

  

Experimenter

An environment for performing experiments and conducting statistical tests between learning schemes.

KnowledgeFlow

learning.

This environment supports essentially the same functions as the Explorer but with a drag-and-drop interface. One advantage is that it supports incremental

SimpleCLI

interface.

Provides a simple command-line interface that allows direct execution of WEKA commands for operating systems that do not provide their own command line 5/6/2020 University of Waikato 7

Launching WEKA (3)

Tools

 Other useful applications.

ArffViewer

application for viewing ARFF files in spreadsheet format.

An MDI (“multiple document interface”)  

SqlViewer

represents an SQL worksheet, for querying databases via JDBC.

EnsembleLibrary

An interface for generating setups for Ensemble Selection [5] (a contribution by Robert Jung and David Michael from Cornell University, Ithaca, NY, USA).

5/6/2020 University of Waikato 8

Launching WEKA (4)

Visualization

Ways of visualizing data with WEKA  

Plot

For plotting a 2D plot of a dataset.

ROC

Displays a previously saved ROC (

receiver operating characteristic)

curve (true positive rate vs false positive rate).

  

TreeVisualizer

decision tree.

For displaying directed graphs, e.g., a

GraphVisualizer

Visualizes XML BIF or DOT format graphs, e.g., for Bayesian networks.

BoundaryVisualizer

Allows the visualization of classifier decision boundaries in two dimensions.

5/6/2020 University of Waikato 9

Launching WEKA (5)

Windows

All open windows are listed here.

Minimize

Minimizes all current windows.

Restore

Restores all minimized windows again.

5/6/2020 University of Waikato 10

Launching WEKA (6)

Help

Online resources for WEKA can be found here.

Weka homepage

homepage.

Opens a browser window with WEKA’s 

Online documentation

Directs to the WekaDoc Wiki.

   

HOWTOs

, code snippets, etc. The general WekaWiki, containing lots of examples and HOWTOs around the development and use of WEKA.

Weka on Sourceforge

Sourceforge.net.

WEKA’s project homepage on

SystemInfo

Lists some internals about the Java/WEKA environment, e.g., the CLASSPATH.

About

The infamous “About” box.

5/6/2020 University of Waikato 11

5/6/2020 University of Waikato 12

WEKA 3.3

5/6/2020 University of Waikato 13

Applications – WEKA Explorer

 Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary   Data can also be read from a URL or from an SQL database (using JDBC) Pre processing tools in WEKA are called “filters”  WEKA contains filters for:  Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … 5/6/2020 University of Waikato 14

5/6/2020 University of Waikato 15

Section Tabs

     

Preprocess

. Choose and modify the data being acted on.

Classify

. Train and test learning schemes that classify or perform regression.

Cluster

. Learn clusters for the data.

Associate

. Learn association rules for the data.

Select attributes

. Select the most relevant attributes in the data.

Visualize

. View an interactive 2D plot of the data.

5/6/2020 University of Waikato 16

Status Box

 Right-clicking the mouse anywhere inside the status box brings up a little menu. The menu gives two options: 

Memory information

. Display in the log box the amount of memory available to WEKA.

Run garbage collector

new tasks. . Force the Java garbage collector to search for memory that is no longer needed and free it up, allowing more memory for Note that the garbage collector is constantly running as a background task anyway.

5/6/2020 University of Waikato 17

Log Button

    Clicking on this button brings up a separate window containing a scrollable text field. Each line of text is stamped with the time it was entered into the log. As you perform actions in WEKA, the log keeps a record of what has happened.

For people using the command line or the SimpleCLI, the log now also contains the full setup strings for classification, clustering, attribute selection, etc., so that it is possible to copy/paste them elsewhere.

5/6/2020 University of Waikato 18

WEKA Status Icon

     When no processes are running, the bird sits down and takes a nap. The number beside the × symbol gives the number of concurrent processes running. When the system is idle it is zero, but it increases as the number of processes increases. When any process is started, the bird gets up and starts moving around. If it’s standing but stops moving for a long time, it’s sick: something has gone wrong! In that case you should restart the WEKA Explorer.

5/6/2020 University of Waikato 19

Graphical output

 Most graphical displays in WEKA, e.g., the GraphVisualizer or the TreeVisualizer, support saving the output to a file.  A dialog for saving the output can be brought up with Alt+Shift+left-click.  Supported formats are currently Windows Bitmap, JPEG, PNG and EPS (encapsulated Postscript).  The dialog also allows you to specify the dimensions of the generated image.

5/6/2020 University of Waikato 20

WEKA deals with “flat” files

@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ...

5/6/2020 University of Waikato 21

WEKA deals with “flat” files

@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ...

5/6/2020 University of Waikato 22

Simple examples - weather data

5/6/2020 University of Waikato 23

Simple examples - weather data

5/6/2020 University of Waikato 24

Simple examples - iris data

5/6/2020 University of Waikato 25

Simple examples – CPU performace data

5/6/2020 University of Waikato 26

Simple examples – Labor

5/6/2020 University of Waikato 27

Simple examples Soybean Data

5/6/2020 University of Waikato 28

Conclusion: try it yourself!

 WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka

  Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang 5/6/2020 University of Waikato 29