Transcript weka.ppt

Introduction to Weka
CS4705 – Natural Language Processing
Thursday, September 28
What is weka?
●
java-based Machine Learning Tool
●
3 modes of operation
●
–
GUI
–
Command Line
–
API (not discussed here)
To run:
–
java -Xmx1024M -jar ~cs4705/bin/weka.jar &
weka Homepage
●
http://www.cs.waikato.ac.nz/ml/weka/
.arff file format
●
http://www.cs.waikato.ac.nz/~ml/weka/arff.html
@relation name
@attribute attrName {numeric, string, <nominal>, date}
...
@data
a,b,c,d,e
●
<nominal> := {class1,class2,...,classN}
Example Arff Files
●
http://sourceforge.net/projects/weka
●
iris.arff
●
cmc.arff
To Classify with weka GUI
1.Run weka GUI
7.Click 'Start'
2.Click 'Explorer'
8.Wait...
3.'Open file...'
9.Right-click on Result
list entry
4.Select 'Classify' tab
5.'Choose' a classifier
a. 'Save result buffer'
6.Confirm options
b.'Save model'
Classify
●
Some classifiers to start with.
–
NaiveBayes
–
JRip
–
J48
–
SMO
●
Find References by selecting a classifier
●
Use Cross-Validation!
Analyzing Results
●
Important tools for Homework 2
–
Accuracy
●
“Correctly classified instances”
–
Confusion matrix
–
Save model
–
Visualization
Running weka from the Command Line
●
Running an N-fold cross validation experiment
–
●
java -cp ~cs4705/bin/weka.jar
weka.classifiers.bayes.NaiveBayes -t
trainingdata.arff -x N
Using a predefined test set
–
java -cp ~cs4705/bin/weka.jar
weka.classifiers.bayes.NaiveBayes -t
trainingdata.arff -T testingdata.arff
●
Saving the model
–
●
java -cp ~cs4705/bin/weka.jar
weka.classifiers.bayes.NaiveBayes -t
trainingdata.arff -d output.model
Classifying a test set
–
java -cp ~cs4705/bin/weka.jar
weka.classifiers.bayes.NaiveBayes -l input.model T testingdata.arff
●
Analyzing results
–
Get predictions from test data
●
–
java -cp ~cs4705/bin/weka.jar
weka.classifiers.bayes.NaiveBayes -l
input.model -T testingdata.arff -p range
Then DIY with scripts
●
awk and sed will be your friends
●
Getting predictions from crossvalidation
–
“Output Predictions” doesn't cut it.
–
export CLASSPATH=~cs4705/bin/:~cs4705/bin/weka.jar
–
java callClassifier
weka.classifiers.bayes.NaiveBayes -t
trainingdata.arff