Transcript Slide 1

Machine Learning
in Practice
Carolyn Penstein Rosé
Language Technologies Institute/
Human-Computer Interaction
Institute
Machine Learning?
Why should we care?
Overwhelmed with data…
www.powerfulinformation.org
What do we do with all of it?
Machine learning is about
automatically finding meaningful
patterns in data
Example for credit history data:
Rule predicts who is more likely to have problems
paying off credit.
But what can
machine learning
do for me
personally?
How I got interested…
Would you believe I started dating
the man who became my husband
because of a common interest in
machine learning?
Apprentices of Wonder: Inside the Neural Network Revolution, by William F. Allman
Your TA

Kai-min Kevin Chang
 Language
Technologies Institutes Ph.D.
 [email protected]
 http://www.cs.cmu.edu/~kkchang/
 Office hour: 2:00pm – 3:00pm @ NSH 2507
Machine Learning in Practice
– Mind Reading

Predicting Human Brain Activity Associated with the
Meanings of Nouns (Mitchell et al., 2008)


In an object-contemplation task, participants were presented with
60 objects and were instructed to think of the same properties of
the stimulus object consistently while being scanned by fMRI
machines.
Given the evoked neural activity signatures evoked, we can
correctly guess what the participants were thinking 70% of the
time!
Help
Machine learning in my work….
Processing conversational data
Student1: I don’t
understand what to
do next.
Time
Student2:
Let me do it.
Support Agent: Student2,
it looks like your partner
could use some help.
Triggering feedback for
collaborative idea generation…
Speaker
Text
Student 1
People stole sand and stones to use for construction.
FEEDBACK
AGENT
Yes, steeling sand and stones may destroy the balance and
thus make mountain areas unstable. Thinking about
development of mountain areas, can you think of a kind of
development that may cause a problem?
Student 2
Development of mountain areas often causes problems.
Student 1
It is okay to develop, but there must be some constraints.
Individuals+ Individuals+
Feedback
NoFeedback
Pairs+
Feedback
Process Analysis
Pairs+
NoFeedback
Unique Ideas
8
#Unique Ideas
10
12
Individuals+Feedback
Nom+N
Nom+F
Real+N
Real+F
Individuals+NoFeedback
Why do we care?
Conclusion
Pairs+Feedback
Pairs+NoFeedback
0
2
4
6
If we can understand how our design
is affecting
behavior
differently
overfirst
time,
Don’t offer
feedback
during the
we can get more
insight into what is the
5 minutes.
most fruitful direction for a redesign.
Negative effect of Pairs vs 0
Individuals:
F(1,24)=12.22, p<.005, 1 st. dev.
Negative effect of Feedback:
F(1,24)= 7.23, p<.05, -1.03 st.
dev
5
10
15
Time Stamp
20
25
30
Negative effect of Pairs vs Individuals:
F(1,24)=4.61, p<.05, .61 st. dev.
Positive effect of feedback:
F(1,24)=16.43, p<.0005, 1.37 st. dev.
How does machine learning work?
A slightly
more
sophisticated
rule
learner
The simplest
rule
learner
will
Outlook:
willlearn
find the
feature
that
gives
the
most
to predict
whatever
is
Sunny -> No
information
the
result
class.
What
the most about
frequent
result
class.
Overcast -> Yes
do This
you think
that
would
be
in
this
case?
is called
the
majority
Rainy-> Yes
Class.
<Feature Name>:
What
<value> -> <prediction>
will the
rule
be
in
this
case?
<value> -> <prediction>
It will …
always predict yes.
What is machine learning?

Automatically or semi-automatically
 Inducing
concepts (i.e., rules) from data
 Finding patterns in data
 Explaining data
 Making predictions
Data
Learning Algorithm
Model
New Data
Classification Engine
Prediction
What will be the prediction?
Model
Outlook:
Sunny -> No
Overcast -> Yes
Rainy-> Yes
New Data
Yes
Terminology



Concept: the rule you
want to learn
Instance: one data
point from your training
or testing data (row in
table)
Attribute: one of the
features that an
instance is composed
of (column in table)
* Compute the predicted value.
What do concepts look like?
Clarification: Concepts as Lines
R
S
T
B
X
X
X
X
C
X
X
Styles of Learning
Classification – learn rules from labeled
instances that allow you to assign new
instances to a class
 Association – look for relationships between
features, not just rules that predict a class
from an instance (more general)
 Clustering – look for instances that are
similar (involves comparisons of multiple
features)
 Numeric Prediction (regression models)

6 Data sets that come with Weka

The weather problem: tiny fictitious data set
 Supposedly
helps you predict whether you should go
outside to play based on features of the weather

Contact lenses: still fake but slightly more realistic
 Data
for telling you what type of contact lenses a
person should have based on information about the
patient

Irises: numeric predictors, nominal target attribute
 Famous
data set from the 50s
 50 examples each of 3 types of irises
 Learn rules for determining which type of iris you have
6 Data sets that come with Weka

CPU Performance: both predictors and target are numeric


Labor negotiations: predict whether the outcome of
negotiations was good or not (nominal predictor)




Predict CPU performance based on computer configuration
information
Real data from labor negotiations in Canada
Both nominal and numeric predictors
Some missing and noisy data
Soybean classification: classic machine learning problem


Rules for diagnosing soybean diseases
Data taken from questionnaires about soybean diseases
Why is this course different from
typical machine learning courses?

Machine learning researchers focus on general
purpose algorithms
 Data


simply provides a level playing field
Data “cleaned up” for evaluations
Data selected for showing off the algorithm’s strengths
 Focus
on relative results on standard data sets
 Strive for generalizability across applications
Why is this course different from
typical machine learning courses?

Applied machine learning researchers focus on
doing something practical using machine learning
 Focus
on the data representation/ feature construction
 Focus on understanding the data
 Focus on usable results


Data used “as is”
Data selected based on task
Dual Focus: Machine Learning and
Language

Language interactions are important for
many types of applications
 Computer
mediated communication
 Computer supported education
 Speech based applications

My expertise is in computational linguistics
 Computational
linguistics is a good example of
an applied machine learning field
 Rich space of possible data representations
Course Objectives


Gain an appreciation for what machine learning is
and is not
Gain competence in applying machine learning
technology in a purposeful manner to your
research
 Basic
data manipulation skills
 Data structure design skills
 Means-ends analysis
 Solid corpus based experimentation methodology
 Evaluating and reporting your results
Course Objectives

Learn problem solving skills for moving forward in
the face of difficulties
 Data
interpretation
 Error analysis skills
 Hypothesis formation skills

Gain accessibility to the primary literature
Readings



Witten, I. H. & Frank, E. (2005). Data Mining:
Practical Machine Learning Tools and
Techniques, second edition, Elsevier: San
Francisco
Jackson, P. & Mouliner, I. (2002). Natural
Language Processing for Online Applications:
Text Retrieval, Extraction, and Categorization,
John Benjamins Publishing Company:
Philadelphia
Selected readings
 Will
be posted to blackboard in Course Documents
Folder
Software Tools

Weka
 Open
source machine learning toolkit
 Includes Java API
 We will do most of our work in this package

TagHelper
 Weka
add-on for text processing
 Developed at CMU
 We will introduce this tool in Lecture 10

Data manipulation tools
 Whatever
you are comfortable with
 Scripting language like Perl
 Excel
Course Projects
Problem of interest in the real world
 Lots of data in a machine readable format
 Well defined question with an unknown
answer
 Should be room for improvement over
baseline
 Deliverables: Proposal, Final Report
(Substantial implementation not required!)

Some previous projects…










Predicting which neighborhoods would become more or less
ethnically diverse
Identifying people by their handwriting
Predicting political affiliation from demographic information
Predicting music genre from lyrics
Predicting level of grammatical competence from writing
samples
Assessing level of cultural sensitivity from a newsgroup post
Predicting whether someone will be late to their next meeting
Predicting whether a student is contributing productively to
their working group in a project class
Predicting whether a newsgroup post will get a response
Predicting whether a newsgroup post was written by a male
or female
Browsable Summaries
Predicting student’s tendency to behave like a “slacker”
based on behavior in a message board environment.
Predict likelihood of reply
For
messages posted to online discussion
groups. Based on features of the message and
target group, such as:
message length
 on-topicness
 self-disclosing language
 requests
 usual group traffic
 usual group response rate

juxtaposing raw data with model predictions, the application would allow you to:
1
View the model’s predictions of destination
2
Compare those predictions with raw sensor data
3
Interactively label a ground truth, for accuracy estimates
Grading


Quizzes (10%)
Weekly assignments (20%)



2 Midterms (10% each)


To give you practice on the skills of the week
You will get credit for doing these
These will be practical exercises meant to test your competence,
not focused on memorization
Course project (50%)



Evaluated based on demonstration of competence, not accuracy of
the technology
Project proposals: includes a write up of preliminary work (defining
the problem, building baseline approach, evaluating baseline
performance, error analysis of baseline approach)
Final project report will discuss your experimentation process and
final results in comparison with baseline approach
Questions?