CoCQA: Co-Training Over Questions and Answers with an

Download Report

Transcript CoCQA: Co-Training Over Questions and Answers with an

CoCQA: Co-Training Over Questions and Answers
with an Application to Predicting Question Subjectivity
Orientation
Baoli Li, Yandong Liu, and Eugene Agichtein
Emory University
1
Community Question Answering
An effective way of seeking information from other users
Can be searched for resolved questions


2
Community Question Answering (CQA)

Yahoo! Answers
Users




Asker: post questions
Answerer: post answers
Voter: vote for existing answers
Questions



Subject
Detail
Answers




3
Answer text
Votes
Archive: millions of questions
and answers
Lifecycle of a Question in CQA
Choose
a category
Compose
the question
Open
question
Examine
User
Answer
+
User
Answer
- + +
- User
User
User
Close question
Choose best answers
Give ratings
Find the answer?
Question is closed by system.
Best answer is chosen by voters
4
Answer
User
+
User
User
Problem Statement

How can we exploit structure of CQA to improve
question classification?

Case Study: Question Subjectivity Prediction


5
Subjective questions: seek answers containing
private states such as personal opinion, judgment,
and experience;
Objective questions: are expected to be answered
with reliable or authoritative information;
Example Questions
Subjective:


Has anyone got one of those home blood
pressure monitors? and if so what make is it
and do you think they are worth getting?
Objective:


6
What is the difference between chemotherapy
and radiation treatments?
Motivation
Guiding the CQA engine to process questions more
intelligently
Some Applications






7
Ranking/filtering answers
Improving question archive search
Evaluating answers provided by users
Inferring user intent
Challenges
Some challenges in online real question analysis:




8
Typically complex and subjective
Can be ill-phrased and vague
Not enough annotated data
Key Observations
Can we utilize the inherent structure of the CQA
interactions, and use the unlimited amounts of
unlabeled data to improve classification
performance?

9
Natural Approach: Co-Training

Introduced by


Two views of the data




Combining labeled and unlabeled data with co-training, Blum and
Mitchell, 1998
E.g.: content and hyperlinks in web pages
Provide complementary information for each other
Iteratively construct additional labeled data
Can often significantly improve accuracy
10
Questions and Answers: Two Views

Example:



Answers usually match/fit question


Q: Has anyone got one of those home blood pressure
monitors? and if so what make is it and do you think they are
worth getting?
A: My mom has one as she is diabetic so its important for her
to monitor it she finds it useful.
My mom… she finds…
Askers can usually identify matching answers by selecting
the “best answer”
11
CoCQA: A Co-Training Framework over
Questions and Answers
Q
Labeled
Labeled Data
Data
A
CQ
CA
Classify
Stop
12
Validation
(Holdout training
data)
Q
A
Unlabeled Data
Unlabeled
Data
??????????
??????????
??????????
??????????
+--++---++--+
Details of CoCQA implementation

Base classifier


Term Frequency as Term Weight


LibSVM
Also tried Binary, TF*IDF
Select top K examples with highest confidence

13
Margin value in SVM
Feature Set

Character 3-grams


Words



Has, anyone, got, mom, she, finds…
Word with Character 3-grams
Word n-grams (n<=3, i.e. Wi, WiWi+1,
WiWi+1Wi+2)


has, any, nyo, yon, one…
Has anyone got, anyone got one, she finds it…
Word and POS n-gram (n<=3, i.e. Wi, WiWi+1, Wi
POSi+1, POSiWi+1 , POSiPOSi+1, etc.)

14
NP VBP, She PRP, VBP finds…
Overview of Experimental Setup

Datasets




From Yahoo! Answers
Manually labeled data by Amazon Mechanical Turk
Metrics
Compare CQA to state-of-the semi-supervised method
15
Dataset

1,000 Labeled Questions from Yahoo! Answers



10,000 Unlabeled Questions from Yahoo! Answers


5 categories (Arts, Education, Science, Health & Sports)
200 questions from each category
2,000 questions from each category
Data available at
 http://ir.mathcs.emory.edu/shared
16
Manual Labeling






Annotated using Amazon’s Mechanical Turk service
Each question was judged by 5 Mechanical Turk workers
25 questions included in each HIT task
Worker needs to pass the qualification test
Majority vote to derive gold standard
Discarded small fraction (22 out of 1000) of nonsensical
questions such as “Upward Soccer Shorts?” and
“1+1=?fdgdgdfg” by manual inspection
17
Example HIT task
18
Subjectivity Statistics by Category
Education
Arts
Science
30%
36%
48%
52%
70%
64%
34%
Sports
Health
21%
66%
36%
64%
19
79%
Evaluation Metric

Macro-Averaged F-1


Prediction performance on both subjective questions and
objective questions is equally important
F-1
2 x Precision x Recall
F1 
Precision  Recall

20
Averaged over subjective and objective classes
Experimental Settings


5 fold cross validation
Methods Compared:



Supervised: LibSVM (Chang and Lin, 2001)
Generalized Expectation (GE): (Mann and McCallum,
2007)
CoCQA: our method


21
Base classifier: LibSVM
View 1: question text;View 2: answer text
F1 for Supervised Learning
Word
Word+
Char
3-gram
Word
POS
n-gram
(n<=3)
0.700
0.717
0.694
0.720
best_ans
0.587
0.597
0.578
0.565
q_bestans
0.681
0.695
0.662
0.712
Char
3-gram
question
Features
Naïve (majority class) baseline:
F1 with different sets of features
22
0.398
Semi Supervised Learning: Adding
unlabeled data
Question
Question+
Best Answer
Supervised
0.717
0.695
GE
0.712 (-0.7%)
0.717 (+3.2%)
CoCQA
0.731 (+1.9%)
0.745 (+7.2%)
Features
Method
Comparison between Supervised, GE and CoCQA
23
CoCQA with varying K
(# new examples added in each iteration)
0.76
0.75
0.74
0.73
0.72
F1
0.71
0.7
0.69
0.68
0.67
CoCQA(Question and Best Answer)
Supervised Q_bestans
CoCQA(Question and All Answers)
Supervised Q_allans
0.66
0.65
0.64
20
24
40
60
80
100
120
140
160
K: # labeled examples added on each
co-training iteration
180
200
CoCQA for varying # iterations
3500
0.75
CoCQA (Question + Best Answer)
Supervised
Total # Unlabeled
0.74
2500
2000
0.73
1500
F1
Total # Unlabeled Added
3000
1000
0.72
500
0
0.71
6
6
6
6
7
7
7
# co-training iterations
25
7
13
16
CoCQA for varying amount of labeled data
0.72
0.7
0.68
0.66
0.64
F1
0.62
0.6
0.58
CoCQA (Question + Best Answer)
0.56
Supervised Q_Best Ans
0.54
0.52
50
100
150
200
250
300
# of labeled data used
26
350
400
Conclusions and Future Work

Problem: Non-topical text classification in CQA

CoCQA: a co-training framework that can exploit
information from both question and answers
Case study: subjectivity classification for real
questions in CQA
We plan to explore:





27
more sophisticated features;
related variants of semi-supervised learning;
other applications (Sentiment classification)
Thank you!
Baoli Li
[email protected]
Yandong Liu
[email protected]
Eugene Agichtein
[email protected]
28
Performance of Subjective vs. Objective
classes

Subjective class


80%
Objective class

29
60%
Related work

Some related work:




30
Question Classification: (Zhang and Lee, 2003)( Tri et al., 2006)
Sentiment Analysis: (Pang and Lee, 2004)
(Yu and Hatzivassiloglou, 2003)
(Somasundaran et al. 2007)
Important words for Subjective,
Objective classes by Information Gain
31