Transcript Slides
Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J. Mooney, Salim Roukos, Chris Welty Presented by: Young-Suk Lee The University of Texas at Austin IBM T. J. Watson Research Center © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Outline Problem definition and motivations Data System and Features Experimental Results 2 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Readability DARPA machine reading program (MRP) “Readability is defined as a subjective judgment of how easily a reader can extract the information the writer or the speaker intended to convey.” Task: given a general document, assign a readability score (1 to 5) 3 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Sample Passage: High Readability Industrial agriculture has grown increasingly paradoxical, replacing natural processes with synthetic practices and treating farms as factories. Consequently, food has become a marketing entity rather than a necessity to sustain life. … 4 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Sample Passage: Low Readability The word of the prince of believers may Allah God him Talk of gold this at present Reflections on the word of the prince of believers may Allah pleased with him, Prince of Believers May Allah be pleased with him: … 5 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Readability: Motivations Remove less readable documents from web-search Filter out less readable documents before extracting knowledge Select reading materials 6 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Contrast With Other Work Predicting readability: conveying message – vs. reading difficulty (grade 1 to 12) Document sources: multiple genres – vs. single domain, genre or reader group 7 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Outline Problem definition and motivations Data System and Features Experimental Results 8 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Data 390 training documents Genre #Docs Expert Rating Novice Rating Each document: nwire 56 4.93 4.23 wiki 56 4.83 4.13 weblog 55 4.46 3.75 – Nwire and wiki documents: high q-trans 56 4.47 3.83 – MT documents: low news-grp 55 4.26 3.34 ccap 56 4.13 3.53 mt 56 2.38 1.92 – 8 expert ratings: [1,..,5] – 6-10 “novice” ratings: [1,…,5] Ratings differ by genre 9 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Data Histogram of Novice Ratings 250 MT MTdocs docs Count 200 150 100 50 0 Speech: closed 1 2 3 4 5 ng: newsgroup caption Rating 10 nw wk wl qt ng cc mt © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Outline Problem definition and motivations Data System and Features Experimental Results 11 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. System Overview LM score Training Docs Preprocessing Test Doc … Regression (WEKA) Parser score Sys. Rating 12 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Syntactical Features Using Sundance [Riloff &Phillips 04] and English Slot Grammer parsers – Ratio of sentences without verbs – Avg. # clauses/per sentence – Avg. #NPs, #VPs, #PPs, #Phrases/sent, – Failure rate of ESG parser – .. 13 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Language Model (LM) Features Normalized document probability: – by a 5-gram generic LM Genre-specific LMs – Data readily available for those genres – Certain genre is a strong predictor of readability 14 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Genre-based Language Model Features Perplexity of genre-specific LM (Mj): History words Document Word Genre posterior perplexity (relative probability compared to all G genres): 15 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Lexical Features Fraction of known words using dictionary and gazetteer of names Out-of-vocabulary (OOV) rates using genre-based corpora Ratio of function words (“the”, “of” etc.) Ratio of pronouns 16 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Experiments: Evaluation Metric Pearson correlation coefficient – Mean expert judge rating as the gold-standard To compare with novice judges: – A sampling distribution representing performance of novice judges was generated – Distribution mean and upper critical value were computed Correlation between system and mean expert ratings – If above the upper critical value: system significantly (statistically) better than novice judges 17 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Outline Problem definition and motivations Data System and Features Experimental Results 18 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Experiments: Methodology Compared regression algorithms Feature ablation experiments Results: 13-fold cross-validation – Balanced genre representation 19 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Results: Regression Algorithms 1 0.9 0.8 0.7 0.6 Correlation 0.5 0.4 0.3 0.2 0.1 0 Upper Critical Value Distribution Mean Bagged Decision Tree Linear Regression SVM Regression Gaussian Process Regression Decsion Trees Choice of regression algorithm is not critical. 20 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Results: Feature Sets 1 0.9 0.8 0.7 0.6 Correlation 0.5 0.4 0.3 Upper Critical Value Distribution Mean 0.2 0.1 0 All Lexical Syntactical Lexical + Syntactical LM Based Each feature set contributes, LM-based feature set: most useful. 21 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Results: Genre-based Feature Sets 1 0.9 0.8 0.7 Upper Critical Value Distribution Mean 0.6 Correlation 0.5 0.4 0.3 0.2 0.1 0 All Genre-independent Genre-based Genre-independent features: better than novice mean; Genre-specific features: significantly improve performance. 22 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Results: Individual Feature Sets 1 0.9 0.8 0.7 Correlation 0.6 0.5 0.4 0.3 0.2 0.1 0 By itself Ablated from all System using all features Upper Critical Value Distribution Mean All Sundance ESG Perp. Post. Perp. OOV rates Posterior perplexities: best feature set, but no single feature set is indispensable. 23 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Official Evaluation Conducted by SAIC on behalf of DARPA Three teams participated Evaluation task: Predict readability of 150 test documents using the 390 documents for training 24 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Official Evaluation Results 1 0.9 Sig. better than human at p<0.0001 0.8 Upper Critical Value 0.7 0.6 Correlation Novice mean 0.5 0.4 0.3 0.2 0.1 0 Our System System B System C Our system performed favorably and scored better than the upper critical value. 25 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Conclusions Readability system – Regression over syntactical, lexical and language model features All features contribute, but LM features are most useful System is significantly (statistically) better than novice human judges 26 © 2010 IBM Corporation QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Thank You! Questions?? 27 © 2010 IBM Corporation