Transcript Document
CROSS-DOCUMENT RELATION DISCOVERY, TRUTH FINDING Heng Ji [email protected] Nov 12, 2014 2 Outline Task Definition Supervised Models Basic Features World Knowledge Learning Models Joint Inference Semi-supervised Learning Domain-independent Relation Extraction Relation Extraction: Task relation: a semantic relationship between two entities ACE relation type Agent-Artifact Discourse Employment/ Membership Place-Affiliation Person-Social Physical Other-Affiliation example Rubin Military Design, the makers of the Kursk each of whom Mr. Smith, a senior programmer at Microsoft Salzburg Red Cross officials relatives of the dead a town some 50 miles south of Salzburg Republican senators A Simple Baseline with K-Nearest-Neighbor (KNN) Train Sample Train Sample Test Sample Train Sample Train Sample K=3 Train Sample Relation Extraction with KNN Train Sample: Employment Train Sample: Employment the previous president of the United States the secretary of NIST 0 Test Sample 36 the president of the United States 46 US forces in Bahrain Train Sample: Physical 1. 2. 3. 46 Train Sample: Physical his ranch in Texas 26 Connecticut’s governor Train Sample: Employment If the heads of the mentions don’t match: +8 If the entity types of the heads of the mentions don’t match: +20 If the intervening words don’t match: +10 Typical Relation Extraction Features Lexical Entity Synonyms in WordNet Name Gazetteers Personal Relative Trigger Word List Wikipedia Chunking Premodifier, Possessive, Preposition, Formulaic The sequence of the heads of the constituents, chunks between the two mentions The syntactic relation path between the two mentions Dependent words of the mentions Semantic Gazetteers Entity and mention type of the heads of the mentions Entity Positional Structure Entity Context Syntactic Heads of the mentions and their context words, POS tags If the head extent of a mention is found (via simple string matching) in the predicted Wikipedia article of another mention References: Kambhatla, 2004; Zhou et al., 2005; Jiang and Zhai, 2007; Chan and Roth, 2010,2011 7 Using Background Knowledge (Chan and Roth, 2010) • Features employed are usually restricted to being defined on the various representations of the target sentences • Humans rely on background knowledge to recognize relations • Overall aim of this work • Propose methods of using knowledge or resources that exists beyond the sentence • Wikipedia, word clusters, hierarchy of relations, entity type constraints, coreference • As additional features, or under the Constraint Conditional Model (CCM) framework with Integer Linear Programming (ILP) 7 8 Using Background Knowledge David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team 8 9 Using Background Knowledge David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team 9 10 Using Background Knowledge David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team 10 11 Using Background Knowledge David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team 11 12 Using Background Knowledge David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team David Brian Cone (born January 2, 1963) is a former Major League Baseball pitcher. He compiled an 8–3 postseason record over 21 postseason starts and was a part of five World Series championship teams (1992 with the Toronto Blue Jays and 1996, 1998, 1999 & 2000 with the New York Yankees). He had a career postseason ERA of 3.80. He is the subject of the book A Pitcher's Story: Innings With David Cone by Roger Angell. Fans of David are known as "Cone-Heads." Cone lives in Stamford, Connecticut, and is formerly a color commentator for the Yankees on the YES Network.[1] Contents [hide] 1 Early years 2 Kansas City Royals 3 New York Mets Partly because of the resulting lack of leadership, after the 1994 season the Royals decided to reduce payroll by trading pitcher David Cone and outfielder Brian McRae, then continued their salary dump in the 1995 season. In fact, the team payroll, which was always among the league's highest, was sliced in half from $40.5 million in 1994 (fourth-highest in the major leagues) to $18.5 million in 1996 (second-lowest in the major leagues) 12 13 Using Background Knowledge David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team fine-grained Employment:Staff 0.20 Employment:Executive 0.15 Personal:Family 0.10 Personal:Business 0.10 Affiliation:Citizen 0.20 Affiliation:Based-in 0.25 13 14 Using Background Knowledge David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team fine-grained coarse-grained Employment:Staff 0.20 Employment:Executive 0.15 Personal:Family 0.10 Personal:Business 0.10 Affiliation:Citizen 0.20 Affiliation:Based-in 0.25 0.35 Employment 0.40 Personal 0.25 Affiliation 14 15 Using Background Knowledge David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team fine-grained coarse-grained Employment:Staff 0.20 Employment:Executive 0.15 Personal:Family 0.10 Personal:Business 0.10 Affiliation:Citizen 0.20 Affiliation:Based-in 0.25 0.35 Employment 0.40 Personal 0.25 Affiliation 15 16 Using Background Knowledge David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team fine-grained Employment:Staff 0.55 0.20 Employment:Executive 0.15 Personal:Family 0.10 Personal:Business 0.10 Affiliation:Citizen 0.20 Affiliation:Based-in 0.25 coarse-grained 0.35 Employment 0.40 Personal 0.25 Affiliation 16 17 Knowledge1: Wikipedia1 (as additional feature) mi r? mj • We use a Wikifier system (Ratinov et al., 2010) which performs context-sensitive mapping of mentions to Wikipedia pages • Introduce a new feature based on: • 1, if Ami (m j ) or Am j (mi ) w1 (mi , m j ) 0, otherwise • introduce a new feature by combining the above with the coarse- grained entity types of mi,mj 17 18 Knowledge1: Wikipedia2 (as additional feature) mi parent-child? mj • Given mi,mj, we use a Parent-Child system (Do and Roth, 2010) to predict whether they have a parent-child relation • Introduce a new feature based on: 1, if parent- child(mi , m j ) • w2 (mi , m j ) 0, otherwise • combine the above with the coarse-grained entity types of mi,mj 18 19 Knowledge2: Word Class Information (as additional feature) 0 0 0 apple 1 1 pear 1 0 Apple 1 0 IBM bought 0 1 1 0 run of 1 in • Supervised systems face an issue of data sparseness (of lexical features) • Use class information of words to support generalization better: instantiated as word clusters in our work • Automatically generated from unlabeled texts using algorithm of (Brown et al., 1992) 19 20 Knowledge2: Word Class Information 0 0 0 apple 1 1 pear 1 0 Apple 1 0 IBM bought 0 1 1 0 run of 1 in • Supervised systems face an issue of data sparseness (of lexical features) • Use class information of words to support generalization better: instantiated as word clusters in our work • Automatically generated from unlabeled texts using algorithm of (Brown et al., 1992) 20 21 Knowledge2: Word Class Information 0 0 0 apple 1 1 pear 1 0 Apple 1 0 IBM bought 0 1 1 0 run of 1 in 011 • Supervised systems face an issue of data sparseness (of lexical features) • Use class information of words to support generalization better: instantiated as word clusters in our work • Automatically generated from unlabeled texts using algorithm of (Brown et al., 1992) 21 22 Knowledge2: Word Class Information 0 0 1 0 apple pear 00 1 1 0 Apple 01 1 0 IBM bought 10 0 1 1 0 run 1 of in 11 • All lexical features consisting of single words will be duplicated with its corresponding bit-string representation 22 23 Constraint Conditional Models (CCMs) (Roth and Yih, 2007; Chang et al., 2008) weight vector for “local” models collection of classifiers 23 24 Constraint Conditional Models (CCMs) (Roth and Yih, 2007; Chang et al., 2008) penalty for violating the constraint weight vector for “local” models collection of how far y is from a “legal” assignment classifiers 24 25 Constraint Conditional Models (CCMs) (Roth and Yih, 2007; Chang et al., 2008) •Wikipedia •word clusters •hierarchy of relations •entity type constraints •coreference 25 26 Constraint Conditional Models (CCMs) David Cone , a Kansas City native , was originally signed by the Royals and broke into the majors with the team fine-grained coarse-grained Employment:Staff 0.20 Employment:Executive 0.15 Personal:Family 0.10 Personal:Business 0.10 Affiliation:Citizen 0.20 Affiliation:Based-in 0.25 0.35 Employment 0.40 Personal 0.25 Affiliation 26 27 Constraint Conditional Models (CCMs) (Roth and Yih, 2007; Chang et al., 2008) • Key steps • Write down a linear objective function • Write down constraints as linear inequalities • Solve using integer linear programming (ILP) packages 27 28 Knowledge3: Relations between our target relations personal employment ... family biz ... staff ... ... executive 28 29 Knowledge3: Hierarchy of Relations personal employment ... family coarse-grained classifier biz ... staff executive ... fine-grained ... classifier 29 30 coarse-grained? Knowledge3: Hierarchy of Relations mi mj fine-grained? personal employment ... family biz ... staff ... ... executive 30 31 Knowledge3: Hierarchy of Relations personal employment ... family biz ... staff ... ... executive 31 32 Knowledge3: Hierarchy of Relations personal employment ... family biz ... staff ... ... executive 32 33 Knowledge3: Hierarchy of Relations personal employment ... family biz ... staff ... ... executive 33 34 Knowledge3: Hierarchy of Relations personal employment ... family biz ... staff ... ... executive 34 35 Knowledge3: Hierarchy of Relations personal employment ... family biz ... staff ... ... executive 35 36 Knowledge3: Hierarchy of Relations Write down a linear objective function max p RR rcL Rc R (rc ) x R ,rc coarse-grained prediction probabilities p RR rf L Rf R (rf ) y R ,rf fine-grained prediction probabilities 36 37 Knowledge3: Hierarchy of Relations Write down a linear objective function max p RR rcL Rc R (rc ) x R ,rc coarse-grained coarse-grained prediction indicator probabilities variable p RR rf L Rf R (rf ) y R ,rf fine-grained fine-grained prediction indicator probabilities variable indicator variable == relation assignment 37 38 Knowledge3: Hierarchy of Relations Write down constraints • If a relation R is assigned a coarse-grained label rc, then we must also assign to R a fine-grained relation rf which is a child of rc. x R,rc y R,rf1 y R,rf 2 y R,rf n • (Capturing the inverse relationship) If we assign rf to R, then we must also assign to R the parent of rf, which is a corresponding coarse-grained label y R,rf x R, parent (rf ) 38 39 Knowledge4: Entity Type Constraints (Roth and Yih, 2004, 2007) Employment:Staff Employment:Executive mi Personal:Family Personal:Business mj Affiliation:Citizen Affiliation:Based-in • Entity types are useful for constraining the possible labels that a relation R can assume 39 40 Knowledge4: Entity Type Constraints (Roth and Yih, 2004, 2007) per Employment:Staff org per Employment:Executive org mi per Personal:Family per per Personal:Business per mj per per Affiliation:Citizen gpe per org Affiliation:Based-in gpe • Entity types are useful for constraining the possible labels that a relation R can assume 40 41 Knowledge4: Entity Type Constraints (Roth and Yih, 2004, 2007) per Employment:Staff org per Employment:Executive org mi per Personal:Family per per Personal:Business per mj per per Affiliation:Citizen gpe per org Affiliation:Based-in gpe • We gather information on entity type constraints from ACE-2004 documentation and impose them on the coarse-grained relations • By improving the coarse-grained predictions and combining with the hierarchical constraints defined earlier, the improvements would propagate to the fine-grained predications 41 42 Knowledge5: Coreference Employment:Staff Employment:Executive mi Personal:Family Personal:Business mj Affiliation:Citizen Affiliation:Based-in 42 43 Knowledge5: Coreference Employment:Staff Employment:Executive mi Personal:Family null Personal:Business mj Affiliation:Citizen Affiliation:Based-in • In this work, we assume that we are given the coreference information, which is available from the ACE annotation. 43 44 Experiment Results BasicRE All nwire 10% of nwire 50.5% 31.0% F1% improvement from using each knowledge source 44 Most Successful Learning Methods: Kernel-based • Consider different levels of syntactic information • Deep processing of text produces structural but less reliable results • Simple surface information is less structural, but more reliable • Generalization of feature-based solutions • A kernel (kernel function) defines a similarity metric Ψ(x, y) on objects • No need for enumeration of features • Efficient extension of normal features into high-order spaces • Possible to solve linearly non-separable problem in a higher order space • Nice combination properties • Closed under linear combination • Closed under polynomial extension • Closed under direct sum/product on different domains • References: Zelenko et al., 2002, 2003; Aron Culotta and Sorensen, 2004; Bunescu and Mooney, 2005; Zhao and Grishman, 2005; Che et al., 2005, Zhang et al., 2006; Qian et al., 2007; Zhou et al., 2007; Khayyamian et al., 2009; Reichartz et al., 2009 Kernel Examples for Relation Extraction 1) Argument 1 ( R1 , R2 ) K E ( R1. argi , R2 . argi ), where i 1, 2 K E ( E1 , E2 ) KT ( E1.tk , E2 .tk ) I ( E1.type, E2 .type) I ( E1.subtype, E2 .subtype) I ( E1.role, E2 .role) KT is a token kernel defined as: KT (T1 , T2 ) I (T1.word, T2 .word ) I (T1. pos, T2 . pos) I (T1.base, T2 .base) 2) Local dependency 2 ( R1, R2 ) K D ( R1. argi .dseq, R2 . argi .dseq) , where i 1, 2 K D (dseq, dseq' ) (I (arc .label, arc' .label) K T (arci .dw, arc' j .dw)) (I (arc .label, arc ' .label) K (arci .dw, arc 'j .dw)) 0i dseq .len 0 j dseq ' .len i j 3) Path 3 (R1, R2 ) K path (R1. path, R2 . path) , where K path ( path, path' ) 0 i path.len 0 j path ' .len i j T Composite Kernels: 1( R1, R2 ) (1 2 ) (1 2 )2 / 4 (Zhao and Grishman, 2005) Bootstrapping for Relation Extraction Occurrences of seed tuples: ORGANIZATION MICROSOFT IBM BOEING INTEL LOCATION REDMOND ARMONK SEATTLE SANTA CLARA Computer servers at Microsoft’s headquarters in Redmond… In mid-afternoon trading, share of Redmond-based Microsoft fell… The Armonk-based IBM introduced a new line… The combined company will operate from Boeing’s headquarters in Seattle. Intel, Santa Clara, cut prices of its Pentium processor. Initial Seed Tuples Occurrences of Seed Tuples Generate New Seed Tuples Augment Table Generate Extraction Patterns Bootstrapping for Relation Extraction (Cont’) Learned Patterns: • <STRING1>’s headquarters in <STRING2> •<STRING2> -based <STRING1> •<STRING1> , <STRING2> Initial Seed Tuples Occurrences of Seed Tuples Generate New Seed Tuples Augment Table Generate Extraction Patterns Bootstrapping for Relation Extraction (Cont’) Generate new seed tuples; start new iteration ORGANIZATION AG EDWARDS 157TH STREET 7TH LEVEL 3COM CORP 3DO JELLIES MACWEEK Initial Seed Tuples LOCATION ST LUIS MANHATTAN RICHARDSON SANTA CLARA REDWOOD CITY APPLE SAN FRANCISCO Occurrences of Seed Tuples Generate New Seed Tuples Augment Table Generate Extraction Patterns 50 State-of-the-art and Remaining Challenges • State-of-the-art: About 71% F-score on perfect mentions, and 50% F-score on system mentions • Single human annotator: 84% F-score on perfect mentions • Remaining Challenges • Context generalization to reduce data sparsity Test: “ABC's Sam Donaldson has recently been to Mexico to see him” Training: PHY relation ( “arrived in”, “was traveling to”, …) • Long context Davies is leaving to become chairman of the London School of Economics, one of the best-known parts of the University of London • Disambiguate fine-grained types • “U.S. citizens” and “U.S. businessman” indicate “GPE-AFF” relation while “U.S. president” indicates “EMP-ORG” relation • Parsing errors Knowledge Base Population (Slot Filling) <query id="SF114"> <name>Jim Parsons</name> <docid>eng-WL-11-174592-12943233</docid> <enttype>PER</enttype> <nodeid>E0300113</nodeid> <ignore>per:date_of_birth per:age per:country_of_birth per:city_of_birth</ignore> </query> School Attended: University of Houston KB Slots Person per:alternate_names per:date_of_birth per:age per:country_of_birth per:stateorprovince_of_birth per:city_of_birth per:origin per:date_of_death per:country_of_death per:stateorprovince_of_death per:city_of_death per:cause_of_death per:countries_of_residence per:stateorprovinces_of_residence per:cities_of_residence per:schools_attended per:title per:member_of per:employee_of per:religion per:spouse per:children per:parents per:siblings per:other_family per:charges Organization org:alternate_names org:political/religious_affiliation org:top_members/employees org:number_of_employees/members org:members org:member_of org:subsidiaries org:parents org:founded_by org:founded org:dissolved org:country_of_headquarters org:stateorprovince_of_headquarters org:city_of_headquarters org:shareholders org:website Slot Filling & Slot filler Validation Slot Filling (SF) Definition: The slot filling task is to search a document collection to fill in values for predefined slots (attributes) for a given entity to populate a reference KB. Queries: 50 person queries and 50 organization queries such as “Marc Bolland” and “Public Library of Science” Response: Claim + Evidence 41 slot types:single or multiple attribute values Slot Filling Validation (SFV) 52 runs from 18 SF teams Extracting true claims from multiple sources Problems: different information sources may generate claims with varied trustability various SF systems may generate erroneous, conflicting, redundant, complementary, ambiguously worded, or interdependent claims from the same set of documents System Source A Agence FrancePresse, News B New York Times, News Los Angeles C Discussion Forum Associated Press Worldstream, News D Slot Filler Los Angeles Atlantic City Los Angeles Evidence The statement was confirmed by publicist Maureen O’Connor, who said Dio died in Los Angeles . Ronnie James Dio , a singer with the heavy-metal bands Rainbow, died on Sunday in Los Angeles . Dio revealed last summer that he was suffering from stomach cancer shortly after wrapping up a tour in Atlantic City . LOS ANGELES 2010-05-16 20:31:18 UTC Ronnie James Dio ... has died, according to his wife. Solution Truth Finding: Determine the veracity of multiple conflicting claims from various sources and providers (i.e. systems or humans) Truth Finding Problem We require not only high-confidence claims but also trustworthy evidence to verify them. deep understanding is needed. Previous truth finding work assumed most claims are likely to be true. Most of them relied on the “wisdom of the crowd”. In SF, 72.02% responses are false. Certain truths might only be discovered by a minority of systems or from a few sources (62% from 1 or 2 systems) Multi-dimensional truth-finding model (MTM) Heuristics Explored in MTM Heuristic 1: A response is more likely to be true if derived from many trustworthy sources. A source is more likely to be trustworthy if many responses derived from it are true. Heuristic 2: A response is more likely to be true if it is extracted by many trustworthy systems. A system is more likely to be trustworthy if many responses generated by it are true. Credibility Initialization Source (𝑆): a combination of publication venue and genre initialized uniformly as 1/𝑛 (𝑛 is the number of sources) System (𝑇 = {𝑡1 , … , 𝑡𝑙 }): Each system 𝑡𝑖 generates a set of responses 𝑅𝑡𝑖 . Similarity between system 𝑡𝑖 and 𝑡𝑗 is |𝑅𝑡𝑖 ∩𝑅𝑡𝑗 )| (Mihalcea, 2004). log 𝑅 +log(|𝑅 |) 𝑡𝑖 𝑡𝑗 Construct a weighted undirected graph 𝐺 =< 𝑇, 𝐸 >, 𝑇 𝐺 = 𝑇, 𝐸 𝐺 = < 𝑡𝑖 , 𝑡𝑗 > , < 𝑡𝑖 , 𝑡𝑗 > = 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑡𝑖 , 𝑡𝑗 Apply TextRank to obtain the initial score. Response (𝑅): Rely on deep linguistic analysis of the evidence sentences and semantic clues. We will introduce it later. Credibility Propagation Extension of Co-HITS (Deng et al., 2009) Given the initial credibility scores 𝑐 0 𝑟 , 𝑐 0 𝑠 , 𝑎𝑛𝑑 𝑐 0 𝑡 , we aim to obtain the refined credibility scores 𝑐 𝑟 , 𝑐 𝑠 𝑎𝑛𝑑 𝑐 𝑡 . Propagation: Sources: Consider both the initial score for source and the propagation from connected responses. 𝑟𝑠 𝑐 𝑠𝑖 = 1 − λ𝑟𝑠 𝑐 0 𝑠𝑖 + λ𝑟𝑠 𝑟𝑗∈𝑅 𝑝𝑗𝑖 𝑐(𝑟𝑗 ) System: Consider both the initial score for system and the propagation from responses to systems 𝑟𝑡 𝑐 𝑡𝑘 = 1 − λ𝑟𝑡 𝑐 0 𝑡𝑘 + λ𝑟𝑡 𝑟𝑗 ∈𝑅 𝑝𝑗𝑘 𝑐(𝑟𝑗 ) Response: Each response’s score is influenced by both linked sources and systems. 𝑠𝑟 𝑡𝑟 𝑐 𝑟𝑗 = 1 − λ𝑠𝑟 − λ𝑡𝑟 𝑐 0 𝑟𝑗 + λ𝑠𝑟 𝑠𝑖 ∈𝑆 𝑝𝑖𝑗 𝑐(𝑠𝑖 ) + λ𝑡𝑟 𝑡𝑘 ∈𝑇 𝑝𝑘𝑗 𝑐(𝑡𝑘 ) Converges and a similar proof to HITS (Peserico and Pretto, 2009) Bottleneck: Low Coverage of Patterns Manually crafted/edited patterns: low coverage; expensive Bootstrapping: hard to generalize; long-tail distribution Typical Dependency patterns for per:place_of_birth <Query_PER> nsubjpass-1 born prep_in <Filler_LOC> <Query_PER> partmod born prep_in <Filler_LOC> <Query_PER> nsubjpass-1 born prep_on <Filler_LOC> <Query_PER> rcmod born prep_in <Filler_LOC> Missing some simple cases Charles Gwathmey [1] was born on June 19 , 1938 , in Charlotte [2] , N.C.. Dependency path between [1] and [2]: [ 'nsubjpass', 'born', 'prep_on', 'June', 'prep_in', 'N.C', 'nn') ] Bottleneck: Low Coverage of Patterns Typical Dependency Patterns for per:place_of_death • • • • <Q_PER> nsubj-1 dies prep_in <A_LOC> <Q_PER> nsubj-1 died prep_in <A_LOC> <Q_PER> nsubj-1 died prep_on <A_LOC> <Q_PER> nsubj-1 died prep_in hospital nn <A_LOC> Missing some simple cases • ``60 Minutes'' was the brainchild of Don Hewitt [1], the show 's longtime executive producer who died Wednesday of pancreatic cancer at his home in Bridgehampton, N.Y. [2] , at age 86 . • Dependency path between [1] and [2]: [ 'appos', "producer", 'nsubj', 'died', "who", 'rcmod', 'died', 'prep_at', 'home', 'prep_in‘] Knowledge Gap 1 • Deep Knowledge Acquisition: Nominal Coreference Almost overnight, he became fabulously rich, with a $3-million book deal, a $100,000 speech making fee, and a lucrative multifaceted consulting business, Giuliani Partners. As a celebrity rainmaker and lawyer, his income last year exceeded $17 million. His consulting partners included seven of those who were with him on 9/11, and in 2002 Alan Placa, his boyhood pal, went to work at the firm. After successful karting career in Europe, Perera became part of the Toyota F1 Young Drivers Development Program and was a Formula One test driver for the Japanese company in 2006. “Alexandra Burke is out with the video for her second single … taken from the British artist’s debut album” “a woman charged with running a prostitution ring … her business, Pamela Martin and Associates” Our Solution: Online knowledge graph construction; enrich paths with semantic annotations and Information Extraction (coreference/relation/event) Knowledge Gap 2 Deep Knowledge Acquisition: Implicit paraphrases & long-tail distribution “employee/member”: Sutil, a trained pianist, tested for Midland in 2006 and raced for Spyker in 2007 where he scored one point in the Japanese Grand Prix. Daimler Chrysler reports 2004 profits of $3.3 billion; Chrysler earns $1.9 billion. In her second term, she received a seat on the powerful Ways and Means Committee Jennifer Dunn was the face of the Washington state Republican Party for more than two decades State of Residence: Davis became Virginia's first Republican woman elected to Congress in 2000, and she was a member of the House Armed Services Committee and the Foreign Affairs Committee Buchwald lied about his age and escaped into the Marine Corps. By 1942, Peterson was performing with one of Canada's leading big bands, the Johnny Holmes Orchestra. Even more: “would join”, “would be appointed”, “will start at”, “went to work”, “was transferred to”, “was recruited by”, “took over as”, “succeeded PERSON”, “began to teach piano”, … “spouse”: Buchwald 's 1952 wedding -- Lena Horne arranged for it to be held in London 's Westminster Cathedral -- was attended by Gene Kelly , John Huston , Jose Ferrer , Perle Mesta and Rosemary Clooney , to name a few Linguistic Indicators: Knowledge Graph Construction {NUM } 【Per:age】 50 {PER.Individual, NAM, Billy Mays} 【Query】 Mays amod nsubj aux Tampa nn had sleep prep_at poss home {FAC.Building-Grounds.NOM} poss June,28 {Death-Trigger} prep_in located_in prep_of died his {PER.Individual.PRO, Mays} Linguistic Indicators Linguistic Indicators: (binary classification result) Linguistic indicators make use of linguistic features on varying levels surface form, sentential syntax, semantics, and pragmatics. Node Indicators Path Indicators Interdependent Claims Node Indicators Surface: stop words, lowercased Entity type, subtype and mention type Fillers for org:top_employee Fillers for org:website Entity attributes mined by the NELL system (Carlson et al., 2010) Path Indicators Trigger phrases Relations and events: Examples: “top-employees”: chief executive officer, chief financial officer, chief operating officer, chief strategy and development officer, chiev information officer, ecommerce and security officer,… “headquarters”: based, headquarter, headquarters, 's Disease list from medical ontology e.g. “Start-Position” indicates slot type: per:employee_or_member_of Path length: e.g. the path length for per:title is usually 1. Independent Claims Indicators Conflicting slot fillers Inter-dependent slot types: After initial credibility scores for each response, we check whether evidence exists for any implied claims. e.g.: Given A is B’s son and C is A’s sibling brother-> A is C’s parent. Inter-dependent Slots Query: Beverly Sills U.S. Merdith Peter Beverly Sills Bubbles Peter Green Ough 78 Belle Miriam Silverman Monday Manhattan New York Brooklyn May 25, 1929 Example: local structure for death related slots We already know Beverly Sills, 78, died on Monday in Brookly, NY. Beverly Sills 78 Given the knowledge graph of Paul Gillmor and a similar local structure, we can predict the slot types of nodes . Paul Gillmor 68 Monday Brookly New York Wednesday Arlington Virginia Truth Finding Overall Performance Methods Precision Recall F-measure Accuracy MAP* 1. Random 28.64% 50.48% 36.54% 50.54% 34% 2. Voting 42.16% 70.18% 52.68% 62.54% 62% 3. Linguistic Indicators 50.24% 70.69% 58.73% 72.29% 60% 4. SVM (3+system+source) 56.59% 48.72% 52.36% 75.86% 56% 5. MTM (3+system+source) 53.94% 72.11% 61.72% 81.57% 70% *MAP: Mean Average Precision Truth Finding Efficiency 14000 12000 3 10000 1 2 #truths 5 8000 4 6 6000 6 Oracle 5 MTM 4 SVM 3 Linguistic Indicator 2 Voting 1 Baseline 4000 2000 0 0 10000 20000 30000 #total responses 40000 Enhance Individual SF Systems 35 Before After F-mesaure (%) 30 25 20 15 10 5 0 0 2 4 6 8 10 System 12 14 16 18 20 Remaining Challenges • Name Tagging Errors • Coreference Resolution Errors • • He worked his way up the organization under founder Ted Arison and his son Micky , who now leads Carnival Corp. and called Dickinson, `` one of the most influential people in the development of the modern-day cruise industry. Indiana Muslim running for Congress wants to combat ignorance about his [Andre Carson] faith INDIANAPOLIS -- A convert to Islam stands an election victory away from becoming the second Muslim elected to Congress and a role model for a faith community seeking to make its mark in national politics. • Vague Justification • It was in December 1970 that Anderson criticized Hoover 's pretrial attack on two Roman Catholic priests , Daniel J. and Philip F. Berrigan , who were later convicted of destroying draft board records. religion filler? • Fuzzy Definition • She and Russell Simmons, 50, have two daughters: 8-year-old Ming Lee and 5-year-old Aoki Lee. 75 Remaining Challenges • Distinguish Slot Directions • Organization parent/subsidiary; members/member_of • Implicit Relations He [Pascal Yoadimnadji] has been evacuated to France on Wednesday after falling ill and slipping into a coma in Chad, Ambassador Moukhtar Wawa Dahab told The Associated Press. His wife, who accompanied Yoadimnadji to Paris, will repatriate his body to Chad, the amba. is he dead? in Paris? Until last week, Palin was relatively unknown outside Alaska, and as facts have dribbled out about her, the McCain campaign has insisted that its examination of her background was thorough and that nothing that has come out about her was a surprise. does she live in Alaska? The list says that the state is owed $2,665,305 in personal income taxes by singer Dionne Warwick of South Orange, N.J., with the tax lien dating back to 1997. does she live in NJ? Vernon Bellecourt -- whose Ojibwe name, WaBun-Inini, means "Man of Dawn" or "Daybreak" -- was born on the White Earth Indian Reservation in Minnesota. He left home at 15 after finding work in a carnival. did he live in Minnesota? 76