Open Information Extraction From The Web

Download Report

Transcript Open Information Extraction From The Web

Open Information
Extraction From The
Web
Rani Qumsiyeh
What is Information Extraction
 This article surveys a range of Information
Extraction methods. (Particularly Open)
 A venerable technology that maps natural
language text into structured relational data.
 Open Information Extraction is where the
identities of the relations to be extracted are
unknown and the billions of documents found
on the Web necessitate highly scalable
processing.
Most common Ways to do IE
 Direct knowledge-based encoding.

A human enters regular expressions or
rules.
 Supervised learning.

A human provides labeled training
examples.
 Self-supervised learning.

The system automatically finds and labels
its own examples.
Direct Knowledge
 Not efficient, has to be altered for
different domains.
 Class PhysicalTarget space to the term
bank in the terrorism domain.
 Class Corporation in the joint-ventures
domain
Example of Supervised
Learning
Self Supervised Knowledge
 A system that labels its own training
examples. (Example: KnowItAll)
 For a given relation

Use generic pattern  instantiate relationspecific extraction rules  learn domainspecific extraction rules  apply rules to
web pages and assign them probabilities.
 Example: X is a Y (X is a country).


China is a country.
Garth Brooks is a country singer
Open Information Extraction
 The challenge of Web extraction is to be able to
do Open Information Extraction.


Unbounded number of relations
Web corpus contains billions of documents.
How open IE systems work
 learn a general model of how relations
are expressed (in a particular language),
based on unlexicalized features such as
part-of-speech tags. (Identify a verb)
 Learn domain-independent regular
expressions. (Punctuations, Commas).
Is there a general model of
relationships in English
TextRunner

Works in two phases.
1.
2.
Using a conditional random field, the extractor
learns to assign labels to each of the words in a
sentence.
Extracts one or more textual triples that aim to
capture (some of) the relationships in each
sentence.
Additional Tasks to Accomplish
 Opinion mining: in which Open IE can extract
opinion information about particular objects
(including products, political candidates, and
more) that are contained in blog, posts,
reviews, and other texts.
 Fact checking: in which Open IE can identify
assertions that directly or indirectly conflict with
the body of knowledge extracted from the Web
and various other knowledge bases.