Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar.

Download Report

Transcript Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar.

Oregon State University – CS539 PRMs
Learning Probabilistic Models
of Link Structure
Getoor, Friedman, Koller, Taskar
Oregon State University – CS539 PRMs
Example Application: WebKB
 Classify web page as course, student,
professor, project, none using…



Words on the web page
Links from other web pages (and the class
of those pages, recursively)
Words in the “anchor text” from the other
page <a href=“url”>anchor text</a>.
 Web pages obtained from Cornell, Texas,
Washington, and Wisconsin
Oregon State University – CS539 PRMs
Example Application: CORA
 Classify documents according to topic (7
levels) using…



words in the document
papers cited by the document
papers citing the document
Oregon State University – CS539 PRMs
Standard PRM
 parents(Doc.class) =
{MODE(Doc.citers.class),MODE(Doc.cited.clas
s)}
citers
Document
MODE
class
MODE
words
cited
Document
Document
Document
class
class
class
words
words
words
Document
Document
Document
Document
class
class
class
class
words
words
words
words
Oregon State University – CS539 PRMs
Problem: The Citation Structure is
Fixed
 The existence (or non-existence) of a link
cannot serve as evidence
 Individually-linked papers only influence
the class through the MODE.
Oregon State University – CS539 PRMs
Possible Solution: Link Uncertainty
 Model the existence of links as random
variables
 Create a Link instance for each pair of
possibly-linked objects
Oregon State University – CS539 PRMs
Unrolled Network
Document
class
Cites
Cites
words
Exists
Document
class
Exists
Document
Cites
class
Exists
words
words
Oregon State University – CS539 PRMs
Getoor’s Diagram
 Entity classes (Paper)
 Relation classes (Cites)
 Technically, every instance has an Exists
variable which is true for all Entity instances.
Oregon State University – CS539 PRMs
Semantics
 P is the basic CPT
 P* will be the equivalent unrolled CPT
 Require that an object does not exist if
any of the objects it points to do not exist
Oregon State University – CS539 PRMs
WebKB Network
Oregon State University – CS539 PRMs
Experimental Results
 Cora and WebKB
Oregon State University – CS539 PRMs
WebKB with various features
Oregon State University – CS539 PRMs
A Second Approach:
Reference Uncertainty
 Treat reference attributes as random
variables

Each reference attribute takes as value an
object of the indicated class
 Citation


Citing: reference attribute, value is a Paper
Cited: reference attribute, value is a Paper
Oregon State University – CS539 PRMs
Problems
 How many citation objects exist?
Consequently, how many reference
random variables exist?
 How do we represent P(Citation.cites |
…)? Citation.cites could take on
thousands of possible values.


Huge conditional probability table
Costly inference at run time
Oregon State University – CS539 PRMs
Solutions
Problem 1: How many citations?
 Fix the number of Citation objects
 This gives the “object skeleton”
Oregon State University – CS539 PRMs
Problem 2: Too many potential
values for a reference attribute
 Attach to each reference attribute a set of
partition attributes


The reference attribute chooses a partition
A Paper is then chosen uniformly at random from
the partition
Theory
Citation
Citing
Cited
Paper
Paper
Paper
Learning
Graphics
Paper
Paper
Paper
Paper
Paper
Paper
Oregon State University – CS539 PRMs
Representing Constraints Between
Citing and Cited Papers
Parents(Cites.Cited) = {Cites.Citing.Topic}
Oregon State University – CS539 PRMs
Details
 Each reference attribute  has a selector
attribute S that chooses the partition.
Citation
Sciting
Citing
Scited
Cited
Theory
Paper
Paper
Paper
Learning
Graphics
Paper
Paper
Paper
Paper
Paper
Paper
Oregon State University – CS539 PRMs
Class-level Dependency Graph
 Five types of edges





Type I: edges within a single object
Type II: edges between objects
Type III: edges from every reference attribute along
any reference paths
Type IV: edges from every partition attribute to the
selector attributes that use those partition attributes
to choose a partition
Type V: edge from selector attributes to their
corresponding reference attributes
Oregon State University – CS539 PRMs
Movie Theater Example
 Type I: Genre  Popularity
 Type II: Shows.Movie.Genre  Shows.Profit
Shows.Theater.Type  SMovie
 Type III: Move  Profit; Theater  Smovie
 Type IV: Genre  SMovie
 Type V: STheater  Theater; SMovie  Movie
Oregon State University – CS539 PRMs
Unrolled Graph?
 The Unrolled Graph can have a huge
number of edges
 Is learning and inference really feasible?
Oregon State University – CS539 PRMs
Homework Exercise
 Construct the dependency graph for the
citation example
 Construct an unrolled network for a
reference uncertainty example