Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar.
Download
Report
Transcript Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar.
Oregon State University – CS539 PRMs
Learning Probabilistic Models
of Link Structure
Getoor, Friedman, Koller, Taskar
Oregon State University – CS539 PRMs
Example Application: WebKB
Classify web page as course, student,
professor, project, none using…
Words on the web page
Links from other web pages (and the class
of those pages, recursively)
Words in the “anchor text” from the other
page <a href=“url”>anchor text</a>.
Web pages obtained from Cornell, Texas,
Washington, and Wisconsin
Oregon State University – CS539 PRMs
Example Application: CORA
Classify documents according to topic (7
levels) using…
words in the document
papers cited by the document
papers citing the document
Oregon State University – CS539 PRMs
Standard PRM
parents(Doc.class) =
{MODE(Doc.citers.class),MODE(Doc.cited.clas
s)}
citers
Document
MODE
class
MODE
words
cited
Document
Document
Document
class
class
class
words
words
words
Document
Document
Document
Document
class
class
class
class
words
words
words
words
Oregon State University – CS539 PRMs
Problem: The Citation Structure is
Fixed
The existence (or non-existence) of a link
cannot serve as evidence
Individually-linked papers only influence
the class through the MODE.
Oregon State University – CS539 PRMs
Possible Solution: Link Uncertainty
Model the existence of links as random
variables
Create a Link instance for each pair of
possibly-linked objects
Oregon State University – CS539 PRMs
Unrolled Network
Document
class
Cites
Cites
words
Exists
Document
class
Exists
Document
Cites
class
Exists
words
words
Oregon State University – CS539 PRMs
Getoor’s Diagram
Entity classes (Paper)
Relation classes (Cites)
Technically, every instance has an Exists
variable which is true for all Entity instances.
Oregon State University – CS539 PRMs
Semantics
P is the basic CPT
P* will be the equivalent unrolled CPT
Require that an object does not exist if
any of the objects it points to do not exist
Oregon State University – CS539 PRMs
WebKB Network
Oregon State University – CS539 PRMs
Experimental Results
Cora and WebKB
Oregon State University – CS539 PRMs
WebKB with various features
Oregon State University – CS539 PRMs
A Second Approach:
Reference Uncertainty
Treat reference attributes as random
variables
Each reference attribute takes as value an
object of the indicated class
Citation
Citing: reference attribute, value is a Paper
Cited: reference attribute, value is a Paper
Oregon State University – CS539 PRMs
Problems
How many citation objects exist?
Consequently, how many reference
random variables exist?
How do we represent P(Citation.cites |
…)? Citation.cites could take on
thousands of possible values.
Huge conditional probability table
Costly inference at run time
Oregon State University – CS539 PRMs
Solutions
Problem 1: How many citations?
Fix the number of Citation objects
This gives the “object skeleton”
Oregon State University – CS539 PRMs
Problem 2: Too many potential
values for a reference attribute
Attach to each reference attribute a set of
partition attributes
The reference attribute chooses a partition
A Paper is then chosen uniformly at random from
the partition
Theory
Citation
Citing
Cited
Paper
Paper
Paper
Learning
Graphics
Paper
Paper
Paper
Paper
Paper
Paper
Oregon State University – CS539 PRMs
Representing Constraints Between
Citing and Cited Papers
Parents(Cites.Cited) = {Cites.Citing.Topic}
Oregon State University – CS539 PRMs
Details
Each reference attribute has a selector
attribute S that chooses the partition.
Citation
Sciting
Citing
Scited
Cited
Theory
Paper
Paper
Paper
Learning
Graphics
Paper
Paper
Paper
Paper
Paper
Paper
Oregon State University – CS539 PRMs
Class-level Dependency Graph
Five types of edges
Type I: edges within a single object
Type II: edges between objects
Type III: edges from every reference attribute along
any reference paths
Type IV: edges from every partition attribute to the
selector attributes that use those partition attributes
to choose a partition
Type V: edge from selector attributes to their
corresponding reference attributes
Oregon State University – CS539 PRMs
Movie Theater Example
Type I: Genre Popularity
Type II: Shows.Movie.Genre Shows.Profit
Shows.Theater.Type SMovie
Type III: Move Profit; Theater Smovie
Type IV: Genre SMovie
Type V: STheater Theater; SMovie Movie
Oregon State University – CS539 PRMs
Unrolled Graph?
The Unrolled Graph can have a huge
number of edges
Is learning and inference really feasible?
Oregon State University – CS539 PRMs
Homework Exercise
Construct the dependency graph for the
citation example
Construct an unrolled network for a
reference uncertainty example