CCMs - University of Illinois at Urbana–Champaign

Download Report

Transcript CCMs - University of Illinois at Urbana–Champaign

Introductory Notes
about
Constrained Conditional Models
and
ILP for NLP
Dan Roth
Department of Computer Science
University of Illinois at Urbana-Champaign
Notes
CS546-11
Page 1
Goal: Learning and Inference
 Global decisions in which several local decisions play a role but
there are mutual dependencies on their outcome.
 E.g. Structured Output Problems – multiple dependent output variables
 (Learned) models/classifiers for different sub-problems
 In some cases, not all local models can be learned simultaneously
 Engineering issues:
 We don’t have data annotated with all aspects of problem
 Distributed development
 In these cases, constraints may appear only at evaluation time
 In other cases, we may prefer to learn independent models
 Incorporate models’ information, along with prior
knowledge/constraints, in making coherent decisions
 decisions that respect the local models as well as domain & context
specific knowledge/constraints.
Page 2
ILP & Constraints Conditional Models (CCMs)
 Making global decisions in which several local interdependent decisions play a
role.
 Informally:
 Everything that has to do with constraints (and learning models)
Issues to attend to:
 Formally:
 While we formulate the problem as an ILP problem, Inference
 We typically
make decisions
can be done
multiplebased
ways on models such as:
T Á(x,y) SAT; ILP
Search; sampling;
dynamicw
programming;
Argmax
y
 CCMs (specifically, ILP formulations) make decisions based on models such as:



The focus is on joint global inference
T Á(x,y) + 
Argmax
w
Learning
may or may
not
be joint.
y
c2 C
½c d(y, 1C)
Decomposing
models method,
is often beneficial
 We donot
define the learning
but we’ll discuss it and make suggestions
 CCMs make predictions in the presence of /guided by constraints
0: 3
The Space of Problems
Examples
How to solve? [Inference]
How to train? [Learning]
Decouple?
An Integer Linear Program
Exact (ILP packages) or
approximate solutions
Training is learning the
objective function
[A lot of work on this]
Joint Learning
vs.
Joint Inference
Difficulty of Annotating Data
Indirect Supervision
Semi-supervised Learning
Constraint Driven
Learning
Constraint Driven
Learning
New Applications
Page 4
Introductory Examples:
Constrained Conditional Models (aka ILP for NLP)
CCMs can be viewed as a general interface to easily combine
domain knowledge with data driven statistical models
Formulate NLP Problems as ILP problems
(inference may be done otherwise)
1. Sequence tagging
(HMM/CRF + Global constraints)
2. Sentence Compression (Language Model + Global Constraints)
3. SRL
(Independent classifiers + Global Constraints)
Sentence Prediction
Sequential
Compression/Summarization:
HMM/CRF based:
Argmax
 ¸ij xij
Language Model
based:
Argmax  ¸ijk xijk
Linguistics Constraints
Cannot have both A states and B states
inaan
If
modifier
output chosen,
sequence.
include its head
If verb is chosen, include its arguments
Page 5
Next Few Meetings: (I)

How to pose the inference problem


Introduction to ILP
Posing NLP Problems as ILP problems




Detailed examples



1. Sequence tagging
(HMM/CRF + global constraints)
2. SRL
(Independent classifiers + Global Constraints)
3. Sentence Compression (Language Model + Global Constraints)
1. Co-reference
2. A bunch more ...
Inference Algorithms (ILP & Search)


Compiling knowledge to linear inequalities
Inference algorithms
0: 6
Next Few Meetings (Part II)

Training Issues

Learning models



Learning constraints’ penalties





Independently of learning the model
Jointly, along with learning the model
Dealing with lack of supervision


Independently of constraints (L+I); Jointly with constraints (IBT)
Decomposed to simpler models
Constraints Driven Semi-Supervised learning (CODL)
Indirect Supervision
Learning Constrained Latent Representations
Markov Logic Networks

Relations and Differences
0: 7
Summary: Constrained Conditional Models
Conditional Markov Random Field
y1
y4
y2
y5
y6
Constraints Network
y3
y7
y1
y8
y4

Linear objective functions
Often Á(x,y) will be local functions,
or Á(x,y) = Á(x)


y5
y6
y3
y7
y8
- i ½i dC(x,y)
y* = argmaxy  wi Á(x; y)

y2



Expressive constraints over output
variables
Soft, weighted constraints
Specified declaratively as FOL formulae
Clearly, there is a joint probability distribution that represents
this mixed model. Key difference from MLNs, which provide a concise
We would like to:
definition of a model, but the whole joint one.


Learn a simple model or several simple models
Make decisions with respect to a complex model : Regularize in the
posterior rather then in the prior.
Page 8