CCMs - University of Illinois at Urbana–Champaign
Download
Report
Transcript CCMs - University of Illinois at Urbana–Champaign
Introductory Notes
about
Constrained Conditional Models
and
ILP for NLP
Dan Roth
Department of Computer Science
University of Illinois at Urbana-Champaign
Notes
CS546-11
Page 1
Goal: Learning and Inference
Global decisions in which several local decisions play a role but
there are mutual dependencies on their outcome.
E.g. Structured Output Problems – multiple dependent output variables
(Learned) models/classifiers for different sub-problems
In some cases, not all local models can be learned simultaneously
Engineering issues:
We don’t have data annotated with all aspects of problem
Distributed development
In these cases, constraints may appear only at evaluation time
In other cases, we may prefer to learn independent models
Incorporate models’ information, along with prior
knowledge/constraints, in making coherent decisions
decisions that respect the local models as well as domain & context
specific knowledge/constraints.
Page 2
ILP & Constraints Conditional Models (CCMs)
Making global decisions in which several local interdependent decisions play a
role.
Informally:
Everything that has to do with constraints (and learning models)
Issues to attend to:
Formally:
While we formulate the problem as an ILP problem, Inference
We typically
make decisions
can be done
multiplebased
ways on models such as:
T Á(x,y) SAT; ILP
Search; sampling;
dynamicw
programming;
Argmax
y
CCMs (specifically, ILP formulations) make decisions based on models such as:
The focus is on joint global inference
T Á(x,y) +
Argmax
w
Learning
may or may
not
be joint.
y
c2 C
½c d(y, 1C)
Decomposing
models method,
is often beneficial
We donot
define the learning
but we’ll discuss it and make suggestions
CCMs make predictions in the presence of /guided by constraints
0: 3
The Space of Problems
Examples
How to solve? [Inference]
How to train? [Learning]
Decouple?
An Integer Linear Program
Exact (ILP packages) or
approximate solutions
Training is learning the
objective function
[A lot of work on this]
Joint Learning
vs.
Joint Inference
Difficulty of Annotating Data
Indirect Supervision
Semi-supervised Learning
Constraint Driven
Learning
Constraint Driven
Learning
New Applications
Page 4
Introductory Examples:
Constrained Conditional Models (aka ILP for NLP)
CCMs can be viewed as a general interface to easily combine
domain knowledge with data driven statistical models
Formulate NLP Problems as ILP problems
(inference may be done otherwise)
1. Sequence tagging
(HMM/CRF + Global constraints)
2. Sentence Compression (Language Model + Global Constraints)
3. SRL
(Independent classifiers + Global Constraints)
Sentence Prediction
Sequential
Compression/Summarization:
HMM/CRF based:
Argmax
¸ij xij
Language Model
based:
Argmax ¸ijk xijk
Linguistics Constraints
Cannot have both A states and B states
inaan
If
modifier
output chosen,
sequence.
include its head
If verb is chosen, include its arguments
Page 5
Next Few Meetings: (I)
How to pose the inference problem
Introduction to ILP
Posing NLP Problems as ILP problems
Detailed examples
1. Sequence tagging
(HMM/CRF + global constraints)
2. SRL
(Independent classifiers + Global Constraints)
3. Sentence Compression (Language Model + Global Constraints)
1. Co-reference
2. A bunch more ...
Inference Algorithms (ILP & Search)
Compiling knowledge to linear inequalities
Inference algorithms
0: 6
Next Few Meetings (Part II)
Training Issues
Learning models
Learning constraints’ penalties
Independently of learning the model
Jointly, along with learning the model
Dealing with lack of supervision
Independently of constraints (L+I); Jointly with constraints (IBT)
Decomposed to simpler models
Constraints Driven Semi-Supervised learning (CODL)
Indirect Supervision
Learning Constrained Latent Representations
Markov Logic Networks
Relations and Differences
0: 7
Summary: Constrained Conditional Models
Conditional Markov Random Field
y1
y4
y2
y5
y6
Constraints Network
y3
y7
y1
y8
y4
Linear objective functions
Often Á(x,y) will be local functions,
or Á(x,y) = Á(x)
y5
y6
y3
y7
y8
- i ½i dC(x,y)
y* = argmaxy wi Á(x; y)
y2
Expressive constraints over output
variables
Soft, weighted constraints
Specified declaratively as FOL formulae
Clearly, there is a joint probability distribution that represents
this mixed model. Key difference from MLNs, which provide a concise
We would like to:
definition of a model, but the whole joint one.
Learn a simple model or several simple models
Make decisions with respect to a complex model : Regularize in the
posterior rather then in the prior.
Page 8