Semantic association in humans and machines

Download Report

Transcript Semantic association in humans and machines

Topic Models for Semantic Memory
and Information Extraction
Mark Steyvers
Department of Cognitive Sciences
University of California, Irvine
Joint work with:
Tom Griffiths, UC Berkeley
Padhraic Smyth, UC Irvine
Dave Newman, UC Irvine
Chaitanya Chemudugunta, UC Irvine
Human Memory
Information
Retrieval
Finding relevant
memories
Finding relevant
documents
Extracting meaning
Extracting content
Semantic Representations
• What are suitable representations?
• How can it be learned from experience?
Two approaches to semantic representation
Semantic networks
Semantic Spaces
e.g. Collins & Quillian 1969
e.g. Landauer & Dumais, 1997
BILL
BILL
LOAN
CASH
CASH
LOAN
MONEY
MONEY
BANK
RIVER
BANK
STREAM
• Structured representations
• Encoding of propositions
• Hand coded examples
• Not inferred from data
RIVER
• Relatively unstructured representations
• Can be learned from data
Overview
I
Probabilistic Topic Models
II
Associative Memory
III Episodic Memory
IV Applications
V
Conclusion
Probabilistic Topic Models
• Extract topics from large text collections
 unsupervised
 Bayesian statistical techniques
• Topics provide quick summary of content / gist
What are topics?
• A topic represents a probability distribution over words
– Related words get high probability in same topic
• Example topics extracted from psychology grant applications:
brain
fmri
imaging
functional
mri
subjects
magnetic
resonance
neuroimaging
structural
schizophrenia
patients
deficits
schizophrenic
psychosis
subjects
psychotic
dysfunction
abnormalities
clinical
Probability distribution
over words. Most likely
words listed at the top
memory
working
memories
tasks
retrieval
encoding
cognitive
processing
recognition
performance
disease
ad
alzheimer
diabetes
cardiovascular
insulin
vascular
blood
clinical
individuals
Model Input
• Matrix of counts: number of times words occur in documents
words
documents
Doc1
Doc2
Doc3 …
PIZZA
34
0
3
PASTA
12
0
2
ITALIAN
0
19
6
FOOD
0
16
1
…
…
…
…
• Note:
– word order is lost: “bag of words” approach
– Some function words are deleted: “the”, “a”, “in”
Model Assumptions
• Each topic is a probability distribution over words
• Each document is modeled as a mixture of topics
Generative Model for Documents
1.
for each document, choose
a mixture of topics
2.
sample a topic [1..T]
from the mixture
3.
TOPICS MIXTURE
TOPIC
...
TOPIC
WORD
...
WORD
sample a word from the topic
(“LDA” by Blei, Ng, & Jordan, 2002; “pLSI” by Hoffman, 1999; Griffith & Steyvers, 2002 & 2004)
Document = mixture of topics
brain
fmri
imaging
functional
mri
subjects
magnetic
resonance
neuroimaging
structural
schizophrenia
patients
deficits
schizophrenic
psychosis
subjects
psychotic
dysfunction
abnormalities
clinical
20%
80%
memory
working
memories
tasks
retrieval
encoding
cognitive
processing
recognition
performance
disease
ad
alzheimer
diabetes
cardiovascular
insulin
vascular
blood
clinical
individuals
Document
Document
---------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
100%
The Generative Model
• Conditional probability of word in document:
T
P w | d   P w | z  j P z  j | d 
j 1
word probability
in topic j
probability of topic j
in document
Inverting the generative model
• Generative model gives procedure to obtain corpus from topics
and mixing proportions
• Inverting the model involves extracting topics and mixing
proportions per document from corpus
Inverting the generative model
• Estimate topic assignments
– Each occurrence of a word is assigned to one topic [ 1..T ]
• Large state space
– With T=300 topics and 6,000,000 words, the size of the
discrete state space is (300)6,000,000
• Need efficient sampling techniques
– Markov Chain Monte Carlo (MCMC) with Gibbs sampling
Summary
INPUT:
word-document counts
(word order is irrelevant)
OUTPUT:
topic assignments to each word
P( zi )
likely words in each topic
P( w | z )
likely topics in each document (“gist”) P( z | d )
A selection from 500 topics [P(w|z = j)]
THEORY
SCIENTISTS
EXPERIMENT
OBSERVATIONS
SCIENTIFIC
EXPERIMENTS
HYPOTHESIS
EXPLAIN
SCIENTIST
OBSERVED
EXPLANATION
BASED
OBSERVATION
IDEA
EVIDENCE
THEORIES
BELIEVED
DISCOVERED
OBSERVE
FACTS
BRAIN
CURRENT
ART
STUDENTS
SPACE
NERVE
ELECTRICITY
PAINT
TEACHER
EARTH
SENSE
ELECTRIC
ARTIST
STUDENT
MOON
SENSES
CIRCUIT
PAINTING
TEACHERS
PLANET
ARE
IS
PAINTED
TEACHING
ROCKET
NERVOUS ELECTRICAL
ARTISTS
CLASS
MARS
NERVES
VOLTAGE
MUSEUM
CLASSROOM
ORBIT
BODY
FLOW
WORK
SCHOOL
ASTRONAUTS
SMELL
BATTERY
PAINTINGS
LEARNING
FIRST
TASTE
WIRE
STYLE
PUPILS
SPACECRAFT
TOUCH
WIRES
PICTURES
CONTENT
JUPITER
SWITCH
WORKS
INSTRUCTION MESSAGES
SATELLITE
IMPULSES CONNECTED
OWN
TAUGHT
SATELLITES
CORD
ELECTRONS
GROUP
ATMOSPHERE SCULPTURE
ORGANS
RESISTANCE
PAINTER
GRADE
SPACESHIP
SPINAL
POWER
ARTS
SHOULD
SURFACE
FIBERS
CONDUCTORS
BEAUTIFUL
GRADES
SCIENTISTS
SENSORY
CIRCUITS
DESIGNS
CLASSES
ASTRONAUT
PAIN
TUBE
PORTRAIT
PUPIL
SATURN
IS
NEGATIVE
PAINTERS
GIVEN
MILES
Words can have high probability in multiple topics
FIELD
MAGNETIC
MAGNET
WIRE
NEEDLE
CURRENT
COIL
POLES
IRON
COMPASS
LINES
CORE
ELECTRIC
DIRECTION
FORCE
MAGNETS
BE
MAGNETISM
POLE
INDUCED
JOB
SCIENCE
BALL
WORK
STUDY
GAME
JOBS
SCIENTISTS
TEAM
CAREER
SCIENTIFIC
FOOTBALL
KNOWLEDGE BASEBALL EXPERIENCE
WORK
PLAYERS EMPLOYMENT
RESEARCH
PLAY OPPORTUNITIES
WORKING
CHEMISTRY
FIELD
TRAINING
TECHNOLOGY
PLAYER
SKILLS
MANY
BASKETBALL
CAREERS
MATHEMATICS
COACH
POSITIONS
BIOLOGY
PLAYED
FIND
FIELD
PLAYING
POSITION
PHYSICS
HIT
FIELD
LABORATORY
TENNIS
STUDIES
TEAMS OCCUPATIONS
REQUIRE
WORLD
GAMES
SCIENTIST
SPORTS OPPORTUNITY
EARN
STUDYING
BAT
ABLE
SCIENCES
TERRY
Disambiguation
• Give example for field
Topics versus LSA
• Latent Semantic Analysis (LSI/LSA)
– Projects words into a K-dimensional hidden space
– Less interpretable
– Not generalizable
– Not as accurate
BILL
CASH
LOAN
MONEY
BANK
RIVER
(high dimensional space)
Modeling Word Association
Word Association
(norms from Nelson et al. 1998)
CUE: PLANET
Word Association
(norms from Nelson et al. 1998)
CUE: PLANET
associate number
people
1
2
3
4
5
6
7
8
EARTH
STARS
SPACE
SUN
MARS
UNIVERSE
SATURN
GALAXY
(vocabulary = 5000+ words)
Topic model for Word Association
• Association as a problem of prediction:
– Given that a single word is observed, predict what other
words might occur in that context
• Under a single topic assumption:
P  w2 |Pw(1w | w
) P w2P| (zw P|zz)P| w(1z| w )
2
Response
1
z

z
Cue
2
1
Word Association
(norms from Nelson et al. 1998)
CUE: PLANET
associate number
people
model
1
2
3
4
5
6
7
8
EARTH
STARS
SPACE
SUN
MARS
UNIVERSE
SATURN
GALAXY
STARS
STAR
SUN
EARTH
SPACE
SKY
PLANET
UNIVERSE
First associate “EARTH” is in the set of 8 associates (from the model)
P( set contains first associate )
1
LSA
TOPICS
P(set contains first associate)
0.9
0.8
0.7
0.6
0.5
0.4
TOPICS
0.3
LSA
0.2
0.1
0 0
10
10
1
10
Set size
2
10
3
Why would LSA perform worse?
• Cosine similarity measure imposes unnecessary
constraints on representations, e.g.
– Symmetry
– Triangle inequality
Violation of triangle inequality
A
AC  AB + BC
B
C
Can find associations that violate this:
SOCCER
FIELD
MAGNETIC
No Triangle Inequality with Topics
TOPIC 1
TOPIC 2
SOCCER
FIELD
MAGNETIC
Topic structure easily explains violations of triangle inequality
Modeling Episodic Memory
Semantic Isolation Effect
Study this list:
PEAS, CARROTS, BEANS, SPINACH,
LETTUCE, HAMMER, TOMATOES,
CORN, CABBAGE, SQUASH
HAMMER,
PEAS,
CARROTS,
...
Semantic Isolation Effect
• Verbal explanations:
– Attention, surprise, distinctiveness
• Our approach: memory system is trading off two encoding
resources
– Storing specific words (e.g. “hammer”)
– Storing general theme of list (e.g. “vegetables”)
Computational Problem
• How to tradeoff specificity and generality?
– Remembering detail and gist
 Special word topic model
Special word topic (SW) model
• Each word can be generated via one route
– Topics
– Special words distribution (unique to a document)
• Conditional prob. of a word under a document:
P  w | d   P( x  0 | d ) Ptopics ( w | d ) 
P( x  1 | d ) Pspecial ( w | d )
RETRIEVAL
ENCODING
Topic Probability
VEGETABLES
FURNITURE
TOOLS
0.0
0.1
0.2
Retrieval Probability
Switch Probability
Study words
PEAS
CARROTS
BEANS
SPINACH
LETTUCE
HAMMER
TOMATOES
CORN
CABBAGE
SQUASH
Special
Verbatim
Topic
0.0
0.5
1.0
Special Word Probability
HAMMER
PEAS
SPINACH
CABBAGE
CARROTS
LETTUCE
SQUASH
BEANS
TOMATOES
CORN
0.00
HAMMER
BEANS
CORN
PEAS
SPINACH
CABBAGE
LETTUCE
CARROTS
SQUASH
TOMATOES
0.00
0.05
0.10
0.15
0.01
0.02
Hunt & Lamb (2001 exp. 1)
CONTROL LIST
PEAS
SAW
CARROTS
SCREW
BEANS
CHISEL
SPINACH
DRILL
LETTUCE
SANDPAPER
HAMMER
HAMMER
TOMATOES
NAILS
CORN
BENCH
CABBAGE
RULER
SQUASH
ANVIL
DATA
1.0
Target
Background
0.8
Prob. of Recall
OUTLIER LIST
0.6
0.4
0.2
0.0
outlier list pure list
Model Predictions
DATA
PREDICTED
1.0
0.05
Target
Background
0.04
Need Probability
Prob. of Recall
0.8
0.6
0.4
Target
Background
0.03
0.02
0.01
0.2
0.00
0.0
outlier list pure list
outlier list pure list
Robinson & Roediger (1997)
False memory effects
DATA
1.0
Number of Associates
MAD
FEAR
HATE
SMOOTH
NAVY
HEAT
SALAD
TUNE
COURTS
CANDY
PALACE
PLUSH
TOOTH
BLIND
WINTER
MAD
FEAR
HATE
RAGE
TEMPER
FURY
SALAD
TUNE
COURTS
CANDY
PALACE
PLUSH
TOOTH
BLIND
WINTER
9
MAD
FEAR
HATE
RAGE
TEMPER
FURY
WRATH
HAPPY
FIGHT
CANDY
PALACE
PLUSH
TOOTH
BLIND
WINTER
Prob. of Recall
6
0.8
0.6
0.4
0.2
0.0
3
6
9
12
15
PREDICTED
Number of Associates Studied
Studied associates
Nonstudied (lure)
0.03
Prob. of Retrieval
3
Studied items
Nonstudied (lure)
0.02
0.01
0.00
(lure = ANGER)
3
6
9
12
15
Number of Associates Studied
Relation to Dual Process Models
• Gist/Verbatim distinction (e.g. Reyna, Brainerd, 1995)
– Maps onto topics and special words
• Our approach specifies both encoding and retrieval
representations and processes
– Routes are not independent
– Model explains performance for actual word lists
Applications I
Topics provide quick summary of content
• What is in this corpus?
• What is in this document?
• What are the topical trends over time?
• Who writes on this topic?
Analyzing the New York Times
330,000 articles
2000-2002
Extracted Named Entities
Three investigations began Thursday into the
securities and exchange_commission's choice
of william_webster to head a new board
overseeing the accounting profession. house and
senate_democrats called for the resignations of
both judge_webster and harvey_pitt, the
commission's chairman.
The white_house
expressed support for judge_webster as well as
for harvey_pitt, who was harshly criticized
Thursday for failing to inform other
commissioners before they approved the choice
of judge_webster that he had led the audit
committee of a company facing fraud
accusations. “The president still has confidence
in harvey_pitt,” said dan_bartlett, bush's
communications director …
• Used standard
algorithms to extract
named entities:
- People
- Places
- Organizations
Standard Topic Model with Entities
Basketball
team
0.028
play
0.015
game
0.013
season
0.012
final
0.011
games
0.011
point
0.011
series
0.011
player
0.010
coach
0.009
playoff
0.009
championship
0.007
playing
0.006
win
0.006
LAKERS
0.062
SHAQUILLE-O-NEAL0.028
KOBE-BRYANT
0.028
PHIL-JACKSON
0.019
NBA
0.013
SACRAMENTO
0.007
RICK-FOX
0.007
PORTLAND
0.006
ROBERT-HORRY 0.006
DEREK-FISHER
0.006
Tour de France
tour
rider
riding
bike
team
stage
race
won
bicycle
road
hour
scooter
mountain
place
LANCE-ARMSTRONG
FRANCE
JAN-ULLRICH
LANCE
U-S-POSTAL-SERVICE
MARCO-PANTANI
PARIS
ALPS
PYRENEES
SPAIN
0.039
0.029
0.017
0.016
0.016
0.014
0.013
0.012
0.010
0.009
0.009
0.008
0.008
0.008
0.021
0.011
0.003
0.003
0.002
0.002
0.002
0.002
0.001
0.001
Holidays
holiday
gift
toy
season
doll
tree
present
giving
special
shopping
family
celebration
card
tradition
CHRISTMAS
THANKSGIVING
SANTA-CLAUS
BARBIE
HANUKKAH
MATTEL
GRINCH
HALLMARK
EASTER
HASBRO
Oscars
0.071
0.050
0.023
0.019
0.014
0.011
0.008
0.008
0.007
0.007
0.007
0.007
0.007
0.006
0.058
0.018
0.009
0.004
0.003
0.003
0.003
0.002
0.002
0.002
award
film
actor
nomination
movie
actress
won
director
nominated
supporting
winner
picture
performance
nominees
OSCAR
ACADEMY
HOLLYWOOD
DENZEL-WASHINGTON
JULIA-ROBERT
RUSSELL-CROWE
TOM-HANK
STEVEN-SODERBERGH
ERIN-BROCKOVICH
KEVIN-SPACEY
0.026
0.020
0.020
0.019
0.015
0.011
0.011
0.010
0.010
0.010
0.008
0.008
0.007
0.007
0.035
0.020
0.009
0.006
0.005
0.005
0.005
0.004
0.003
0.003
Computers
computer
technology
system
digital
chip
software
machine
devices
machines
video
Companies
0.069
0.026
0.015
0.014
0.013
0.013
0.011
0.010
0.010
0.009
1.000
Companies
IBM
APPLE
INTEL
MICROSOFT
COMPAQ
SONY
DELL
HP
0.074
0.061
0.059
0.053
0.041
0.029
0.019
0.018
Arts
play
show
stage
theater
director
production
performance
dance
audience
festival
Theater
Music
0.030
0.029
0.022
0.022
0.017
0.017
0.016
0.014
0.014
0.013
0.960
0.040
Theatre
BROADWAY
NEW_YORK
SHAKESPEARE
THEATER
LONDON
GUINNESS
TONY
LINCOLN_CTR
Music
0.119
0.044
0.029
0.022
0.019
0.018
0.016
0.015
BACH
BEETHOVEN
LOUIS_ARMSTRONG
MOZART
CARNEGIE_HALL
LATIN
0.035
0.026
0.019
0.019
0.017
0.017
Topic Trends
Tour-de-France
15
Proportion of words
assigned to topic for that
time slice
10
5
0
Jan00
Quarterly Earnings
Jul00
Jan01
Jul01
Jan02
Jul02
Jan03
Jul00
Jan01
Jul01
Jan02
Jul02
Jan03
Jul02
Jan03
30
20
10
0
Jan00
Anthrax
100
50
0
Jan00
Jul00
Jan01
Jul01
Jan02
Example of Extracted
Entity-Topic Network
FBI_Investigation
AL_HAZMI
Pakistan_Indian_War
MOHAMMED_ATTA
Detainees
ZAWAHIRI
TALIBAN
US_Military
Terrorist_Attacks
AL_QAEDA
Muslim_Militance
HAMAS
BIN_LADEN
ARIEL_SHARON
Mid_East_Conflict
Afghanistan_War
MOHAMMED
KING_ABDULLAH
HAMID_KARZAI
Palestinian_Territories
NORTHERN_ALLIANCE
YASSER_ARAFAT
KING_HUSSEIN
Mid_East_Peace
Religion
EHUD_BARAK
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major credit
agencies said the conglomerate would still be challenged in repaying its
debts, despite raising $4.6 billion Monday in taking its finance group public.
Analysts at XXXX Investors service in XXXX said they were keeping
XXXX and its subsidiaries under review for a possible debt downgrade,
saying the company “will continue to face a significant debt burden,'' with
large slices of debt coming due, over the next 18 months. XXXX said …
Test article
with entities
removed
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major credit
agencies said the conglomerate would still be challenged in repaying its
debts, despite raising $4.6 billion Monday in taking its finance group public.
Analysts at XXXX Investors service in XXXX said they were keeping
XXXX and its subsidiaries under review for a possible debt downgrade,
saying the company “will continue to face a significant debt burden,'' with
large slices of debt coming due, over the next 18 months. XXXX said …
fitch goldman-sachs lehman-brother moody morgan-stanley new-yorkstock-exchange standard-and-poor tyco tyco-international wall-street
worldco
Test article
with entities
removed
Actual
missing
entities
Prediction of Missing Entities in Text
Shares of XXXX slid 8 percent, or $1.10, to $12.65 Tuesday, as major credit
agencies said the conglomerate would still be challenged in repaying its
debts, despite raising $4.6 billion Monday in taking its finance group public.
Analysts at XXXX Investors service in XXXX said they were keeping
XXXX and its subsidiaries under review for a possible debt downgrade,
saying the company “will continue to face a significant debt burden,'' with
large slices of debt coming due, over the next 18 months. XXXX said …
fitch goldman-sachs lehman-brother moody morgan-stanley new-yorkstock-exchange standard-and-poor tyco tyco-international wall-street
worldco
wall-street new-york nasdaq securities-exchange-commission sec merrilllynch new-york-stock-exchange goldman-sachs standard-and-poor
Test article
with entities
removed
Actual
missing
entities
Predicted entities
given observed
words (matches
in blue)
Applications II
Faculty Browser
• System collects PDF files from websites
– UCI
– UCSD
• Applies topic model on text extracted from PDFs
• Display faculty research with topics
• Demo:
http://yarra.calit2.uci.edu/calit2/
one
topic
most prolific
researchers
for this topic
one
researcher
topics this
researcher
works on
other researchers
with similar
topical interests
Conclusions
Human Memory
Information
Retrieval
Finding relevant
memories
Finding relevant
documents
Software
Public-domain MATLAB toolbox for topic modeling on the Web:
http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm
Hidden Markov Topic Model
Hidden Markov Topics Model
• Syntactic dependencies  short range dependencies
• Semantic dependencies  long-range
q
z1
z2
z3
z4
w1
w2
w3
w4
s1
s2
s3
s4
Semantic state: generate
words from topic model
Syntactic states: generate
words from HMM
(Griffiths, Steyvers, Blei, & Tenenbaum, 2004)
Transition between semantic state and syntactic states
OF
0.6
FOR
0.3
BETWEEN 0.1
0.8
z = 1 0.4
HEART
LOVE
SOUL
TEARS
JOY
0.2
0.2
0.2
0.2
0.2
z = 2 0.6
SCIENTIFIC
0.2
KNOWLEDGE 0.2
WORK
0.2
RESEARCH
0.2
MATHEMATICS 0.2
0.7
0.3
0.1
0.2
0.9
THE
0.6
A
0.3
MANY 0.1
Combining topics and syntax
x=2
x=1
z = 1 0.4
HEART
LOVE
SOUL
TEARS
JOY
0.2
0.2
0.2
0.2
0.2
OF
0.6
FOR
0.3
BETWEEN 0.1
0.8
z = 2 0.6
SCIENTIFIC
0.2
KNOWLEDGE 0.2
WORK
0.2
RESEARCH
0.2
MATHEMATICS 0.2
0.7
0.1
0.3
0.2
0.9
THE ………………………………
x=3
THE
0.6
A
0.3
MANY 0.1
Combining topics and syntax
x=2
x=1
z = 1 0.4
HEART
LOVE
SOUL
TEARS
JOY
0.2
0.2
0.2
0.2
0.2
OF
0.6
FOR
0.3
BETWEEN 0.1
0.8
z = 2 0.6
SCIENTIFIC
0.2
KNOWLEDGE 0.2
WORK
0.2
RESEARCH
0.2
MATHEMATICS 0.2
0.7
0.1
0.3
0.2
0.9
THE LOVE……………………
x=3
THE
0.6
A
0.3
MANY 0.1
Combining topics and syntax
x=2
x=1
z = 1 0.4
HEART
LOVE
SOUL
TEARS
JOY
0.2
0.2
0.2
0.2
0.2
OF
0.6
FOR
0.3
BETWEEN 0.1
0.8
z = 2 0.6
SCIENTIFIC
0.2
KNOWLEDGE 0.2
WORK
0.2
RESEARCH
0.2
MATHEMATICS 0.2
0.7
0.1
0.3
0.2
0.9
THE LOVE OF………………
x=3
THE
0.6
A
0.3
MANY 0.1
Combining topics and syntax
x=2
x=1
z = 1 0.4
HEART
LOVE
SOUL
TEARS
JOY
0.2
0.2
0.2
0.2
0.2
OF
0.6
FOR
0.3
BETWEEN 0.1
0.8
z = 2 0.6
SCIENTIFIC
0.2
KNOWLEDGE 0.2
WORK
0.2
RESEARCH
0.2
MATHEMATICS 0.2
0.7
0.1
0.3
0.2
0.9
THE LOVE OF RESEARCH ……
x=3
THE
0.6
A
0.3
MANY 0.1
Semantic topics
MAP
FOOD
NORTH
FOODS
EARTH
BODY
SOUTH
NUTRIENTS
POLE
DIET
MAPS
FAT
EQUATOR
SUGAR
WEST
ENERGY
LINES
MILK
EAST
EATING
AUSTRALIA
FRUITS
GLOBE
VEGETABLES
POLES
WEIGHT
HEMISPHERE
FATS
LATITUDE
NEEDS
CARBOHYDRATES PLACES
LAND
VITAMINS
WORLD
CALORIES
COMPASS
PROTEIN
CONTINENTS
MINERALS
GOLD
CELLS
BEHAVIOR
DOCTOR
BOOK
IRON
CELL
SELF
PATIENT
BOOKS
SILVER
ORGANISMS
INDIVIDUAL
HEALTH
READING
ALGAE
PERSONALITY
HOSPITAL
INFORMATION COPPER
METAL
BACTERIA
RESPONSE
MEDICAL
LIBRARY
METALS
MICROSCOPE
SOCIAL
CARE
REPORT
STEEL
MEMBRANE
EMOTIONAL
PATIENTS
PAGE
CLAY
ORGANISM
LEARNING
NURSE
TITLE
LEAD
FOOD
FEELINGS
DOCTORS
SUBJECT
ADAM
LIVING
PSYCHOLOGISTS
MEDICINE
PAGES
ORE
FUNGI
INDIVIDUALS
NURSING
GUIDE
ALUMINUM PSYCHOLOGICAL
MOLD
TREATMENT
WORDS
MINERAL
EXPERIENCES MATERIALS
NURSES
MATERIAL
MINE
NUCLEUS
ENVIRONMENT
PHYSICIAN
ARTICLE
STONE
CELLED
HUMAN
HOSPITALS
ARTICLES
MINERALS
STRUCTURES
RESPONSES
DR
WORD
POT
MATERIAL
BEHAVIORS
SICK
FACTS
MINING
STRUCTURE
ATTITUDES
ASSISTANT
AUTHOR
MINERS
GREEN
PSYCHOLOGY
EMERGENCY
REFERENCE
TIN
MOLDS
PERSON
PRACTICE
NOTE
PLANTS
PLANT
LEAVES
SEEDS
SOIL
ROOTS
FLOWERS
WATER
FOOD
GREEN
SEED
STEMS
FLOWER
STEM
LEAF
ANIMALS
ROOT
POLLEN
GROWING
GROW
Syntactic classes
SAID
ASKED
THOUGHT
TOLD
SAYS
MEANS
CALLED
CRIED
SHOWS
ANSWERED
TELLS
REPLIED
SHOUTED
EXPLAINED
LAUGHED
MEANT
WROTE
SHOWED
BELIEVED
WHISPERED
THE
HIS
THEIR
YOUR
HER
ITS
MY
OUR
THIS
THESE
A
AN
THAT
NEW
THOSE
EACH
MR
ANY
MRS
ALL
MORE
SUCH
LESS
MUCH
KNOWN
JUST
BETTER
RATHER
GREATER
HIGHER
LARGER
LONGER
FASTER
EXACTLY
SMALLER
SOMETHING
BIGGER
FEWER
LOWER
ALMOST
ON
AT
INTO
FROM
WITH
THROUGH
OVER
AROUND
AGAINST
ACROSS
UPON
TOWARD
UNDER
ALONG
NEAR
BEHIND
OFF
ABOVE
DOWN
BEFORE
GOOD
SMALL
NEW
IMPORTANT
GREAT
LITTLE
LARGE
*
BIG
LONG
HIGH
DIFFERENT
SPECIAL
OLD
STRONG
YOUNG
COMMON
WHITE
SINGLE
CERTAIN
ONE
SOME
MANY
TWO
EACH
ALL
MOST
ANY
THREE
THIS
EVERY
SEVERAL
FOUR
FIVE
BOTH
TEN
SIX
MUCH
TWENTY
EIGHT
HE
YOU
THEY
I
SHE
WE
IT
PEOPLE
EVERYONE
OTHERS
SCIENTISTS
SOMEONE
WHO
NOBODY
ONE
SOMETHING
ANYONE
EVERYBODY
SOME
THEN
BE
MAKE
GET
HAVE
GO
TAKE
DO
FIND
USE
SEE
HELP
KEEP
GIVE
LOOK
COME
WORK
MOVE
LIVE
EAT
BECOME
NIPS Semantics
IMAGE
DATA
IMAGES
GAUSSIAN
OBJECT
MIXTURE
OBJECTS
LIKELIHOOD
FEATURE
POSTERIOR
RECOGNITION
PRIOR
VIEWS
DISTRIBUTION
#
EM
PIXEL
BAYESIAN
VISUAL
PARAMETERS
STATE
POLICY
VALUE
FUNCTION
ACTION
REINFORCEMENT
LEARNING
CLASSES
OPTIMAL
*
MEMBRANE
SYNAPTIC
CELL
*
CURRENT
DENDRITIC
POTENTIAL
NEURON
CONDUCTANCE
CHANNELS
EXPERTS
EXPERT
GATING
HME
ARCHITECTURE
MIXTURE
LEARNING
MIXTURES
FUNCTION
GATE
KERNEL
SUPPORT
VECTOR
SVM
KERNELS
#
SPACE
FUNCTION
MACHINES
SET
NETWORK
NEURAL
NETWORKS
OUPUT
INPUT
TRAINING
INPUTS
WEIGHTS
#
OUTPUTS
NIPS Syntax
IN
WITH
FOR
ON
FROM
AT
USING
INTO
OVER
WITHIN
IS
WAS
HAS
BECOMES
DENOTES
BEING
REMAINS
REPRESENTS
EXISTS
SEEMS
SEE
SHOW
NOTE
CONSIDER
ASSUME
PRESENT
NEED
PROPOSE
DESCRIBE
SUGGEST
USED
TRAINED
OBTAINED
DESCRIBED
GIVEN
FOUND
PRESENTED
DEFINED
GENERATED
SHOWN
MODEL
ALGORITHM
SYSTEM
CASE
PROBLEM
NETWORK
METHOD
APPROACH
PAPER
PROCESS
HOWEVER
ALSO
THEN
THUS
THEREFORE
FIRST
HERE
NOW
HENCE
FINALLY
#
*
I
X
T
N
C
F
P
Random sentence generation
LANGUAGE:
[S] RESEARCHERS GIVE THE SPEECH
[S] THE SOUND FEEL NO LISTENERS
[S] WHICH WAS TO BE MEANING
[S] HER VOCABULARIES STOPPED WORDS
[S] HE EXPRESSLY WANTED THAT BETTER VOWEL
Nested Chinese Restaurant Process
Topic Hierarchies
•
•
In regular topic model, no relations
between topics
topic 1
topic 2
Nested Chinese Restaurant Process
topic 3
– Blei, Griffiths, Jordan, Tenenbaum
(2004)
– Learn hierarchical structure, as
well as topics within structure
topic 4
topic 5
topic 6
topic 7
Example: Psych Review Abstracts
THE
OF
AND
TO
IN
A
IS
A
MODEL
MEMORY
FOR
MODELS
TASK
INFORMATION
RESULTS
ACCOUNT
RESPONSE
SPEECH
STIMULUS
READING
REINFORCEMENT
WORDS
RECOGNITION MOVEMENT
STIMULI
MOTOR
RECALL
VISUAL
CHOICE
WORD
CONDITIONING SEMANTIC
ACTION
SOCIAL
SELF
EXPERIENCE
EMOTION
GOALS
EMOTIONAL
THINKING
SELF
SOCIAL
PSYCHOLOGY
RESEARCH
RISK
STRATEGIES
INTERPERSONAL
PERSONALITY
SAMPLING
GROUP
IQ
INTELLIGENCE
SOCIAL
RATIONAL
INDIVIDUAL
GROUPS
MEMBERS
SEX
EMOTIONS
GENDER
EMOTION
STRESS
WOMEN
HEALTH
HANDEDNESS
MOTION
VISUAL
SURFACE
BINOCULAR
RIVALRY
CONTOUR
DIRECTION
CONTOURS
SURFACES
DRUG
FOOD
BRAIN
AROUSAL
ACTIVATION
AFFECTIVE
HUNGER
EXTINCTION
PAIN
REASONING
IMAGE
CONDITIONIN
ATTITUDE
COLOR
STRESS
CONSISTENCY
MONOCULAR
EMOTIONAL
SITUATIONAL
LIGHTNESS
BEHAVIORAL
INFERENCE
GIBSON
FEAR
JUDGMENT
SUBMOVEMENT STIMULATION
PROBABILITIES ORIENTATION
TOLERANCE
STATISTICAL HOLOGRAPHIC
RESPONSES
Generative Process
THE
OF
AND
TO
IN
A
IS
A
MODEL
MEMORY
FOR
MODELS
TASK
INFORMATION
RESULTS
ACCOUNT
RESPONSE
SPEECH
STIMULUS
READING
REINFORCEMENT
WORDS
RECOGNITION MOVEMENT
STIMULI
MOTOR
RECALL
VISUAL
CHOICE
WORD
CONDITIONING SEMANTIC
ACTION
SOCIAL
SELF
EXPERIENCE
EMOTION
GOALS
EMOTIONAL
THINKING
SELF
SOCIAL
PSYCHOLOGY
RESEARCH
RISK
STRATEGIES
INTERPERSONAL
PERSONALITY
SAMPLING
GROUP
IQ
INTELLIGENCE
SOCIAL
RATIONAL
INDIVIDUAL
GROUPS
MEMBERS
SEX
EMOTIONS
GENDER
EMOTION
STRESS
WOMEN
HEALTH
HANDEDNESS
MOTION
VISUAL
SURFACE
BINOCULAR
RIVALRY
CONTOUR
DIRECTION
CONTOURS
SURFACES
DRUG
FOOD
BRAIN
AROUSAL
ACTIVATION
AFFECTIVE
HUNGER
EXTINCTION
PAIN
REASONING
IMAGE
CONDITIONIN
ATTITUDE
COLOR
STRESS
CONSISTENCY
MONOCULAR
EMOTIONAL
SITUATIONAL
LIGHTNESS
BEHAVIORAL
INFERENCE
GIBSON
FEAR
JUDGMENT
SUBMOVEMENT STIMULATION
PROBABILITIES ORIENTATION
TOLERANCE
STATISTICAL HOLOGRAPHIC
RESPONSES