Transcript Slide 1

Alexander Gelbukh
Moscow, Russia
1
Mexico
2
Computing Research Center (CIC),
Mexico
3
Chung-Ang University, Korea
Electronic Commerce and
Internet Application Lab
4
Natural Language
Processing
Alexander Gelbukh
www.Gelbukh.com
What language is
Texto
Voice,
Lingu
OCR
istic
modul
e
Sentido
LanLinguist
guage
ic
module
Expert
System
experto
This is an example of the output text of
the system. This is an example of the
output text of the system. This is an
example of the output text of the
system. This is an example of the output
text of the system. This is an example of
the output text of the system. This is an
example of the output text of the
system. This is an example of the output
text of the system. This is an example of
the output text of the system. This is an
example of the output text of the
system. This is an example of the output
text of the system. This is an example of
the output text of the system. This is an
example of the output text of the
system. This is an example of the output
text of the system. This is an example of
the output text of the system. This is an
example of the output text of the
system. This is an example of the output
text of the system. This is an example of
the output text of the system. This is an
example of the output text of the
system. This is the output text of the
6
Better communication with computers
0101011
1010100
0110101
0111o10
1001011
VS.
Persons are more productive when
speaking their own language
7
Accessibility of computers for all
vs.
It’s easier to teach one computer how to
speak than teach generations of people how
to use computers
8
Better knowledge management
vs.
Computers are better than people at
managing information
9
Solution:
Language understanding by compu
ters
10
Applications







Information retrieval (Internet search. Google)
Question Answering (Internet)
Information extraction (Fill a DB from newspapers)
Automatic translation
OCR, speech recognition
Natural Language Interfaces (robots, computers)
Interaction of agents
 Thinking computers?
 Think = speak
11
Source of language complexity: 1-D
Language
Meaning
Brain 2
Brain 1
Meaning
........Text
Text.......
Text (speech)
This is a text that repres ents the
meaning s hown in the right part of the
picture. This is a text that repres ents the
meaning s hown in the right part of the
picture. This is a text that repres ents the
meaning s hown in the right part of the
picture. This is a text that repres ents the
meaning s hown in the right part of the
picture. This is a text that repres ents the
meaning s hown in the right part of the
picture. This is a text that repres ents the
meaning s hown in the right part of the
picture. This is a text that repres ents the
meaning s hown in the right part of the
picture. This is a text that repres ents the
meaning s hown in the right part of the
picture. This is a text that repres ents the
meaning s hown in the right part of the
picture. This is a text that repres ents the
meaning s hown in the right part of the
picture. This is a text that repres ents the
meaning s hown in the right part of the
12
Source of language complexity: 1-D
Knowledge
Knowledge
Text
Language
Language
This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents
the meaning shown in the right part of
the picture. This is a text that
represents the meaning shown in the
right part of the picture. This is a text
that represents the meaning shown in
the right part of the picture. This is a
text that represents the meaning
shown in the right part of the picture.
This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents
the meaning shown in the right part of
the picture. This is a text that
represents the meaning shown in the
right part of the picture. This is a text
that represents the meaning shown in
the right part of the picture. This is a
text that represents the meaning
shown in the right part of the picture.
13
Linguistic processor
translates between representations
Meanings
Texts
Linguistic
Linguistic
module
module
Applied
system
This is an example of the output text of
the system. This is an example of the
output text of the system. This is an
example of the output text of the
system. This is an example of the output
text of the system. This is an example of
the output text of the system. This is an
example of the output text of the
system. This is an example of the output
text of the system. This is an example of
the output text of the system. This is an
example of the output text of the
system. This is an example of the output
text of the system. This is an example of
the output text of the system. This is an
example of the output text of the
system. This is an example of the output
text of the system. This is an example of
the output text of the system. This is an
example of the output text of the
system. This is an example of the output
text of the system. This is an example of
the output text of the system. This is an
14
General scheme of text processing
(Sem a n t ic)
r epr esen t a t ion
In pu t
Lin gu ist ic
pr ocessor
Applied syst em
(e.g., E xper t
syst em )
Ou t pu t
 Linguistic processor uses linguistic knowledge
 Applied system uses other types of knowledge
(e.g., Artificial Intelligence)
15
Language levels





Morphological: words
Syntactic: sentences
Semantic: meaning
Pragmatic: intention
...?
16
Fine structure of linguistic processor
Text
This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents the
meaning shown in the right part of the
picture. This is a text that represents the
meaning shown in the right part of the
picture.
Meaning
Language
Morphological
transformer
Syntactic
transformer
Semantic
transformer
Surface
representation
Semanitc
representation
Morphological
representation
Syntactic
representation
17
Example of text
“Science is important
for our country.
The Government pays it
much attention.”
18
Textual representation
Text is a sequence of letters.
S
i
f
u
G
p
c
o
c
m
o
n
o
a
h
n
i
p
r
t
v
y
e n c
o r t
o u
r y .
e r n
s
i
a t t
e
a
r
T
m
t
e
n
h
e
n
i
t
c
e
n
m
t
s
o
t
u
i
.
19
Morfological analysis
Li n g u i s ti c p ro c e s s o r
Mor ph ologica l
Morphological
a n a lyzer
analysis
Syn t a ct ic
pa r ser
Sem a n t ic
a n a lyzer
20
Morphological representation
A sequence of words.
The
THE
article
definite, plural/singular
SCIENCE
noun
singular
BE
verb
present, 3rd person, sing.
IMPORTANT
adjective
for
FOR
preposition
our
WE
pronoun
possessive
COUNTRY
noun
singular
science
is
important
country
21
Syntactic parsing
Li n g u i s ti c p ro c e s s o r
Mor ph ologica l
a n a lyzer
Syn t a ct ic
Syntactic
pa r ser
parsing
Sem a n t ic
a n a lyzer
22
Syntactic representation
A sequence of syntactic trees.
P AY
BE
S CIEN CE IMP ORTAN T
COU N TRY
of
GOVERNMENT ATTENTION
IT
MUCH
WE
23
Semantic analysis
Li n g u i s ti c p ro c e s s o r
Mor ph ologica l
a n a lyzer
Syn t a ct ic
pa r ser
Sem a n t ic
Semantic
a n a lyzer
analysis
24
Semantic representation
Complex structure of whole text
Mon ey
m a in for m
Or ga n iza t ion
is a
gives
ATTEN TION
F u n din g
is a
n eeds
Sect or
gives
im plies
for
IMP ORTAN T
is
is a
S CIEN CE
GOVERN MEN T
for
of
COU N TRY
of
WE
25
The meaning
“Science
“La
ciencia
is important
es importante
for our
para
country.
nuestro
Organizacion
Dinero
SER
La LA
articulo
determinado,
femenino
país.Government
The
pays
it
much
es un
PONER
da
Forma
ciencia
CIENCIA
sustantivo feminino, singular
attention.”
principal
da
El Gobierno
leATENCION
pone
atención.”
es SER
verbo mucha
presente,
3Є persona, sing.
GOBIERNO
implica
es un CIENCIA IMPORTANTE
importante IMPORTANTE
adjetivo singular
GOBIERNO
ATENCION
de
para
nececita
para PARA
preposicion
--IMPORTANTE
PAIS
esposesivo para
nuestro
NOSOTROS
pronombre
PAIS
es un
Sector
de
pais PAIS
sustantivo
masculino,
singular
LE
MUCHA
CIENCIA
de
NOSOTROS
Presupuesto
There are good conditions
for development of science
in our country.
NOSOTROS
26
Example: Translation
Language A
Language B
Text
level
Morphological
level
Syntactic
level
Semantic
level
The Meaning,
yet unreachable
?
27
Problems
Ambiguity of text
 I see a cat with a telescope
Knowledge needed
 Linguistic
 About the world and life
Good news
Learning from texts
 Plenty of texts in Internet!
 Good statistical methods
28
29
30
31
32
33
Current state
Working...
34
Conclusiones
¿Is it necessary?
¿Is it simple?
¿Is it possible?
¿Has been done something?
¿Has been done all?
¿Where are people working on it?
35
Thank you!
www.Gelbukh.com
36