Transcript Slide 1
Alexander Gelbukh Moscow, Russia 1 Mexico 2 Computing Research Center (CIC), Mexico 3 Chung-Ang University, Korea Electronic Commerce and Internet Application Lab 4 Natural Language Processing Alexander Gelbukh www.Gelbukh.com What language is Texto Voice, Lingu OCR istic modul e Sentido LanLinguist guage ic module Expert System experto This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is the output text of the 6 Better communication with computers 0101011 1010100 0110101 0111o10 1001011 VS. Persons are more productive when speaking their own language 7 Accessibility of computers for all vs. It’s easier to teach one computer how to speak than teach generations of people how to use computers 8 Better knowledge management vs. Computers are better than people at managing information 9 Solution: Language understanding by compu ters 10 Applications Information retrieval (Internet search. Google) Question Answering (Internet) Information extraction (Fill a DB from newspapers) Automatic translation OCR, speech recognition Natural Language Interfaces (robots, computers) Interaction of agents Thinking computers? Think = speak 11 Source of language complexity: 1-D Language Meaning Brain 2 Brain 1 Meaning ........Text Text....... Text (speech) This is a text that repres ents the meaning s hown in the right part of the picture. This is a text that repres ents the meaning s hown in the right part of the picture. This is a text that repres ents the meaning s hown in the right part of the picture. This is a text that repres ents the meaning s hown in the right part of the picture. This is a text that repres ents the meaning s hown in the right part of the picture. This is a text that repres ents the meaning s hown in the right part of the picture. This is a text that repres ents the meaning s hown in the right part of the picture. This is a text that repres ents the meaning s hown in the right part of the picture. This is a text that repres ents the meaning s hown in the right part of the picture. This is a text that repres ents the meaning s hown in the right part of the picture. This is a text that repres ents the meaning s hown in the right part of the 12 Source of language complexity: 1-D Knowledge Knowledge Text Language Language This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. 13 Linguistic processor translates between representations Meanings Texts Linguistic Linguistic module module Applied system This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an example of the output text of the system. This is an 14 General scheme of text processing (Sem a n t ic) r epr esen t a t ion In pu t Lin gu ist ic pr ocessor Applied syst em (e.g., E xper t syst em ) Ou t pu t Linguistic processor uses linguistic knowledge Applied system uses other types of knowledge (e.g., Artificial Intelligence) 15 Language levels Morphological: words Syntactic: sentences Semantic: meaning Pragmatic: intention ...? 16 Fine structure of linguistic processor Text This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. Meaning Language Morphological transformer Syntactic transformer Semantic transformer Surface representation Semanitc representation Morphological representation Syntactic representation 17 Example of text “Science is important for our country. The Government pays it much attention.” 18 Textual representation Text is a sequence of letters. S i f u G p c o c m o n o a h n i p r t v y e n c o r t o u r y . e r n s i a t t e a r T m t e n h e n i t c e n m t s o t u i . 19 Morfological analysis Li n g u i s ti c p ro c e s s o r Mor ph ologica l Morphological a n a lyzer analysis Syn t a ct ic pa r ser Sem a n t ic a n a lyzer 20 Morphological representation A sequence of words. The THE article definite, plural/singular SCIENCE noun singular BE verb present, 3rd person, sing. IMPORTANT adjective for FOR preposition our WE pronoun possessive COUNTRY noun singular science is important country 21 Syntactic parsing Li n g u i s ti c p ro c e s s o r Mor ph ologica l a n a lyzer Syn t a ct ic Syntactic pa r ser parsing Sem a n t ic a n a lyzer 22 Syntactic representation A sequence of syntactic trees. P AY BE S CIEN CE IMP ORTAN T COU N TRY of GOVERNMENT ATTENTION IT MUCH WE 23 Semantic analysis Li n g u i s ti c p ro c e s s o r Mor ph ologica l a n a lyzer Syn t a ct ic pa r ser Sem a n t ic Semantic a n a lyzer analysis 24 Semantic representation Complex structure of whole text Mon ey m a in for m Or ga n iza t ion is a gives ATTEN TION F u n din g is a n eeds Sect or gives im plies for IMP ORTAN T is is a S CIEN CE GOVERN MEN T for of COU N TRY of WE 25 The meaning “Science “La ciencia is important es importante for our para country. nuestro Organizacion Dinero SER La LA articulo determinado, femenino país.Government The pays it much es un PONER da Forma ciencia CIENCIA sustantivo feminino, singular attention.” principal da El Gobierno leATENCION pone atención.” es SER verbo mucha presente, 3Є persona, sing. GOBIERNO implica es un CIENCIA IMPORTANTE importante IMPORTANTE adjetivo singular GOBIERNO ATENCION de para nececita para PARA preposicion --IMPORTANTE PAIS esposesivo para nuestro NOSOTROS pronombre PAIS es un Sector de pais PAIS sustantivo masculino, singular LE MUCHA CIENCIA de NOSOTROS Presupuesto There are good conditions for development of science in our country. NOSOTROS 26 Example: Translation Language A Language B Text level Morphological level Syntactic level Semantic level The Meaning, yet unreachable ? 27 Problems Ambiguity of text I see a cat with a telescope Knowledge needed Linguistic About the world and life Good news Learning from texts Plenty of texts in Internet! Good statistical methods 28 29 30 31 32 33 Current state Working... 34 Conclusiones ¿Is it necessary? ¿Is it simple? ¿Is it possible? ¿Has been done something? ¿Has been done all? ¿Where are people working on it? 35 Thank you! www.Gelbukh.com 36