CS 671 ICT For Development 19 Sep 2008 Vishal Vachhani

Download Report

Transcript CS 671 ICT For Development 19 Sep 2008 Vishal Vachhani

CS 671 ICT For Development 19

th

Sep 2008

Vishal Vachhani CFILT and DIL, IIT Bombay

Agro Explorer

A Meaning Based Multilingual Search Engine

Vishal Vachhani 2

    Web-site for Indian farmers Farmers can submit their problems related to their crops Queries are answered by Agricultural Experts at KVK, Baramati Languages supported: Marathi, Hindi, English Vishal Vachhani 3

Why Need Multilingual Search

 Vast Amount of Information available on the Web  Almost 70% of the Information is in English  The Indian rural populace is not English Literate 

“A Big Language Barrier”

 Information has to be made available to them in their local languages.

Vishal Vachhani 4

Why Need Meaning Based Search

 Most of the current Search Engines are Keyword Based.

 They do not consider the semantics of the query  The result set contains a large number of extraneous documents.

 Search based on the Meaning of the query will help narrow down on the desired information quickly.

Vishal Vachhani 5

Result in Hindi Query in Hindi search English Document Vishal Vachhani System English Document Marathi Document 6

Same Keywords Different Semantics Moneylenders Exploit Farmers Found 1 Result Farmers Exploit Moneylenders Found 0 Result Vishal Vachhani 7

Provides both   Meaning Based Search Cross-Lingual Information Access Vishal Vachhani 8

System Architecture

Vishal Vachhani 9

Vishal Vachhani 10

Vishal Vachhani 11

Vishal Vachhani 12

Vishal Vachhani 13

Vishal Vachhani 14

Conclusion

Provides two independent features   Multi-Linguality Meaning Based Search.

Because of UNL both multi-lingual and meaning based properties can be incorporated together rather than using separate language translators in search engines. The scheme admits itself to Integration of multiple languages in a seamless, scalable manner. Vishal Vachhani 15

UNL

Universal Networking Language

Vishal Vachhani 16

Englis h Marath i Hind i UNL Tam il Vishal Vachhani Frenc h 17

  Direct translation - translation will be done directly - N*(N-1) translator are needed for N languages translation.

Intermediate Language - intermediate language will be used for language translation - Only 2*N translators are required. Vishal Vachhani 18

    UNL is an acronym for “Universal Networking Language”.

UNL is a computer language that enables computers to process information and knowledge across the language barriers.

UNL is a language for representing information and knowledge provided by natural languages Unlike natural languages, UNL expressions are unambiguous. Vishal Vachhani 19

   ◦ ◦ Although the UNL is a language for computers, it has all the components of a natural language.

It is composed of Universal Words (UWs), Relations, Attributes.

Knowledge :semantic graph Nodes  Arcs  concepts relation between concepts Vishal Vachhani 20

  A UW represents simple or compound concepts. There are two classes of UWs: ◦ unit concepts ◦ compound structures of binary relations grouped together ( indicated with Compound UW-Ids) A UW is made up of a character string (an English language word) followed by a list of constraints.

◦ ◦ ::=[] example   state(icl>express) state(icl>country) Vishal Vachhani 21

◦ ◦ ◦ ◦ ◦ A relation label is represented as strings of 3 characters or less.

The relations between UWs are binary.

 rel (UW1, UW2) They have different labels according to the different roles they play. At present, there are 46 relations in UNL For example, (purpose), etc.

agt (agent), ins (instrument), pur Vishal Vachhani 22

 Attribute labels express additional information about the Universal Words that appear in a sentence.

◦ They show what is said from the speaker’s point of view; how the speaker views what is said. (time, reference, emphasis, attitude, etc) ◦ @entry, @present, @progressive, @topic, etc.

Vishal Vachhani 23

Example: Ram eats rice.

{unl} agt(eat.@entry.@present, Ram) obj(eat.@entry.@present, rice(icl>eatable)) {/unl} Vishal Vachhani 24

plc eat agt Ram rice Vishal Vachhani 25

Example: The boy who works here went to school.

{unl} agt(go(icl>move).@entry.@past, :01) plt(go(icl>occur).@entry.@past,school(icl>institutio n)) agt:01(work(icl>do), boy(icl>person.@entry)) plc:01(work(icl>do),here) {/unl} Vishal Vachhani 26

go agt plc work agt :01 here boy plt school Vishal Vachhani 27

Source language Enconvertor Intermediate Language Deconvertor Vishal Vachhani target language 28

   It’s a Language Independent Generator It can deconvert UNL expressions into a variety of native languages, using a number of linguistic data such as Word Dictionary, Grammatical Rules of each language.

The DeConverter transforms the sentence represented by a UNL expression into Natural language sentence.

Vishal Vachhani 29

Vishal Vachhani 30

Dictionary Case Marking Rules Morphology Rules Syntax Planning Rules UNL Doc UNL Parser Case Marking Module Morphology Module Language dependent Module Language Independent Module Vishal Vachhani Syntax Planning Module Hind iDoc 31

UNL parser module will do following tasks

– Check input format of UNL document – Separate attributes form UWs – Separate attributes form dictionary entries – Replace UWs with Hindi root words

   ◦ ◦

Category of morpho-syntactic properties which distinguish the various relations that a noun phrase may bear to a governing head.

ने

,

पर

,

के

,

से

,

पे

,etc.

A rule base based on :

UNL attributes lexical attributes from dictionary Vishal Vachhani 33

   Case marking is implemented using rules.

We analyze all UNL as well as dictionary attributes and decide next and previous case marker.

Also we use relation with parent to extract the right case mark.

Vishal Vachhani 34

  ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ agt:null:null:null: ने :@past#V:VINT:N:null

Structure

◦ relName : parent previous case marker: parent next case marker: child previous case marker: child next case marker: the rest four are in form of attr'REL'relationname and attr will be separated by # also relation name are separated by # Vishal Vachhani 35

What is Morphology

◦ ◦ Study of Morphemes Their formation into words, including inflection, derivation and composition Vishal Vachhani 36

    Noun, Verb and Adjective Morphology ◦ Depends on the phonetic properties of the Hindi word Noun Morphology ◦ Depends on gender, number and vowel ending of the noun Adjective Morphology ◦ अच्छा लडका , अच्छी लडकी , अच्छे लडके ◦ adjective अच्छ changes, lexical attribute “AdjA” Verb Morphology ◦ Depends upon tense, gender, number , person etc.

Vishal Vachhani 37

  Verbs are categorized by ◦ Tense (past,present,future) ◦ ◦ ◦ Gender(male,female) Person (1 st , 2 nd Number (sg,pl) , 3 rd ) Example ◦ Ladaka khana kha raha hai.

 It contains present continuous tense,male, sg, and 3 rd person Vishal Vachhani 38

   Arranging word according to the language structure Rule based module It is priority based graph traversal Vishal Vachhani 39

Algorithm for Syntax Planning: 1) Start traversing the UNL graph from the entry node.

2) If node has no children then add this node to final string.

3) If there is more than one child of one node then sort children based on the priority of the relations. Relation having highest priority will be traversed first.

4) Mark that node as visited node.

5) Repeat steps 3 and 4 until all the children of that node get visited.

6) If all the children of that node get visited then add that node to final string.

7) Repeat steps 2 to 4 until all the nodes get traversed. Vishal Vachhani 40

41  Also, spray 5% Neemark solution.

U-3

spray obj man solution also mod mod Neemark percent qua 5 obj:17 man:9 mod:5 qua:5 Vishal Vachhani

Entry

spray

Vishal Vachhani 42

Entry obj

spray

man Vishal Vachhani 43

Entry obj:17

spray

man:9 Vishal Vachhani 44

Entry obj:17 solution

spray

man:9 Vishal Vachhani 45

Entry obj:17 mod solution mod

spray

man:9 Vishal Vachhani 46

Entry obj:17 mod:5 solution mod:5

spray

man:9 Vishal Vachhani 47

Entry obj:17 mod:5 percent solution mod:5

spray

man:9 Vishal Vachhani 48

Entry obj:17 mod:5 percent solution mod:5

spray

man:9 Vishal Vachhani 49

Entry obj:17 mod:5 percent qua:5 solution mod:5

spray

man:9 Vishal Vachhani 50

Entry obj:17 mod:5 percent

qua:5

5 Output : 5 solution mod:5

spray

man:9 Vishal Vachhani 51

Entry obj:17 solution mod:5 percent

qua:5

5 Output : 5 percent mod:5

spray

man:9 Vishal Vachhani 52

Entry

spray

obj:17 solution mod:5 mod:5 percent

qua:5

5 Neemark Output : 5 percent Neemark man:9 Vishal Vachhani 53

Entry

spray

obj:17 man:9 solution mod:5 mod:5 percent

qua:5

5 Neemark Output : 5 percent Neemark solution Vishal Vachhani 54

Entry

spray

obj:17 man:9 solution also mod:5 mod:5 percent

qua:5

5 Neemark Output : 5 percent Neemark Solution also Vishal Vachhani 55

Entry

spray

obj:17 man:9 solution also mod:5 mod:5 percent

qua:5

5 Neemark Output : 5 percent Neemark Solution also spray Vishal Vachhani 56

Output:

5 percent Neemark solution also spray

5 प्रतिशि नीमअकक घोल भी तिड़क् | 5 प्रतिशि नीमअकक घोल भी तिड़को | Vishal Vachhani 57

Input sentence: Its roots are affected by bacterial infection.

Module Input UNL parser Case marking Morphology Syntax Planning Output Its roots are affected by bacterial infection.

जड़् प्रभातिि जीिाण्विक संक्रमण् जड़् प्रभातिि जीिाण्विक संक्रमण् से इसकी जड़ें जीिाण्विक प्रभातिि होती हैं संक्रमण से | जीिाण्विक संक्रमण से इसकी जड़़ें प्रभातिि होिी हैं | Output: जीिाण्विक संक्रमण से इसकी जड़़ें प्रभातिि होिी हैं | Vishal Vachhani 58

     UNL 2005 Specifications: http://www.undl.org/unlsys/unl/unl2005/ S.Singh, M.Dalal, V.Vachhani, P.Bhattacharrya and O.Damani

“Hindi generation from interlingua” MTsummit 2007 ( www.cse.iitb.ac.in/~vishalv ) Mrugank Surve, Sarvjeet Singh, Satish Kagathara, Venkatasivaramasastry K, Sunil Dubey, Gajanan Rane, Jaya Saraswati, Salil Badodekar, Akshay Iyer, Ashish Almeida, Roopali Nikam, Carolina Gallardo Perez, Pushpak Bhattacharyya, AgroExplorer Group: AgroExplorer: a Meaning Based Multilingual Search Engine, International Conference on Digital Libraries (ICDL), New Delhi, India, Feb 2004.

Agro Explorer : http://agro.mlasia.iitb.ac.in

aAQUA : http://www.aaqua.org

Vishal Vachhani 59