Jedinstveni jezik OO-modeliranja

Download Report

Transcript Jedinstveni jezik OO-modeliranja

DICTIONARY OF
TOPONYMS
IN SERBIAN
1
Pavlović-Lažetić
Gordana
Cvetana Krstev2
Duško Vitas1
1Faculty
of Mathematics
2Faculty of Philology
University of Belgrade
Serbia
DICTIONARY OF TOPONYMS:
OVERVIEW
AIMS
 SOURCES & CONTENTS
 STRUCTURE & FORMAT
 TAGS
 SPECIFICS
 EXAMPLES: daily newspapers
 FURTHER DEVELOPMENT

6th Intex Workshop, 28-30 May 2003
3
DICTIONARY OF TOPONYMS:
AIMS






Daily newspaper analysis: Intex
Web browsers
Unknown words: proper names – personal
and toponyms
Prolintex: Maurel, Piton
Encyclopedic dictionary with morphological
tags – Intex; experimental
Delas-top: toponyms, hydronyms, oronyms –
size ~2600; simple words
6th Intex Workshop, 28-30 May 2003
4
DICTIONARY OF TOPONYMS:
SOURCES

YU / foreign geography
 Geographic atlas used in education in Serbia
 Official statistics register of inhabited places
in former Yugoslavia
–
–
–
–
–
–
–
–
Belgrade, Serbia, Monte Negro
Bosnia, Macedonia, Croatia, Slovenia
South / Central / West / North Europe
Near East, Asia, Africa
North America (USA & Canada), South America
Australia, Pacific islands
Arctic & Antarctic, seas & oceans
Earth, continents, cosmos, Sun system
6th Intex Workshop, 28-30 May 2003
5
DICTIONARY OF TOPONYMS:
CHOICE







Country
Official languages
Capital city
Administrative divisions of common
importance (e.g., US states)
Cities (more than 10000 / 50000 /100000
inhabitants)
Hydronyms (rivers, lakes, swamps;
associated with mouth-country)
Mountains, volcanoes, etc, if of common
importance (for, e.g., newspaper text)
6th Intex Workshop, 28-30 May 2003
6
DICTIONARY OF TOPONYMS:
CONTENTS









Proper names
Relational adjectives (-ski, -sxki, -cyki)
Names of inhabitants, including pejoratives (if
exist; e.g., sxvaba,sxiptar)
Possessive adjectives (-ov/-ev; -in)
Examples:
Pariz,N - pariski,A+Rel
Parizxanin,Nm - Parizxaninov,A+Poss
Parizxanka,Nf - Parizxankin,A+Poss
parizxanski,A+Rel
6th Intex Workshop, 28-30 May 2003
7
DICTIONARY OF TOPONYMS:
CONTENTS
YU-geography: local official names
 Foreign: exonyms (rarely official / traditional)
 E.g., Becy, Rim, Solun, Bitolx, Skoplxe, Prag
 Transcription (adjusted): Cyrillic orthography
(e.g., Bolonxa)
 Orthography transcription rules
 Phonetic / adjusted to writing (e.g., Cykago,
Cyikagou) / declination
 Tradition (spontaneous: e.g., Peking)
 Latin: diacritic (Be~) / rarely transliteration

6th Intex Workshop, 28-30 May 2003
8
DICTIONARY OF TOPONYMS:
STRUCTURE & FORMAT
 Delas-top
entry: syntactic &
semantic attributes
 Syntactic: flective classes for
simple words (Delas for Serbian)
 Generated a part of DELAF-top
dictionary
 Semantics: Prolintex
6th Intex Workshop, 28-30 May 2003
9
DICTIONARY OF TOPONYMS:
TAGS - codes






DER (derivative): Beogradxanka, .Nfs+Der(Beograd)
Top (toponym, place):Beograd, .Nms +Top
Hyd – hydronym: dunavski,.A+Rel+Hyd
Oro – oronym: Gocy,.Nms+Oro
Hum- human beings: Valxevka,.Nfs+Hum
IsoXX – ISO-country-code for toponyms:
Beograd,.Nms+Top+IsoYU
LngXX – language code: grcyki, .A +LngEL
 PR – proper names: Beograd,.Nms+Top+PR
 PG – pejorative name: Sxvabica,.Nfs+PG

6th Intex Workshop, 28-30 May 2003
10
DICTIONARY OF TOPONYMS: TAGS











PAut: Autonom. region: Vojvodina,.Nfs+Top+PAut +IsoYU
PCen: Regional center: Nisx,.Nms+PCen
PDgr – City or place part: Autokomanda, .Nfs +PDgr
PDrz – Country: Francuska,.NAfs+PDrz
PGr1-PGr4, PGgr: Sofija, .Nfs+Top+PGgr+PGr4+IsoBG
POps – Township: Cyukarica,.Nfs+Top+POps+IsoYU
PKon – Continent: Evropa,.Nfs+PKon
POst – Island: Elba,.Nfs+PR+Top+POst+IsoIT
PPla – Mountain: Rila,.Nfs+PR+Oro+PPla+IsoBG
PReg – Region: Metohija,.Nfs+Top+PReg+IsoYU
PRgr – Reg. Cap. city: Prisxtina,.Nfs+Top+PRgr+IsoYU
6th Intex Workshop, 28-30 May 2003
11
DICTIONARY OF TOPONYMS:
SPECIFICS: nouns

Fem.: Top, Hyd, Oro, Hum (inhab.)
 Size
– Total: 911
– YU: 146 (16%)
– Foreign: 765 (84%)

Types
– Tops: 364 (40%)
– Hyds: 131 (14%)
– Oros: 33 (4%)
– Fem. inhab.:383 (42%)
6th Intex Workshop, 28-30 May 2003
12
DICTIONARY OF TOPONYMS:
SPECIFICS: nouns (cont.)
 Inhabitants
(fem.):
a) –ka, 361 (94%), same synt. cl., e.g.,
Beogradxanka, Bugarka,
Parizxanka, Kanadxanka
b) –ca, 7 (2%), diff. synt. cl., e.g.,
Sxvabica, Sremica
c) –nxa, 15 (4%), same synt. cl., e.g.,
Grkinxa, Polxakinxa, Turkinxa,
Francuskinxa, Renkinxa
6th Intex Workshop, 28-30 May 2003
13
DICTIONARY OF TOPONYMS:
SPECIFICS: adjectives
 Possessive,
relational
 Size
– Total: 1673
– YU: 268 (16%)
– Foreign:1405 (84%)
 Types
– Rel: 1002 (60%)
– Poss: 671 (40%)
6th Intex Workshop, 28-30 May 2003
14
DICTIONARY OF TOPONYMS:
SPECIFICS: adjectives (cont.)
 Relational:
– -ski, -sxki, -cyki
– e.g., beogradski, prasxki, becyki
– orthography: small l.
 Possessive:
– -in, e.g., Beogradxankin (f.), Becylijin(m.)
– -ov, -ev (m.), e.g., Beogradxaninov,
Prisxtincyev
6th Intex Workshop, 28-30 May 2003
15
DICTIONARY OF TOPONYMS:
SPECIFICS: homonymy

Mostly nouns, same synt. cl., e.g.,
Barselona, .Nfs+Top+PGr2+IsoVE
Barselona, .Nfs+Top+PGr4+IsoES
Alabama, .Nfs+Top+PFed+IsoUS
Alabama, .Nfs+Hyd+IsoUS
Drenica, .Nfs+Top+PReg+IsoYU
Drenica, .Nfs+Hyd+IsoYU

Some adjectives, e.g.,
kolumbijski, .A+Rel+Top+IsoUS
kolumbijski, .A+Rel+Top+PDrz+IsoCO
6th Intex Workshop, 28-30 May 2003
16
DICTIONARY OF TOPONYMS:
EXAMPLES: “Politika”
 Bilateral
relationships in former YU
 Bilateral relationships in former YU +
AL
 Officials in former YU + US + GB
 Officials + bilateral officials
6th Intex Workshop, 28-30 May 2003
17
DICTIONARY OF TOPONYMS:
EXAMPLES: “Politika”
6th Intex Workshop, 28-30 May 2003
18
DICTIONARY OF TOPONYMS:
EXAMPLES: “Politika”
6th Intex Workshop, 28-30 May 2003
19
DICTIONARY OF TOPONYMS:
EXAMPLES: “Politika”
6th Intex Workshop, 28-30 May 2003
20
DICTIONARY OF TOPONYMS:
EXAMPLES: “Politika”
6th Intex Workshop, 28-30 May 2003
21
DICTIONARY OF TOPONYMS:
FURTHER DEVELOPMENT






Further development of DELAS-TOP (dictionary
by continents)
Description of compounds (DELAFC-top), e.g.,
Novi Sad
DB system
Local grammars describing relationships
between toponyms / grouping, e.g., Jugoslavija:
ex-YU + SRJ +SCG +... Balkan part (+
normalization - body)
Tags’ reduction (e.g., PGr1, PGr2, PGr3, PGr4,
PVgr, PGgr -> PGrad) – Prolintex2
Information extraction
6th Intex Workshop, 28-30 May 2003
22