Nom du pays de votre choix

Download Report

Transcript Nom du pays de votre choix

INTERNATIONAL CONFERENCE
“COGNITIVE MODELING IN LINGUISTICS”
CML-2010, Dubrovnik (Croatia)
ANALYZING THE LOCALIZATION
OF LANGUAGE FEATURES WITH
COMPLEX SYSTEMS TOOLS AND
PREDICTING LANGUAGE VITALITY
Samuel Omlin
University of Lausanne, Switzerland
([email protected])
Romansh – an endangered language
“Allegra, miu num ei
Alfons Camiu. Jeu
vivel en la biala
Swizzera. Per discletg
sundel jeu in dils
davos 35'000 che
discuoren aunc bein
romontsch. Denton ei
il prighel fetg gronds,
che quei bi lungatg
sto murir.”
Romansh – an endangered language
“Hello. My name is
Alfons Camiu. I live
in Switzerland. I am
one of the 35'000
people who speak
Romansh as their
native language.
Unfortunately, the
language of my
people is in danger
of dying out.”
Half of the languages are endangered
Language competition
Business doctrine: location, location, location
Can this doctrine be applied to the survival of
languages?
Literature study
The role played by the geographic situation of a
language
in its ultimate
survival, and
in
particular the role played by the linguistic
structure of the languages neighboring it, is still
unclear.
Literature study
Inevitable extinction of minority languages in
competition with stronger ones or possibility for
stable coexistence under certain circumstances?
Unesco: assessing language vitality
In no criteria geography was directly implied:
Focus
Relation
between
the
vitality
of
a
minority
language and the linguistic structure of the
languages neighboring it?
Method
Adaptation of a mathematical method, having its
origins in the economical sciences and identifying
optimal localizations to implement commercial
stores with empirical success
Presentation Outline
Modeling and sample
Identifying optimal business locations
Measuring the spatial distribution of linguistic
features
Predicting language vitality
Results and conclusions
Presentation Outline
Modeling and sample
Identifying optimal business locations
Measuring the spatial distribution of linguistic
features
Predicting language vitality
Results and conclusions
Modeling and sample
Modeling and sample
Sample summary
• 105 living languages
• 186 linguistic communities in Eurasia with
independent vitality
• 31 of these linguistic communities have associated a
vitality grade
Presentation Outline
Modeling and sample
Identifying optimal business locations
Measuring the spatial distribution of linguistic
features
Predicting language vitality
Results and conclusions
M index
Quantifies
the
geographic
aggregation
and
dispersion tendencies of pairs of categories of
stores
Imaginary city: 16 Stores
Legend
Butcher
Bakery
Other store
Imaginary city: 16 Stores
Do butcher stores “attract” bakeries?
Legend
Butcher
Bakery
Other store
Step 1: definition of neighborhood
Draw a disk of radius r (100m) around each store (s) of category A.
Legend
Butcher (A)
Bakery (B)
Other store
Step 2
Pick a store (s) of category A.
s1
Legend
Butcher (A)
Bakery (B)
Other store
Step 3
Count the total number of stores in its neighborhood: n(s);
n(s1) = 3
s1
Legend
Butcher (A)
Bakery (B)
Other store
Step 4
…count the number of B stores in its neighborhood: nB(s);
n(s1) = 3
nB(s1) = 2
s1
Legend
Butcher (A)
Bakery (B)
Other store
.
Step 5
…compute the local concentration of B stores in its neighborhood:
n(s1) = 3
nB(s1) = 2
= 2/3
s1
Legend
Butcher (A)
Bakery (B)
Other store
.
.
Step 6
Then, count the total number of stores in the entire city: N;
n(s1) = 3
nB(s1) = 2
= 2/3
N = 16
s1
Legend
Butcher (A)
Bakery (B)
Other store
.
Step 7
…count the number of B stores in the entire city: NB;
n(s1) = 3
nB(s1) = 2
= 2/3
N = 16
NB = 4
s1
Legend
Butcher (A)
Bakery (B)
Other store
.
Step 8
…compute the overall concentration of B stores in the entire city:
n(s1) = 3
nB(s1) = 2
= 2/3
N = 16
NB = 4
s1
Legend
Butcher (A)
Bakery (B)
Other store
= 1/4
.
.
Step 9
Compare the local concentration of B stores with its overall concentration:
n(s1) = 3
nB(s1) = 2
= 2/3
N = 16
NB = 4
s1
Legend
Butcher (A)
Bakery (B)
Other store
= 1/4
=
=
.
Step 10
Compute this ratio
also for all the other A stores in the city.
n(s2) = 6
nB(s2) = 2
= 1/3
s2
N = 16
NB = 4
s1
Legend
Butcher (A)
Bakery (B)
Other store
= 1/4
=
=
.
Step 11
Finally, compute the average of this ratio over all A stores
in the city:
For our example (A: butcher; B: bakery):
Imaginary city: answer
Do butcher stores (A) “attract” bakeries (B)?
MAB = 2: next to the butcher stores, the local
concentration of bakeries is on average two times
higher than the overall concentration. => Butcher
stores tend to “attract” bakeries.
.
M index interpretation
Under pure randomness hypothesis E[MAB]=1 for all r > 0.
=> MAB allows quantifying deviations from purely random
configurations:
MAB > 1: A tends to “attract” B
MAB < 1: A tends to “repulse” B
Location quality
Location “quality” index for a commercial
activity
A
at
a
point
(x,y):
essentially
represents the sum of all quantified attraction
and repulsion tendencies from the stores in the
point’s neighborhood
Presentation Outline
Modeling and sample
Identifying optimal business locations
Measuring the spatial distribution of linguistic
features
Predicting language vitality
Results and conclusions
Adapted M index
Quantifies
tendencies
of
typological
features to aggregate or disperse
language
Adapted M index
A: “2.5.3.SIMPLE SENTENCE -> marginal constructions -> Affective”
B: “2.1.4.SYLLABLE -> the element following the vowel -> not more than one consonant”
Neighborhood of a linguistic community
Defined as: the set of communities overlapping its area,
enlarged by a buffer of size r (1 degree ≈ 110 km)
Particularity
Determination of the concentration of a
feature: adding numbers of speakers rather
than simply counting communities
Example
Does feature A “attract” feature B?
Example: answer
Does feature A “attract” feature B?
MAB ≈ 0.001: next to speakers manifesting feature A, the
local concentration of speakers using feature B is on average
about a thousandth of the overall concentration.
=> Feature A tends to “repulse” feature B.
Presentation Outline
Modeling and sample
Identifying optimal business locations
Measuring the spatial distribution of linguistic
features
Predicting language vitality
Results and conclusions
Location quality
Location quality of a feature: average ability of
a feature to coexist with the features manifested
by the communities in its neighborhood
Location quality of a linguistic community:
aggregated
features
location
quality
indexes
of
its
Predicting language vitality
For the 31 minority communities for which I could
associate a vitality, I put it in relation to the
corresponding location quality.
Presentation Outline
Modeling and sample
Identifying optimal business locations
Measuring the spatial distribution of linguistic
features
Predicting language vitality
Results and conclusions
Location quality and vitality
Spearman’s rang correlation: 0.62 (p-value: 0.00009)
Conclusions
• The degree of endangerment of the considered
minority languages seems effectively related to
the linguistic structure of their neighboring
languages.
Conclusions
• It has been outlined how to join
- Jaziky mira
- World Language Mapping System
- Atlas of the World’s Languages in Danger
in order to conduct quantitative linguistic studies
when geographic parameters are involved.
Conclusions
• The first study to integrate realistic linguistic features
in order to describe languages in competition
Conclusions
• The approach constitutes a promising tool to gain
more knowledge about the mechanisms that
control the geographical distribution of linguistic
features.
Acknowledgement
• Dr Vladimir Polyakov, organizing committee
• Professor
Valery
Solovyev,
organizing
committee
• Dr
Søren
Wichmann,
Linguistics of the
Department
of
Max Planck Institute for
Evolutionary Anthropology, Germany
Support (1/2)
• Dr Aris Xanthos, section of Linguistics,
section of Information Technologies and
Mathematical
Methods,
University
of
Lausanne (UNIL), Switzerland
• Professor
Dr
François
Stéphane
Joost,
Golay
and
Geographic
Information Systems Laboratory (LASIG),
Swiss Federal Institute of Technology
Lausanne (EPFL), Switzerland
Support (2/2)
• Professor Pablo Jensen, Laboratory
of Physics, French National Center for
Scientific Research (CNRS), France
• Professor François Pellegrino and
Dr
Fermín
Martín,
Moscoso
'Dynamique
del
Du
Prado
Langage‘
Laboratory, French National Center for
Scientific Research (CNRS), France
References (1/3)
Jazyki mira (Languages of the World) (1993-2004).
Moscow: Academia & Indrik. [Online]. Available:
http://ww.dblang.ru/en
Jensen, P. (2006). Network-based predictions of
retail store commercial categories and optimal
locations. Phys. Rev. E 74(3), 035101(R). [Online].
Available:
http://dx.doi.org/10.1103/PhysRevE.74.035101
References (2/3)
Jensen, P. (2009). Analyzing the Localization of Retail
Stores with Complex Systems Tools. In Adams, N. M.,
Robardet, C., Siebes, A. & Boulicaut, J-F. (Eds.), Advances
in Intelligent Data Analysis VIII: 8th International
Symposium on Intelligent Data Analysis, Lecture Notes in
Computer Science, Vol. 5772/2009, 10–20. Berlin
Heidelberg: Springer-Verlag. [Online]. Available:
http://dx.doi.org/10.1007/978-3-642-03915-7_2
References (3/3)
Moseley, C. (Ed.) (2009). Atlas of the World’s Languages in
Danger. Unesco. [Online]. Available:
http://www.unesco.org/culture/en/endangeredlanguages/atlas
World Language Mapping System (2010). Global Mapping
International & SIL International. [Online]. Available:
http://www.gmi.org/wlms
Additional slides
…
Step 1: definition of neighborhood
Draw the neighborhood of every community (c) manifesting feature A.
Step 2
Pick a community (c) manifesting feature A.
Step 3
Add up the number of speakers of all communities in its neighborhood: n(c);
n(c) ≈ 54 million
Step 4
…add up the number of speakers of the communities manifesting
the feature B in its neighborhood: nB(c);
n(c) ≈ 54 million
nB(c) = 0
Step 5
…compute the local concentration of communities manifesting
feature B in its neighborhood:
.
n(c) ≈ 54 million
nB(c) = 0
=0
Step 6
Then, add up the number of speakers of all communities in the
entire sample region: N;
n(c) ≈ 54 million
nB(c) = 0
=0
N ≈ 931 million
Step 7
…add up the number of speakers of all communities manifesting
feature B in the entire sample region: NB;
n(c) ≈ 54 million
nB(c) = 0
=0
N ≈ 931 million
NB ≈ 93 million
Step 8
… compute the overall concentration of communities manifesting
feature B in the entire sample region: .
n(c) ≈ 54 million
nB(c) = 0
=0
N ≈ 931 million
NB ≈ 93 million
≈ 1/10
Step 9
Compare the local concentration of communities
manifesting feature B with its overall concentration:
.
n(c) ≈ 54 million
nB(c) = 0
=0
N ≈ 931 million
NB ≈ 93 million
≈ 1/10
≈
≈0
Step 10
Compute this ratio
also for all the other communities
manifesting feature B in the sample region.
Computers
work…
.
Step 11
Finally, compute the average of this ratio over all communities
manifesting feature A in the sample region (the average is
weighted by their number of speakers):
For our example:
MAB ≈ 0.001
Example: answer
Does feature A “attract” feature B?
MAB ≈ 0.001: next to speakers manifesting feature A, the
local concentration of speakers using feature B is on average
about a thousandth of the overall concentration.
=> Feature A tends to “repulse” feature B.
.
Interpretation of the adapted M index
Under pure randomness hypothesis E[MAB]=1 for all r > 0.
=> the adapted MAB allows quantifying deviations from
purely random configurations like Jensen’s MAB :
MAB > 1: A tends to “attract” B
MAB < 1: A tends to “repulse” B
Coexistence ability of features
The spatial distribution of the commercial activities
seems to unravel interactions that favor or
disfavor successful local coexistence of certain
activities.
From the spatial distribution of language features
only can not be directly determined
which
features can successfully coexist and which can
not.
Coexistence ability of features
We can quantify interactions favoring or disfavoring
successful
coexistence
between
features by considering only communities that
are probably not endangered when computing
the M index.
C index: coexistence ability of features
where the considered linguistic communities are
only the ones that are probably not endangered.
Method
City: a heterogeneous geographic space (with
parks, streams etc.) giving home to a network of
commercial activities
World: a heterogeneous geographic space (with sea,
mountains, lakes etc.) giving home to a network
of languages, or more precisely, of linguistic
features