Semantic Parameters of the Grammaticalization of Agreement

Download Report

Transcript Semantic Parameters of the Grammaticalization of Agreement

Advances in
Automated
Language
Classification
ASJP Consortium
(Dik Bakker)
Overview
Project (MAY 2007 - ):
ASJP (Automated Similarity Judgment Program)
ASJP: Automatic Reconstruction
2
Overview
Project:
ASJP (Automated Similarity Judgment Program)
LANGUAGE
NUMBERS
ASJP: Automatic Reconstruction
3
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Data
sources
TOOLS
ASJP: Automatic Reconstruction
4
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Data
sources
Data
bases
TOOLS
Results
ASJP: Automatic Reconstruction
5
Overview
Project:
ASJP (Automated Similarity Judgment Program)
ASJP: Automatic Reconstruction
6
Overview
Project:
ASJP are:
Sören Wichmann (BRD; Netherlands)
Viveka Velupillai (BRD)
André Müller (BRD)
Robert Mailhammer (BRD)
Hagen Jung (BRD)
Eric Holman (US)
Anthony Grant (UK)
Dmitry Egorov (Russia)
Pamela Brown (US)
Cecil Brown (US)
Dik Bakker (UK; Netherlands)
ASJP: Automatic Reconstruction
7
Overview
Project:
ASJP (Automated Similarity Judgment Program)
ASJP: Automatic Reconstruction
8
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Overall goal:
Automatic reconstruction of language relationships
ASJP: Automatic Reconstruction
9
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Overall goal:
Automatic reconstruction of language relationships
Basis:
Distance matrices between individual languages on
the basis of linguistic features
ASJP: Automatic Reconstruction
10
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Overall goal:
Automatic reconstruction of language relationships
Basis:
Distance matrices between individual languages on
the basis of linguistic features
Method:
Lexicostatistics: mass comparison of basic lexical items,
ASJP: Automatic Reconstruction
11
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Overall goal:
Automatic reconstruction of language relationships
Basis:
Distance matrices between individual languages on
the basis of linguistic features
Method:
Lexicostatistics: mass comparison of basic lexical items,
extended by all relevant data available
ASJP: Automatic Reconstruction
12
Swadesh
(2440)
ASJP: Automatic Reconstruction
13
Swadesh
(2440)
ASJP software
ASJP: Automatic Reconstruction
14
Swadesh
(2440)
ASJP software
distance
matrices
ASJP: Automatic Reconstruction
15
Swadesh
(2440)
ASJP1
ASJP2
distance
matrices
ASJP: Automatic Reconstruction
16
Swadesh
(2440)
ASJP1
ASJP2
distance
matrices
TREE
SFTW
ASJP: Automatic Reconstruction
17
ETHN
WALS
EXPRT
Swadesh
(2440)
ASJP1
calibration
ASJP2
distance
matrices
STAT
SFTW
TREE
SFTW
ASJP: Automatic Reconstruction
18
ETHN
WALS
EXPRT
Swadesh
(2440)
ASJP1
calibration
ASJP2
distance
matrices
STAT
SFTW
TREE
SFTW
ASJP: Automatic Reconstruction
19
GEO
GRAPH
ETHN
WALS
EXPRT
Swadesh
(2440)
ASJP1
ASJP2
distance
matrices
MAP
SFTW
STAT
SFTW
TREE
SFTW
ASJP: Automatic Reconstruction
20
HIST
FACTS
GEO
GRAPH
ETHN
WALS
EXPRT
Swadesh
(2440)
ASJP1
ASJP2
distance
matrices
MAP
SFTW
STAT
SFTW
TREE
SFTW
ASJP: Automatic Reconstruction
21
HIST
FACTS
GEO
GRAPH
ETHN
WALS
EXPRT
Swadesh
(2440)
ASJP1
PHON
INVENT
ASJP2
distance
matrices
MAP
SFTW
STAT
SFTW
TREE
SFTW
ASJP: Automatic Reconstruction
22
HIST
FACTS
GEO
GRAPH
ETHN
WALS
EXPRT
Swadesh
(2440)
ASJP1
ASJP2
distance
matrices
MAP
SFTW
STAT
SFTW
PHON
INVENT
Jeff
Mielke
500+
TREE
SFTW
ASJP: Automatic Reconstruction
23
HIST
FACTS
GEO
GRAPH
ETHN
WALS
EXPRT
Swadesh
(2440)
ASJP1
PHON
INVENT
LOANS
ASJP2
distance
matrices
MAP
SFTW
STAT
SFTW
TREE
SFTW
ASJP: Automatic Reconstruction
24
Overview
OVERALL GOAL: Reconstruction of Language Relationships
ASJP: Automatic Reconstruction
25
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
ASJP: Automatic Reconstruction
26
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
ASJP: Automatic Reconstruction
27
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
ASJP: Automatic Reconstruction
28
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Search for (ir)regularities in phylogenies
ASJP: Automatic Reconstruction
29
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
ASJP: Automatic Reconstruction
30
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find an optimal dating method
ASJP: Automatic Reconstruction
31
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find an optimal dating method
- Automatically detect borrowings
ASJP: Automatic Reconstruction
32
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find the best/optimal dating method
- Automatically detect borrowings
ASJP: Automatic Reconstruction
33
Overview
1. The list of basic lexical items
ASJP: Automatic Reconstruction
34
Overview
1. The list of basic lexical items
2. Comparing words & languages
ASJP: Automatic Reconstruction
35
Overview
1. The list of basic lexical items
2. Comparing words & languages
3. Some results: genetic proximity
ASJP: Automatic Reconstruction
36
Overview
1. The list of basic lexical items
2. Comparing words & languages
3. Some results: genetic proximity
4. On Inheritance vs Borrowing
ASJP: Automatic Reconstruction
37
Overview
1. The list of basic lexical items
2. Comparing words & languages
3. Some results: genetic proximity
4. On Inheritance vs Borrowing
5. Immanent extensions
ASJP: Automatic Reconstruction
38
1. The list of basic lexical items
ASJP: Automatic Reconstruction
39
Lexical items
Word list: Swadesh 100 basic meanings
ASJP: Automatic Reconstruction
40
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
ASJP: Automatic Reconstruction
41
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
ASJP: Automatic Reconstruction
42
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
ASJP: Automatic Reconstruction
43
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
- Culturally independent
ASJP: Automatic Reconstruction
44
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
- Culturally independent
- Stable over time
ASJP: Automatic Reconstruction
45
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
- Culturally independent
- Stable over time
- Few synonyms
ASJP: Automatic Reconstruction
46
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
- Culturally independent
- Stable over time
- Few synonyms
?
ASJP: Automatic Reconstruction
47
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
- Culturally independent
- Stable over time
- Few synonyms
?
ASJP: Automatic Reconstruction
LWT
48
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
49
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
Otomi
from
Spanish
50
Lexical items: further reduction
Early analyses have shown:
- Most stable 40/100 item subset gives same results
ASJP: Automatic Reconstruction
51
Lexical items: further reduction
Early analyses have shown:
- Most stable 40/100 item subset gives same results
 Less work
ASJP: Automatic Reconstruction
52
Lexical items: further reduction
Early analyses have shown:
- Most stable 40/100 item subset gives same results
 Less work
 Less missing data
ASJP: Automatic Reconstruction
53
Lexical items: further reduction
Early analyses have shown:
- Most stable 40/100 item subset gives same results
 Less work
 Less missing data
 Faster processing; combinatorial explosion:
40 : 100 ~ 109 < 1010 COMPARISONS
ASJP: Automatic Reconstruction
54
Lexical items: further reduction
Early analyses have shown:
- Most stable 40/100 item subset gives same results
 Less work
 Less missing data
 Faster processing; combinatorial explosion:
40 : 100 ~ 109 < 1010 COMPARISONS
ASJP: Automatic Reconstruction
55
Lexical items: further reduction
Most stable:
*
SSM = (R – U) / (1 – U)
*see references
ASJP: Automatic Reconstruction
56
Lexical items: further reduction
Most stable:
SSM = (R – U) / (1 – U)
R = mean proportion ‘same form ’ for SMi / genus
ASJP: Automatic Reconstruction
57
Lexical items: further reduction
Most stable:
SSM = (R – U) / (1 – U)
R = mean proportion ‘same form ’ for SMi / genus
U = mean proportion ‘same form ’ for different SMx / genus
ASJP: Automatic Reconstruction
58
Lexical items: further reduction
Most stable:
SSM = (R – U) / (1 – U)
R = mean proportion ‘same form ’ for SMi / genus
U = mean proportion ‘same form ’ for different SMx / genus
N.B. Ssm high correlation between families
ASJP: Automatic Reconstruction
59
Ethnologue (Goodmann-Kruskal)
WALS (Pearson)
++ < Stability > -ASJP: Automatic Reconstruction
60
I
dog
nose
die
smoke
you
louse
mouth
kill
fire
we
tree
tooth
swim
ash
this
seed
tongue
fly
burn
that
leaf
claw
walk
path
who
root
foot
come
mountain
what
bark
knee
lie
red
not
skin
hand
sit
green
all
flesh
belly
stand
yellow
many
blood
neck
give
white
one
bone
breasts
say
black
two
grease
heart
sun
night
big
egg
liver
moon
hot
long
horn
drink
star
cold
small
tail
eat
water
full
woman
feather
bite
rain
new
man
hair
see
stone
good
person
head
hear
sand
round
fish
ear
know
earth
dry
bird
eye
sleep
cloud
name
ASJP: Automatic Reconstruction
61
I
dog
nose
die
smoke
you
louse
mouth
kill
fire
we
tree
tooth
swim
ash
this
seed
tongue
fly
burn
that
leaf
claw
walk
path
who
root
foot
come
mountain
what
bark
knee
lie
red
not
skin
hand
sit
green
all
flesh
belly
stand
yellow
many
blood
neck
give
white
one
bone
breast
say
black
two
grease
heart
sun
night
big
egg
liver
moon
hot
long
horn
drink
star
cold
small
tail
eat
water
full
woman
feather
bite
rain
new
man
hair
see
stone
good
person
head
hear
sand
round
fish
ear
know
earth
dry
bird
eye
sleep
cloud
name
ASJP: Automatic Reconstruction
40
Most
Stable
62
I
dog
nose
die
smoke
you
louse
mouth
kill
fire
we
tree
tooth
swim
ash
this
seed
tongue
fly
burn
that
leaf
claw
walk
path
who
root
foot
come
mountain
what
bark
knee
lie
red
not
skin
hand
sit
green
all
flesh
belly
stand
yellow
many
blood
neck
give
white
one
bone
breast
say
black
two
grease
heart
sun
night
big
egg
liver
moon
hot
long
horn
drink
star
cold
small
tail
eat
water
full
woman
feather
bite
rain
new
man
hair
see
stone
good
person
head
hear
sand
round
fish
ear
know
earth
dry
bird
eye
sleep
cloud
name
ASJP: Automatic Reconstruction
40
Most
Stable
63
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
ASJP: Automatic Reconstruction
64
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
ASJP: Automatic Reconstruction
65
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
- simple programming language (Fortran; Pascal)
ASJP: Automatic Reconstruction
66
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
- simple programming language (Fortran; Pascal)
 Recoding to simplified ASJPcode (only Ascii)
ASJP: Automatic Reconstruction
67
Lexical items: transcription
ASJPcode:
ASJP: Automatic Reconstruction
68
Lexical items: transcription
ASJPcode:
7 Vowels
ASJP: Automatic Reconstruction
69
Lexical items: transcription
ASJPcode:
7 Vowels
34 Consonants
ASJP: Automatic Reconstruction
70
Lexical items: transcription
ASJPcode:
7 Vowels
‘Closest sound’
34 Consonants
ASJP: Automatic Reconstruction
71
Lexical items: transcription
ASJPcode:
7 Vowels
34 Consonants
Operators for:
Nasalization
Labialization
Palatalization
Aspiration
Glottalization
ASJP: Automatic Reconstruction
72
Abaza (Caucasian):
Meaning
PERSON
LEAF
SKIN
HORN
NOSE
TOOTH
ASJP: Automatic Reconstruction
73
Abaza (Caucasian):
Meaning
IPA
PERSON
ʕʷɨʧʼʲʷʕʷɨs
LEAF
bɣʲɨ
SKIN
ʧʷazʲ
HORN
ʧʼʷɨʕʷa
NOSE
pɨnʦʼa
TOOTH
pɨʦ
ASJP: Automatic Reconstruction
74
Abaza (Caucasian):
Meaning
IPA
ASJPcode
PERSON
ʕʷɨʧʼʲʷʕʷɨs
Xw~3Cw"yXw~3s
LEAF
bɣʲɨ
bxy~3
SKIN
ʧʷazʲ
Cw~azy~
HORN
ʧʼʷɨʕʷa
Cw"~3Xw~a
NOSE
pɨnʦʼa
p3nc"a
TOOTH
pɨʦ
p3c
ASJP: Automatic Reconstruction
75
Lexical items
Collected to date:
- Close to 2500 languages (incl. dialects and proto)
ASJP: Automatic Reconstruction
76
Lexical items
Collected to date:
- Close to 2500 languages (incl. dialects and proto)
- Mean number of items/language: 35.8 (/40)
ASJP: Automatic Reconstruction
77
Lexical items
Areal distribution (not a sample!):
Americas:
27%
Eurasia:
23%
Australia/PNG:
Austronesia:
Africa:
Creoles:
Artificial:
18%
15%
14%
2%
1%
ASJP: Automatic Reconstruction
78
Languages currently sampled
ASJP: Automatic Reconstruction
79
2. Comparing words and languages
ASJP: Automatic Reconstruction
80
Comparing words
Two strategies:
ASJP: Automatic Reconstruction
81
Comparing words
Two strategies:
1. ASJP rules
ASJP: Automatic Reconstruction
82
Comparing words
1. ASJP context rules
ASJP: Automatic Reconstruction
83
Comparing words
ASJP context rules
a. between 2 words
ASJP: Automatic Reconstruction
84
Comparing words
ASJP context rules
SMi:
WORDlg1
==
WORDlg2
ASJP: Automatic Reconstruction
85
Comparing words
ASJP context rules (C/V=general; c/v=specific; X=*)
SMi:
WORDlg1
==
WORDlg2
R1
R2
…
R12
R13
…
R22
#(V)cVcX#
#Xc(V)c(V)cX#
#XcVcX#
#Xc(V)c(V)cX#
#AVcvX#
#(V)ccVX#
#VcvX# A=hwy
#(V)ccVX#
#cv#
#(CV)cv#
ASJP: Automatic Reconstruction
86
Comparing words
ASJP context rules (C/V=general; c/v=specific; X=*)
SMi:
WORDlg1
==
WORDlg2
R1
R2
…
R12
R13
…
R22
#(V)cVcX#
#Xc(V)c(V)cX#
#XcVcX#
#Xc(V)c(V)cX#
#AVcvX#
#(V)ccVX#
#VcvX# A=hwy
#(V)ccVX#
#cv#
pattern Wlg1 UNIFIES
#(CV)cv#
pattern Wlg2
ASJP: Automatic Reconstruction
87
Comparing words
ASJP context rules (C/V=general; c/v=specific; X=*)
SMi:
WORDlg1
==
WORDlg2
R1
R2
…
R12
R13
…
R22
#(V)cVcX#
#Xc(V)c(V)cX#
#XcVcX#
#Xc(V)c(V)cX#
#AVcvX#
#(V)ccVX#
#VcvX# A=hwy
#(V)ccVX#
#cv#
#(CV)cv#
ASJP: Automatic Reconstruction
88
Comparing words
ASJP context rules (C/V=general; c/v=specific; X=*)
SMi:
WORDlg1
==
WORDlg2
R1
R2
…
R12
R13
…
R22
#(V)cVcX#
#Xc(V)c(V)cX#
#XcVcX#
#Xc(V)c(V)cX#
#AVcvX#
#(V)ccVX#
#VcvX# A=hwy
#(V)ccVX#
#cv#
#(CV)cv#
ASJP: Automatic Reconstruction
89
Comparing words
ASJP context rules (C/V=general; c/v=specific; X=*)
R1
R2
…
R12
R13
…
R22
#(V)cVcX#
#Xc(V)c(V)cX#
#XcVcX#
#Xc(V)c(V)cX#
#AVcvX#
#(V)ccVX#
#VcvX# A=hwy
#(V)ccVX#
#cv#
#(CV)cv#
#yapi
#opi
ASJP: Automatic Reconstruction
90
Comparing words
ASJP context rules
a. between 2 words
value 0 or 1
ASJP: Automatic Reconstruction
91
Comparing words
ASJP context rules
a. between 2 words
value 0 or 1
b. between 2 languages: RELATEDNESS
(n of matching words / total pairs) * 100
ASJP: Automatic Reconstruction
92
Comparing words
ASJP context rules
a. between 2 words
value 0 or 1
b. between 2 languages: DISTANCE
LSP=100 – ((matching words / total pairs) * 100 )
ASJP: Automatic Reconstruction
93
Comparing words
2. Levenshtein Distance
ASJP: Automatic Reconstruction
94
Comparing words
Levenshtein Distance
a. between 2 words:
number of transformations to get from the shorter
form to the longer one (changes, additions)
min = 0 / max = length longest word
ASJP: Automatic Reconstruction
95
Comparing words
Levenshtein Distance
a. between 2 words:
number of transformations to get from the shorter
form to the longer one (changes, additions)
b. between 2 languages:
mean LD for total number of pairs
ASJP: Automatic Reconstruction
96
Comparing words
Two problems with simple LD:
ASJP: Automatic Reconstruction
97
Comparing words
Two problems:
1. Value depends on length of longest word
ASJP: Automatic Reconstruction
98
Comparing words
Two problems:
1. Value depends on length of longest word
 Normalize: LDN = ( LD / Lmax )
ASJP: Automatic Reconstruction
99
Comparing words
Two problems:
1. Value depends on length of longest word
 Normalize: LDN = ( LD / Lmax )
2. Differences between lgs in phonological overlap
ASJP: Automatic Reconstruction
100
Comparing words
Two problems:
1. Value depends on length of longest word
 Normalize: LDN = ( LD / Lmax )
2. Differences between lgs in phonological overlap
 Eliminate ‘ background noise’:
LDND = ( LDN / LDNdifferent pairs )
ASJP: Automatic Reconstruction
101
Comparing words
Levenshtein Distance
a. between 2 words:
LDND = 0
-
100 (+)
ASJP: Automatic Reconstruction
102
Comparing words
Levenshtein Distance
a. between 2 words:
LDND = 0
-
100 (+)
b. between 2 languages:
Mean of all LDND’s of words in common
ASJP: Automatic Reconstruction
103
Comparing languages
AGUACATEC (agu) <> MOCHO (mhc)
MAYAN (45) > MAYAN
[GeoD=97; GenD=1.86]
ONE
xun=hun
-
LDND= 37.4
TWO
kob=kabe7
R1
LDND= 67.3
BONE
baq=baq
R3
LDND= 0.0
EAR
SCin=Cikin
-
LDND= 67.3
R10
LDND= 37.4
WATER a7=ha7
ASJP: Automatic Reconstruction
104
Comparing languages
AGUACATEC (agu) <> MOCHO (mhc)
MAYAN (45) > MAYAN
[GeoD=97; GenD=1.86]
ONE
xun=hun
-
LDND= 37.4
TWO
kob=kabe7
R1
LDND= 67.3
BONE
baq=baq
R3
LDND= 0.0
EAR
SCin=Cikin
-
LDND= 67.3
WATER a7=ha7
R10
LDND= 37.4
TOTAL
LSP = 58.14
ASJP: Automatic Reconstruction
105
Comparing languages
AGUACATEC (agu) <> MOCHO (mhc)
MAYAN (45) > MAYAN
[GeoD=97; GenD=1.86]
ONE
xun=hun
-
LDND= 37.4
TWO
kob=kabe7
R1
LDND= 67.3
BONE
baq=baq
R3
LDND= 0.0
EAR
SCin=Cikin
-
LDND= 67.3
WATER a7=ha7
R10
LDND= 37.4
TOTAL
LSP = 58.14
LDND = 51.68 (n=35)
ASJP: Automatic Reconstruction
106
Comparing languages
AGUACATEC (agu) <> MOCHO (mhc)
MAYAN (45) > MAYAN
[GeoD=97; GenD=1.86]
ONE
xun=hun
-
LDND= 37.4
TWO
kob=kabe7
R1
LDND= 67.3
BONE
baq=baq
R3
LDND= 0.0
EAR
SCin=Cikin
-
LDND= 67.3
WATER a7=ha7
R10
LDND= 37.4
HIGH CORRELATION:
LSP = 58.14
LDND = 51.68 (n=35)
ASJP: Automatic Reconstruction
107
Comparing languages
HIGH CORRELATION
LSP ~ LDND
ASJP: Automatic Reconstruction
108
Comparing languages
HIGH CORRELATION
LSP ~ LDND
MAYA (n=34)
0.93**
INDO-EUROPEAN (n=129)
0.97**
AMERINDIAN (n=511)
0.59**
ASJP: Automatic Reconstruction
109
Comparing languages
BEST PERFORMERS
1. EYE
2. LOUSE
3. DIE
4. BREAST
5. STONE
Within families
0.496
0.480
0.469
0.415
0.364
ASJP: Automatic Reconstruction
110
Comparing languages
BEST PERFORMERS
1. EYE
2. LOUSE
3. DIE
4. BREAST
5. STONE
1. I
2. DIE
3. WE
4. YOU
5. BREAST
Within families
0.496
0.480
0.469
0.415
0.364
Across families
0.072
0.065
0.061
0.057
0.057
ASJP: Automatic Reconstruction
111
Comparing languages
BEST PERFORMERS
1. EYE
2. LOUSE
3. DIE
4. BREAST
5. STONE
1. I
2. DIE
3. WE
4. YOU
5. BREAST
Within families
0.496
0.480
0.469
0.415
0.364
Across families
0.072
0.065
0.061
0.057
0.057
ASJP: Automatic Reconstruction
112
Comparing languages
BEST PERFORMERS
1. EYE
2. LOUSE
3. DIE
4. BREAST
5. STONE
1. I
2. DIE
3. WE
4. YOU
5. BREAST
Within families
0.496
0.480
0.469
0.415
0.364
Across families
0.072
0.065
- Shortness
0.061
- Sound Symbolism?
0.057
0.057
ASJP: Automatic Reconstruction
113
Comparing languages
WORST PERFORMERS
36. HORN
37. SEE
38. KNEE
39. NIGHT
40. MOUNTAIN
Within families
0.107
0.099
0.095
0.079
0.075
ASJP: Automatic Reconstruction
114
Comparing languages
WORST PERFORMERS
36. HORN
37. SEE
38. KNEE
39. NIGHT
40. MOUNTAIN
Within families
0.107
0.099
0.095
0.079
0.075
36. NIGHT
37. HEAR
38. HORN
39. STAR
40. KNEE
Across families
0.028
0.027
0.027
0.024
0.023
ASJP: Automatic Reconstruction
115
Comparing languages
WORST PERFORMERS
36. HORN
37. SEE
38. KNEE
39. NIGHT
40. MOUNTAIN
Within families
0.107
0.099
0.095
0.079
0.075
36. NIGHT
37. HEAR
38. HORN
39. STAR
40. KNEE
Across families
0.028
0.027
0.027
0.024
0.023
ASJP: Automatic Reconstruction
116
for 2440 lgs: ~ 3,000,000 ( * 362 ~ ± 3.109 )
LANG1
LANG2
FAM1
FAM2
LSP
LDND
AGUACATEC
CHICOMUCELTEC
MAYAN
MAYAN
96.55
94.75
AGUACATEC
CHOL_TILA
MAYAN
MAYAN
86.11
80.10
AGUACATEC
CHONTAL_TABASCO
MAYAN
MAYAN
90.00
83.97
AGUACATEC
IXIL_CHAJUL
MAYAN
MAYAN
47.50
49.25
AGUACATEC
KAQCHIKEL_NORTHERN
MAYAN
MAYAN
74.36
64.40
AGUACATEC
MAYA_YUCATAN
MAYAN
MAYAN
78.95
76.15
AGUACATEC
MOCHO
MAYAN
MAYAN
54.29
51.68
AGUACATEC
QANJOBAL_EASTERN
MAYAN
MAYAN
45.00
50.59
AGUACATEC
RABINAL_ACHI
MAYAN
MAYAN
70.00
59.03
AGUACATEC
SAKAPULTEKO
MAYAN
MAYAN
70.00
61.83
AGUACATEC
SIPAKAPENSE
MAYAN
MAYAN
66.67
54.97
AGUACATEC
TEKTITEKO
MAYAN
MAYAN
52.50
57.24
AGUACATEC
TZELTAL_OXCHUC
MAYAN
MAYAN
86.84
72.93
AGUACATEC
TZOTZIL_SAN_ANDRES
MAYAN
MAYAN
92.50
79.64
ASJP: Automatic Reconstruction
117
3. Genetic proximity
ASJP: Automatic Reconstruction
118
Swadesh
(2440)
AJP2
distance
matrices
Splits
Tree
ASJP: Automatic Reconstruction
119
Swadesh
(2440)
AJP2
distance
matrices
Splits
Tree
MEGA4
ASJP: Automatic Reconstruction
120
Swadesh
(2440)
AJP2
distance
matrices
Splits
Tree
MEGA4
ASJP: Automatic Reconstruction
Neighbour
Joining
121
RABINAL ACHI
SAKAPULTEKO
KAQCHIKEL NORTHERN
ASJP
SIPAKAPENSE
TZUTUJIL
QUICHE
USPANTEKO
KEKCHI
POQOMCHI WESTERN
POCOMAM
TEKTITEKO
AGUACATEC
MAM
IXIL CHAJUL
MOCHO
JACALTEC
QANJOBAL EASTERN
AKATEKO
CHUJ
CHICOMUCELTEC
HUASTEC
MAYA YUCATAN
ITZAJ
LACANDON
MOPAN
CHOL TILA
CHOL
CHONTAL TABASCO
CHORTI
TOJOLABAL
TZELTAL OXCHUC
TZELTAL
TZOTZIL SAN ANDRES
ZINACANTAN TZOTZIL
LSP
10
ASJP: Automatic Reconstruction
122
RABINAL ACHI
SAKAPULTEKO
KAQCHIKEL NORTHERN
SIPAKAPENSE
TZUTUJIL
QUICHE
USPANTEKO
KEKCHI
POQOMCHI WESTERN
POCOMAM
TEKTITEKO
AGUACATEC
Correlation:
MAM
IXIL CHAJUL
MOCHO
JACALTEC
QANJOBAL EASTERN
ETHN
AKATEKO
CHUJ
.325**
CHICOMUCELTEC
HUASTEC
MAYA YUCATAN
ITZAJ
LACANDON
MOPAN
CHOL TILA
CHOL
CHONTAL TABASCO
CHORTI
TOJOLABAL
TZELTAL OXCHUC
TZELTAL
TZOTZIL SAN ANDRES
ZINACANTAN TZOTZIL
LSP
10
ASJP: Automatic Reconstruction
123
RABINAL ACHI
SAKAPULTEKO
KAQCHIKEL NORTHERN
SIPAKAPENSE
TZUTUJIL
QUICHE
USPANTEKO
KEKCHI
POQOMCHI WESTERN
POCOMAM
TEKTITEKO
AGUACATEC
Correlation:
MAM
IXIL CHAJUL
MOCHO
JACALTEC
QANJOBAL EASTERN
ETHN
AKATEKO
CHUJ
.325**
CHICOMUCELTEC
HUASTEC
(n = 69)
MAYA YUCATAN
ITZAJ
LACANDON
MOPAN
CHOL TILA
CHOL
CHONTAL TABASCO
CHORTI
TOJOLABAL
TZELTAL OXCHUC
(n = 34)
TZELTAL
TZOTZIL SAN ANDRES
ZINACANTAN TZOTZIL
LSP
10
ASJP: Automatic Reconstruction
124
RABINAL ACHI
SAKAPULTEKO
KAQCHIKEL NORTHERN
SIPAKAPENSE
TZUTUJIL
QUICHE
USPANTEKO
KEKCHI
POQOMCHI WESTERN
POCOMAM
TEKTITEKO
AGUACATEC
Correlation:
MAM
IXIL CHAJUL
MOCHO
JACALTEC
QANJOBAL EASTERN
ETHN
AKATEKO
CHUJ
.325**
CHICOMUCELTEC
HUASTEC
MAYA YUCATAN
ITZAJ
LACANDON
MOPAN
CHOL TILA
CHOL
CHONTAL TABASCO
CHORTI
TOJOLABAL
TZELTAL OXCHUC
TZELTAL
TZOTZIL SAN ANDRES
More structure than ETHN
ZINACANTAN TZOTZIL
LSP
10
ASJP: Automatic Reconstruction
125
RABINAL ACHI
SAKAPULTEKO
KAQCHIKEL NORTHERN
SIPAKAPENSE
TZUTUJIL
QUICHE
USPANTEKO
KEKCHI
POQOMCHI WESTERN
POCOMAM
TEKTITEKO
AGUACATEC
Correlation:
MAM
IXIL CHAJUL
MOCHO
JACALTEC
QANJOBAL EASTERN
ETHN
AKATEKO
CHUJ
.325**
CHICOMUCELTEC
HUASTEC
MAYA YUCATAN
Separation
ITZAJ
LACANDON
MOPAN
CHOL TILA
CHOL
CHONTAL TABASCO
CHORTI
TOJOLABAL
TZELTAL OXCHUC
TZELTAL
TZOTZIL SAN ANDRES
ZINACANTAN TZOTZIL
LSP
10
ASJP: Automatic Reconstruction
126
RABINAL ACHI
SAKAPULTEKO
Levenshtein
SIPAKAPENSE
KAQCHIKEL NORTHERN
USPANTEKO
TZUTUJIL
QUICHE
POQOMCHI WESTERN
POCOMAM
KEKCHI
TEKTITEKO
AGUACATEC
MAM
IXIL CHAJUL
MOCHO
JACALTEC
QANJOBAL EASTERN
AKATEKO
CHUJ
TOJOLABAL
TZELTAL OXCHUC
TZELTAL
TZOTZIL SAN ANDRES
ZINACANTAN TZOTZIL
CHOL TILA
CHOL
CHONTAL TABASCO
CHORTI
MOPAN
LACANDON
MAYA YUCATAN
ITZAJ
CHICOMUCELTEC
HUASTEC
LDND
10
ASJP: Automatic Reconstruction
127
RABINAL ACHI
SAKAPULTEKO
Levenshtein
SIPAKAPENSE
KAQCHIKEL NORTHERN
USPANTEKO
TZUTUJIL
QUICHE
POQOMCHI WESTERN
POCOMAM
KEKCHI
TEKTITEKO
AGUACATEC
MAM
IXIL CHAJUL
Correlation:
MOCHO
JACALTEC
QANJOBAL EASTERN
ETHN
AKATEKO
CHUJ
.195**
TOJOLABAL
TZELTAL OXCHUC
TZELTAL
TZOTZIL SAN ANDRES
ZINACANTAN TZOTZIL
CHOL TILA
CHOL
CHONTAL TABASCO
CHORTI
MOPAN
LACANDON
MAYA YUCATAN
ITZAJ
CHICOMUCELTEC
HUASTEC
LDND
10
ASJP: Automatic Reconstruction
128
RABINAL ACHI
SAKAPULTEKO
Levenshtein
SIPAKAPENSE
KAQCHIKEL NORTHERN
USPANTEKO
TZUTUJIL
QUICHE
POQOMCHI WESTERN
POCOMAM
KEKCHI
TEKTITEKO
AGUACATEC
MAM
IXIL CHAJUL
Correlation:
MOCHO
JACALTEC
QANJOBAL EASTERN
ETHN
AKATEKO
CHUJ
.195**
TOJOLABAL
TZELTAL OXCHUC
TZELTAL
TZOTZIL SAN ANDRES
(LSP = .325)
ZINACANTAN TZOTZIL
CHOL TILA
CHOL
CHONTAL TABASCO
CHORTI
MOPAN
LACANDON
MAYA YUCATAN
ITZAJ
CHICOMUCELTEC
HUASTEC
LDND
10
ASJP: Automatic Reconstruction
129
RABINAL ACHI
RABINAL ACHI
SAKAPULTEKO
SAKAPULTEKO
SIPAKAPENSE
KAQCHIKEL NORTHERN
KAQCHIKEL NORTHERN
SIPAKAPENSE
USPANTEKO
TZUTUJIL
TZUTUJIL
QUICHE
QUICHE
USPANTEKO
POQOMCHI WESTERN
KEKCHI
POCOMAM
POQOMCHI WESTERN
KEKCHI
POCOMAM
TEKTITEKO
TEKTITEKO
AGUACATEC
AGUACATEC
MAM
MAM
IXIL CHAJUL
IXIL CHAJUL
MOCHO
MOCHO
JACALTEC
JACALTEC
QANJOBAL EASTERN
QANJOBAL EASTERN
AKATEKO
AKATEKO
CHUJ
CHUJ
TOJOLABAL
CHICOMUCELTEC
TZELTAL OXCHUC
HUASTEC
TZELTAL
MAYA YUCATAN
TZOTZIL SAN ANDRES
ITZAJ
ZINACANTAN TZOTZIL
LACANDON
CHOL TILA
MOPAN
CHOL
CHOL TILA
CHONTAL TABASCO
CHOL
CHORTI
CHONTAL TABASCO
MOPAN
CHORTI
LACANDON
TOJOLABAL
MAYA YUCATAN
TZELTAL OXCHUC
ITZAJ
TZELTAL
CHICOMUCELTEC
TZOTZIL SAN ANDRES
HUASTEC
ZINACANTAN TZOTZIL
ASJP
10
LDND
10
ASJP: Automatic Reconstruction
130
RABINAL ACHI
RABINAL ACHI
SAKAPULTEKO
SAKAPULTEKO
SIPAKAPENSE
KAQCHIKEL NORTHERN
KAQCHIKEL NORTHERN
SIPAKAPENSE
USPANTEKO
TZUTUJIL
TZUTUJIL
QUICHE
QUICHE
USPANTEKO
POQOMCHI WESTERN
KEKCHI
POCOMAM
POQOMCHI WESTERN
KEKCHI
POCOMAM
TEKTITEKO
TEKTITEKO
AGUACATEC
AGUACATEC
MAM
MAM
IXIL CHAJUL
IXIL CHAJUL
MOCHO
MOCHO
JACALTEC
JACALTEC
QANJOBAL EASTERN
QANJOBAL EASTERN
AKATEKO
AKATEKO
CHUJ
CHUJ
TOJOLABAL
CHICOMUCELTEC
TZELTAL OXCHUC
HUASTEC
TZELTAL
MAYA YUCATAN
TZOTZIL SAN ANDRES
ITZAJ
ZINACANTAN TZOTZIL
LACANDON
CHOL TILA
MOPAN
CHOL TILA
CHOL
cholan
CHOL
CHONTAL TABASCO
CHORTI
CHONTAL TABASCO
MOPAN
CHORTI
LACANDON
TOJOLABAL
MAYA YUCATAN
TZELTAL OXCHUC
ITZAJ
TZELTAL
CHICOMUCELTEC
TZOTZIL SAN ANDRES
HUASTEC
ZINACANTAN TZOTZIL
ASJP
10
LDND
10
ASJP: Automatic Reconstruction
131
RABINAL ACHI
RABINAL ACHI
SAKAPULTEKO
SAKAPULTEKO
SIPAKAPENSE
KAQCHIKEL NORTHERN
KAQCHIKEL NORTHERN
SIPAKAPENSE
USPANTEKO
TZUTUJIL
TZUTUJIL
QUICHE
QUICHE
USPANTEKO
POQOMCHI WESTERN
KEKCHI
POCOMAM
POQOMCHI WESTERN
KEKCHI
POCOMAM
TEKTITEKO
TEKTITEKO
AGUACATEC
AGUACATEC
MAM
MAM
IXIL CHAJUL
IXIL CHAJUL
MOCHO
MOCHO
JACALTEC
JACALTEC
QANJOBAL EASTERN
QANJOBAL EASTERN
AKATEKO
AKATEKO
CHUJ
CHUJ
TOJOLABAL
CHICOMUCELTEC
TZELTAL OXCHUC
HUASTEC
TZELTAL
MAYA YUCATAN
TZOTZIL SAN ANDRES
ITZAJ
ZINACANTAN TZOTZIL
LACANDON
CHOL TILA
MOPAN
CHOL TILA
CHOL
cholan
CHOL
CHONTAL TABASCO
CHORTI
CHONTAL TABASCO
MOPAN
CHORTI
LACANDON
TOJOLABAL
MAYA YUCATAN
TZELTAL OXCHUC
TZELTAL
TZOTZIL SAN ANDRES
tzeltalan
ITZAJ
CHICOMUCELTEC
HUASTEC
ZINACANTAN TZOTZIL
ASJP
10
LDND
10
ASJP: Automatic Reconstruction
132
RABINAL ACHI
RABINAL ACHI
SAKAPULTEKO
SAKAPULTEKO
SIPAKAPENSE
KAQCHIKEL NORTHERN
KAQCHIKEL NORTHERN
SIPAKAPENSE
USPANTEKO
TZUTUJIL
TZUTUJIL
QUICHE
QUICHE
USPANTEKO
POQOMCHI WESTERN
KEKCHI
POCOMAM
POQOMCHI WESTERN
KEKCHI
POCOMAM
TEKTITEKO
TEKTITEKO
AGUACATEC
AGUACATEC
MAM
MAM
IXIL CHAJUL
IXIL CHAJUL
MOCHO
MOCHO
JACALTEC
JACALTEC
QANJOBAL EASTERN
QANJOBAL EASTERN
AKATEKO
AKATEKO
CHUJ
CHUJ
TOJOLABAL
CHICOMUCELTEC
TZELTAL OXCHUC
HUASTEC
TZELTAL
MAYA YUCATAN
TZOTZIL SAN ANDRES
ITZAJ
ZINACANTAN TZOTZIL
LACANDON
CHOL TILA
MOPAN
CHOL TILA
CHOL
cholan
CHOL
CHONTAL TABASCO
CHORTI
CHONTAL TABASCO
MOPAN
CHORTI
LACANDON
TOJOLABAL
MAYA YUCATAN
TZELTAL OXCHUC
TZELTAL
TZOTZIL SAN ANDRES
tzeltalan
ITZAJ
CHICOMUCELTEC
HUASTEC
ZINACANTAN TZOTZIL
ASJP
10
LDND
10
ASJP: Automatic Reconstruction
133
RABINAL ACHI
RABINAL ACHI
SAKAPULTEKO
SAKAPULTEKO
SIPAKAPENSE
KAQCHIKEL NORTHERN
KAQCHIKEL NORTHERN
SIPAKAPENSE
USPANTEKO
TZUTUJIL
TZUTUJIL
QUICHE
QUICHE
USPANTEKO
POQOMCHI WESTERN
KEKCHI
POCOMAM
POQOMCHI WESTERN
KEKCHI
POCOMAM
TEKTITEKO
TEKTITEKO
AGUACATEC
AGUACATEC
MAM
MAM
IXIL CHAJUL
IXIL CHAJUL
MOCHO
MOCHO
JACALTEC
JACALTEC
QANJOBAL EASTERN
QANJOBAL EASTERN
AKATEKO
AKATEKO
CHUJ
CHUJ
TOJOLABAL
CHICOMUCELTEC
TZELTAL OXCHUC
HUASTEC
TZELTAL
MAYA YUCATAN
TZOTZIL SAN ANDRES
ITZAJ
ZINACANTAN TZOTZIL
LACANDON
CHOL TILA
MOPAN
CHOL
CHOL TILA
CHONTAL TABASCO
CHOL
CHORTI
CHONTAL TABASCO
MOPAN
CHORTI
MAYA YUCATAN
TZELTAL OXCHUC
ITZAJ
TZELTAL
CHICOMUCELTEC
TZOTZIL SAN ANDRES
HUASTEC
ZINACANTAN TZOTZIL
ASJP
10
yucatecan
LACANDON
TOJOLABAL
LDND
10
ASJP: Automatic Reconstruction
134
RABINAL ACHI
RABINAL ACHI
SAKAPULTEKO
SAKAPULTEKO
SIPAKAPENSE
KAQCHIKEL NORTHERN
KAQCHIKEL NORTHERN
SIPAKAPENSE
USPANTEKO
TZUTUJIL
TZUTUJIL
QUICHE
QUICHE
USPANTEKO
POQOMCHI WESTERN
KEKCHI
POCOMAM
POQOMCHI WESTERN
KEKCHI
POCOMAM
TEKTITEKO
TEKTITEKO
AGUACATEC
AGUACATEC
MAM
MAM
IXIL CHAJUL
IXIL CHAJUL
MOCHO
MOCHO
JACALTEC
JACALTEC
QANJOBAL EASTERN
QANJOBAL EASTERN
AKATEKO
AKATEKO
CHUJ
CHUJ
TOJOLABAL
CHICOMUCELTEC
TZELTAL OXCHUC
HUASTEC
TZELTAL
MAYA YUCATAN
TZOTZIL SAN ANDRES
ITZAJ
ZINACANTAN TZOTZIL
LACANDON
CHOL TILA
MOPAN
CHOL
CHOL TILA
CHONTAL TABASCO
CHOL
CHORTI
CHONTAL TABASCO
MOPAN
CHORTI
LACANDON
TOJOLABAL
MAYA YUCATAN
TZELTAL OXCHUC
ITZAJ
TZELTAL
CHICOMUCELTEC
TZOTZIL SAN ANDRES
HUASTEC
ZINACANTAN TZOTZIL
ASJP
10
LDND
10
ASJP: Automatic Reconstruction
135
RABINAL ACHI
RABINAL ACHI
SAKAPULTEKO
SAKAPULTEKO
SIPAKAPENSE
KAQCHIKEL NORTHERN
KAQCHIKEL NORTHERN
SIPAKAPENSE
USPANTEKO
TZUTUJIL
TZUTUJIL
QUICHE
QUICHE
USPANTEKO
POQOMCHI WESTERN
KEKCHI
POCOMAM
POQOMCHI WESTERN
KEKCHI
POCOMAM
TEKTITEKO
TEKTITEKO
AGUACATEC
AGUACATEC
MAM
MAM
IXIL CHAJUL
IXIL CHAJUL
MOCHO
MOCHO
JACALTEC
JACALTEC
QANJOBAL EASTERN
QANJOBAL EASTERN
AKATEKO
AKATEKO
CHUJ
CHUJ
TOJOLABAL
CHICOMUCELTEC
TZELTAL OXCHUC
HUASTEC
TZELTAL
MAYA YUCATAN
TZOTZIL SAN ANDRES
ITZAJ
ZINACANTAN TZOTZIL
LACANDON
CHOL TILA
MOPAN
CHOL
CHOL TILA
CHONTAL TABASCO
CHOL
CHORTI
CHONTAL TABASCO
MOPAN
CHORTI
LACANDON
TOJOLABAL
MAYA YUCATAN
TZELTAL OXCHUC
ITZAJ
TZELTAL
CHICOMUCELTEC
TZOTZIL SAN ANDRES
HUASTEC
ZINACANTAN TZOTZIL
ASJP
10
LDND
10
ASJP: Automatic Reconstruction
136
NLGS
LSP
LDND
Altaic
30
.723
.688
Maya
34
.325
.195
Afro-Asiatic
128
.147
.172
Trans New-Guinea
148
.294
.325
Niger-Congo
379
.089
.125
**all significant > 0.01
ASJP: Automatic Reconstruction
137
NLGS
LSP
LDND
Altaic
30
.723
.688
Maya
34
.325
.195
Afro-Asiatic
128
.147
.172
Trans New-Guinea
148
.294
.325
Niger-Congo
379
.089
.125
**all significant > 0.01
ASJP: Automatic Reconstruction
138
Improving the fit
Enrich lexical with typological data:
ASJP: Automatic Reconstruction
139
Swadesh
(2440)
~
WALS
(2580)
ASJP
distance
matrices
TREE
SFTW
ASJP: Automatic Reconstruction
140
SWALSH
(2440)
ASJP
distance
matrices
TREE
SFTW
ASJP: Automatic Reconstruction
141
Improving the fit
Enrich lexical with typological data:
ASJP: Automatic Reconstruction
142
Improving the fit
Enrich lexical with typological data:
- NOT 1:1 with ASJP languages
ASJP: Automatic Reconstruction
143
SWALSH
(550)
ASJP
distance
matrices
TREE
SFTW
ASJP: Automatic Reconstruction
144
Improving the fit
Enrich lexical with typological data:
- NOT 1:1 with ASJP languages
- WALS variables very unevenly spread
ASJP: Automatic Reconstruction
145
Improving the fit
Enrich lexical with typological data:
- NOT 1:1 with ASJP languages
- WALS variables very unevenly spread
- Maximum subset: 85 most stable
ASJP: Automatic Reconstruction
146
Most stable WALS variables
WALS Description
Variable
Stability
Within
Genus
31
Sex-based and Non-sex-based Gender
Systems
0.81
118
Predicative Adjectives
0.74
30
Number of Genders
0.73
119
Nominal and Locational Predication
0.71
29
Syncretism in Verbal Person/Number
Marking
0.71
ASJP: Automatic Reconstruction
147
Improving the fit
Enrich lexical with typological data:
- Maximum subset: 85 most stable
ASJP: Automatic Reconstruction
148
Improving the fit
Enrich lexical with typological data:
- Maximum subset: 85 most stable
- Correlation with Swadesh: 0.063 (> 0.001)
?
ASJP: Automatic Reconstruction
149
Improving the fit
Enrich lexical with typological data:
- Maximum subset: 85 most stable
- Correlation with Swadesh: 0.063 (> 0.001)
- Mantel Test: 10.000 simulations:
ASJP: Automatic Reconstruction
150
Improving the fit
Enrich lexical with typological data:
- Maximum subset: 85 most stable
- Correlation with Swadesh: 0.063 (> 0.001)
- Mantel Test: 10.000 simulations:
best +0.050 < > - 0.043 (mean 0.009)
ASJP: Automatic Reconstruction
151
Improving the fit
Enrich lexical with typological data:
- Database 40 most stable Swadesh +
85 most stable WALS features
ASJP: Automatic Reconstruction
152
Improving the fit
Enrich lexical with typological data:
- Database 40 most stable Swadesh +
85 most stable WALS features
- Optimal weight of both?
ASJP: Automatic Reconstruction
153
i
t
e
l
a
Improving the fit
.
2
5
0
0
0
0
.
2
2
5
0
0
0
.
2
0
0
0
0
0
.
1
7
5
0
0
C
o
r
r
0
0
2
5
P
5
e
0
r
7
5
c
e
ASJP: Automatic Reconstruction
1
0
n
0
t
a
g
e
154
i
t
e
l
a
Improving the fit
.
2
5
0
0
0
0
.
2
2
5
0
0
0
.
2
0
0
0
0
0
.
1
7
5
0
0
C
o
r
r
0
0
2
5
P
5
e
0
r
7
5
c
e
ASJP: Automatic Reconstruction
1
0
n
0
t
a
g
e
155
i
t
e
l
a
Improving the fit
.
2
5
0
0
0
0
.
2
2
5
0
0
0
.
2
0
0
0
0
0
.
1
7
5
0
0
C
o
r
r
0
0
2
5
P
5
e
0
r
7
5
c
e
ASJP: Automatic Reconstruction
1
0
n
0
t
a
g
e
156
4. On Inheritance vs Borrowing
ASJP: Automatic Reconstruction
157
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
ASJP: Automatic Reconstruction
158
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I
YOU
HORN
FIRE
FULL
NEW
:
:
:
:
:
:
dun=zun
mun=wun
tLar=k"arC
c"a=c"a
c"ura=ac"uf
c"iya=c"EyEr
*
*
*
*
*
*
LDND=36.6
LDND=36.6
LDND=66.0
LDND= 0.0
LDND=66.0
LDND=55.0
ASJP: Automatic Reconstruction
159
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I
YOU
HORN
FIRE
FULL
NEW
:
:
:
:
:
:
dun=zun
mun=wun
tLar=k"arC
c"a=c"a
c"ura=ac"uf
c"iya=c"EyEr
*
*
*
*
*
*
LDND=36.6
LDND=36.6
LDND=66.0
LDND= 0.0
LDND=66.0
LDND=55.0
 6 items < 70.0
ASJP: Automatic Reconstruction
160
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I
YOU
HORN
FIRE
FULL
NEW
:
:
:
:
:
:
dun=zun
mun=wun
tLar=k"arC
c"a=c"a
c"ura=ac"uf
c"iya=c"EyEr
*
*
*
*
*
*
LDND=36.6
LDND=36.6
LDND=66.0
LDND= 0.0
LDND=66.0
LDND=55.0
 6 items < 70.0  Genetically related !!
ASJP: Automatic Reconstruction
161
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ASJP: Automatic Reconstruction
162
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
ASJP: Automatic Reconstruction
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=15.8
LDND=27.6
LDND=44.2
LDND=44.2
163
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=15.8
LDND=27.6
LDND=44.2
LDND=44.2
 6 items < 70.0
ASJP: Automatic Reconstruction
164
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=15.8
LDND=27.6
LDND=44.2
LDND=44.2
NOT Related: Chance?
ASJP: Automatic Reconstruction
165
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=15.8
LDND=27.6
LDND=44.2
LDND=44.2
NOT Related: Chance? Or Borrowing?
ASJP: Automatic Reconstruction
166
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
ASJP: Automatic Reconstruction
167
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
SPA:
: estreya=estrecas
f/g=
* LDND=27.6
0.17/0.82
(= % < 0.70)
ASJP: Automatic Reconstruction
168
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
SPA <> CHA:
f/g=
* LDND=27.6
0.17/0.82
0.00/0.00
ASJP: Automatic Reconstruction
169
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
SPA > CHA:
f/g=
* LDND=27.6
0.17/0.82 >
0.00/0.00
ASJP: Automatic Reconstruction
170
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
SPA > CHA:
f/g=
SPA <> CHA:
wwF=
* LDND=27.6
0.17/0.82 > 0.00/0.00
ASJP: Automatic Reconstruction
171
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
SPA > CHA:
f/g=
0.17/0.82 > 0.00/0.00
SPA:
wwF= 83 (= mean LDND estreya in IE)
ASJP: Automatic Reconstruction
172
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
SPA > CHA:
f/g=
0.17/0.82 > 0.00/0.00
SPA:
wwF= 83-99 (= mean estreya in AU)
ASJP: Automatic Reconstruction
173
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
SPA > CHA:
f/g=
0.17/0.82 > 0.00/0.00
SPA <> CHA:
wwF= 83-99 <> 102 (= mn estrecas / AU)
ASJP: Automatic Reconstruction
174
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
SPA > CHA:
f/g=
0.17/0.82 > 0.00/0.00
SPA <> CHA:
wwF= 83-99 <> 102-85 (= estrecas / IE)
ASJP: Automatic Reconstruction
175
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
SPA > CHA:
f/g=
0.17/0.82 > 0.00/0.00
SPA <> CHA:
wwF= 83-99 <> 102-85
ASJP: Automatic Reconstruction
176
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
SPA > CHA:
f/g=
0.17/0.82 > 0.00/0.00
SPA > CHA:
wwF= 83-99 > 102-85
ASJP: Automatic Reconstruction
177
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
SPA > CHA:
f/g=
0.17/0.82 > 0.00/0.00
SPA > CHA:
wwF= 83-99 > 102-85
SPA <> CHA:
phwF=
ASJP: Automatic Reconstruction
178
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
SPA > CHA:
f/g=
0.17/0.82 > 0.00/0.00
SPA > CHA:
wwF= 83-99 > 102-85
SPA:
phwF=100.00 (phon estreya in IE / AU)
ASJP: Automatic Reconstruction
179
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
SPA > CHA:
f/g=
0.17/0.82 > 0.00/0.00
SPA > CHA:
wwF= 83-99 > 102-85
SPA<> CHA:
phwF=100.00 <> 0.52
(phon estrecas in AU/ IE )
ASJP: Automatic Reconstruction
180
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
SPA > CHA:
f/g=
0.17/0.82 > 0.00/0.00
SPA > CHA:
wwF= 83-99 > 102-85
SPA > CHA:
phwF=100.00 > 0.52
ASJP: Automatic Reconstruction
181
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE (12)
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
* LDND=27.6
SPA > CHA:
f/g=
0.17/0.82 > 0.00/0.00
SPA > CHA:
wwF= 83-99 > 102-85
SPA > CHA:
phwF=100.00 > 0.52
SYN: CHA= puti7on (f: 1.00)
ASJP: Automatic Reconstruction
182
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
SPA > CHA
* LDND=36.9
f/g= 0.24/0.82 > 0.03/0.00
wwF= 97-106 > 110-97
phwF= 12.00 > 0.44
ASJP: Automatic Reconstruction
183
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
TWO
: dos=dos
SPA > CHA
f/g=
* LDND= 0.0
0.62/1.00 > 0.12/0.00
wwF= 78-99 > 102-78
phwF=100.00 > 0.22
ASJP: Automatic Reconstruction
184
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
NIGHT : noCe=noces
SPA > CHA
f/g=
* LDND=44.2
0.23/0.55 > 0.04/0.00
wwF= 89-100 > 105-92
phwF=100.00 > 0.10
ASJP: Automatic Reconstruction
185
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
NEW
: nuevo=nueba
SPA > CHA
f/g=
* LDND=44.2
0.50/0.64 > 0.04/0.00
wwF= 68-104 > 105-80
phwF=4.27 > 0.03
ASJP: Automatic Reconstruction
186
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
PERSON : persona=petsona
SPA > CHA
f/g=
* LDND=15.8
0.20/0.64 > 0.01/0.00
wwF= 89-98 > 98-90
phwF=32.40 > 0.13
SYN: CHA= taotao (f: 1.00)
ASJP: Automatic Reconstruction
187
Inherited or borrowed?
Further output filters:
ASJP: Automatic Reconstruction
188
Inherited or borrowed?
Further output filters:
1. Minimum N potential borrowings
ASJP: Automatic Reconstruction
189
Inherited or borrowed?
Further output filters:
1. Minimum N potential borrowings
2. All in the same direction
ASJP: Automatic Reconstruction
190
Inherited or borrowed?
Further output filters:
1. Minimum N potential borrowings
2. All in the same direction
3. Geographic information
ASJP: Automatic Reconstruction
191
Inherited or borrowed?
SPANISH (spa)
INDO-EUROPEAN (128) > ROMANCE (12)
EURASIA
SPAIN
VS.
CHAMORRO (cha)
AUSTRONESIAN (678) > CHAMORRO
OCEANIA
GUAM
[GEODIST=13244; GENDIST=3.00]
ASJP: Automatic Reconstruction
192
‘Spaniards in Pacific since 16th century’
HIST
FACTS
GEO
GRAPH
ETHN
WALS
EXPRT
Swadesh
(2440)
ASJP1
ASJP2
distance
matrices
MAP
SFTW
STAT
SFTW
TREE
SFTW
ASJP: Automatic Reconstruction
193
Inherited or borrowed?
Further output filters:
1. Minimum N potential borrowings
2. All in the same direction
3. Geographic information
4. Role of form and meaning (?)
ASJP: Automatic Reconstruction
194
Inherited or borrowed?
Further output filters:
1. Minimum N potential borrowings
2. All in the same direction
3. Geographic information
4. Role of form and meaning (?)
ASJP: Automatic Reconstruction
LWT
195
Borrowed!
BOR = spa TO cha 6 (=15.0%)
LDND = 76.63 (shared=40; crit=70.00 - U)
DATABASE:
unu(*spa)
dos(*spa)
petsona(*spa)
estrecas(*spa)
noces(*spa)
nueba(*spa)
ASJP: Automatic Reconstruction
196
5. Immanent extensions
ASJP: Automatic Reconstruction
197
ASJP: Automatic Reconstruction
198
GARBAGE IN

GARBAGE OUT
ASJP: Automatic Reconstruction
199
Lexical items: transcription
Second year of project (2008-9):
Replace ASJP code by full IPA representations
ASJP: Automatic Reconstruction
200
Lexical items: transcription
Second year of project (2008-9):
Replace ASJP code by full IPA representations
Juliette
Jeff
ASJP: Automatic Reconstruction
201
Lexical items: transcription
Second year of project (2008-9):
Problems with full IPA representation solved:
ASJP: Automatic Reconstruction
202
Lexical items: transcription
Second year of project (2008-9):
Problems with full IPA representation solved:
1. scan/download/… full IPA representations
ASJP: Automatic Reconstruction
203
Lexical items: transcription
Second year of project (2008-9):
Problems with full IPA representation solved:
1. scan/download/… full IPA representations
2. automatic conversion IPA to integer (Python)
ASJP: Automatic Reconstruction
204
Lexical items: transcription
Second year of project (2008-9):
Problems with full IPA representation solved:
1. scan/download/… full IPA representations
2. automatic conversion IPA to integer (Python)
3. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
ASJP: Automatic Reconstruction
205
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
ASJP: Automatic Reconstruction
206
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
IPA:
ʕʷɨʧʼʲʷʕʷɨs
ASJP: Automatic Reconstruction
207
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661,695,616,679,700,690,695,661,695,616,115
ASJP: Automatic Reconstruction
208
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661,695,616,679,700,690,695,661,695,616,115
ASJPcode:
88,119,126,51,67,34,121,119,126,88,119,126,51 115
( = Xw~3Cw"y~Xw~3s)
ASJP: Automatic Reconstruction
209
Lexical items: transcription
Second year of project (2008-9):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
Why not run on full IPA??
ASJP: Automatic Reconstruction
210
Lexical items: transcription
Second year of project (2008):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
Caucasian: correlations IPA ~ ASJP > 0.9
ASJP: Automatic Reconstruction
211
Lexical items: transcription
Second year of project (2008):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
- correlations IPA ~ ASJP > 0.9
- but: ASJP better fit with classifications
 IPA too specific
ASJP: Automatic Reconstruction
212
Lexical items: transcription
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661,695,616,679,700,690,695,661,695,616,115
‘a’ <- 661, 895, 416, …
formal grammar
ASJP++code: ( = any unicode subset )
ASJP: Automatic Reconstruction
213
Lexical items: transcription
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661,695,616,679,700,690,695,661,695,616,115
‘a’ <- 661, 895, 416, …
…
C [-V] <- C [+V] / - #
C [+V] <- C [-V, +PL] / - C [+V]
formal grammar
ASJP++code: ( = any unicode subset )
ASJP: Automatic Reconstruction
214
Lexical items: transcription
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661,695,616,679,700,690,695,661,695,616,115
‘a’ <- 661, 895, 416, …
…
C [-V] <- C [+V] / - #
C [+V] <- C [-V, +PL] / - C [+V]
optimal level
of abstraction
for historical
phonological
reconstruction?
ASJP++code: ( = any unicode subset )
ASJP: Automatic Reconstruction
215
HIST
FACTS
GEO
GRAPH
ETHN
WALS
EXP
ASJP1
ASJP2
distance
matrices
MAP
SFTW
STAT
SFTW
Phon
Invent
Swadesh
Borrowing!
TREE
SFTW
ASJP: Automatic Reconstruction
216
Lexical items: transcription
NLGS
LSP
LDND
Altaic
30
.723
.688
Maya
34
.325
.195
Afro-Asiatic
128
.147
.172
Trans New-Guinea
148
.294
.325
Niger-Congo
379
.089
.125
**all significant > 0.01
ASJP: Automatic Reconstruction
217
Lexical items: transcription
NLGS
PHON
Altaic
30
.723+
Maya
34
.325+
Afro-Asiatic
128
.172+
Trans New-Guinea
148
.325+
Niger-Congo
379
.125+
**all significant > 0.01
ASJP: Automatic Reconstruction
218
- Holman, Eric et al. (2008).
Advances in automated language classification.
In Arppe, Antti, Kaius Sinnemäki and Urpu
Nikanne (eds.), Quantitative Investigations in Theoretical
Linguistics, 40-43. Helsinki: University of Helsinki.
- Holman et al. (forthc. 2008)
Explorations in automated language classification.
Folia Linguistica
- Brown et al. (forthc. 2008)
Automated Classification of the World’s languages:
A description of the method and prelimary results
Sprachtypologie und Universalienforschung
- Bakker et al. (2009?)
Using WALS for the ASJP project
ASJP: Automatic Reconstruction
219
email.eva.mpg.de./~wichmann/ASJPHomePage
ASJP: Automatic Reconstruction
220
?
ASJP: Automatic Reconstruction
221