Semantic Parameters of the Grammaticalization of Agreement

Download Report

Transcript Semantic Parameters of the Grammaticalization of Agreement

Advances in
Automated
Language
Classification
ASJP Consortium
Dik Bakker, Lancaster
Overview
Project:
ASJP (Automated Similarity Judgment Program)
ASJP: Automatic Reconstruction
2
Overview
Project:
ASJP (Automated Similarity Judgment Program)
LANGUAGE
NUMBERS
ASJP: Automatic Reconstruction
3
Overview
Project:
ASJP (Automated Similarity Judgment Program)
ASJP: Automatic Reconstruction
4
Overview
Project:
ASJP are:
Sören Wichmann (BRD; Netherlands)
Viveka Velupillai (BRD)
André Müller (BRD)
Robert Mailhammer (BRD)
Hagen Jung (BRD)
Eric Holman (US)
Anthony Grant (UK)
Dmitry Egorov (Russia)
Pamela Brown (US)
Cecil Brown (US)
Dik Bakker (UK; Netherlands)
ASJP: Automatic Reconstruction
5
Overview
Project:
ASJP (Automated Similarity Judgment Program)
ASJP: Automatic Reconstruction
6
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Overall goal:
Automatic reconstruction of language relationships
ASJP: Automatic Reconstruction
7
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Overall goal:
Automatic reconstruction of language relationships
Basis:
Distance matrix between individual languages on basis of
linguistic features
ASJP: Automatic Reconstruction
8
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Overall goal:
Automatic reconstruction of language relationships
Basis:
Distance matrix between individual languages on basis of
linguistic features
Method:
Lexicostatistics: mass comparison of basic lexical items,
extended by typological data
ASJP: Automatic Reconstruction
9
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals (a.o):
ASJP: Automatic Reconstruction
10
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
ASJP: Automatic Reconstruction
11
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
ASJP: Automatic Reconstruction
12
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Search for (ir)regularities in phylogenies
ASJP: Automatic Reconstruction
13
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
ASJP: Automatic Reconstruction
14
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find an optimal dating method
ASJP: Automatic Reconstruction
15
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find an optimal dating method
- Automatically detect borrowings
ASJP: Automatic Reconstruction
16
Overview
OVERALL GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find the best/optimal dating method
- Automatically detect borrowings
ASJP: Automatic Reconstruction
17
Overview
1. The list of basic lexical items
ASJP: Automatic Reconstruction
18
Overview
1. The list of basic lexical items
2. Comparing languages
ASJP: Automatic Reconstruction
19
Overview
1. The list of basic lexical items
2. Comparing languages
3. Some results: genetic and areal proximity
ASJP: Automatic Reconstruction
20
Overview
1. The list of basic lexical items
2. Comparing languages
3. Some results: genetic and areal proximity
4. On Inheritance vs Borrowing
ASJP: Automatic Reconstruction
21
Overview
1. The list of basic lexical items
2. Comparing languages
3. Some results: genetic and areal proximity
4. On Inheritance vs Borrowing
5. Conclusions
ASJP: Automatic Reconstruction
22
1. The list of basic lexical items
ASJP: Automatic Reconstruction
23
Lexical items
Word list: Swadesh 100 basic meanings
ASJP: Automatic Reconstruction
24
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
ASJP: Automatic Reconstruction
25
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
ASJP: Automatic Reconstruction
26
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
ASJP: Automatic Reconstruction
27
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
- Culturally independent
ASJP: Automatic Reconstruction
28
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
- Culturally independent
- Stable over time
ASJP: Automatic Reconstruction
29
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
- Culturally independent
- Stable over time
- Few synonyms
ASJP: Automatic Reconstruction
30
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
31
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
32
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
33
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
34
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
35
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
36
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
37
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
ASJP: Automatic Reconstruction
38
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
 Less work
ASJP: Automatic Reconstruction
39
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
 Less work
 Less missing data
ASJP: Automatic Reconstruction
40
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
 Less work
 Less missing data
 Faster processing; combinatorial explosion:
40 : 100
~
3 * 107 : 2 * 1010
ASJP: Automatic Reconstruction
41
Lexical items: stability
Determine most stable items:
ASJP: Automatic Reconstruction
42
Lexical items: stability
Determine most stable items:
Iteratively throw out the most unstable
item in terms of variation within genera
(3500-4000 years; Dryer 2001; 2005)
E.g. Germanic, Romance, …, Mayan, …, Sino-T
ASJP: Automatic Reconstruction
43
Lexical items: stability
Determine most stable items:
Iteratively throw out the most unstable
item in terms of variation within genera
(3500-4000 years; Dryer 2001; 2005)
E.g. Germanic, Romance, …, Mayan, …, Sino-T
Formula: S = (E - U)/(100 - U)
(weighted average % matches Eq vs Uneq)
ASJP: Automatic Reconstruction
44
Ethnologue (Goodmann-Kruskal)
WALS (Pearson)
++ < Stability > -ASJP: Automatic Reconstruction
45
I
dog
nose
die
smoke
you
louse
mouth
kill
fire
we
tree
tooth
swim
ash
this
seed
tongue
fly
burn
that
leaf
claw
walk
path
who
root
foot
come
mountain
what
bark
knee
lie
red
not
skin
hand
sit
green
all
flesh
belly
stand
yellow
many
blood
neck
give
white
one
bone
breasts
say
black
two
grease
heart
sun
night
big
egg
liver
moon
hot
long
horn
drink
star
cold
small
tail
eat
water
full
woman
feather
bite
rain
new
man
hair
see
stone
good
person
head
hear
sand
round
fish
ear
know
earth
dry
bird
eye
sleep
cloud
name
ASJP: Automatic Reconstruction
46
I
dog
nose
die
smoke
you
louse
mouth
kill
fire
we
tree
tooth
swim
ash
this
seed
tongue
fly
burn
that
leaf
claw
walk
path
who
root
foot
come
mountain
what
bark
knee
lie
red
not
skin
hand
sit
green
all
flesh
belly
stand
yellow
many
blood
neck
give
white
one
bone
breast
say
black
two
grease
heart
sun
night
big
egg
liver
moon
hot
long
horn
drink
star
cold
small
tail
eat
water
full
woman
feather
bite
rain
new
man
hair
see
stone
good
person
head
hear
sand
round
fish
ear
know
earth
dry
bird
eye
sleep
cloud
name
ASJP: Automatic Reconstruction
40
Most
Stable
47
I
dog
nose
die
smoke
you
louse
mouth
kill
fire
we
tree
tooth
swim
ash
this
seed
tongue
fly
burn
that
leaf
claw
walk
path
who
root
foot
come
mountain
what
bark
knee
lie
red
not
skin
hand
sit
green
all
flesh
belly
stand
yellow
many
blood
neck
give
white
one
bone
breast
say
black
two
grease
heart
sun
night
big
egg
liver
moon
hot
long
horn
drink
star
cold
small
tail
eat
water
full
woman
feather
bite
rain
new
man
hair
see
stone
good
person
head
hear
sand
round
fish
ear
know
earth
dry
bird
eye
sleep
cloud
name
ASJP: Automatic Reconstruction
40
Most
Stable
Home
page
48
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
ASJP: Automatic Reconstruction
49
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
ASJP: Automatic Reconstruction
50
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
- simple programming language (Fortran; Pascal)
ASJP: Automatic Reconstruction
51
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
- simple programming language (Fortran; Pascal)
 Recoding to simplified ASJPcode (only Ascii)
ASJP: Automatic Reconstruction
52
Lexical items: transcription
ASJPcode:
ASJP: Automatic Reconstruction
53
Lexical items: transcription
ASJPcode:
7 Vowels
ASJP: Automatic Reconstruction
54
Lexical items: transcription
ASJPcode:
7 Vowels
34 Consonants
ASJP: Automatic Reconstruction
55
Lexical items: transcription
ASJPcode:
7 Vowels
Closest sound
34 Consonants
ASJP: Automatic Reconstruction
56
Lexical items: transcription
ASJPcode:
7 Vowels
34 Consonants
Operators for:
Nasalization
Labialization
Palatalization
Aspiration
Glottalization
ASJP: Automatic Reconstruction
57
Abaza (Caucasian):
Meaning
PERSON
LEAF
SKIN
HORN
NOSE
TOOTH
ASJP: Automatic Reconstruction
58
Abaza (Caucasian):
Meaning
IPA
PERSON
ʕʷɨʧʼʲʷʕʷɨs
LEAF
bɣʲɨ
SKIN
ʧʷazʲ
HORN
ʧʼʷɨʕʷa
NOSE
pɨnʦʼa
TOOTH
pɨʦ
ASJP: Automatic Reconstruction
59
Abaza (Caucasian):
Meaning
IPA
ASJPcode
PERSON
ʕʷɨʧʼʲʷʕʷɨs
Xw~3Cw"yXw~3s
LEAF
bɣʲɨ
bxy~3
SKIN
ʧʷazʲ
Cw~azy~
HORN
ʧʼʷɨʕʷa
Cw"~3Xw~a
NOSE
pɨnʦʼa
p3nc"a
TOOTH
pɨʦ
p3c
ASJP: Automatic Reconstruction
60
Lexical items
Collected to date:
- Over 2100 languages, dialects and proto
ASJP: Automatic Reconstruction
61
Lexical items
Collected to date:
- Over 2100 languages, dialects and proto
- Mean number of items/language: 36.2 (/40)
ASJP: Automatic Reconstruction
62
Lexical items
Areal distribution (not a sample!):
Americas:
27%
Eurasia:
23%
Australia/PNG:
Austronesia:
Africa:
Creoles:
Artificial:
18%
15%
14%
2%
1%
ASJP: Automatic Reconstruction
63
Languages currently sampled
ASJP: Automatic Reconstruction
64
Lexical items: transcription
Second phase of project (2008):
Problems with full IPA representation solved:
ASJP: Automatic Reconstruction
65
Lexical items: transcription
Second phase of project (2008):
Problems with full IPA representation solved:
1. automatic conversion IPA to integer (Python)
ASJP: Automatic Reconstruction
66
Lexical items: transcription
Second phase of project (2008):
Problems with full IPA representation solved:
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
ASJP: Automatic Reconstruction
67
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
ASJP: Automatic Reconstruction
68
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
IPA:
ʕʷɨʧʼʲʷʕʷɨs
ASJP: Automatic Reconstruction
69
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661 695 616 679 700 690 695 661 695 616 115
ASJP: Automatic Reconstruction
70
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661 695 616 679 700 690 695 661 695 616 115
ASJPcode:
88 119 126 51 67 34 121 119 126 88 119 126 51 115
( = Xw~3Cw"y~Xw~3s)
ASJP: Automatic Reconstruction
71
Lexical items: transcription
Second phase of project (2008):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
Why not run on full IPA??
ASJP: Automatic Reconstruction
72
Lexical items: transcription
Second phase of project (2008):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
- correlations IPA ~ ASJP > 0.9
ASJP: Automatic Reconstruction
73
Lexical items: transcription
Second phase of project (2008):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
- correlations IPA ~ ASJP > 0.9
- but: ASJP better fit with classifications
 IPA too specific
ASJP: Automatic Reconstruction
74
Lexical items: transcription
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661 695 616 679 700 690 695 661 695 616 115
A  n661, n695, n616, …
…
PQABC
…
ZPQZ
formal grammar
ASJP++code: ( = any unicode subset )
ASJP: Automatic Reconstruction
75
Lexical items: transcription
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661 695 616 679 700 690 695 661 695 616 115
A  n661, n695, n616, …
…
PQABC
…
ZPQZ
optimal level
of abstraction
for historical
phonological
reconstruction?
ASJP++code: ( = any unicode subset )
ASJP: Automatic Reconstruction
76
2. Comparing languages
ASJP: Automatic Reconstruction
77
Comparing words
LG
I
YOU
WE
ABAZA
sErE
w3rE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
ASJP: Automatic Reconstruction
78
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDi=3
ASJP: Automatic Reconstruction
79
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDi=3
LDj=4
ASJP: Automatic Reconstruction
80
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDi=3
LDj=4
ASJP: Automatic Reconstruction
LDk=3
81
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDi=3
LDj=4
ASJP: Automatic Reconstruction
…
LDk=3
82
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDmean=3.73
LDi=3
LDj=4
ASJP: Automatic Reconstruction
…
LDk=3
83
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDmean=4.37
LDi=4
LDj=4
ASJP: Automatic Reconstruction
…
LDk=4
84
Comparing words
3.73
LG
I
YOU
WE
ABAZA
sErE
w3rE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
ASJP: Automatic Reconstruction
85
Comparing words
3.73
LG
I
YOU
WE
ABAZA
sErE
w3rE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
4.37
ASJP: Automatic Reconstruction
86
Comparing words
Levenshtein Distance
ASJP: Automatic Reconstruction
87
Comparing words
Levenshtein Distance
a. between 2 words:
Number of transformations to get from the shorter
form to the longer one (changes, additions)
ASJP: Automatic Reconstruction
88
Comparing words
Levenshtein Distance
a. between 2 words:
Number of transformations to get from the shorter
form to the longer one (changes, additions)
b. Between 2 languages:
E.g. mean LD for overlapping set (<= 40)
ASJP: Automatic Reconstruction
89
Comparing words
Levenshtein Distance
Two problems with simple LD:
ASJP: Automatic Reconstruction
90
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
ASJP: Automatic Reconstruction
91
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
 Normalize: LDN = ( LD / Lmax )
ASJP: Automatic Reconstruction
92
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
 Normalize: LDN = ( LD / Lmax )
2. Differences between lgs in phonological overlap
ASJP: Automatic Reconstruction
93
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
 Normalize: LDN = ( LD / Lmax )
2. Differences between lgs in phonological overlap
 Eliminate ‘noise’: LDND = ( LDN / LDNdifferent )
ASJP: Automatic Reconstruction
94
Comparing languages
Levenshtein Distance for Language Pair
-
Mean of all LDND’s of words in common
ASJP: Automatic Reconstruction
95
Comparing languages
Levenshtein Distance for Language Pair
-
Mean of all LDND’s of words in common
-
Synonyms (12%):
- take Minimum pair
- take Mean
ASJP: Automatic Reconstruction
96
Comparing languages
Levenshtein Distance for Language Pair
-
Mean of all LDND’s of words in common
-
Synonyms (12%):
- take Minimum pair
- take Mean
Experimental
option
ASJP: Automatic Reconstruction
97
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
98
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
99
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
100
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
101
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
102
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
103
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"ayif
* LDND=55.0
ALT: AGL= c"EyEr
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
104
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"ayif
* LDND=55.0
ALT: AGL= c"EyEr
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
105
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"ayif
* LDND=55.0
ALT: AGL= c"EyEr
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
106
Comparing languages
LANG1
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
LANG2
ARPITAN
GALICIAN
ARAGONESE
FRIULIAN
ROMANSH_SURSILVAN
ROMANIAN
LATIN
CATALAN
ITALIAN
PORTUGUESE
SPANISH
DANISH
BERNESE_GERMAN
CIMBRIAN
BRABANTIC
NORTH_FRISIAN_AMRUM
JAMTLANDIC
LIMBURGISH
OLD_HIGH_GERMAN
PLAUTDIETSCH
NORTHERN_LOW_SAXON
STELLINGWERFS
FRANS_VLAAMS
FAM1
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
ASJP: Automatic Reconstruction
FAM2
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
LDND
55.63
74.49
76.16
74.64
77.80
74.37
80.07
71.69
75.91
74.38
80.91
93.11
93.18
94.43
95.18
95.30
94.58
94.78
92.70
95.35
90.87
92.85
94.08
107
3. Some results: genetic and areal proximity
ASJP: Automatic Reconstruction
108
Distance Matrix (0.5 * N * (N-1))
FRE
DUT
GAL
PRT
ENG
…
FRE
DUT
90.93
GAL
71.62
90.00
PRT
74.38
94.61
51.87
ENG
91.17 63.19
91.30
95.18
…
< Excel file >
ASJP: Automatic Reconstruction
109
Tools for Trees
ASJP: Automatic Reconstruction
110
Tools for Trees
 Run data using phylogenetic software such as
SplitsTree (www.splitstree.org)
ASJP: Automatic Reconstruction
111
Tools for Trees
 Run data using phylogenetic software such as
SplitsTree (www.splitstree.org)
 Choose the most appropriate algorithm
(Neighbour Joining for distance data)
ASJP: Automatic Reconstruction
112
NeighborJoining
Salishan
Languages
(n=30)
ASJP: Automatic Reconstruction
113
NeighborJoining
Salishan
Languages
(n=30)
Existing
Classifications
ASJP: Automatic Reconstruction
114
NeighborJoining
NeighborJoining:
ASJP: Automatic Reconstruction
115
NeighborJoining
NeighborJoining:
- specifically meant for
phylogenetic trees
ASJP: Automatic Reconstruction
116
NeighborJoining
NeighborJoining:
- specifically meant for
phylogenetic trees
- does NOT assume equal rate
of change
ASJP: Automatic Reconstruction
117
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
ASJP: Automatic Reconstruction
118
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
- existing classifications (Ethnologue; WALS;
mainly the well-documented areas)
ASJP: Automatic Reconstruction
119
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
- existing classifications (Ethnologue; WALS;
mainly the well-documented areas)
- expert knowledge of specific areas
ASJP: Automatic Reconstruction
120
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
- existing classifications (Ethnologue; WALS;
mainly the well-documented areas)
- expert knowledge of specific areas
 diversion ±12%
ASJP: Automatic Reconstruction
121
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
- existing classifications (Ethnologue; WALS;
mainly the well-documented areas)
- expert knowledge of specific areas
 diversion ±12%  if resistant: niche!
ASJP: Automatic Reconstruction
122
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
ASJP: Automatic Reconstruction
123
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
ASJP: Automatic Reconstruction
124
Linguistically crucial events
Date
Historical event
Linguistic event
c. 250
Goths conquer Dacia
split of E-W Romance
4th c
Irish invade Scotland
split of Irish-Scottish Gaelic
5th c
German kingdoms in W
Roman Empire
breakup of W Romance
5th c
Germans invade Britain
split of English-Frisian
5th-6th c
Britons flee to Brittany
split of Welsh-Breton
400-600
Hieroglyphic evidence
Ch'olan begins to split
768-814
Name of Charlemagne
attested
Proto-Slavic
ASJP: Automatic Reconstruction
125
Linguistically crucial events
Date
Historical event
Linguistic event
c. 250
Goths conquer Dacia
split of E-W Romance
4th c
Irish invade Scotland
split of Irish-Scottish Gaelic
5th c
German kingdoms in W
Roman Empire
breakup of W Romance
5th c
Germans invade Britain
split of English-Frisian
5th-6th c
Britons flee to Brittany
split of Welsh-Breton
400-600
Hieroglyphic evidence
Ch'olan begins to split
768-814
Name of Charlemagne
attested
Proto-Slavic
ASJP: Automatic Reconstruction
126
Linguistically crucial events
Date
Historical event
Linguistic event
c. 250
Goths conquer Dacia
split of E-W Romance
4th c
Irish invade Scotland
split of Irish-Scottish Gaelic
5th c
German kingdoms in W
Roman Empire
breakup of W Romance
5th c
Germans invade Britain
split of English-Frisian
5th-6th c
Britons flee to Brittany
split of Welsh-Breton
400-600
Hieroglyphic evidence
Ch'olan begins to split
768-814
Name of Charlemagne
attested
Proto-Slavic
ASJP: Automatic Reconstruction
127
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
 Standard formula (Swadesh):
TimeDepth = log(Similarity) / 2 log Retention
ASJP: Automatic Reconstruction
128
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
 Standard formula:
TimeDepth = log(Similarity) / 2 log Retention
ASJP: Automatic Reconstruction
129
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
 Standard formula:
TimeDepth = log(LDND) / 2 log Retention
ASJP: Automatic Reconstruction
130
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
 Standard formula:
TimeDepth = log(LDND) / 2 log Retention
ASJP: Automatic Reconstruction
131
Linguistically crucial events
Time linguistic event
LDND
Ret
1.75 split of E-W Romance
0.6753
0.73
1.65 split of Irish-Scottish Gaelic
0.6687
0.72
1.55 breakup of W Romance
0.6411
0.72
1.55 split of English-Frisian
0.6574
0.71
1.50 split of Welsh-Breton
0.5705
0.75
1.40 Ch'olan begins to split
0.5369
0.76
1.21 Proto-Slavic
0.5877
0.69
MEAN:
0.73
ASJP: Automatic Reconstruction
132
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
- Standard formula:
TimeDepth = log(LDND) / 2 log 73
ASJP: Automatic Reconstruction
133
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
- Standard formula:
TimeDepth = log(LDND) / 2 log 73 < 75%
ASJP: Automatic Reconstruction
134
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
- Standard formula:
TimeDepth = log(LDND) / 2 log 73 < 75% < 85%
ASJP: Automatic Reconstruction
135
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
- Standard formula:
TimeDepth = log(LDND) / 2 log 73 < 75%
Deeper!
ASJP: Automatic Reconstruction
136
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
ASJP: Automatic Reconstruction
137
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
Add other linguistic domains …
ASJP: Automatic Reconstruction
138
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
Add other linguistic domains …
WALS Typological database
ASJP: Automatic Reconstruction
139
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
Add other linguistic domains …
WALS Typological database
Best result:
(75% 40 lex) + (25% 40 Ph/M/S features)
ASJP: Automatic Reconstruction
140
4. On Inheritance vs Borrowing
ASJP: Automatic Reconstruction
141
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
ASJP: Automatic Reconstruction
142
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I
YOU
HORN
FIRE
FULL
NEW
:
:
:
:
:
:
dun=zun
mun=wun
tLar=k"arC
c"a=c"a
c"ura=ac"uf
c"iya=c"EyEr
*
*
*
*
*
*
LDND=36.6
LDND=36.6
LDND=66.0
LDND= 0.0
LDND=66.0
LDND=55.0
ASJP: Automatic Reconstruction
143
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I
YOU
HORN
FIRE
FULL
NEW
:
:
:
:
:
:
dun=zun
mun=wun
tLar=k"arC
c"a=c"a
c"ura=ac"uf
c"iya=c"EyEr
*
*
*
*
*
*
LDND=36.6
LDND=36.6
LDND=66.0
LDND= 0.0
LDND=66.0
LDND=55.0
 6 items < 70.0
ASJP: Automatic Reconstruction
144
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I
YOU
HORN
FIRE
FULL
NEW
:
:
:
:
:
:
dun=zun
mun=wun
tLar=k"arC
c"a=c"a
c"ura=ac"uf
c"iya=c"EyEr
*
*
*
*
*
*
LDND=36.6
LDND=36.6
LDND=66.0
LDND= 0.0
LDND=66.0
LDND=55.0
 6 items < 70.0  Genetically related !!
ASJP: Automatic Reconstruction
145
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ASJP: Automatic Reconstruction
146
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
ASJP: Automatic Reconstruction
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
147
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
 6 items < 70.0
ASJP: Automatic Reconstruction
148
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
 6 items < 70.0: RELATED ???
ASJP: Automatic Reconstruction
149
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
 RELATED ???
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
NO!!!
ASJP: Automatic Reconstruction
150
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
INDO-EUROPEAN < > AUSTRONESIAN
ASJP: Automatic Reconstruction
151
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
CHANCE?
ASJP: Automatic Reconstruction
152
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
CHANCE?  < 5%
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
(i.e. 1 – 2 items)
ASJP: Automatic Reconstruction
153
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
BORROWING through LANGUAGE CONTACT
ASJP: Automatic Reconstruction
154
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
* LDND=36.9
ASJP: Automatic Reconstruction
155
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
* LDND=36.9
SPA <?> CHA:
ASJP: Automatic Reconstruction
156
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
SPA <?> CHA:
* LDND=36.9
fam/gen=
0.24/0.82
ASJP: Automatic Reconstruction
157
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
SPA <?> CHA:
* LDND=36.9
fam/gen=
0.24/0.82 > 0.03/0.00
ASJP: Automatic Reconstruction
158
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
SPA <?> CHA:
* LDND=36.9
fam/gen=
0.24/0.82 > 0.03/0.00
phon pattern fit= 12.00 > 0.67
ASJP: Automatic Reconstruction
159
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
SPA <?> CHA:
* LDND=36.9
fam/gen=
0.24/0.82 > 0.03/0.00
phon pattern fit= 12.00 > 0.67
…> … > …
ASJP: Automatic Reconstruction
160
Borrowed!
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
SPA > CHA:
* LDND=36.9
fam/gen=
0.24/0.82 > 0.03/0.00
phon pattern fit= 12.00 > 0.67
…
ASJP: Automatic Reconstruction
161
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
TWO
: dos=dos
SPA > CHA
f/g=
* LDND= 0.0
0.62/1.00
swF= 100.00
> 0.12/0.00
> 0.22
ASJP: Automatic Reconstruction
162
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
SPA > CHA
f/g=
* LDND=61.2
0.17/0.82
swF= 100.00
> 0.00/0.00
> 4.44
ASJP: Automatic Reconstruction
163
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
NIGHT : noCe=noces
SPA > CHA
f/g=
* LDND=68.2
0.23/0.55
swF= 100.00
> 0.04/0.00
> 0.10
ASJP: Automatic Reconstruction
164
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
NEW
: nuevo=nueba
SPA > CHA
f/g=
* LDND=44.2
0.50/0.64
swF= 4.27
> 0.04/0.00
> 0.03
ASJP: Automatic Reconstruction
165
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
PERSON : persona=petsona
SPA > CHA
f/g=
0.20/0.64
swF= 32.40
* LDND=55.3
> 0.01/0.00
> 0.13
ASJP: Automatic Reconstruction
166
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
PERSON : persona=petsona
SPA > CHA
f/g=
0.20/0.64
swF= 32.40
* LDND=55.3
> 0.01/0.00
> 0.13
ALT: CHA= taotao (0.41/0.00)
ASJP: Automatic Reconstruction
167
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
PERSON : persona=petsona
SPA > CHA
f/g=
0.20/0.64
swF= 32.40
* LDND=55.3
> 0.01/0.00
> 0.13
ALT: CHA= taotao (0.41/0.00)
ASJP: Automatic Reconstruction
168
5. Conclusions
ASJP: Automatic Reconstruction
169
Conclusions
- Method for automatic reconstruction of language relationships,
using mass comparison of lexical and typological data
ASJP: Automatic Reconstruction
170
Conclusions
- Method for automatic reconstruction of language relationships
- Framework to discuss and correct existing classifications
ASJP: Automatic Reconstruction
171
Conclusions
- Method for automatic reconstruction of language relationships
- Framework to discuss and correct existing classifications
- Test hypotheses about genetic distances in time
ASJP: Automatic Reconstruction
172
Conclusions
- Method for automatic reconstruction of language relationships
- Framework to discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate (and eliminate) potential borrowings
ASJP: Automatic Reconstruction
173
Conclusions
- Method for automatic reconstruction of language relationships
- Framework to discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate (and eliminate) potential borrowings
- C O R E: incremental lexical database (> 35%)
ASJP: Automatic Reconstruction
174
Conclusions
- Method for automatic reconstruction of language relationships
- Framework to discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate (and eliminate) potential borrowings
- C O R E: incremental lexical database (> 35%)

One day soon: Online
ASJP: Automatic Reconstruction
175
Conclusions
- Method for automatic reconstruction of language relationships
- Framework to discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate (and eliminate) potential borrowings
- C O R E: incremental lexical database (> 35%)

One day soon: Online

Join and cooperate!!!
ASJP: Automatic Reconstruction
176
Holman et al. (forthc. 2008)
Explorations in automated language classification.
Folia Linguistica
Brown et al. (forthc. 2008)
Automated Classification of the World’s languages:
A description of the method and prelimary results
Sprachtypologie und Universalienforschung
+ Several working papers
email.eva.mpg.de./~wichmann/ASJPHomePage
ASJP: Automatic Reconstruction
177
?
ASJP: Automatic Reconstruction
178