Prosody: When, where, why?

Download Report

Transcript Prosody: When, where, why?

Processing with Prosody
&
Predicting Prosody
Taal- en spraaktechnologie
Fall 2005
Jennifer Spenader
1
Today and tomorrow
Today:
1. Why we need to be able to recognize prosody
2. Elements that correlate with prosody in synthetic
speech
Tomorrow
1. How do categories like new-given relate to
choice of lexical and syntactic form?
2. How do we determine the interpretation of
underspecified forms?
2
Structure of Today’s Lecture
1. What makes speech sound good?
2. What role does prosody play in language
understanding?
3. Categories that are relevant to generation of
prosody
•
Defining, identifying, operalizing, implementing,
testing
4. How is the information used to generate
natural synthetic speech?
3
What makes good synthetic speech
good?
• Idealized synthetic
speech
• Good synthetic speech
(AT & T’s Crystal)
• BAD synthetic speech
4
Characteristics of good synthetic speech
• Intelligibility
– It should support the listener’s decoding of the
speaker’s message
• Naturalness
– It should follow the rules of discourse and
information structure
• Pleasant to listen to? Friendly sounding?
5
How do we evaluate synthetic
speech?
• Present listeners with samples and ask them
–
–
–
–
Their opinion (give rating, e.g. 1 to 5)
To compare two samples
To compare two samples to a third reference
To ‘type what you hear’
6
Problems with evaluation
• Are all listeners informative subjects?
– consistency (do the scores make sense when taken together)
– reliability (do people’s scores have same range? same mean?)
– native language, experience, etc.?
• What are we judging anyway?
naturalness, understandability, likeableness, coverage,
intelligibility – how are these the same or different?
Slide slightly modified from Tina Bennett (2004)
7
Prosody in synthetic speech
– Using the expected accentuation patterns makes
synthetic speech more predictable
• If applications used in real world, e.g. noisy
environments, then we need to have high intelligibility
• (Does it make the message more redundant?)
• Meaning is sometimes effected by prosody
– Important for analysis, for machine translation, etc.
– Ex.
» Jag behöver en biljet.
» Jag behöver EN biljet.
8
What role does prosody play in
language?
• Lexicon
– Some languages make meaningful lexical distinctions with
prosody, e.g. Chinese, even Japanese
• Ame_candy
vs. Ame rain
• Syntactic Structure
– Identify constituents or phrases?
• Discourse structure
– Identifies referents, distinguishes given from new
– Identifies contrasts, emphasizes key points?
– Marks topic changes
• Aides in identifying rhetorical relations
9
Prosodic prominence aids
processing
• Word initial phonemes are recognized faster
in words with pitch accent
– (Shields et al. 1974; Cutler & Foss 1977)
– Phoneme identification tasks
• Mispronunciations are recognized faster if the
word has pitch accent
– (Cole et al. 1978, Cole & Jakimik, 1980)
– Words with pitch accent have clearer acoustics
10
Why not just give
everything prosodic
prominence?
– Information theory and coding &
– “Speaker economy”:
• An efficient code has a low average length per message
compared to an inefficient code
• Giving everything prosodic prominence might be helpful
to the hearer but makes things harder for the speaker
• Language is already redundant, speaker’s utilize this
11
What does prosody tell us about the
message?
• So far we’ve just said something about
prosody being helpful in decoding and
recognize words
12
Syntactic form
• Rising and falling fundamental frequency, with
final lengthening function as boundary tones
• For many years linguists assumed prosody
mirrored syntactic structure
13
Prosody not isomorphic with syntax
• No one-to-one prosodic correlates of syntactic
structure
– Accepted only fairly recently
• Major syntactic boundaries:
– show greater F0 movement and longer segmental durations
• Major syntactic boundaries may be accurately
located from prosodic information alone
– (Collier & ‘t Hart, 1975)
• How good is Crystal?
– Note break before “during”
14
Prosody disambiguates local
ambiguities
Ex.
– John believes Mary implicitly.
– John believes Mary to be a professor.
• Prosody helps online processing
– Ex.
– Earlier my sister took a dip/in the pool/at the club/on the
hill.
– Grosjean (1983) Subjects could distinguish whether the
target word “dip” was followed by zero, three, or six more
words.
• Language specific: French listener’s couldn’t do more
than recognize sentence finality of English sentences
15
Information structure
•
From Eady and Cooper (1986) (version of “Question
Test of Harjicova et al. 1995)
•
Ex. George has flowers for Mary.
1. Who has flowers for Mary?
2. What does George have for Mary?
3. Who does George have flowers for?
•
Depending on the question (=context), different
words will receive phonetic focus.
16
Listeners actively search
for sentence focus
Cutler (1976) Phoneme monitoring task
(listen for a particular phoneme, e.g. /d/)
1.
2.
•
•
That summer four years ago I ate roast DUCK for the first
time.
That summer four years ago I ate roast duck for EVERY
MEAL.
“duck” edited out and replaced by neutral version
Subjects faster in recognizing target word’s phoneme in
context where it would have been focused
17
Prosodic prominence also triggers
extra semantic processing
• Homophomes “gelijkklinkend woord”
– hart vs. hard, (de) bal vs. (het) bal
• Blutner & Sommer (1988)
– If a homophone (a word with several meanings) is
focused, its multiple meanings are activated
– Unaccented activates only the contextually correct
interpretation
18
Why deaccent and accent?
• Let your hearer know what’s important!
– New-given :New items receive accent, given items are
deaccented
– Receive accent: The stressed syllable is produced so that it
coincides with an F0 maxima…
• As well as longer duration, increased intensity?
– Be deaccented:
– Get cliticized: clitic: An unstressed word incapable of
standing on its own and attaches in pronunciation to a
stressed word, with which it forms a single accentual unit.
• the pronoun 'em in I see 'em
• the definite article in French l'arme, "the arm."
– (modified from Free Online Dictionary)
19
Sentence processing is sensitive to
new-given
• Response to comprehension task better with correct
new-given prosody
– (Bock & Mazzella, 1983)
• Simple definition of new-given
– First occurance = NEW
– Second occurance = GIVEN
20
Correct accenting
• Mark “new information” by using question test
form (Harjicova et al. 1995)
– Ex.
– Who won the lottery?
– It was won by a phonologist.
(Target phonemes: /b/ or /f/)
• Cutler & Fodor (1979) phoneme-identification
is faster when the word in the phoneme
identification was the same as focus word.
21
Correct deaccenting
• Verification of given information in pictures
faster when given information deaccented
– When this information was accented reaction
times became longer
– (Terken & Nooteboom 1987)
22
Do speakers
deaccent to
distinguish given
information from
new, or do the
deaccent
because they
can?
23
New-given: how defined
• Actually until now we just used a simple
definition, repetition of same word form
• This is also the type of data used in most
testing
• But surely there is more to new-given!
24
25
When is
something
“given”?
26
When is something given?
• Threshold and scope of givenness
–
–
–
–
–
How does an item become given
Same word earlier?
Reference to same referent earlier?
Reference to same concept earlier?
How much earlier? Is 6 pages/20 minutes earlier
too long ago? How long does something remain
given?
27
Theories: Threshold and scope
• Chafe (1976)
– Scope of givenness depends on number of
intervening concepts, number of words. Change of
topic might remove given items from
consciousness
• Grosz & Sidner (1981)
– Local focus: items that are now in focus, stored in
stack, this are “popped” at topic change
– Global focus items are always given: references to
topic of article of conversation
28
Experimental evidence threshold and
scope
• Terken & Nooteboom (1987) Studied radio
program speech. Mentioning a word once
was enough for the time to be deaccented for
the rest of the program
•
• If deaccentuation in this situation corresponds
to givenness then givenness is established
after one mention
29
Inheritence of givenness
• Can items be considered given even if the
same exact surface form wasn’t used before?
• Referents are given or new, not the words
used to refer to them!
– E.g. purse - handbag
30
Deaccentuation of given forms or given
concept?
• Donselaar (1995a)
– Ship-boat vs. boat-boat
– Subjects asked to make true-false judgements
about spoken sentences
Ex. The millionaire bought a surprise for his wife. He
gave her a boat/ship/mink.
The wife UNEXPECTEDLY got a BOAT/boat.
– BOAT: accented, or not accented.
– Sentences with unaccented synonyms verified
more quickly than accented synomyms
• No difference for same word
31
Chafe (1976) Inheritance patterns
• Generic concept  specific instance
– I don’t like Norwegians. I met a Norwegian
yesterday.
– I met a Norwegian yesterday. I don’t like
Norwegians.
• Specific concepts implies more general
concepts if the distance is not more than one
step
– Table  furniture
– Mentioning furniture does not make tables given
32
When is something new?
1. We just bought a new house. The roof
needs repairing.
2. We just bought a new house. The sauna is
fabulous!
33
Summary
• Experimental results show that correct
prosody aids in processing
• Incorrect prosody makes processing harder
• Getting the prosody right should greatly
increase the intelligibility and naturalness of
synthetic speech
34
Predicting prosody
What do we expect to be accented or
deaccented?
35
Development of TTS
• Original TTS systems: used one of two
strategies
– Accent all open class words, and deaccent all
closed class words
• This results in too many accents
– Accent the last open class word in a phrase
• Deaccent everything else
• This sounds terrible for many languages, though is “OK”
for English
36
37
Vos en Haas:1
(Sylvia van den Heiden, ilistrations The Tjong-Khing)
• Koekboek
Haas is niet thuis. Vos hang lui in de stoel. Hij
heeft nergens zin in. Of toch wel. Hij heeft zin
in iets lekkers. Koek of zo. Iets ZOETS. Is er
nog koek? Vast wel. Vos loopt naar de
keuken. HIj doet de kast open. Daar staat
de koektrommel. Maar er zit bijna niks meer
in. Drie kleine koekjes! En hoop kruimels.
38
Use other strategies
• Information structure
– Identify new-given information
– Accent new information, deaccent given
information
• Identify contrasted elements
– Emphasize them
• Identify most important part of message
– Focus this
39
Hirschberg (1993)
• Algorithm to assign pitch accent
– Implemented in NewSpeak, Bell Laboratories TTS system
– Input: unrestricted text, output: tagged text
– Used FM Radio texts, ATIS texts and ??? To predict accent
– Closed-open class word strategy gets 85% of accents right
in FM Radio texts
• Tendency for news readers to accent final phrase content
words even though most people would not
– E.g. TRIAL lawyer vs. TRIAL LAWYER
40
Not all function words deaccented
•
“and” as a conjunction vs. “and” as discourse
particle (Example from Hirschberg, 1992)
1.
They left after lunch AND landed in France in
time for dinner.
2. ?? They left after lunch. AND, they landed in France
in time for dinner.
41
NewSpeak’s treatment of closedclass items
• Three categories
1. closed-class and frequently deaccented
•
Possessive pronouns, definite and indefinite
articles, copulas, coordinating and subordinating
subjections, existential “there”, have, accusative
pronouns, most prepositions, positive modals,
positive do, as well as certain adverbials,
nominative they, nominative and accusative it,
some nominal pronouns (e.g. something)
42
Commonly accented closed class
items
• Negative article, negative modals, negative do, most
nominal pronouns, most nominative and all reflexive
pronouns, pre-quantifiers (e.g. all), post-determiners
(e.g. next) nominal adverbials (e.g. here),
interjections, particles, most wh-words, plus some
prepositions
43
Not all content words are accented
• Complex nominals
– CAMPAIGN promise
– MASSACHUSSETS BAR Association
– Semantico-syntactic structure maps to differences
in stress assignment
– Some stress to left, some to right.
44
Identifying new-given information
• Harder to “tag” for information structure than
it is to construct your own examples
• For each word
– Identify its root
– If root is already mentioned in the context, treat as
given
– If root isn’t mentioned in context, treat as new
– Context = local context, should coincide with
topics
45
Vos en Haas:2
(Sylvia van den Heiden, ilistrations The Tjong-Khing)
• Koekboek
Haas is niet thuis. Vos hang lui in de stoel. Hij
heeft nergens zin in. Of toch wel. Hij heeft zin
in iets lekkers. Koek of zo. Iets ZOETS. Is er
nog koek? Vast wel. Vos loopt naar de
keuken. HIj doet de kast open. Daar staat de
koektrommel. Maar er zit bijna niks meer in.
Drie kleine koekjes! En hoop kruimels.
46
Content vs. form words
• Hirschberg (1992)
– If a word with the same root as a word in the local
focus stack, then it is treated as given
• Ignores synonyms! Introduces errors because roots can’t
always be identified easily
• Horne et al. (1993) does same thing but used a network
of synonyms an hyponyms to identify given concepts if
the referential form was different
– inform and information same root!
– Koek and koekjes same root!
47
Contrastive elements
• NewSpeak: contrastiveness within a complex
noun identified
– If part of the complex noun is given, while others
are new, then the new items are contrastive
– TRIAL\N lawyer\N vs. CRIMINAL\contrastive
lawyer\G
48
Focused elements
• Something the speaker considers particular
important
– ZOETs and LEEG in Kookboek?
49
Certain closed class words almost
always get focused
• Negative adverbials
Haas is niet thuis. Vos hang lui in de stoel. Hij
heeft nergens zin in.
Maar er zit bijna niks meer in.
50
Vos en Haas: 3
(Sylvia van den Heiden, ilistrations The Tjong-Khing)
• Koekboek
Haas is niet thuis. Vos hang lui in de stoel. Hij
heeft nergens zin in. Of toch wel. Hij heeft zin
in iets lekkers. Koek of zo. Iets ZOETS. Is er
nog koek? Vast wel. Vos loopt naar de
keuken. HIj doet de kast open. Daar staat
de koektrommel. Drie kleine koekjes! En
hoop kruimels.
51
Local and Global Focus
• Hirschberg considers all words in the first
sentence to be in Global focus
• Local focus: experimented with sentences
and paragraph boundaries
– Paragraph boundaries best
52
Remains: how to realize
prominence?
• Should all prominences be realized the same
way?
• Sentences also often have general pattern of
F0 movement…
53
Associate different statuses with
prosodic correlates
OPEN RESEARCH QUESTIONS:
• Is prosodic focus the same for each
category?
• Does prosodic focus differ in form if it is within
or at the end of a sentence?
• Is there a special type of contrastive focus
that is different from the focus on new items?
54
Tomorrow
1. How do categories like new-given relate to
choice of lexical and syntactic form?
•
•
•
Definite vs. indefinite forms
Marking with particles
In depth look at one theory of new-given (Prince 1981)
2. How do we determine the interpretation of
underspecified forms?
•
•
Resolution of anaphoric reference
Interpretation of bridging NPs
– How lexical relationships help
55
References
• Hirschberg, J. (1992). Using discourse
context to guide pitch accent decisions in
synthetic speech. In Talking Machines:
Theories Models and Designs, G. Balily, C.
Benoit, and T. R. Sawallis (Editors). 1992
Elsevier Science Publishing
• Cutler, A., D. Dahan & W. Donselaar (1997).
Prosody in the Comprehension of Spoken
Language: A Literature Review
56