Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005 Pronunciation Lexicon Background.

Transcript Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005 Pronunciation Lexicon Background.

Paolo Baggia, Loquendo
W3C SSML Workshop
Beijing – 2-3 Nov 2005
Pronunciation Lexicon
Background
Overview
• Introduction to Pronunciation Lexicon
• Pronunciation Alphabets
• The PLS language
• Issues for the workshop
2
W3C SSML workshop
2-3 Nov 05 - Beijing
Introduction to Pronunciation
Lexicon Specification
• The PLS spec is about “Pronunciation Lexicon”:
– How to pronounce words and phrases
– How to deal with the variability of pronunciations by country, region,
person, etc.
– How to spell abbreviations and acronyms
• Two main uses:
– Speech Synthesis (SSML documents)
– Speech Recognition (SRGS grammars)
– Other uses are possible (embedded or referenced in other mark-up)
3
W3C SSML workshop
2-3 Nov 05 - Beijing
The TTS perspective
• A TTS engine’s job is to transform an “input text” into speech,
this involves a lot of processing, including:
–
–
–
–
–
–
Text normalization
Word pronunciation (lexical stress, phonetic transcription)
Sentence structure (intonation, rhythm)
Sentence level modification in phonetic transcription (co-articulation)
Computation of prosodic parameters
Generation of the acoustic signal
• SSML documents enable TTS enhancement, acting on several
levels of processing through SSML markup elements
• PLS improves SSML on text normalization and phonetic
transcription
4
W3C SSML workshop
2-3 Nov 05 - Beijing
An SSML example document
• This is a simple SSML document:
<?xml version="1.0" encoding="ISO-8859-1"?>
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
The title of the movie is: "La vita è bella" (Life is beautiful),
which is directed by Roberto Benigni.
</speak>
• This is an enhancement of the same example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
The title of the movie is:
<phoneme alphabet="ipa" ph="ˈlɑ ˈviːɾə
ˈʔeɪ ˈbɛlə"> La vita è bella </phoneme>

(Life is beautiful),
which is directed by
<phoneme alphabet="ipa"
ph="ɹəˈbɛːɹɾoʊ
bɛˈniːnji"> Roberto Benigni </phoneme>

</speak>
5
W3C SSML workshop
2-3 Nov 05 - Beijing
An SSML example with PLS
• This is a simple SSML document that references an external
Pronunciation Lexicon:
<?xml version="1.0" encoding="ISO-8859-1"?>
<speak version="1.0"
xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<lexicon uri="http://www.example.com/movie_lexicon.pls"/>
The title of the movie is: "La vita è bella" (Life is beautiful),
which is directed by Roberto Benigni.
</speak>
• PLS factorizes all the changes in an external document
• TTS engine loads the PLS document(s) and applies it(them)
transparently to the SSML document
• An application may define contextual PLS documents to be
used in different points of the interaction
6
W3C SSML workshop
2-3 Nov 05 - Beijing
The ASR perspective
• An ASR engine’s job is to transform an audio signal into a
textual or semantic representation of the meaning of the
sentence
• Using SRGS grammars constrains the sentences to be
recognized and improves ASR performance
• PLS improves ASR performance by allowing multiple
pronunciations of words, phrases, abbreviations, text
normalization
7
W3C SSML workshop
2-3 Nov 05 - Beijing
An SRGS example grammar
• This is a very simple SRGS grammar:
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
xml:lang="en-US" version="1.0" root="city_state" mode="voice">
<rule id="city" scope="public">
<one-of> <item>Boston</item>
<item>Miami</item>
<item>Fargo</item> </one-of>
</rule>
<rule id="state" scope="public">
<one-of> <item>Florida</item>
<item>North Dakota</item>
<item>Massachusetts</item> </one-of>
</rule>
<rule id="city_state" scope="public">
<ruleref uri="#city"/> <ruleref uri="#state"/>
</rule>
</grammar>
• The grammar recognizes sentences like:
– “Boston Massachusetts” or “Miami Florida”
but also:
– “Boston Florida” or “Fargo Massachusetts”
8
W3C SSML workshop
2-3 Nov 05 - Beijing
An SRGS example with PLS
• This is a simple SRGS grammar that references an external
Pronunciation Lexicon:
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
xml:lang="en-US" version="1.0" root="city_state" mode="voice">
<lexicon uri= =“http://www.example.com/city_state.pls"/>
<rule id="city" scope="public">
<one-of> <item>Boston</item>
<item>Miami</item>
<item>Fargo</item> </one-of>
</rule>
<rule id="state" scope="public">
<one-of> <item>Florida</item>
<item>North Dakota</item>
<item>Massachusetts</item> </one-of>
</rule>
<rule id="city_state" scope="public">
<ruleref uri="#city"/> <ruleref uri="#state"/>
</rule>
</grammar>
• The grammar allows different pronunciations of words to
accommodate many different speakers
9
W3C SSML workshop
2-3 Nov 05 - Beijing
PLS allows you…
• to create Pronunciation Lexicons to be used by both ASR
and TTS
• to take into account different usages:
– For TTS: to improve reading proper names
– For ASR: to give multiple pronunciations
– For TTS/ASR: to expand abbreviations and acronyms
• to exchange Pronunciation Lexicons between different
applications (interoperability)
• to use contextual Pronunciation Lexicons in different
points of the application
• The PLS is a W3C standard language!
PLS saves application developers time/money for
creating good speech applications!
10
W3C SSML workshop
2-3 Nov 05 - Beijing
Phonetic Alphabets
• To describe the pronunciation of a word/phrase, you need a
phonetic alphabet
• An alphabet contains symbols to represent speech sounds,
just like in a dictionary, e.g.
Cracked
/krakt/
adj.
1 having cracks. 2 (predic.) slang crazy
• The PLS spec suggests to use either:
– a standard pronunciation alphabet, such as IPA
(defined by the International Phonetic Association,
see: http://www2.arts.gla.ac.uk/IPA/index.html)
– other alphabets:
• SAMPA which is an ASCII-way of encoding IPA and X-SAMPA
• Pying, JEITA, etc
11
W3C SSML workshop
2-3 Nov 05 - Beijing
IPA – Chart
•
•
•
•
IPA was founded in 1886
It is the major international
association of phoneticians
The IPA alphabet provides
symbols making possible
the phonemic transcription
of all known languages
IPA characters can be
encoded in Unicode by
supplementing ASCII with
characters from other
ranges, particularly:
– IPA extensions (0250–02AF)
– Latin Extended-A (0100-017F)
•
See the detailed:
http://www.unicode.org/charts
12
W3C SSML workshop
2-3 Nov 05 - Beijing
SAMPA – SAM Phonetic Alphabet
• Developed for phonetic transcription in a EU founded project
called Speech Assessment Methods (SAM)
• It is ASCII based (easy to write). It is an “ASCII-ization” of IPA
• Recently, Prof. John C. Wells proposed an alphabet called
“X-SAMPA”, which encodes all the IPA symbols in ASCII format
• A few examples:
–
–
–
–
“thin”
“thing”
“flabbergasted”
“Weltanshauung”
IPA: /θɪn/
IPA: /θɪŋ/
IPA: /’flæbəgɑːstɪd/
IPA: /’vɛltʔan,ʃaʊʊŋ/
– en-GB:“vice versa” IPA: /vaɪsə ’vɜːsə/
it-IT:“vice versa” IPA: /’viʧe ’vɛrsa/
13
X-SAMPA: /TIn/
X-SAMPA: /TIN/
X-SAMPA: /”fl{b@gA:stID/
X-SAMPA: /”vElt?an%SaUUN/
X-SAMPA: /vaIs@ “v3:s@/
X-SAMPA: /”vitSe ”vErsa/
W3C SSML workshop
2-3 Nov 05 - Beijing
Phonetic Alphabets – Issues
•
How to write pronunciation in a reliable and easy way?
•
Problems with fonts, word processors, browsers
•
There are very few tools to help with writing pronunciation and to let
you listen to what you have written
•
The standardization process may push the creation of tools and the
improvement of the coverage by word processors.
• Has IPA any uses for Asian languages?
• Are there standard phonetic alphabets for Asian
languages? Such as pinyin, jyutping or jeita?
• Should they be referenced in a standard way, like “ipa”?
14
W3C SSML workshop
2-3 Nov 05 - Beijing
The PLS language
• PLS is an XML language
<?xml version="1.0" encoding="UTF-8"?>
• The container element is <lexicon>, attributes:
– version (required):
"1.0"
– xmlns (required):
"http://www.w3.org/2005/pronunciation-lexicon"
– alphabet (optional):
"ipa" (default value)
– xml:lang (optional):
“en-US” or “zh-CN” or “jp”
Example:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/pronunciation-lexicon"
alphabet="ipa" xml:lang=“zh-CN">
<!– The lexicon for Chinese Mandarin! -->
</lexicon>
• The current PLS is monolingual!
15
W3C SSML workshop
2-3 Nov 05 - Beijing
The PLS language - metadata
•
Metadata (annotation of the document for other uses, …)
can be of two varieties:
–
–
<meta> element (for compatibility with other markup, like SRGS and SSML)
<metadata> element (which contains the annotations either RDF format or other formats)
Example of metadata:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"
alphabet="ipa" xml:lang="en-US”>
<metadata>
<rdf:RDF xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc = "http://purl.org/dc/elements/1.1/">

<rdf:Description
rdf:about=""
dc:title="Pronunciation lexicon for W3C terms“
dc:description="This lexicon contains common pronunciations for many
W3C acronyms and abbreviations, such as I18N, WSDL or WAI"
dc:publisher="W3C“
dc:language="en-US“
dc:date="2005-11-29“
dc:rights="Copyright 2002 W3C“
dc:format="application/pls+xml">
<dc:creator>The W3C Voice Browser Working Group</dc:creator>
</rdf:Description>
</rdf:RDF>
</metadata>
<!– Add lexicon entries here!! -->
</lexicon>
W3C SSML workshop
16
2-3 Nov 05 - Beijing
The PLS language – <lexeme>
• The <lexeme> element is the container of a lexicon entry.
It is composed of:
– One or more <grapheme> elements
that indicate the words/phrases to be matched in the input
– One or more either <phoneme> or <alias> elements
that indicate the possible pronunciations or expansions respectively
• First considerations:
– More <grapheme> elements may be present
 this means that all of them will match the pronunciations
– More <phoneme> elements may be present
 this means that several pronunciations are in alternative
– A mixture of <alias> and <phoneme> elements may be present
 there is a preference mechanism to choose the single one for TTS
17
W3C SSML workshop
2-3 Nov 05 - Beijing
The PLS language – <grapheme>
•
The <grapheme> element contains CDATA that represents
orthographies:
– Regional spelling variations e.g. "colour" and "color";
– Free spelling variations e.g. "judgment" and "judgement"
– Traditional vs Modern spellings e.g. for example in German it is common to
replace "ö" with "oe".
– Alternate writing systems, e.g. Japanese uses a mixture of Han ideographs
(Kanji), and phonemic spelling systems e.g. Katakana or Hiragana for
representing the orthography of a word or phrase
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"
xml:lang="jp" alphabet="ipa">
<lexeme>
<grapheme orthography="Latn">nihongo</grapheme>
<grapheme orthography="Hani">日本語</grapheme>
<grapheme orthography="Kana">にほんご</grapheme>
<!– Here you can insert the pronunciation of “nihongo”.
in IPA language it could be: "nɪhɒŋɒ" -->v
</lexeme>
</lexicon>
• Is an explicit “orthography” attribute useful?
• Is it redundant?
18
W3C SSML workshop
2-3 Nov 05 - Beijing
The PLS language – <phoneme>
• The <phoneme> elements are contained inside <lexeme>
• <phoneme> contains CDATA specifying the pronunciation in
a given pronunciation alphabet:
– An “alphabet” attribute may be specified to override the alphabet of
the whole lexicon
– A “prefer” attribute may be present to indicate precedence among
pronunciations
Example of lexeme for Sepulveda:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/pronunciation-lexicon“
alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme>Sepulveda</grapheme>
<phoneme>sə'pʌlvɪdə</phoneme>
<!– In IPA language it says: "sə'pʌlvɪdə" -->
</lexeme>
</lexicon>
19
W3C SSML workshop
2-3 Nov 05 - Beijing
The PLS language – <phoneme>
• Other examples
Example for more than one pronunciation of the word “huge”:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"
xml:lang=“en-US" alphabet="ipa">
<lexeme>
<grapheme>huge</grapheme>
<phoneme prefer="true">hju:ʤ</phoneme>

<phoneme>ju:ʤ</phoneme>

</lexeme>
</lexicon>
Example for the Japanese word “nihongo” with different spellings:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"
xml:lang="jp" alphabet="ipa">
<lexeme>
<grapheme orthography="Latn">nihongo</grapheme>
<grapheme orthography="Hani">日本語</grapheme>
<grapheme orthography="Kana">にほんご</grapheme>
<phoneme>nɪhɒŋɒ</phoneme>

</lexeme>
</lexicon>
20
W3C SSML workshop
2-3 Nov 05 - Beijing
The PLS language – <alias>
• The <alias> elements are contained inside <lexeme>
• <alias> is used to indicate the pronunciation of an acronym
or an abbreviated term in the form of other orthographies.
• <alias> may contain
– A “prefer” attribute to indicate precedence among pronunciations
• Both <phoneme> and <alias> may occur in a <lexeme>
Example of lexeme with both <phoneme> and <alias>:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"
alphabet="ipa" xml:lang="en">
<lexeme>
<grapheme>W3C</grapheme>
<alias>World Wide Web Consortium</alias>
</lexeme>
</lexicon>
21
W3C SSML workshop
2-3 Nov 05 - Beijing
Use Cases/Future Issues
The current version of PLS can deal with:
• Multiple Pronunciations for ASR
• Homographs
• Abbreviations
But it cannot deal with:
• Homophones
• Part of speech annotations (and other contextual
information)
• Grouping lexemes and external references
Too challenging tasks to be solved for PLS
version 1.0
22
W3C SSML workshop
2-3 Nov 05 - Beijing
Issues for the workshop
• Monolingual lexicon?
• Orthography attribute: Useful or redundant?
• Mandate new phonetic alphabets?
23
W3C SSML workshop
2-3 Nov 05 - Beijing
Quick demo of SSML+PLS
• Mobile device (with embedded TTS)
• By GPRS, the device connects to a server:
– It donwloads News for news site (RSS)
– Transformation in SSML
– Returned to the mobile device
• The device then:
– Shows the news on the screen
– Read the SSML document (which includes a lexicon) using
the TTS engine
24
W3C SSML workshop
2-3 Nov 05 - Beijing
Use Cases – Multiple
pronunciations
• More than one pronunciation for a word (very common for
ASR)
Example of two pronunciations for the word “Newton”:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0“
xmlns="http://www.w3.org/2005/pronunciation-lexicon"
alphabet="ipa" xml:lang="en">
<lexeme>
<grapheme>Newton</grapheme>
<phoneme prefer="true">nju:'tən</phoneme>

<phoneme>nu:'tən</phoneme>

<lexeme>
</lexicon>
25
W3C SSML workshop
2-3 Nov 05 - Beijing
Use Cases – Multiple
Orthographies
• More than one orthography for a word (common for ASR
and TTS)
Example of two orthographies for colour/color:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/pronunciation-lexicon"
alphabet="ipa" xml:lang="en">
<lexeme>
<grapheme>color</grapheme>
<grapheme>colour</grapheme>
<phoneme>'kʌlə</phoneme>

<lexeme>
</lexicon>
26
W3C SSML workshop
2-3 Nov 05 - Beijing
Final Remarks
• The usage of PLS:
– Simplifies the development of a speech application
– Improves the performance of speech recognition (in a
standard way)
– Enhances TTS output
• A standard language for PLS enables the exchange of
pronunciations between applications
27
W3C SSML workshop
2-3 Nov 05 - Beijing

Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005 Pronunciation Lexicon Background.

Transcript Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005 Pronunciation Lexicon Background.

Directory