An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin.
Download
Report
Transcript An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin.
An Introduction to S3ML
Beijing InfoQuick SinoVoice Speech
Technology Corp.
CHEN Ming, LV Shinan, LI Xiulin
Outline
Background
PinYin Support
<say-as> Definition
Domain Support
Conclusion
Background
SSML
Speech Synthesis Markup Language
http://www.w3.org/TR/speech-synthesis/
Now is W3C Recommendation
SinoVoice
Famous Speech Technology and Service Provider
Leading Chinese TTS Technology and Product
Deploy 1000+ Real Systems
Background
S3ML (SinoVoice SSML)
Since the launching of jTTS 4.0, March 2004
Based on SSML Specification
Defines some extensions aiming at Chinese TTS
Defines the detail of some elements which SSML
does not define precisely
Provide maximum compatibility with newest SSML
version
PinYin Support
PinYin
Phoneme annotation method for Chinese
characters
<phoneme> in SSML
The phoneme element provides a
phonemic/phonetic pronunciation for the
contained text.
Two attributes: alphabet and ph
PinYin Support
alphabet
The alphabet attribute is an optional attribute
that specifies the phonemic/phonetic alphabet.
Use ‘py’ as value of ‘alphabet’ to specify that
PinYin will be used
ph
The ph attribute is a required attribute that
specifies the phoneme/phone string.
Use PinYin string as value of ‘ph’
PinYin Support
Example
<phoneme alphabet="py" ph="zha1">查</phoneme>良镛
<phoneme alphabet="py" ph="zha1 liang2yong1">
查良镛</phoneme>先生
More about PinYin string
Conformed to “Chinese Mandarin PinYin Specification“
Series of PinYin for several characters
Tone information
1~4: high flat, rising, diving and falling tone
0, 5: light tone
PinYin support
When PinYin string is included in normal text?
Next station is <say-as interpret-as="phoneme" format="py">
di4 tan2</say-as>
Comparing with CSSML
<phoneme lang=“zh-cn”>zha1</phoneme>良镛
他姓<phoneme py=“zha1”>查</phoneme>
We think <phoneme> is not for such purpose,
<say-as> is more suitable
We think <phoneme> extension in S3ML is more
compatible with SSML
<say-as> Definition
The detail of <say-as> element
When SinoVoice define S3ML, the detail values of
the attributes of this element is not defined in
SSML.
Now, “SSML 1.0 say-as attribute values” is
proposed but it is still on progress
http://www.w3.org/TR/2005/NOTE-ssml-sayas20050526/
SinoVoice will support this proposal, so I will only
talk about some additional values
<say-as> Definition
Name and address, especially person
name because of the polyphone
Chinese characters
<say-as interpret-as=“name” format=“person”>张朝阳</say-as>
<say-as interpret-as=“address”>朝阳区</say-as>
Math, some mathematic expression is
confused with other info
<say-as interpret-as=“math” >2005-12-13</say-as>
<say-as interpret-as=“math”>+8610-62972997</say-as>
<say-as> Definition
Net address
<say-as interpret-as="net" format="email">[email protected]</say-as>
<say-as interpret-as="net" format="url"> http://www.sinovoice.com.cn
</say-as>
Phoneme, useful for character/phoneme
mixed text
The pronunciation of ‘tomato’ is
<say-as interpret-as="phoneme" format="ipa">
tɒmɑtoʊ</say-as>
Next station is <say-as interpret-as="phoneme" format="py">
di4 tan2</say-as>
Domain Support
Important for real system
Customized TTS is used more and more popular
Better voice quality than general version
One possibility in SSML
Use <voice> element and define special values of
‘name’ attribute
But it is not natural because it is normal to
support several different domains by using a same
name (voice library)
Domain Support
<domain> element
The ‘name’ attribute is required to specify
the customized TTS package used
The value of ‘name’ attribute will be a
vendor-specific name
<domain> will not change voice
If a voice library does not support this
domain, this element will be just ignored.
Domain Support
If we want TTS System select the best
voice for this domain automatically
Extended ‘domain’ attribute of <voice>
‘domain’ is still in least priority
<domain name=“weather”>
今天白天 ,晴转多云,最高温度26度
</domain>
<voice domain=“weather”>
今天白天 ,晴转多云,最高温度26度
</voice>
Conclusion
Summarize extension of S3ML
<phoneme alphabet=“py” ph=“…”>
<say-as interpret-as=“...”>
name / address / math / phoneme / net
<domain name=“…”>
<voice domain=“…”>
We hope it will be helpful to define the
standard for internationalizing SSML
Thank You!