An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin.

Download Report

Transcript An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin.

An Introduction to S3ML
Beijing InfoQuick SinoVoice Speech
Technology Corp.
CHEN Ming, LV Shinan, LI Xiulin
Outline





Background
PinYin Support
<say-as> Definition
Domain Support
Conclusion
Background

SSML




Speech Synthesis Markup Language
http://www.w3.org/TR/speech-synthesis/
Now is W3C Recommendation
SinoVoice



Famous Speech Technology and Service Provider
Leading Chinese TTS Technology and Product
Deploy 1000+ Real Systems
Background

S3ML (SinoVoice SSML)





Since the launching of jTTS 4.0, March 2004
Based on SSML Specification
Defines some extensions aiming at Chinese TTS
Defines the detail of some elements which SSML
does not define precisely
Provide maximum compatibility with newest SSML
version
PinYin Support

PinYin


Phoneme annotation method for Chinese
characters
<phoneme> in SSML


The phoneme element provides a
phonemic/phonetic pronunciation for the
contained text.
Two attributes: alphabet and ph
PinYin Support

alphabet



The alphabet attribute is an optional attribute
that specifies the phonemic/phonetic alphabet.
Use ‘py’ as value of ‘alphabet’ to specify that
PinYin will be used
ph


The ph attribute is a required attribute that
specifies the phoneme/phone string.
Use PinYin string as value of ‘ph’
PinYin Support

Example
<phoneme alphabet="py" ph="zha1">查</phoneme>良镛
<phoneme alphabet="py" ph="zha1 liang2yong1">
查良镛</phoneme>先生

More about PinYin string



Conformed to “Chinese Mandarin PinYin Specification“
Series of PinYin for several characters
Tone information


1~4: high flat, rising, diving and falling tone
0, 5: light tone
PinYin support

When PinYin string is included in normal text?
Next station is <say-as interpret-as="phoneme" format="py">
di4 tan2</say-as>

Comparing with CSSML
<phoneme lang=“zh-cn”>zha1</phoneme>良镛
他姓<phoneme py=“zha1”>查</phoneme>


We think <phoneme> is not for such purpose,
<say-as> is more suitable
We think <phoneme> extension in S3ML is more
compatible with SSML
<say-as> Definition

The detail of <say-as> element


When SinoVoice define S3ML, the detail values of
the attributes of this element is not defined in
SSML.
Now, “SSML 1.0 say-as attribute values” is
proposed but it is still on progress


http://www.w3.org/TR/2005/NOTE-ssml-sayas20050526/
SinoVoice will support this proposal, so I will only
talk about some additional values
<say-as> Definition

Name and address, especially person
name because of the polyphone
Chinese characters
<say-as interpret-as=“name” format=“person”>张朝阳</say-as>
<say-as interpret-as=“address”>朝阳区</say-as>

Math, some mathematic expression is
confused with other info
<say-as interpret-as=“math” >2005-12-13</say-as>
<say-as interpret-as=“math”>+8610-62972997</say-as>
<say-as> Definition

Net address
<say-as interpret-as="net" format="email">[email protected]</say-as>
<say-as interpret-as="net" format="url"> http://www.sinovoice.com.cn
</say-as>

Phoneme, useful for character/phoneme
mixed text
The pronunciation of ‘tomato’ is
<say-as interpret-as="phoneme" format="ipa">
t&#x252;m&#x251;to&#x28A;</say-as>
Next station is <say-as interpret-as="phoneme" format="py">
di4 tan2</say-as>
Domain Support

Important for real system



Customized TTS is used more and more popular
Better voice quality than general version
One possibility in SSML


Use <voice> element and define special values of
‘name’ attribute
But it is not natural because it is normal to
support several different domains by using a same
name (voice library)
Domain Support

<domain> element



The ‘name’ attribute is required to specify
the customized TTS package used
The value of ‘name’ attribute will be a
vendor-specific name
<domain> will not change voice

If a voice library does not support this
domain, this element will be just ignored.
Domain Support

If we want TTS System select the best
voice for this domain automatically


Extended ‘domain’ attribute of <voice>
‘domain’ is still in least priority
<domain name=“weather”>
今天白天 ,晴转多云,最高温度26度
</domain>
<voice domain=“weather”>
今天白天 ,晴转多云,最高温度26度
</voice>
Conclusion

Summarize extension of S3ML


<phoneme alphabet=“py” ph=“…”>
<say-as interpret-as=“...”>




name / address / math / phoneme / net
<domain name=“…”>
<voice domain=“…”>
We hope it will be helpful to define the
standard for internationalizing SSML
Thank You!