An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin.
Download ReportTranscript An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin.
An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin Outline Background PinYin Support <say-as> Definition Domain Support Conclusion Background SSML Speech Synthesis Markup Language http://www.w3.org/TR/speech-synthesis/ Now is W3C Recommendation SinoVoice Famous Speech Technology and Service Provider Leading Chinese TTS Technology and Product Deploy 1000+ Real Systems Background S3ML (SinoVoice SSML) Since the launching of jTTS 4.0, March 2004 Based on SSML Specification Defines some extensions aiming at Chinese TTS Defines the detail of some elements which SSML does not define precisely Provide maximum compatibility with newest SSML version PinYin Support PinYin Phoneme annotation method for Chinese characters <phoneme> in SSML The phoneme element provides a phonemic/phonetic pronunciation for the contained text. Two attributes: alphabet and ph PinYin Support alphabet The alphabet attribute is an optional attribute that specifies the phonemic/phonetic alphabet. Use ‘py’ as value of ‘alphabet’ to specify that PinYin will be used ph The ph attribute is a required attribute that specifies the phoneme/phone string. Use PinYin string as value of ‘ph’ PinYin Support Example <phoneme alphabet="py" ph="zha1">查</phoneme>良镛 <phoneme alphabet="py" ph="zha1 liang2yong1"> 查良镛</phoneme>先生 More about PinYin string Conformed to “Chinese Mandarin PinYin Specification“ Series of PinYin for several characters Tone information 1~4: high flat, rising, diving and falling tone 0, 5: light tone PinYin support When PinYin string is included in normal text? Next station is <say-as interpret-as="phoneme" format="py"> di4 tan2</say-as> Comparing with CSSML <phoneme lang=“zh-cn”>zha1</phoneme>良镛 他姓<phoneme py=“zha1”>查</phoneme> We think <phoneme> is not for such purpose, <say-as> is more suitable We think <phoneme> extension in S3ML is more compatible with SSML <say-as> Definition The detail of <say-as> element When SinoVoice define S3ML, the detail values of the attributes of this element is not defined in SSML. Now, “SSML 1.0 say-as attribute values” is proposed but it is still on progress http://www.w3.org/TR/2005/NOTE-ssml-sayas20050526/ SinoVoice will support this proposal, so I will only talk about some additional values <say-as> Definition Name and address, especially person name because of the polyphone Chinese characters <say-as interpret-as=“name” format=“person”>张朝阳</say-as> <say-as interpret-as=“address”>朝阳区</say-as> Math, some mathematic expression is confused with other info <say-as interpret-as=“math” >2005-12-13</say-as> <say-as interpret-as=“math”>+8610-62972997</say-as> <say-as> Definition Net address <say-as interpret-as="net" format="email">[email protected]</say-as> <say-as interpret-as="net" format="url"> http://www.sinovoice.com.cn </say-as> Phoneme, useful for character/phoneme mixed text The pronunciation of ‘tomato’ is <say-as interpret-as="phoneme" format="ipa"> tɒmɑtoʊ</say-as> Next station is <say-as interpret-as="phoneme" format="py"> di4 tan2</say-as> Domain Support Important for real system Customized TTS is used more and more popular Better voice quality than general version One possibility in SSML Use <voice> element and define special values of ‘name’ attribute But it is not natural because it is normal to support several different domains by using a same name (voice library) Domain Support <domain> element The ‘name’ attribute is required to specify the customized TTS package used The value of ‘name’ attribute will be a vendor-specific name <domain> will not change voice If a voice library does not support this domain, this element will be just ignored. Domain Support If we want TTS System select the best voice for this domain automatically Extended ‘domain’ attribute of <voice> ‘domain’ is still in least priority <domain name=“weather”> 今天白天 ,晴转多云,最高温度26度 </domain> <voice domain=“weather”> 今天白天 ,晴转多云,最高温度26度 </voice> Conclusion Summarize extension of S3ML <phoneme alphabet=“py” ph=“…”> <say-as interpret-as=“...”> name / address / math / phoneme / net <domain name=“…”> <voice domain=“…”> We hope it will be helpful to define the standard for internationalizing SSML Thank You!