Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG [email protected] SNLP Unit, FTRD Beijing 2005/11/2 V1.1

Download Report

Transcript Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG [email protected] SNLP Unit, FTRD Beijing 2005/11/2 V1.1

Towards Synthesis of Focus in Mandarin
Text-to-speech System
Dr. Dezhi HUANG
[email protected]
SNLP Unit, FTRD Beijing
2005/11/2 V1.1
Table of Contents
1
Synthesis of focus
2
Proposal for SSML
3
Examples with <focus>
2
Human has the strong ability of
information reconstruct
Evidence from music perception
The “Butterfly Lovers” violin concerto
3
Human has the strong ability of
information reconstruct (Cont.)
Evidence from human vision
4
Application model of Mandarin
Text-to-speech (Cont.)
Spoken dialog system
Information query by the side of road
Mandarin Voice-enabled
Service Gateway
PSTN/
Wireless
Mandarin TTS
Engine
Angry
Environment Noise
5
Why we fail?
The important content is not prominent as we
expect

Weaken the background noise (Noise reduction)

Improve the prominence of information that we need

Utilizing the human ability of information reconstruct
6
What do we need in speech
communication?
The key information is always contained in a
phrase/word in a sentence

Have you always seen Prof. Zhao?
No, I saw him only once.

The container of key information is called the focus.


The semantic centre of a sentence
7
The value of synthesis of focus
It is helpful for

Analyzing the syntactic of sentence

Understanding the meaning of utterance

Capturing the turn-taking

Comprehending the attempt and emotion of speaker
Improve the acceptance of TTS
8
Key challenges in synthesis of focus
Difficult to locate a focus in a sentence

Some focuses can be found from the syntactic structure


明天你准备去买什么?我要去买红色的帽子。
The other focuses are decided by the context of a sentence



老王去年退休了。
老王去年退休了。
老王去年退休了。
Markup Language for Focus
Lack of appropriate acoustic model to realize a focus





Pitch accent
Duration
Energy
Pause
Weakness
9
Table of Contents
1
Synthesis of focus
Make the synthesized speech clear
Improve the validity of speech communication
with TTS
2
Proposal for SSML
3
Examples with <focus>
10
What is SSML?
It is designed to provide a rich, XML-based
markup language for assisting the generation of
synthetic speech in Web and other applications
SSML
Natural Language
Processing and Understanding
Speech Synthesis
11
<EMPHASIS> in SSML
The emphasis element requests that the contained
text be spoken with emphasis (also referred to as
prominence or stress)

Level: strong, moderate and none

For synthesizer, it is easy to know which word has
sentence stress


老王买了车。
老王买了车。
12
The proposed <focus> element
The focus element indicates that the contained
text be the semantic centre and the carrier of
important information of a sentence
In the perspective of pragmatics

Contrastive focus (also referred to as identificational
focus)

Informational focus (also referred to as the
presentational focus, natural focus)
13
Samples of focus
(1) 你经常见赵教授吗?
 我见过他一次。
(2)昨天老张干什么了?
 昨天老张去看病。
(3)是老张帮我修了车。
(4)他连我也不相信。
(5)他经常和我打球。
(6)他居然卖了房子。
(7)我们去钓鱼吧。
14
A focus in Mandarin is not one-to-one
corresponding with an emphasis
Most of focuses are realized by stresses

是老张退休了。

明天最高气温多少度?明天最高气温30度。
Some of them are realized by pause or intonation

你常常见赵老师吗?我见过他一次。

我们下象棋吧。
15
Differences between focus and
emphasis
Focus is the concept of semantics and pragmatics

We can mark the focus up without speech signal

国家工商总局昨天发出紧急通知强调,全国大中城市、边境地区、发
生过疫情的地区、养殖大省四类区域必须建立健全禽类产品“挂牌经
营”制度,市场内禽类产品要标明禽类生产地、动物检验检疫证明及
销售承诺。
Emphasis is the concept of psychoacoustics

The consistency of emphasis label is relatively
difficult to achieve without speech signal
16
Differences between focus and
emphasis (Cont.)
Focus always carries the purpose of utterance

We can know exactly what the sentence means
Emphasis is not directly linked to the purpose of
utterance

The emphasized word may be trivial

黄菊强调,认真学习贯彻五中全会精神,继续推进国有商
业银行改革。

他经常和我打球。
17
What can we benefit from focus
labeling?
Improve the intelligibility of synthesized speech,
especially in communication environment with
noise
Q:明天最晚一班到北京的飞机是几点?
A:在晚上9点钟有一班CZ8071的飞机飞往北京。
Q:几点钟?
A:是9点。
Q:哪一班?
A:是CZ8071。
18
What can we benefit from focus
labeling? (Cont.)
focus labeling can be directly applied to text
information processing

The next generation of search engine should need to
know


which is the topic of a paragraph
which are the focuses of a sentence

Text highlight is important step for information retrieval

Keywords in automatic digest are always the focuses
19
Table of Contents
1
Synthesis of focus
2
Proposal for SSML
<focus> indicates what is semantic centre
<focus> solves the problem of focus location
3
Examples with <focus>
20
Attributes of <focus>
Type

informational

contrastive
Method





StrongStress
ModerateStress
None
Pause
Intonation
21
Samples of <focus>
(1) 你经常见<focus type=“informational”
method=“StrongStress ”>赵教授</focus>吗?

我见过他<focus type=“informational” method=“Pause”>
一次</focus>。
(2)昨天老张干什么了?

昨天老张<focus type=“informational”
method=“ModerateStress ”>去看病</focus>。
(3)是<focus type=“contrastive” method=“StrongStress ”>老
张</focus>帮我修了车。
22
Samples of <focus> (Cont.)
(4)他连<focus type=“contrastive” method=“StrongStress ”>
我</focus>也不相信。
(5)他经常<focus type=“informational” method=“Pause”>和
我打球</focus>。
(6)他居然<focus type=“informational”
method=“ModerateStress ”>卖了房子</focus>。
(7)我们<focus type=“informational” method=“Intonation ”>
去钓鱼</focus>吧。
23
Thank you!
24