A Weight Pushing Algorithm

Download Report

Transcript A Weight Pushing Algorithm

A Weight Pushing Algorithm
Michael Güttinger
Overview
•
•
•
•
Semirings
Weighted finite-state acceptors (WFSAs)
The Weight Pushing Algorithm
Results
Semirings
•
•
•
•
•
•
•
Semiring (K, ⊕,⊗,0 1
, )
K a set
⊕ associative and commutative
⊗ associative
a ⊗(b ⊕ c) = a ⊗ b ⊕ a ⊗ c
Identity 0 (⊕), Identity 1 (⊗)
0 ⊗ a= a ⊗ 0=0
Semirings Examples
• Probability semiring (ℝ₊,+,*,0,1)
• Log semiring (ℝ₊{∞},⊕
,+,∞,0)
with
l
∀ a,b ∈ ℝ{∞},a⊕
b =-log(exp(-a)+exp(l
b))
• Tropical Semiring (ℝ₊{∞},min,+, {∞},0)
Weighted Finite State Acceptors
(WFSAs)
• WFSA A=(∑,Q,E,i,F,λ,ρ) over a Semiring K
• ∑ alphabet
• Q finite set of states
• E finite set of transitions ⊆Q∑{ε}KQ
• Initial state i∈Q, set of final states F ⊆ Q
• Initial weight λ,final weight function ρ
WFSA Transitions
•
•
•
•
•
Transition t = (p[t],l[t],w[t],n[t])
Source state p[t],
Destination state n[t],
Label l[t],
Weight w[t]
WFSA: Path
• A path in A is a consecutive transitions t1 ,.., t n with:
n[t ] = p[ t ] (∀i=1,..,n-1)
• A successful path π= t1 ,.., t nis a path from the initial state i
to a final state F
• The weight w[π] of path π is:
t ])
w[π] = λ ⊗ w[t ] ⊗ .... ⊗ w[t ] ⊗ ρ(n[
• Two WFSAs are equivalent when they associate the same
weight with any given input string.
• Equivalent WFSAs may have weights distributed
differently along their paths.
i
1
i 1
n
n
Motivation for Weight Pushing
• The weight pushing algorithm will build an
equivalent automata whose weight distribution is
better for pruning during speech recognition.
• This means pushing the weights as far towards the
initial state as possible.
• The final weights assigned to all arcs leaving any
given state will sum to unity.
Weight Pushing
• Potential function V:Q K – {0}:
The weights are updated by:
• λ  λ ⊗ V(i)
• ∀ e E,w[e]  [V(p[e])] ⁻¹ ⊗ (w[e] ⊗V(n[e]))
• ∀ f F, ρ(f)  [V(f)] ⁻¹⊗ ρ[f]
• D[q] =  
w[
π
]
P(q)
• V(q) is equal to the shortest distance from q to any
final state.
Weighted Acceptor A
Tropical Semiring
V(q)= min w[π]
 P ( q )
V[0]=0;
V[1]=0;
V[2]=10;
V[3]=0;
w[e]  [V(p[e])] ⁻¹ + (w[e] +V(n[e]))
w[e]  w[e] +V(n[e])- V(p[e])
Result of Pushing A over the Tropical
Semiring
Weighted Acceptor A
Log - Semiring
V(q)= ⊕l w[π]
 P ( q )
w[e]  [V(p[e])] ⁻¹ ⊗ (w[e]
⊗V(n[e]))
λ  λ ⊗ V(i)
Result of Pushing A over Log Semiring
Consequence of Pushing
• In the Tropical Semiring the shortest path
from each state to a final State has weight 0
• In the Automaton we obtained from A by
pushing over the log semiring:
At each state,the outgoing weights sum to 1
• With classical minimization the size of both
automatons can be reduced
Conditions for Computing V(q)
• The semiring K is divisible, when
a, b  K : a  b  0 : a1  K
a  (a  b)  a1
• A k-closed Semiring is a semiring for which there
exists a k such that:
k 1
a  K , n0 a  n0 a
n
k
n
Algorithm for Computing V(q)
Source-Shortest-Distance(q); (tropical Semiring)
For j= 1 to |Q|
do d[j] =r[j]= ∞ (0 ) d[j] an estimate of the shortest
d[q] = r[q] =0 (1)
distance from q to j
S={q}
r[j] the total weight add to d[j]since
While S
the last time j was extracted from S
do node=head(S)
DEQUEUE(S)
R=r[node]
r[node]=∞ (0 )
for each e ∈ E[q]
do if d[n[e]]min (d[n[e]],(R+w[e]) //d[n[e]]  d[n[e]]⊕ (R⊗w[e])
then d[n[e]]min (d[n[e]],(R+w[e]) // d[n[e]] = d[n[e]] ⊕ (R⊗w[e])
r[n[e]]min (r[n[e]],(R+w[e]) // r[n[e]] = r[n[e]] ⊕ (R⊗w[e])
if n[e] ∉ S
then ENQUEUE(S,n[e])
d[q]=0
Algorithm to compute V(q)
Source-Shortest-Distance(q); ()
For j= 1 to |Q|
do d[j] =r[j]= 0
d[j] an estimate of the shortest
d[q] = r[q] = 1
distance from q to j
S={q}
r[j] the total weight add to d[j]since
While S
the last time j was extracted from S
do node=head(S)
DEQUEUE(S)
R=r[node]
r[node]=0
for each e ∈ E[q]
do if d[n[e]]d[n[e]]⊕ (R⊗w[e])
then d[n[e]] = d[n[e]] ⊕ (R⊗w[e])
r[n[e]] = r[n[e]] ⊕ (R⊗w[e])
if n[e] ∉ S
then ENQUEUE(S,n[e])
d[q]=0
Some facts about Weight Pushing
• Using tropical or log semiring pushing in the
minimization step result in equivalent machines.
• But the distribution of the weights differs often
radically
• The log semirring benefits speech recognition
pruning
• Using the tropical semiring can be harmful in
some cases
Experiments and Results
40k-word NAB
•
•
•
•
•
•
Task:North American BusinessNews
Vocabulary size: 40K Words
Trigram language model
with 4 000 000 transitions
Triphonic acoustic model
Compaq alpha processor
unpushed
pushed(log)
pushed(tropical)
86
Word
Accuracy
84
82
80
78
76
74
0
0,5
1
X Real Time
40k-word NAB
1,5
160k-word NAB
•
•
•
•
•
•
Task:North American BusinessNews
Vocabulary size: 160K Words
6-gram language model
with 40 000 000 transitions
Triphonic acoustic model
Compaq alpha processor
unpushed
pushed(log)
pushed(tropical)
90
Word
Accuracy
89
88
87
86
85
0
0,025 0,05 0,075 0,1 0,125 0,15 0,175 0,2
X Real Time
160-word NAB
Summary
• The principle of weight pushing.
• The algorithm for the shortest distance.
• In consequence of the weight pushing algorithm is
that speech recognition will be much faster.