Transcript Document

Language Model Grammar Conversion
XML
ABNF
IHD
BNF
BNF
JSGF
Wesley Holland, Julie Baca, Dhruva Duncan, Joseph Picone
Center for Advanced Vehicular Systems
Mississippi State University
Speech Recognition
• Acoustic Model
• Maps audio data to words or phonemes
• Language Model
• Specifies order in which a sequence of words or phonemes is likely to
occur
• Described using grammar
Language Model Grammar Conversion
Page 1 of 10
Grammar Specifications
• Backus-Naur Form (BNF)
• Augmented BNF (ABNF)
• JSpeech Grammar Format (JSGF)
• Speech Recognition Grammar Specification (SRGS)
• ISIP Hierarchical Digraph (IHD)
BNF
ABNF
<A>::=aB
<B>::=bB
<B>::=ε
JSGF
<A>::=ab*
XML-SRGS
<A>=a(b)*;
IHD
a
<item repeat=“0-”>
b
</item>
Language Model Grammar Conversion
Page 2 of 10
Conversion Design
• Goals
• JSGF ↔ IHD
• XML-SRGS ↔ IHD
• Determination of equivalence
• Grammar minimization
• Final Architecture
XML
ABNF
BNF
IHD
JSGF
Language Model Grammar Conversion
Page 3 of 10
JSGF/XML-SRGS → ABNF
• JSGF → ABNF
• Trivial
• Similar in syntax and structure to ABNF
• XML-SRGS → ABNF
• Harder than JSGF
• Different in syntax and structure from ABNF
• Requires enumeration of certain repeat attributes
XML-SRGS
ABNF
<item repeat=‘1-2’>
a
b
</item>
<S>::=(ab)|(abab)
<item repeat=‘2-’>
a
b
</item>
<S>::=abab(ab)*
Language Model Grammar Conversion
Page 4 of 10
JSGF/XML-SRGS → ABNF
• XML-SRGS → ABNF (continued)
• Different weighting mechanisms (weight and repeat-prob attributes)
a
<item repeat=“0-” repeat-prob=“.45”>
b
</item>
<one-of>
<item weight=“.4”>c</item>
<item weight=“.6”>d</item>
</one-of>
Language Model Grammar Conversion
Page 5 of 10
ABNF → BNF
• Normalized BNF
• Consists of rules of the following formats:
•(RULE_NAME)::=(TERMINAL),(NON_TERMINAL)
ABNF
•(RULE_NAME)::=(NON_TERMINAL)
•(RULE_NAME)::=ε
1.
Break rule into multiple rules at each
top-level alternation. Recurse on
each rule.
• Complicated
2.
• Accomplished using a recursive
algorithm that extracts sets of
normalized BNF rules from a set
of ABNF rules
For each concatenation, Kleene star,
or Kleene plus, extract a set of left
symbols and a set of right symbols.
3.
For n left symbols and m right
symbols, create n x m connecting
rules.
• ABNF → BNF
BNF
Language Model Grammar Conversion
Page 6 of 10
BNF ↔ IHD
• BNF ↔ IHD
• Each arc translates to a normalized BNF
• Terminals correspond to nodes; concatenations correspond to arcs
BNF
RS→R0
R3→C,R3
RS→R1
R3→C,RT
R0→A,R3 RT→ε
R1→B,R3
Language Model Grammar Conversion
IHD
Nodes Arcs
(S,1) (2,3)
1: A
(S,2) (3,3)
2: B
(1,3) (3,T)
3: C
Page 7 of 10
BNF → JSGF/XML-SRGS
• BNF → JSGF/XML-SRGS
• Rule-by-rule
• Trivial
XML-SRGS
<rule id=“a”>
a
<ruleref uri=“#b”/>
</rule>
BNF
<A>::=aB
<B>::=bB
<B>::=ε
Language Model Grammar Conversion
JSGF
<A>=aB;
<B>=b*;
<rule id=“b”>
<one-of>
<item>
b
<ruleref uri=“#b”/>
</item>
<item>
<ruleref special=
“NULL”/>
</item>
</one-of>
</rule>
Page 8 of 10
Software Tools
• ISIP Network Converter
• Console tool to perform conversions to and from arbitrary grammar
formats
• ISIP Network Builder
• Java-based graphical tool to design grammars as finite state machines
• Can exports grammars to JSGF, XML-SRGS, ABNF, BNF, and IHD
• ISIP Language Model Tester
• Console tool for testing of grammars
• Can generate valid sentences in a given grammar
• Can parse sentences and determine if accepted by a given grammar.
Language Model Grammar Conversion
Page 9 of 10
Minimization
• Minimization
• Happens in BNF
• Iterate over rule set, merging redundant rules
• Rules can be merged if the non terminal of both rules reference the same
terminal
• Example:
Language Model Grammar Conversion
Page 10 of 10