Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus

Download Report

Transcript Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus

Comparison of the SPHINX
and HTK Frameworks
Processing the AN4 Corpus
Arthur Kunkle
ECE 5526
Fall 2008
Framework Introduction

CMU Sphinx



Developed by Carnegie
Mellon University.
has been supported by
programs such as DARPA,
IBM, Sun Microsystems
Some notable applications
that use Sphinx include:


Roomline, a conference
room reservation system at
CMU
Let’s Go, a spoken dialog
system in use at
Pittsburgh’s transit system.

HTK



originally developed in 1989
by the Speech Vision and
Robotics Group of
Cambridge University
HTK was purchased by
Entropic Laboratories in
1993 and then again by
Microsoft during its
acquisition of Entropic in
1999.
The HTK source code was
then licensed back to
Cambridge University for
advances in development.
Open source since then
Phase 1: Performance-Based Areas of
Comparison



Training and Decoding using the AN4 Corpus
Same procedure used in Homework #5
Provides following metrics






Decoder time to completion
Decoder accuracy on the sentence level.
Decoder accuracy on the word level.
Types and quantities of decoding errors
encountered during the decoding process.
Notable trends of errors
Memory requirements for recognizer at runtime
Phase 2: Other Notable Areas of
Comparison








Coded data feature format support
Language Modeling support
Overall ease of training and decoding corpora
Notable features of the Software Baseline for each
toolkit
Operating System support
Available documentation and community support
Licensing and usage rights
Future Toolkit development plans
Training and Testing Procedure for AN4
in HTK
Training Procedure Developed




In a “tutorial” format:
HTKTrainingDecoding_tutorial.doc
An example of a full-result developed tutorial
directory is also included on the CD
htktut
Training Results Comparison




Metric
Sphinx3
HTK
Peak Memory Usage (MB)
8.2
5.9
Time to Completion (sec)
63
93
Sentence Error Rate (%)
59.2
69.0
Word Error Rate (%)
21.3
9.0
Word Substitution Errors
92
92
Word Insertion Errors
71
154
Word Deletion Errors
2
0
8 Gaussians per HMM state
Context-dependant Tri-phone state models
Tied states
Finite State Grammar Language Model
Front-End Data Feature Support


Sphinx provides wave2feat for limited conversion to
MFCC (used in a previous homework). However,
“Sphinx trainer and decoder are compatible with
man other data formats”  Need more research into
which specifically
HTK Provides HCopy
to do many different
conversions:
Language Modeling


Both frameworks use N-Gram Statistical Grammar
models as well as Fixed, context-free grammars
(defined by BNF-type networks).
HTK includes two separate modules HLMLib and
HLMTools to provide N-Gram Language Model
training, class-based models, and perplexity
calculations.


NOTE: HTK Book also includes a thorough tutorial building
and training such a model using phrases from Sherlock
Holmes
Sphinx relies on other tools for LM Generation.
(Reference CMU Statistical Language Model toolkit).
Notable Software Baseline Characteristics

Sphinx





Organized across three
components
Huge amount of Code
Uses Unix-style directory
organization
Source files averaged
1200 LOC
Includes automated
tests.

HTK




All in one distribution
Organized into HTKLib,
HTKTools, and HLMLib,
HLMTools
Average LOC: 1400
Only one level of
dependency between
*Tools and *Lib
Documentation

1.
2.
3.
4.
•
HTK has an excellent wealth of information available through
the HTKBook.
The first part of the book gives enough background theory to
equip relatively unversed individuals with enough knowledge to
understand the mechanics of the toolkit.
Section two of the book provides extensive details about the
core architecture of HTK through the major phases of model
training and testing.
Section three provides an in-depth look into the language
modeling features that HTK provides as a part of its framework.
Section four provides a detailed reference to each application
that is provided with the framework.
No comparably detailed information exists for Sphinx. (Does
have automatically maintained Doxygen and JavaDoc,
however).
Licensing (IMPORTANT!)




MAJOR Difference in the restrictions.
HTK – “The Licensed Software either in
whole or in part can not be distributed or sublicensed to any third party in any form.”
Makes the application of HTK a very
important question when deciding.
Sphinx Licensed by CMU, may be redistributed.
Recent Release Activity and Future Plans

Sphinx



Last release of a major
Sphinx component
(Sphinx3) was in
06/2007.
PocketSphinx,
embedded decoder
Sphinx-4, pure Java
implementation.

HTK


Last release of HTK3 in
12/2006
Lack of public
announces.
Comparison Matrix


Developed to summarize results across many
areas of comparison
comparison_matrix.xls
References







Main HTK Website -- http://htk.eng.cam.ac.uk/
Sourceforge Sphinx -http://cmusphinx.sourceforge.net/html/cmusphinx.php
Brief Sphinx/HTK Comparison -http://lima.lti.cs.cmu.edu/moinmoin/SphinxHTK
HTKBook -- http://htk.eng.cam.ac.uk/prot-docs/htk_book.shtml
ASR System Review -- http://www.cis.hut.fi/Opinnot/T61.6040/pellom-2004/lecture-09.pdf
Arthur Chan Sphinx Presentation -http://www.cs.cmu.edu/~archan/sphinxPresentation.html
Sphinx-3 Decoder Wiki -http://cmusphinx.sourceforge.net/sphinx3/doc/s3_description.htm
l#lm_dumpfile