Email address protected by JavaScript.
Please enable JavaScript to contact me.

The CMU Sphinx Group Open Source Speech Recognition Engines

Speech at CMU   |   Sphinx at SourceForge

Introduction

General Documentation

CMUSphinx Components

Common library

Decoders

Acoustic Model Training

Language Model Training

Utilities


Latest News

PocketSphinx: 0.5 release
2008-07-08 16:02
Read More »

cmudict.0.7a release
2008-02-19 18:22
Read More »

New IRC channel and documentation wiki
2007-12-20 16:01
Read More »

Site news archive »


External Links

Notice: if you have comments about the links below, please contact the authors directly.

Current uses of the several versions of Sphinx

The information presented here provides information about current uses of the severla versions of Sphinx. These may be used as guidelines, but keep in mind that there is no definite answer.

PocketSphinx

PocketSphinx is CMU's fastest speech recognition system. It uses Hidden Markov Models (HMM) with semi-continuous output probability density functions (PDF). Even though it is not as accurate as Sphinx-3 or Sphinx-4, it runs at real time, and therefore it is a good choice for live applications. You can find further documentation about PocketSphinx in the release documentation, or at the online documentation.

Sphinx-3

Sphinx-3 is CMU's state-of-the-art large vocabulary speech recognition system. It uses Hidden Markov Models (HMM) with continuous output probability density functions (PDF). It supports several modes of operation. The more accurate mode, known as the "flat decoder", is descended from the original Sphinx-3 release (still available for reference purposes at https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/archive_s3/s3). The faster mode, known as the "tree decoder", was developed separately. The two decoders were merged in Sphinx 3.5, though the flat decoder was not fully functional until Sphinx 3.7. Further documentation can be found in the release documentation, or at the online documentation.

Sphinx-4

Sphinx-4 is a state-of-the-art speech recognition system written entirely in the Java(tm) programming language. It uses Hidden Markov Models (HMM) with continuous output probability density functions (PDF).

For further detail, please check the Sphinx-4 page.

SphinxTrain

SphinxTrain is CMU Sphinx's training package. It trains models in Sphinx-3 format, which is also used by PocketSphinx. The Sphinx-2 format can also be converted to Sphinx-2 format under some conditions related to Sphinx-2's limitations. At this point, Sphinx-4 uses Sphinx-3 models.

Sphinx-2

Sphinx-2 is a fast speech recognition system, the predecessor of PocketSphinx. It is not being actively developed at this time, but is still widely used in interactive applications. It uses Hidden Markov Models (HMM) with semi-continuous output probability density functions (PDF). Even though it is not as accurate as Sphinx-3 or Sphinx-4, it runs at real time, and therefore it is a good choice for live applications. You can find further documentation about Sphinx-2 in the release documentation, or at the online documentation.

Comparison

We have some regression tests comparing sphinx4 to s3 (flat decoder) and to s3.3 (fast decoder) in several different tasks, ranging from digits only to medium-large vocab. s3 (flat decoder) is often the most accurate, but Sphinx4 is faster and more accurate than sphinx3.3 in some of these tests.

If you're familiar with ARPA evaluations (using databases available via the Linguistic Data Consortium (LDC) at U.Penn), you can find WSJ 5k and RM1 results in the "Medium vocab" table, and WSJ 20k in the "Large vocab" table.

The decision about which version to use depends on how familiar you are with C (sphinx3) or Java (sphinx4), and how easy it is to integrate these into your system.

SourceForge.net Logo This page is maintained by David Huggins-Daines ()
CMUSphinx is a project within the Sphinx Group at Carnegie Mellon