Email address protected by JavaScript.
Please enable JavaScript to contact me.

The CMU Sphinx Group Open Source Speech Recognition Engines

Speech at CMU   |   Sphinx at SourceForge

Introduction

General Documentation

CMUSphinx Components

Common library

Decoders

Acoustic Model Training

Language Model Training

Utilities


Latest News

PocketSphinx: 0.5 release
2008-07-08 16:02
Read More »

cmudict.0.7a release
2008-02-19 18:22
Read More »

New IRC channel and documentation wiki
2007-12-20 16:01
Read More »

Site news archive »


External Links

Notice: if you have comments about the links below, please contact the authors directly.

Resources to build a complete system

Introduction

A complete speech recognition system will include data prepared using tools from outside sources, as well as programs available from this site.

Minimally, such a system will have an acoustic model trainer and a decoder, using audio data, a dictionary, and a language model possibly created outside. This page gives you pointers to tools and data that will allow you to create a full speech recognition system. Keep in mind, though, that building a working system requires knowledge in speech processing that this site cannot provide.

Audio data

Most of the reported results in speech recognition use data made available via the Linguistic Data Consortium (LDC). There you will find audio/text data in several levels of complexity, but most of it is licensed, and you will need to pay for it.

CMU has made available the AN4 database, both in its original format and rerecorded through a microphone array. The database is publicly available. Note that it is a small database, which can be used to build a toy or test system, but which does not yield a system with high accuracy.

Open Source Models

If you prefer to skip the data preparation tools, you may retrieve acoustic models, language models, and dictionaries directly from the Open Source Models page. These models were trained from large databases, and may just work for your needs.

You will also find packages containing acoustic models in the Sphinx-4 release page.

Finally, you can find models for the Spanish language at ITESM, in Mexico, with a mirror at CMU.

Dictionary

A dictionary is a file containing a mapping between words to be recognizer and its phonetic transcription. The phonetic transcription uses the phonetic unit used by the system. Most commonly, the system is designed to use phonemes as the phonetic unit, but it is also common that the system is designed to use a word or even a whole phrase as the phonetic unit.

CMU has made available the cmudict, which maps a large dictionary (100k+ words) to their phonemes.

Language Model

Language is commonly modeled through a statistical language models (SLM) or through the use of a finite state grammar (FSG). Sphinx-2, Sphinx3, and Sphinx-4 can handle both SLM and FSG. CMU provides tools for building statistical language models. FSGs have to be built by hand, or using tools not provided here.

To build a language model, you can use an online LM tool, or you can download and compile the CMU Statistical Language Model toolkit.

Acoustic Model Trainer

CMU provides an acoustic model trainer that can be used to produce continuous or semi-continuous HMMs. It produces models compatible with PocketSphinx, Sphinx-2, Sphinx-3, and Sphinx-4. You have several options to retrieve SphinxTrain.

Decoder

CMU offers several versions of the Sphinx decoder. You can check a quick comparison between the versions. You can check the download instructions for pocketsphinx, sphinx2, sphinx3, and sphinx4.

SourceForge.net Logo This page is maintained by David Huggins-Daines ()
CMUSphinx is a project within the Sphinx Group at Carnegie Mellon