03_implementing the hmm based recognition on fpga
TRANSCRIPT
44
Chapter IV
Implementing The HMM Based Recognition
On FPGA
This section describes speech recognizer using HMM and how to use to tools to
implementing the HMM for speech recognition using SoPC system on FPGA board.
The board is DE2, a very basic FPGA board.
IV.1. Introduction to FPGA Technology [3]
The field-programmable gate array (FPGA) is a semiconductor device that can be programmed after manufacturing. Instead of being restricted to any predetermined
hardware function, an FPGA allows you to program product features and functions,
adapt to new standards, and reconfigure hardware for specific applications even
after the product has been installed in the field—hence the name "field-
programmable". You can use an FPGA to implement any logical function that an
application-specific integrated circuit (ASIC) could perform, but the ability to
update the functionality after shipping offers advantages for many applications.
Unlike previous generation FPGAs using I/Os with programmable logic and
interconnects, today's FPGAs consist of various mixes of configurable embedded
SRAM, high-speed transceivers, high-speed I/Os, logic blocks, and routing.
Specifically, an FPGA contains programmable logic components called logic
elements (LEs) and a hierarchy of reconfigurable interconnects that allow the LEs
to be physically connected. You can configure LEs to perform complex
combinational functions, or merely simple logic gates like AND and XOR. In most
FPGAs, the logic blocks also include memory elements, which may be simple
flipflops or more complete blocks of memory.
As FPGAs continue to evolve, the devices have become more integrated. Hard
intellectual property (IP) blocks built into the FPGA fabric provide rich functions
while lowering power and cost and freeing up logic resources for product
differentiation. Newer FPGA families are being developed with hard embedded
processors, transforming the devices into systems on a chip (SoC). This is the
reason which a speech recognizer can implement on an FPGA.
45
IV.2. Speech Recognition Using HMM
Assume we need to build a isolated word recognition that can recognize W words.
Following the 2 steps:
1) Build HMM parameters λw = (A, B, π)w for each word. Each model of word
can be trained by L occurrences of spoken word (spoken by 1 or more
talkers). We use the algorithms in Problem 3 to perform this task.
2) For each unknown word that need to be recognized, the figure IV.1 must be
carried out. The feature vector is obtained from Feature Extraction block,
followed by Vector Quantization to transfer the continuous feature vector
into observation sequence belong to the finite set of vectors in the codebook.
The codebook was created by K-Mean vectors algorithm. Then, estimation
of probability for all models of words is done (see Problem 1), the
recognized word is the word with highest probability.
Feature Extraction
Speech
signal
S(n)
Vector Quantization
Feature vectors {O=O1, O2, … ,OT}
Probability Computation
λ1
HMM for word 1
Probability Computation
λ2
HMM for word 2
Probability Computation
λW
HMM for word W
Select Maximum
P(O, λ1)
P(O, λ2)
P(O, λw)
Observation sequence
Index of Recognized Word
iwOPi )]|(max[arg*
Figure IV.1. Block diagram of an isolated word HMM recognizer
46
IV.3. SoPC – Based Speech Recognition
I decided to use system on a programmable chip system (SoPC) on FPGA for the
speech recognition. A basic system requires application programs, running on a
customizable processor, that can implement custom digital hardware for
computationally intensive operations such as K-Means, Viterbi decoding, ect. Using
a soft-core processor, I can implement and customize various interface, including
serial, parallel,…
The Altera development board – DE2, Quaruts II software, SoPC Builder, Nios
Integrated Development Environment (IDE) are used in this project to develop an
SoPC design. I can perform hardware design and simulation using the Quartus II
software and use SoPC Builder to create the readily available components. With the
Nios II IDE, I created application software for the Nios II processor. SoPC
Builder’s interface provided by the Nios II hardware application layer make the
Nios II processor and an FPGA the ideal platform for implementing my on-line
speech recognizer.
The figure IV.2 present the SoPC system for isolated word recognition in this
project on DE2 board. And figure IV.3 show the system on SoPC builder.
Nios II processor (a soft processor) and the interfaces needed to connect to other
chips on the DE2 board are in the Cyclone II FPGA chip. These components are
interconnected by means of the interconnection network called the Avalon Switch
Fabric. The memory blocks in the Cyclone II device can be used to provide memory
for the Nios II processor. The SRAM, SDRAM, Flash memory, Analog Digital
Converter (ADC) chip and small LCD on the DE2 boar d are accessed or controlled
through the appropriate interface. Parallel and serial input/output interfaces provide
typical I/O computer systems. A special JTAG UART interface is used to connect
to circuitry that provides a Universal Serial Bus (USB) link to the host computer to
which the DE2 board is connected. All parts of the Nios system implemented on the
FPGA chip are defined by using a hardware description language (I used Verilog).
47
In the ADC chip WM8731, stereo line and mono microphone level audio inputs are
provided. Stereo 24-bit multi-bit sigma delta ADCs and DACs are used with
oversampling digital interpolation and decimation filters. Digital audio input word
lengths from 16-32 bits and sampling rates from 8kHz to 96 kHz are supported.
The LCD is 16x2 character display to indicate the process results.
Figure IV.2. The Nios system for speech recognition on DE2 board
AD
C ch
ip
WM
8731
24b
it
Nios II
Processor
JTAG Debug
module
JTAG UART
interface
Cyclone II
FPGA chip
Avalon switch fabric
On-chip Memory
32KB
SRAM
interface
SDRAM interface
e
Flash memory interface
LCD
interface
ADC
Interface
USB - Blaster
interface
Host
computer
SRAM
512KB
SDRAM
8MB
Flash ROM
4MB
LCD
16
x2
Ch
aracter D
isplay
48
Figure IV.3. The Nios system on SoPC Builder 9.0.
IV.4. Implementing Isolated Speech Recognition on The SoPC system
Depend on the SoPC system designed, I wrote C code to implement the HMM
recognition. The flow chart for main program is presented in figures IV.4 below.
There are 2 main modules in this program: Training and Recognition.
Training
The training block train the HMM model for each word in vocabulary. The feature
vectors of the speech samples for each word are compared with the codebook and
their corresponding nearest codebook vector indices is sent to the training algorithm
to train a model for each word.
Recognition
This block recognizes a unknown word using a maximum likehood estimation. The
feature vectors of speech sample are extracted. Then, the nearest codebook vector
index for each frame is sent to the word models. The system choose the model that
has the maximum likehood .
49
Start
Recognition
/ Training
Recognition Training
Sample Speech Input
to be Recognized
Preprocessing
Find the index of the
nearest codebook vector
for each frame of the input
speech
Codebook input
Find the Probability for the
input being word w=1th to
Wth
Trained model
λ for all words
Find the model with the
maximum probability &
the corresponding word
Recognized word
Speech samples
input for each word
Preprocessing
Making Codebook of size K (using K-Mean
Vector)
Find the index of the
nearest codebook vector
for each frame of the input
speech
Train the HMM
parameters (A, B, π)
for each word
Save the model (A, B, π) for
each word to Flash ROM
Figure IV.4. Main HMM recognition flow chart