03_implementing the hmm based recognition on fpga

6
44 Chapter IV Implementing The HMM Based Recognition On FPGA This section describes speech recognizer using HMM and how to use to tools to implementing the HMM for speech recognition using SoPC system on FPGA board. The board is DE2, a very basic FPGA board. IV.1. Introduction to FPGA Technology [3] The field-programmable gate array (FPGA) is a semiconductor device that can be programmed after manufacturing. Instead of being restricted to any predetermined hardware function, an FPGA allows you to program product features and functions, adapt to new standards, and reconfigure hardware for specific applications even after the product has been installed in the fieldhence the name "field- programmable". You can use an FPGA to implement any logical function that an application-specific integrated circuit (ASIC) could perform, but the ability to update the functionality after shipping offers advantages for many applications. Unlike previous generation FPGAs using I/Os with programmable logic and interconnects, today's FPGAs consist of various mixes of configurable embedded SRAM, high-speed transceivers, high-speed I/Os, logic blocks, and routing. Specifically, an FPGA contains programmable logic components called logic elements (LEs) and a hierarchy of reconfigurable interconnects that allow the LEs to be physically connected. You can configure LEs to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flipflops or more complete blocks of memory. As FPGAs continue to evolve, the devices have become more integrated. Hard intellectual property (IP) blocks built into the FPGA fabric provide rich functions while lowering power and cost and freeing up logic resources for product differentiation. Newer FPGA families are being developed with hard embedded processors, transforming the devices into systems on a chip (SoC). This is the reason which a speech recognizer can implement on an FPGA.

Upload: lotfi-grine

Post on 13-Apr-2015

40 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 03_Implementing the HMM Based Recognition on FPGA

44

Chapter IV

Implementing The HMM Based Recognition

On FPGA

This section describes speech recognizer using HMM and how to use to tools to

implementing the HMM for speech recognition using SoPC system on FPGA board.

The board is DE2, a very basic FPGA board.

IV.1. Introduction to FPGA Technology [3]

The field-programmable gate array (FPGA) is a semiconductor device that can be programmed after manufacturing. Instead of being restricted to any predetermined

hardware function, an FPGA allows you to program product features and functions,

adapt to new standards, and reconfigure hardware for specific applications even

after the product has been installed in the field—hence the name "field-

programmable". You can use an FPGA to implement any logical function that an

application-specific integrated circuit (ASIC) could perform, but the ability to

update the functionality after shipping offers advantages for many applications.

Unlike previous generation FPGAs using I/Os with programmable logic and

interconnects, today's FPGAs consist of various mixes of configurable embedded

SRAM, high-speed transceivers, high-speed I/Os, logic blocks, and routing.

Specifically, an FPGA contains programmable logic components called logic

elements (LEs) and a hierarchy of reconfigurable interconnects that allow the LEs

to be physically connected. You can configure LEs to perform complex

combinational functions, or merely simple logic gates like AND and XOR. In most

FPGAs, the logic blocks also include memory elements, which may be simple

flipflops or more complete blocks of memory.

As FPGAs continue to evolve, the devices have become more integrated. Hard

intellectual property (IP) blocks built into the FPGA fabric provide rich functions

while lowering power and cost and freeing up logic resources for product

differentiation. Newer FPGA families are being developed with hard embedded

processors, transforming the devices into systems on a chip (SoC). This is the

reason which a speech recognizer can implement on an FPGA.

Page 2: 03_Implementing the HMM Based Recognition on FPGA

45

IV.2. Speech Recognition Using HMM

Assume we need to build a isolated word recognition that can recognize W words.

Following the 2 steps:

1) Build HMM parameters λw = (A, B, π)w for each word. Each model of word

can be trained by L occurrences of spoken word (spoken by 1 or more

talkers). We use the algorithms in Problem 3 to perform this task.

2) For each unknown word that need to be recognized, the figure IV.1 must be

carried out. The feature vector is obtained from Feature Extraction block,

followed by Vector Quantization to transfer the continuous feature vector

into observation sequence belong to the finite set of vectors in the codebook.

The codebook was created by K-Mean vectors algorithm. Then, estimation

of probability for all models of words is done (see Problem 1), the

recognized word is the word with highest probability.

Feature Extraction

Speech

signal

S(n)

Vector Quantization

Feature vectors {O=O1, O2, … ,OT}

Probability Computation

λ1

HMM for word 1

Probability Computation

λ2

HMM for word 2

Probability Computation

λW

HMM for word W

Select Maximum

P(O, λ1)

P(O, λ2)

P(O, λw)

Observation sequence

Index of Recognized Word

iwOPi )]|(max[arg*

Figure IV.1. Block diagram of an isolated word HMM recognizer

Page 3: 03_Implementing the HMM Based Recognition on FPGA

46

IV.3. SoPC – Based Speech Recognition

I decided to use system on a programmable chip system (SoPC) on FPGA for the

speech recognition. A basic system requires application programs, running on a

customizable processor, that can implement custom digital hardware for

computationally intensive operations such as K-Means, Viterbi decoding, ect. Using

a soft-core processor, I can implement and customize various interface, including

serial, parallel,…

The Altera development board – DE2, Quaruts II software, SoPC Builder, Nios

Integrated Development Environment (IDE) are used in this project to develop an

SoPC design. I can perform hardware design and simulation using the Quartus II

software and use SoPC Builder to create the readily available components. With the

Nios II IDE, I created application software for the Nios II processor. SoPC

Builder’s interface provided by the Nios II hardware application layer make the

Nios II processor and an FPGA the ideal platform for implementing my on-line

speech recognizer.

The figure IV.2 present the SoPC system for isolated word recognition in this

project on DE2 board. And figure IV.3 show the system on SoPC builder.

Nios II processor (a soft processor) and the interfaces needed to connect to other

chips on the DE2 board are in the Cyclone II FPGA chip. These components are

interconnected by means of the interconnection network called the Avalon Switch

Fabric. The memory blocks in the Cyclone II device can be used to provide memory

for the Nios II processor. The SRAM, SDRAM, Flash memory, Analog Digital

Converter (ADC) chip and small LCD on the DE2 boar d are accessed or controlled

through the appropriate interface. Parallel and serial input/output interfaces provide

typical I/O computer systems. A special JTAG UART interface is used to connect

to circuitry that provides a Universal Serial Bus (USB) link to the host computer to

which the DE2 board is connected. All parts of the Nios system implemented on the

FPGA chip are defined by using a hardware description language (I used Verilog).

Page 4: 03_Implementing the HMM Based Recognition on FPGA

47

In the ADC chip WM8731, stereo line and mono microphone level audio inputs are

provided. Stereo 24-bit multi-bit sigma delta ADCs and DACs are used with

oversampling digital interpolation and decimation filters. Digital audio input word

lengths from 16-32 bits and sampling rates from 8kHz to 96 kHz are supported.

The LCD is 16x2 character display to indicate the process results.

Figure IV.2. The Nios system for speech recognition on DE2 board

AD

C ch

ip

WM

8731

24b

it

Nios II

Processor

JTAG Debug

module

JTAG UART

interface

Cyclone II

FPGA chip

Avalon switch fabric

On-chip Memory

32KB

SRAM

interface

SDRAM interface

e

Flash memory interface

LCD

interface

ADC

Interface

USB - Blaster

interface

Host

computer

SRAM

512KB

SDRAM

8MB

Flash ROM

4MB

LCD

16

x2

Ch

aracter D

isplay

Page 5: 03_Implementing the HMM Based Recognition on FPGA

48

Figure IV.3. The Nios system on SoPC Builder 9.0.

IV.4. Implementing Isolated Speech Recognition on The SoPC system

Depend on the SoPC system designed, I wrote C code to implement the HMM

recognition. The flow chart for main program is presented in figures IV.4 below.

There are 2 main modules in this program: Training and Recognition.

Training

The training block train the HMM model for each word in vocabulary. The feature

vectors of the speech samples for each word are compared with the codebook and

their corresponding nearest codebook vector indices is sent to the training algorithm

to train a model for each word.

Recognition

This block recognizes a unknown word using a maximum likehood estimation. The

feature vectors of speech sample are extracted. Then, the nearest codebook vector

index for each frame is sent to the word models. The system choose the model that

has the maximum likehood .

Page 6: 03_Implementing the HMM Based Recognition on FPGA

49

Start

Recognition

/ Training

Recognition Training

Sample Speech Input

to be Recognized

Preprocessing

Find the index of the

nearest codebook vector

for each frame of the input

speech

Codebook input

Find the Probability for the

input being word w=1th to

Wth

Trained model

λ for all words

Find the model with the

maximum probability &

the corresponding word

Recognized word

Speech samples

input for each word

Preprocessing

Making Codebook of size K (using K-Mean

Vector)

Find the index of the

nearest codebook vector

for each frame of the input

speech

Train the HMM

parameters (A, B, π)

for each word

Save the model (A, B, π) for

each word to Flash ROM

Figure IV.4. Main HMM recognition flow chart