report 123
DESCRIPTION
automatic speech recognitionTRANSCRIPT
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 1/47
1. INTRODUCTION
Automatic Speech Recognition
Automatic speech recognition is the process by which a computer maps an acoustic
speech signal to text. Automatic speech understanding is the process by which a computer
maps an acoustic speech signal to some form of abstract meaning of the speech.
What does speaker dependent / adapti e / independent mean!
A speaker dependent system is developed to operate for a single speaker. These systems
are usually easier to develop, cheaper to buy and more accurate, than but not as flexible
as speaker adaptive or speaker independent systems.
A speaker independent system is developed to operate for any speaker of a particular type
(e.g. American English). These systems are the most difficult to develop, most expensive
and accuracy is lower than speaker dependent systems. owever, they are more flexible.
A speaker adaptive system is developed to adapt its operation to the characteristics of
new speakers. !ts difficulty lies somewhere between speaker independent and speaker
dependent systems.
What does continuous speech or iso"ated#$ord mean!
An isolated"word system operates on single words at a time " re#uiring a pause between
saying each word. This is the simplest form of recognition to perform because the end
points are easier to find and the pronunciation of a word tends not affect others. Thus,
because the occurrences of words are more consistent they are easier to recogni$e.
A continuous speech system operates on speech in which words are connected together,
i.e. not separated by pauses. %ontinuous speech is more difficult to handle because of a
variety of effects. &irst, it is difficult to find the start and end points of words. Another
problem is 'co articulation'. The production of each phoneme is affected by the
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 2/47
production of surrounding phonemes, and similarly the the start and end of words are
affected by the preceding and following words. The recognition of continuous speech is
also affected by the rate of speech (fast speech tends to be harder).
How is speech recognition performed?
A wide variety of techni#ues are used to perform speech recognition. There are many
types of speech recognition. There are many levels of speech recognition analysis
understanding.
Typically speech recognition starts with the digital sampling of speech. The next stage is
acoustic signal processing. *ost techni#ues include spectral analysis+ e.g. -% analysis
( inear -redictive %oding), *&%% (*el &re#uency %epstral %oefficients), cochlea
modelling and many more.
The next stage is recognition of phonemes, groups of phonemes and words. This stage
can be achieved by many processes such as T/ ( ynamic Time /arping), **
(hidden *arkov modelling), 00s (0eural 0etworks), expert systems and combinations
of techni#ues. **"based systems are currently the most commonly used and most
successful approach. *ost systems utili$e some knowledge of the language to aid the
recognition process.
1ome systems try to 'understand' speech. That is, they try to convert the words into a
representation of what the speaker intended to mean or achieve by what they said.
This is a simple recogni$er that should give you 2345 recognition accuracy. The
accuracy is a function of the words you have in your vocabulary. ong distinct words are
easy. 1hort similar words are hard. 6ou can get 7254 on the digits with this recogni$er.
8verview9
&ind the beginning and end of the utterance.
&ilter the raw signal into fre#uency bands.
%ut the utterance into a fixed number of segments.
Average data for each band in each segment.
:
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 3/47
1tore this pattern with its name.
%ollect training set of about ; repetitions of each pattern (word).
<ecogni$e unknown by comparing its pattern against all patterns in the training set and
returning the name of the pattern closest to the unknown. *any variations upon the
theme can be made to improve the performance. Try different filtering of the raw signaland different processing methods
Automatic speech recognition and speaker verification are among the most challenging
problems of modern man machine interactions. Among their numerous useful
applications are a future =chuckles> society in which all financial transactions are
executed over the telephone and =signed> by voice. Access to confidential data can be
made secure by speaker certification. 8ther applications include voice information and
reservation system covering a wide spectrum of human activities from travel and study to
purchasing and partner matching. !n these applications, spoken re#uests (over the
telephone, say) are understood by machines and answered by synchroni$ed voice. ?oice
control of computers and spacecraft (and machines in general whose operators have
limited use of their hands) is an aspiration of long standing. Activation by voice could be
particularly beneficial for the severally handicapped who have lost one or several limbs.
The surgeon in the middle of operation, needing the latest medical information, is another
instance where only the acoustic channels are still fully available for re#uesting andreceiving the urgently re#uired advice. The ending of =manuscripts> by voice may
supplement much present paper and pushing or mouse play at the graphics terminals.
T he potential applications of speech and speaker recognition are boundless. As early as
7@@ speaker identification was used successfully by the allies to trace the movements of
erman combat unit by analy$ing speech
1pectrograms of enemy voice traffic .remarkably, the human ear is often able to identify a
telephone caller on the basis of a simple =hello> or Bust the clearing of his throat. Cut thedifficulties of recognition by the machine can be staggering. even if we forego automatic
accent classification and especially if we persuade the banks to live with less than
perfection in voice signature(which they really do not need ,considering the large number
of unsigned or falsely signed checks that clear the system everyday)reliable voice
recognition from the large pools of potential speakers on the basis of their speech
;
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 4/47
alone ,will remain problematic for years to come .and as, widely appreciated by now ,the
automatic recognition of anything but isolated words from a limited vocabulary 1poken
by known speakers presents formidable difficulties .decades of painstaking(and
pain)research have shown that purely technical advances will result ,at best limited
improvements"far short of what is child>s play for the human mind.
%. &rincip"es o' Speaker Recognition
1peaker recognition can be classified into identification and verification. 1peaker
identification is the process of determining which registered speaker provides a given
utterance. 1peaker verification, on the other hand, is the process of accepting or reBecting
the identity claim of a speaker. &igure shows the basic structures of speaker identification and verification systems.
1peaker recognition methods can also be divided into textindependent and text"
dependent methods. !n a text"independent system, speaker models capture characteristics
of somebody>s speech which show up irrespective of what one is saying. !n a text"
dependent system, on the other hand, the recognition of the speaker>s identity is based on
his or her speaking one or more specific phrases, like passwords, card numbers, -!0
codes, etc. All technologies of speaker recognition, identification and verification, text"
independent and text"dependent, each have its own advantages and disadvantages and
may re#uires different treatments and techni#ues. The choice of which technology to use
is application"specific.
At the highest level, all speaker recognition systems contain two main modules (refer to
&igure )9 feature extraction and feature matching. &eature extraction is the process that
extracts a small amount of data from the voice signal that can later be used to representeach speaker. &eature matching involves the actual procedure to identify the unknown
speaker by comparing extracted features from his her voice input with the ones from a set
of known speakers. /e will discuss each module in detail in later sections.
@
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 5/47
All speaker recognition systems have to serve two distinguish phases. The first one is
referred to the enrollment sessions or training phase while the second one is referred to asthe operation sessions or testing phase. !n the training phase, each registered speaker has
to provide samples of their speech so that the system can build or train a reference model
for that speaker. !n case of speaker verification systems, in addition, a speaker"specific
threshold is also computed from the training samples. uring the testing (operational)
phase (see &igure ), the input speech is matched with stored reference model(s) and
recognition decision is made.
1peaker recognition is a difficult task and it is still an active research area. Automatic
speaker recognition works based on the premise that a person>s speech exhibits
characteristics that are uni#ue to the speaker. owever this task has been challenged by
the highly variant of input speech signals. The principle source of variance comes form
the speakers themselves. 1peech signals in training and testing sessions can be greatly
different due to many facts such as people voice change with time, health conditions (e.g.
3
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 6/47
the speaker has a cold), speaking rates, etc. There are also other factors, beyond speaker
variability, that present a challenge to speaker recognition technology. Examples of these
are acoustical noise and variations in recording environments.
(. Speech )eature *+traction
The purpose of this module is to convert the speech waveform to some type of parametric
representation (at a considerably lower information rate) for further analysis and
processing. This is often referred as the signal"processing front end.
The speech signal is a slowly timed varying signal (it is called #uasi"stationary). An
example of speech signal is shown in &igure :. /hen examined over a sufficiently short
period of time (between 3 and DD msec), its characteristics are fairly stationary.
owever, over long periods of time (on the order of 3 seconds or more) the signal
characteristic change to reflect the different speech sounds being spoken. Therefore,
short"time spectral analysis is the most common way to characteri$e the speech signal.
A wide range of possibilities exist for parametrically representing the speech signal for
the speaker recognition task, such as inear -rediction %oding ( -%), *el"&re#uency
%epstrum %oefficients (*&%%), and others. *&%% is perhaps the best known and most
popular, and these will be used in this proBect.
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 7/47
&igure :. An example of speech signal
*&%%>s are based on the known variation of the human ear>s critical bandwidths with
fre#uency, filters spaced linearly at low fre#uencies and logarithmically at high
fre#uencies have been used to capture the phonetically important characteristics of
speech. This is expressed in the mel"fre#uency scale, which is a linear fre#uency spacing
below DDD $ and a logarithmic spacing above DDD $. The process of computing
*&%%s is described in more detail next.
,.-e"#'re uenc cepstrum coe''icients processor
F
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 8/47
A block diagram of the structure of an *&%% processor is given in &igure ;. The speech
input is typically recorded at a sampling rate above DDDD $. This sampling fre#uency
was chosen to minimi$e the effects of aliasing in the analog"to"digital conversion. These
sampled signals can capture all fre#uencies up to 3 k $, which cover most energy of
sounds that are generated by humans. As been discussed previously, the main purpose of
the *&%% processor is to mimic the behavior of the human ears. !n addition, rather than
the speech waveforms themselves, *&&%>s are shown to be less susceptible to mentioned
variations.
)igure (. 0"ock diagram o' the -)CC processor
,.1)rame 0"ocking
!n this step the continuous speech signal is blocked into frames of 0 samples, with
adBacent frames being separated by * (* G 0). The first frame consists of the first 0
samples. The second frame begins * samples after the first frame, and overlaps it by 0
"* samples. 1imilarly, the third frame begins :* samples after the first frame (or *samples after the second frame) and overlaps it by 0 ":* samples. This process
continues until all the speech is accounted for within one or more frames. Typical values
for 0 and * are 0 H :3 (which is e#uivalent to I ;D =m>sec windowing and facilitate the
fast radix": &&T) and * H DD.
2
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 9/47
,.% Windo$ing
The next step in the processing is to window each individual frame so as to minimi$e the
signal discontinuities at the beginning and end of each frame. The concept here is to
minimi$e the spectral distortion by using the window to taper the signal to $ero at the
beginning and end of each frame. !f we define the window as w(n),D J n J 0 K , where 0 is the number of samples in each frame, then the result of windowing is the signal
)()()( nwn xn y = , D n N "
Typically the amming window is used, which has the form9
The next processing step is the &ast &ourier Transform, which converts each frame of 0
samples from the time domain into the fre#uency domain. The &&T is a fast algorithm to
implement the iscrete &ourier Transform ( &T) which is defined on the set of 0
samples LxnM, as follow9
0ote that we use B here to denote the imaginary unit, i.e.BH − !n general Nn>s are
complex numbers. The resulting se#uence LNnM is interpreted as follow9 the $ero
fre#uency corresponds to n H D, positive fre#uencies D G f G & s : correspond to values
J n J 0 : K , while negative fre#uencies K & s : G f G D correspond to 0 : 5 J n J 0K . ere, &s denotes the sampling fre#uency. The result obtained after this step is often
referred to as signal>s spectrum or periodogram.
7
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 10/47
,.( -e"#'re uenc $rapping
As mentioned above, psychophysical studies have shown that human perception of the
fre#uency contents of sounds for speech signals does not follow a linear scale. Thus for
each tone with an actual fre#uency, f, measured in $, a subBective pitch is measured on a
scale called the =*el> scale. The *el"fre#uency scale is linear fre#uency spacing belowDDD $ and a logarithmic spacing above DDD $. As a reference point, the pitch of a
k $ tone, @D dC above the perceptual hearing threshold, is defined as DDD *els.
Therefore we can use the following approximate formula to compute the *els for a given
fre#uency f in $9 8ne approach to simulating
the subBective spectrum is to use a filter bank, one filter for each desired *el"fre#uency
component (see &igure @). That filter bank has a triangular band pass fre#uency response,
and the spacing as well as the bandwidth is determined by a constant *el"fre#uency
interval. The modified spectrum of 1 (O) thus consists of the output power of these filters
when 1 (O) is the input. The number of *el spectrum coefficients, P, is typically chosen
as :D.
0ote that this filter bank is applied in the fre#uency domain+ therefore it simply amounts
to taking those triangle"shape windows in the &igure @ on the spectrum. A useful way of
thinking about this *el"wrapping filter bank is to view each filter as a histogram bin
(where bins have overlap) in the fre#uencydomain.
D
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 11/47
)igure ,. An e+amp"e o' -e"#spaced 'i"ter 2ank
4.4Cepstrum
!n this final step, we convert the log *el spectrum back to time. The result is called the
*el fre#uency cepstrum coefficients (*&%%). The cepstral representation of the speech
spectrum provides a good representation of the local spectral properties of the signal for
the given frame analysis. Cecause the *el spectrum coefficients (and so their logarithm)
are real numbers, we can convert them to the time domain using the iscrete %osine
Transform ( %T). Therefore if we denote those *el power spectrum coefficients that are
the result of the last step are we can calculate the *&%%>s as
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 12/47
0ote that we exclude the first component, c D, from the %T since it represents the mean
value of the input signal which carried little speaker specific information.
3. )eature -atching
The problem of speaker recognition belongs to a much broader topic in scientific and
engineering so called pattern recognition. The goal of pattern recognition is to classify
obBects of interest into one of a number of categories or classes. The obBects of interest
are generically called patterns and in our case are se#uences of acoustic vectors that are
extracted from an input speech using the techni#ues described in the previous section.
The classes here refer to individual speakers. 1ince the classification procedure in our
case is applied on extracted features, it can be also referred to as feature matching.
&urthermore, if there exists some set of patterns that the individual classes of which are
already known, then one has a problem in supervised pattern recognition. This is exactly
our case since during the training session, we label each input speech with the ! of the
speaker (1 to 12). These patterns comprise the training set and are used to derive a
classification algorithm. The remaining patterns are then used to test the classification
algorithm+ these patterns are collectively referred to as the test set. !f the correct classes
of the individual patterns in the test set are also known, then one can evaluate the
performance of the algorithm.
The state"of"the"art in feature matching techni#ues used in speaker recognition includes
ynamic Time /arping ( T/), idden *arkov *odeling ( **), and ?ector
Quanti$ation (?Q). !n this proBect, the ?Q approach will be used, due to ease of
implementation and high accuracy. ?Q is a process of mapping vectors from a large
vector space to a finite number of regions in that space. Each region is called a cluster
and can be represented by its center called a codeword. The collection of all codewords is
called a codebook.
:
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 13/47
&igure 3 shows a conceptual diagram to illustrate this recognition process. !n the figure,
only two speakers and two dimensions of the acoustic space are shown. The circles refer
to the acoustic vectors from the speaker while the triangles are from the speaker :. !n
the training phase, a speaker"specific ?Q codebook is generated for each known speaker
by clustering his her training acoustic vectors. The result codewords (centroids) are
shown in &igure 3 by black circles and black triangles for speaker and :, respectively.
The distance from a vector to the closest codeword of a codebook is called a ?Q"
distortion. !n the recognition phase, an input utterance of an unknown voice is Rvector"
#uanti$edS using each trained codebook and the total ?Q distortion is computed. The
speaker corresponding to the ?Q codebook with smallest total distortion is identified.
&igure 3. %onceptual diagram illustrating vector #uanti$ation codebook formation. 8ne
speaker can be discriminated from another based of the location of
centroids.
3.1 C"ustering the Training 4ectors
After the enrolment session, the acoustic vectors extracted from input speech of a speaker
provide a set of training vectors. As described above, the next important step is to build a
speaker"specific ?Q codebook for this speaker using those training vectors. There is a
;
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 14/47
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 15/47
procedure. R%ompute (distortion)S sums the distances of all training vectors in the
nearest"neighbor search so as to determine whether the procedure has converged.
&igure . &low diagram of the C algorithm (Adapted from <abiner and Xuang, 77;)
5.Imp"ementation
3
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 16/47
As stated above, in this proBect we will experience the building and testing of an
automatic speaker recognition system. !n order to implement such a system, one must go
through several steps which were described in details in previous sections. 0ote that
many of the above tasks are already implemented in *atlab. &urthermore, to ease the
development process, we supplied you with two utility functions9 melfb and disteu+ and
two main functions9 train and test. ownload all of those files into your working folder.
The first two files can be treated as a black box, but the later two needs to be thoroughly
understood. !n fact, your tasks are to write two missing functions9 mfcc and v#lbg, which
will be called from the given main functions. !n order to accomplish that, follow each
step in this section carefully and answer all the #uestions.
Speech Data
%lick here to down"load the Y!- file of the speech database. After un$ipping the file
correctly, you will find two folders, T<A!0 and TE1T, each contains 2 files, named9
1 ./A?, 1:./A?, Z, 12./A?+ each is labeled after the ! of the speaker. These files
were recorded in *icrosoft /A? format. !n /indows systems, you can listen to the
recorded sounds by double clicking into the files.
8ur goal is to train a voice model (or more specific, a ?Q codebook in the *&%% vector
space) for each speaker 1 "12 using the corresponding sound file in the T<A!0 folder.After this training step, the system would have knowledge of the voice characteristic of
each (known) speaker. 0ext, in the testing phase, the system will be able to identify the
(assumed unknown) speaker of each sound file in the TE1T folder.
Question 9 -lay each sound file in the T<A!0 folder. %an you distinguish the voices of
those eight speakers[ 0ow play each sound in the TE1T folder in a random order without
looking at the file name (pretending that you do not known the speaker) and try to
identify the speaker using your knowledge of their voices that you Bust learned from the
T<A!0 folder. This is exactly what the computer will do in our system. /hat is your
(human performance) recognition rate[ <ecord this result so that it could be later on
compared against the computer performance of our system.
Speech &rocessing
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 17/47
!n this phase you are re#uired to write a *atlab function that reads a sound file and turns
it into a se#uence of *&%% (acoustic vectors) using the speech processing steps
described previously. *any of those tasks are already provided by either standard or our
supplied *atlab functions. The *atlab functions that you would need to use are9
wavread, hamming, fft, dct and melfb (supplied function). Type help function name at the
*atlab prompt for more information about a function.
Question :9 <ead a sound file into *atlab. %heck it by playing the sound file in *atlab
using the function9 sound. /hat is the sampling rate[ /hat is the highest fre#uency that
the recorded sound can capture with fidelity[ /ith that sampling rate, how many msecs
of actual speech are contained in a block of :3 samples[
-lot the signal to view it in the time domain. !t should be obvious that the raw data in the
time domain has a very high amount of data and it is difficult for analy$ing the voice
characteristic. 1o the motivation for this step (speech feature extraction) should be clear
now\
0ow cut the speech signal (a vector) into frames with overlap (refer to the frame section
in the theory part). The result is a matrix where each column is a frame of 0 samples
from original speech signal. Applying the steps9 /indowing and &&T to transform the
signal into the fre#uency domain. This process is used in many different applications andis referred in literature as /indowed &ourier Transform (/&T) or 1hort"Time &ourier
Transform (1T&T). The result is often called as the spectrum or periodogram.
Question ;9 After successfully running the preceding process, what is the interpretation of
the result[ %ompute the power spectrum and plot it out using the imagesc command.
0ote that it is better to view the power spectrum on the log scale. ocate the region in the
plot that contains most of the energy. Translate this location into the actual ranges in time
(msec) and fre#uency (in $) of the input speech signal.
Question @9 %ompute and plot the power spectrum of a speech file using different frame
si$e9 for example 0 H :2, :3 and 3 :. !n each case, set the frame increment * to be
about 0 ;. %an you describe and explain the differences among those spectra[
F
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 18/47
The last step in speech processing is converting the power spectrum into mel"fre#uency
cepstrum coefficients. The supplied function melfb facilitates this task.
Question 39 Type help melfb at the *atlab prompt for more information about this
function. &ollow the guidelines to plot out the mel"spaced filter bank. /hat is the
behavior of this filter bank[ %ompare it with the theoretical part.
&inally, put all the pieces together into a single *atlab function, mfcc, which performs
the *&%% processing.
Question 9 %ompute and plot the spectrum of a speech file before and after the mel"
fre#uency wrapping step. escribe and explain the impact of the melfb program.
4ector 6uanti7ation
The result of the last section is that we transform speech signals into vectors in an
acoustic space. !n this section, we will apply the ?Q"based pattern recognition techni#ue
to build speaker reference models from those vectors in the training phase and then can
identify any se#uences of acoustic vectors uttered by unknown speakers.
Question F9 To inspect the acoustic space (*&%% vectors) we can pick any two
dimensions (say the 3 th and the th) and plot the data points in a : plane. Wse acoustic
vectors of two different speakers and plot data points in two different colors. o the data
regions from the two speakers overlap each other[ Are they in clusters[
0ow write a *atlab function, v#lbg that trains a ?Q codebook using the C algorithm
described before. Wse the supplied utility function disteu to compute the pairwise
Euclidean distances between the codewords and training vectors in the iterative process.
Question 29 -lot the data points of the trained ?Q codewords using the same two
dimensions over the plot from the last #uestion
%ompare this with &igure 3. .
2
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 19/47
Simu"ation and * a"uation
0ow is the final part\ Wse the two supplied programs9 train and test (which re#uire two
functions mfcc and v#lbg that you Bust wrote) to simulate the training and testing
procedure in speaker recognition system, respectively.
Question 79 /hat is recognition rate our system can perform[ %ompare this with the
human performance. &or the cases that the system makes errors, re"listen to the speech
files and try to come up with some explanations.
&igure 9 plot of signal s .wav
Question D (optional)9 6ou can also test the system with your own speech files. Wse the
/indow>s program 1ound <ecorder to record more voices from yourself and your friends. Each new speaker needs to provide one speech file for training and one for
testing. %an the system recogni$e your voice[ EnBoy\
&igure 9 plot of signal s .wav
7
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 20/47
-ower 1pectrum (*H DD,0H:3 )
&igure :.a9 power spectrum ( *H DD, 0H:3 )
&igure :.b9 ogarithmic -ower 1pectrum (*H DD,0H:3 )
:D
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 21/47
&igure ;.a9 -ower 1pectrum (*H@;,0H :2,framesHF F)
&igure ;.b9 -ower 1pectrum (*H23,0H:3 ,framesH;2F)
:
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 22/47
&igure ;.c9 -ower 1pectrum (*H F ,0H3 :,framesH 7 )
&igure @9 *el"1paced &ilterbank
::
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 23/47
&igure 3.a9 -ower 1pectrum Wnmodified
&igure 3.b9 -ower 1pectrum *odified through *el %epstrum filter
:;
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 24/47
&igure 9 : plot of accoustic vectors
:@
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 25/47
:3
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 26/47
&igure F9 : plot of accoustic vectors
:
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 27/47
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 28/47
&igure 2.b9 : plot of accoustic vectors
8.A&&*NDI9
INTRODUCTION TO -AT:A0
*AT AC (short for *atrix aboratory) is a special purpose computer program
optimi$ed to perform engineering and scientific caluculation.!t started life as a program
designed to perform matrix mathematic, but over the years it has grown into a flexible
computing system capable of solving essentially any technical problem. The *AT AC
program implements the *AT AC programming language and provides an exclusive
library of predefined function that make technical programming takes easier and more
efficient.
*AT AC is a huge program, with an incredibly rich variety of function. Even the basic
version of *AT AC without any toolkits is much richer than other technical
programming language. There are more than DDD functions in the basic *AT AC
product alone, and the toolkits extend this capability with any more functions in various
specialties.Ad antages o' -AT:A0;
*AT AC has many advantages compared with conventional computer
anguages for technical problem solving. Among them are the following9
*rase o' use 9 *AT AC is an interpreted language, like many version of
CA1!%. The program can be used as a scratch pad to evaluate expressions typed at the
:2
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 29/47
command line, or it can be used to execute large prewritten programs. -rogram may be
easily written and modified with the built"in integrated development environment and
debugged with *AT AC debugger.
&"at'orm independence 9
*AT AC is a supported on much different computer system, providing a large measure
of platform independence.
&rede'ined 'unction 9 *AT AC comes complete with an extensive library of
predefined functions that provide tested and prepackaged solutions to many basic
*AT AC language+ many special purpose toolboxes are available to help solve complex
problems in specific areas.
De ice independent p"otting 9 unlike most other computer languages,
*AT AC has many integral plotting and imaging commands. the plots and images can be displayed on many graphical output device supported by the computer on which
*AT AC is running. This capability makes *AT AC an outstanding tool for visuali$ing
technical data.
<raphica" user inter'ace 9 *atlab includes tools that allow a programmer to
interactively construct a W! for his her program.
-AT:A0 compi"er 9 *AT AC>s flexibility and platform independence is
achieved by compiling *AT AC program into a device"independent p"code, and then
interpreting the p"codes instructions at run time .a separate *AT AC compiler is
available. This compiler can compile a *AT AC program into a executable program that
run faster than the interpreted code.
-AT:A0 Commands used in source code;
:*N<T= ; ength of vector.
E0 T (N) returns the length of vector N. !t is e#uivalent
to *AN(1!YE(N)) for non"empty arrays and D for empty ones.
):OOR; <ound towards minus infinity.
& 88<(N) rounds the elements of N to the nearest integers
towards minus infinity.
:7
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 30/47
=A--IN<; amming window.
A**!0 (0) returns the 0"point symmetric amming window in a column vector.
A**!0 (0,1& A ) generates the 0"point amming window using 1& A
window
sampling. 1& A may be either ]symmetric] or ]periodic]. Cy default, a symmetric
window is returned.
DIA< 9 iagonal matrices and diagonals of a matrix.
!A (?,P) when ? is a vector with 0 components is a s#uare matrix of order
05AC1(P) with the elements of ? on the P"th diagonal. P H D is the main diagonal, P
^ D is above the main diagonal and P G D is below the main diagonal.
!A (?) is the same as !A (?,D) and puts ? on the main diagonal. !A (N,P)
when N is a matrix is a column vector formed from the elements of the P"th diagonalof N.
!A (N) is the main diagonal of N. !A ( !A (N)) is a diagonal matrix.
Example
m H 3+
diag("m9m) 5 diag(ones(:_m, ), ) 5 diag(ones(:_m, )," )
produces a tridiagonal matrix of order :_m5 .
))T; iscrete &ourier transform.
&&T(N) is the discrete &ourier transform ( &T) of vector N. &or matrices, the &&T
operation is applied to each column. &or 0" arrays, the &&T operation operates on the
first non"singleton dimension.
&&T(N,0) is the 0"point &&T, padded with $eros if N has less
than 0 points and truncated if it has more.
&&T(N, U, !*) or &&T(N,0, !*) applies the &&T operation across the dimension
!*.
&or length 0 input vector x, the &T is a length 0 vector N,
with elements 0
;D
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 31/47
N(k) H sum x(n)_exp("B_:_pi_(k" )_(n" ) 0), GH k GH 0.
nH
The inverse &T (computed by !&&T) is given by
0
x(n) H ( 0) sum N(k)_exp( B_:_pi_(k" )_(n" ) 0), GH n GH 0.
kH
WA4R*AD; <ead *icrosoft /A?E ('.wav') sound file.
6H/A?<EA (&! E) reads a /A?E file specified by the string &! E, returning the
sampled data in 6. The '.wav' extension is appended if no extension is given.
Amplitude values are in the range " ,5 U.
6,&1,0C!T1UH/A?<EA (&! E) returns the sample rate (&1) in ert$ and the
number of bits per sample (0C!T1) used to encode the data in the file....UH/A?<EA (&! E,0) returns only the first 0 samples from each channel in the
file.
...UH/A?<EA (&! E, 0 0:U) returns only samples 0 through 0: from each
channel in the file.
1!YH/A?<EA (&! E,]si$e]) returns the si$e of the audio data contained in the file in
place of the actual audio data, returning the vector 1!YH samples channelsU.
6,&1,0C!T1,8-T1UH/A?<EA (...) returns a structure 8-T1 of additional
information contained in the /A? file. The content of this structure differs from file to
file. Typical structure fields include ].fmt] (audio format information) and ].info] (text
which may describe subBect title, copy right, etc.)
1upports multi"channel data, with up to ;: bits per sample.
DIS& 9 isplay array.
!1-(N) displays the array, without printing the array name. !n all other ways it]s the
same as leaving the semicolon off an expression except that empty arrays don]t display.
!f N is a string, the text is displayed.
A9IS; %ontrol axis scaling and appearance.
AN!1( N*!0 N*AN 6*!0 6*ANU) sets scaling for the x" and y"axes on the current
plot.
;
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 32/47
AN!1( N*!0 N*AN 6*!0 6*AN Y*!0 Y*ANU) sets the scaling for the x", y" and
$"axes on the current ;" plot.
AN!1( N*!0 N*AN 6*!0 6*AN Y*!0 Y*AN %*!0 %*ANU) sets the scaling
for the x", y", $"axes and color scaling limits on the current axis (see %AN!1).
? H AN!1 returns a row vector containing the scaling for the current plot. !f the current
view is :" , ? has four components+ if it is ;" , ? has six components.
AN!1 AWT8 returns the axis scaling to its default, automatic
mode where, for each dimension, ]nice] limits are chosen based on the extents of all
line, surface, patch, and image children.
AN!1 *A0WA free$es the scaling at the current limits, so that if 8 is turned on,
subse#uent plots will use the same limits.
AN!1 T! T sets the axis limits to the range of the data.
AN!1 !X puts *AT AC into its 'matrix' axes mode. The coordinate system origin is
at the upper left corner. The i axis is vertical and is numbered from top to bottom. The
B axis is hori$ontal and is numbered from left to right.
AN!1 N6 puts *AT AC into its default '%artesian' axes mode. The coordinate
system origin is at the lower left corner. The x axis is hori$ontal and is numbered from
left to right. The y axis is vertical and is numbered from bottom to top.
AN!1 EQWA sets the aspect ratio so that e#ual tick mark increments on the x",y" and
$"axis are e#ual in si$e. This makes 1- E<E(:3) look like a sphere, instead of an
ellipsoid.
AN!1 !*A E is the same as AN!1 EQWA except that the plot box fits tightly around
the data.
AN!1 1QWA<E makes the current axis box s#uare in si$e.
AN!1 08<*A restores the current axis box to full si$e and
removes any restrictions on the scaling of the units.
This undoes the effects of AN!1 1QWA<E and AN!1 EQWA .AN!1 ?!1; free$es aspect ratio properties to enable rotation of ;" obBects and
overrides stretch"to"fill.
AN!1 8&& turns off all axis labeling, tick marks and background.
AN!1 80 turns axis labeling, tick marks and background back on.
;:
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 33/47
I-A<*SC 9 1cale data and display as image.
!*A E1%(...) is the same as !*A E(...) except the data is scaled to use the full
colormap.
!*A E1%(...,% !*) where % !* H % 8/ % ! U can specify the scaling.
CO:OR0AR 9 isplay color bar (color scale).
%8 8<CA<(]vert]) appends a vertical color scale to the current axes.
%8 8<CA<(]hori$]) appends a hori$ontal color scale.
%8 8<CA<( ) places the colorbar in the axes . The colorbar will be hori$ontal if
the axes width ^ height (in pixels).
%8 8<CA< without arguments either adds a new vertical color scale or updates an
existing colorbar. H %8 8<CA<(...) returns a handle to the colorbar axes.
%8 8<CA<(...,]peer],AN) creates a colorbar associated with axes AN instead of the
current axes.
<*T; et obBect properties.
? H ET( ,]-roperty0ame]) returns the value of the specified
property for the graphics obBect with handle . !f is a vector of handles, then get
will return an *"by" cell array of values where * is e#ual to length( ). !f
]-roperty0ame] is replaced by a "by"0 or 0"by" cell array of strings containing
property names, then ET will return an *"by"0 cell array of values.
ET( ) displays all property names and their current values for the graphics obBect
with handle .
? H ET( ) where is a scalar, returns a structure where each field name is the name
of a property of and each field contains the value of that property.
? H ET(D, ]&actory])
? H ET(D, ]&actoryG8bBectType^])
? H ET(D, ]&actoryG8bBectType^G-roperty0ame^])
returns for all obBect types the factory values of all properties which have user"settable
default values.
? H ET( , ] efault])
;;
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 34/47
? H ET( , ] efaultG8bBectType^])
? H ET( , ] efaultG8bBectType^G-roperty0ame^])
returns information about default property values ( must be scalar). ] efault] returns a
list of all default property values currently set on . ] efaultG8bBectType^] returns
only the defaults for properties of G8bBectType^ set on .
] efaultG8bBectType^G-roperty0ame^] returns the default value for the specific
property, by searching the defaults set on and its ancestors, until that default is found.
!f no default value for this property has been set on or any ancestor of up through
the root, then the factory value for that property is returned.
efaults can not be #ueried on a descendant of the obBect, or on the obBect itself " for
example, avalue for efaultAxes%olor] can not be #ueried on an axes or an axes
child, but can be #ueried on a figure or on the root. /hen using the ]&actory] or
] efault] ET, if -roperty0ame is omitted then the return value will take the form of astructure in which each field name is a property name and the corresponding value is
the value of that property.!f -roperty0ame is specified then a matrix or string value
will be returned.
S*T; 1et obBect properties.
1ET( ,]-roperty0ame],-roperty?alue) sets the value of the specified property for the
graphics obBect with handle . can be a vector of handles, in which case 1ET sets the
properties] values for all the obBects. 1ET( ,a) where a is a structure whose field names
are obBect property names, sets the properties named in each field name with the values
contained in the structure.
1ET( ,pn,pv) sets the named properties specified in the cell array of strings pn to the
corresponding values in the cell array pv for all obBects specified in . The cell array
pn must be "by"0, but the cell array pv can be *"by"0 where * is e#ual to length( )
so that each obBect will be updated with a different set of values for the list of property
names contained in pn.
1ET( ,]-roperty0ame ],-roperty?alue ,]-roperty0ame:],-roperty?alue:,...) sets
multiple property values with a single statement. 0ote that it is permissible to use
property value string pairs, structures, and property value cell array pairs in the same
call to 1ET.
A H 1ET( , ]-roperty0ame])
;@
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 35/47
1ET( ,]-roperty0ame])
returns or displays the possible values for the specified property of the obBect with
handle . The returned array is a cell array of possible value strings or an empty cell
array if the property does not have a finite set of possible string values.
A H 1ET( )
1ET( )
returns or displays all property names and their possible values for the obBect with
handle . The return value is a structure whose field names are the property names of
, and whose values are cell arrays of possible property values or empty cell arrays.
The default value for an obBect property can be set on any of an obBect]s ancestors by
setting the -roperty0ame formed by concatenating the string ] efault], the obBect type,
and the property name. &or example, to set the default color of text obBects to red in
the current figure window9set(gcf,] efaultText%olor],]red])
efaults can not be set on a descendant of the obBect, or on the
obBect itself " for example, a value for ] efaultAxes%olor] can not be set on an axes or
an axes child, but can be set on a figure or on the root.
Three strings have special meaning for -roperty?alues9
]default] " use default value (from nearest ancestor) ]factory] " use factory default value
]remove] " remove default value.
ROUND; <ound towards nearest integer.
<8W0 (N) rounds the elements of N to the nearest integers.
SI>*; 1i$e of array.
H 1!YE(N), for *"by"0 matrix N, returns the two"element row vector H *, 0U
containing the number of rows and olumns in the matrix. &or 0" arrays,1!YE(N)
returns a "by"0 vector of dimension lengths.Trailing singleton dmensions are
ignored.
*,0U H 1!YE(N) for matrix N, returns the number of rows and columns in N as
separate output variables.
* ,*:,*;,...,*0U H 1!YE(N) returns the si$es of the first 0
;3
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 36/47
dimensions of array N. !f the number of output arguments 0 does not e#ual
0 !*1(N), then for9
0 ^ 0 !*1(N), si$e returns ones in the 'extra' variables,
i.e., outputs 0 !*1(N)5 through 0.
0 G 0 !*1(N), *0 contains the product of the si$es of the remaining dimensions,
i.e., dimensions 05 through 0 !*1(N).
* H 1!YE(N, !*) returns the length of the dimension specified by the scalar !*.
&or example, 1!YE(N, ) returns the number of rows. /hen 1!YE is applied to a Xava
array, the number of rows returned is the length of the Xava array and the number of
columns is always . /hen 1!YE is applied to a Xava array of arrays, the result
describes only the top level array in the array of arrays.
S&RINT) 9 /rite formatted data to string.1,E<<*1 U H 1-<!0T&(&8<*AT,A,...) formats the data in the real part of matrix A
(and in any additional matrix arguments), under control of the specified &8<*AT
string, and returns it in the *AT AC string variable 1. E<<*1 is an optional output
argument that returns an error message string if an error occurred or an empty matrix if
an error did not occur. 1-<!0T& is the same as &-<!0T& except that it returns the data
in a *AT AC string variable rather than writing it to a file.
&8<*AT is a string containing % language conversion specifications. %onversion
specifications involve the character 4, optional flags, optional width and precision
fields, optional subtype specifier, and conversion characters d, i, o, u, x, N, f,e,E, g, ,
c, and s.
The special formats `n,`r,`t,`b,`f can be used to produce linefeed,carriage return, tab,
backspace, and formfeed characters respectively.
Wse `` to produce a backslash character and 44 to produce the percentcharacter.
1-<!0T& behaves like A01! % with certain exceptions and extensions.These include9
A01! % re#uires an integer cast of a double argument to correctly use an integer
conversion specifier like d. A similiar conversion is re#uired when using such a
specifier with non"integral *AT AC values. Wse &!N, & 88<, %E! or <8W0 on a
double argument to explicitly convert non"integral *AT AC values to integral values
;
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 37/47
if you plan to use an integer conversion specifier like d.8therwise, any non"integral
*AT AC values will be outputted using the format where the integer conversion
specifier letter has been replaced by the following non"standard subtype specifiers are
supported for conversion characters o, u, x, and N.
T " The underlying % datatype is a float rather than an unsigned integer.
b" The underlying % datatype is a double rather than an unsigned integer.
&or example, to print out in hex a double value use a format like ]4bx].
1-<!0T& is 'vectori$ed' for the case when A is nonscalar. The format string is
recycled through the elements of A (columnwise) until all the elements are used up. !t
is then recycled in a similar manner through any additional matrix arguments.
Examples9
sprintf(]4D.3g],( 5s#rt(3)) :) . 2
sprintf(]4D.3g], eps) @.3D; e5 3sprintf(]4 3.3f], eps) @3D;377 :F;FD@7 .DDDDD
sprintf(]4d],round(pi)) ;
sprintf(]4s],]hello]) hello
sprintf(]The array is 4dx4d.],:,;) The array is :x;.
sprintf(]`n]) is the line termination character on all platforms.
-*:)0; etermine matrix for a mel"spaced filterbank
!nputs9 p number of filters in filterbank n length of fft fs sample rate in $
8utputs9 x a (sparse) matrix containing the filterbank amplitudes
si$e(x) H p, 5floor(n :)U
Wsage9 &or example, to compute the mel"scale spectrum of a
colum"vector signal s, with length n and sample rate fs9
f H fft(s)+
m H melfb(p, n, fs)+
n: H 5 floor(n :)+
$ H m _ abs(f( 9n:)). :+
$ would contain p samples of the desired mel"scale spectrum
To plot filterbanks e.g.9
plot(linspace(D, ( :3DD :), :7), melfb(:D, :3 , :3DD)]),
;F
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 38/47
title(]*el"spaced filterbank]), xlabel(]&re#uency ( $)])+
A0S; Absolute value.
AC1(N) is the absolute value of the elements of N. /hen N is complex, AC1(N) is the
complex modulus (magnitude) of the elements of N.
&:OT; inear plot.
- 8T(N,6) plots vector 6 versus vector N. !f N or 6 is a matrix,then the vector is
plotted versus the rows or columns of the matrix,whichever line up. !f N is a scalar and
6 is a vector, length(6)disconnected points are plotted. - 8T(6) plots the columns of
6 versus their index. !f 6 is complex, - 8T(6) is e#uivalent to
8T(real(6),imag(6)). !n all other uses of - 8T, the imaginary part is ignored.
?arious line types, plot symbols and colors may be obtained with - 8T(N,6,1) where1 is a character string made from one element from any or all the following ; columns9
b blue . point " solid
g green o circle 9 dotted
r red x x"mark ". dashdot
c cyan 5 plus "" dashed
m magenta _ star
y yellow s s#uare
k black d diamond
v triangle (down)
triangle (up)
G triangle (left)
^ triangle (right)
p pentagram
h hexagram
&or example, - 8T(N,6,]c59]) plots a cyan dotted line with a plus at each data point+
- 8T(N,6,]bd]) plots blue diamond at each data point but does not draw any line.
- 8T(N ,6 ,1 ,N:,6:,1:,N;,6;,1;,...) combines the plots defined by the (N,6,1)
triples, where the N]s and 6]s are vectors or matrices and the 1]s are strings.
&or example, - 8T(N,6,]y"],N,6,]go]) plots the data twice, with a solid yellow line
interpolating green circles at the data points. The - 8T command, if no color is
;2
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 39/47
specified, makes automatic use of the colors specified by the axes %olor8rder property.
The default %olor8rder is listed in the table above for color systems where the default
is blue for one line, and for multiple lines, to cycle through the first six colors in the
table. &or monochrome systems, - 8T cycles over the axes ine1tyle8rder property.
- 8T returns a column vector of handles to !0E obBects, one handle per line.
SU0&:OT 9 %reate axes in tiled positions.
H 1WC- 8T(m,n,p), or 1WC- 8T(mnp), breaks the &igure window into an m"by"n
matrix of small axes, selects the p"th axes for for the current plot, and returns the axis
handle. The axes are counted along the top row of the &igure window, then the second
row, etc. &or example,
1WC- 8T(:, , ), - 8T(income)1WC- 8T(:, ,:), - 8T(outgo)
plots income on the top half of the window and outgo on the bottom half.
1WC- 8T(m,n,p), if the axis already exists, makes it current.
1WC- 8T(m,n,p,]replace]), if the axis already exists, deletes it and creates a new axis.
1WC- 8T(m,n,-), where - is a vector, specifies an axes position that covers all the
subplot positions listed in -.
1WC- 8T( ), where is an axis handle, is another way of making an axis current for
subse#uent plotting commands.
1WC- 8T(]position], left bottom width heightU) creates an axis at the specified
position in normali$ed coordinates (in the range from D.D to .D).
!f a 1WC- 8T specification causes a new axis to overlap an existing axis, the existing
axis is deleted " unless the position of the new and existing axis are identical. &or
example, the statement 1WC- 8T( ,:, ) deletes all existing axes overlapping the left
side of the &igure window and creates a new axis on that side " unless there is an axes
there with a position that exactly matches the position of the new axes (and ]replace]
was not specified), in which case all other overlapping axes will be deleted and the
matching axes will become the current axes.
1WC- 8T( ) is an exception to the rules above, and is not
identical in behavior to 1WC- 8T( , , ). &or reasons of backwards compatibility, it is
a special case of subplot which does not immediately create an axes, but instead sets up
;7
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 40/47
the figure so that the next graphics command executes % & <E1ET in the figure
(deleting all children of the figure), and creates a new axes in the default position. This
syntax does not return a handle, so it is an error to specify a return argument. The
delayed % & <E1ET is accomplished by setting the figure]s 0ext-lot to ]replace].
=O:D; old current graph.
8 80 holds the current plot and all axis properties so that subse#uent graphing
commands add to the existing graph.
8 8&& returns to the default mode whereby - 8T commands erase the previous
plots and reset all axis properties before drawing new plots.
8 , by itself, toggles the hold state.
8 does not affect axis autoranging properties.
Algorithm note9
8 80 sets the 0ext-lot property of the current figure and axes to 'add'.
8 8&& sets the 0ext-lot property of the current axes to 'replace'.
46:0<; ?ector #uanti$ation using the inde"Cu$o" ray algorithme
!nputs9 d contains training data vectors (one per column) k is number of centroids
re#uired
8utput9 r contains the result ?Q codebook (k columns, one for each centroids)
:*<*ND 9 raph legend.
E E0 (string ,string:,string;, ...) puts a legend on the current plot using the
specified strings as labels. E E0 works on line graphs, bar graphs, pie graphs,
ribbon plots, etc. 6ou can label any solid"colored patch or surface obBect. The fontsi$e
and fontname for the legend strings matches the axes fontsi$e and fontname.
E E0 ( ,string ,string:,string;, ...) puts a legend on the plot containing the handles
in the vector using the specified strings as labels for the corresponding handles.
E E0 (*), where * is a string matrix or cell array of strings, and E E0 ( ,*)
where is a vector of handles to lines and patches also works.
E E0 (AN,...) puts a legend on the axes with handle AN.
@D
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 41/47
E E0 8&& removes the legend from the current axes.
E E0 (AN,]off]) removes the legend from the axis AN.
E E0 ! E makes legend invisible.
E E0 (AN,]hide]) makes legend on axis AN invisible.
E E0 1 8/ makes legend visible.
E E0 (AN,]show]) makes legend on axis AN visible.
E E0 C8N8&& sets appdata property legendboxon to ]off] making legend
background box invisible when the legend is visible.
E E0 (AN,]boxoff]) sets appdata property legendboxon to ]off for axis AN making
legend background box invisible when the legend is visible.
E E0 C8N80 sets appdata property legendboxon to ]on] making legend
background box visible when the legend is visible.
E E0 (AN,]boxon]) sets appdata property legendboxon to ]off for axis AN makinglegend background box visible when the legend is visible.
E H E E0 returns the handle to legend on the current axes or empty if none
exists.
E E0 with no arguments refreshes all the legends in the current figure (if any).
E E0 ( E ) refreshes the specified legend.
E E0 (...,-os) places the legend in the specified location9
D H Automatic 'best' placement (least conflict with data)
H Wpper right"hand corner (default)
: H Wpper left"hand corner
; H ower left"hand corner
@ H ower right"hand corner
" H To the right of the plot
To move the legend, press the left mouse button on the legend and drag to the desired
location. ouble clicking on a label allows you to edit the label.E ,8CX ,8WT ,8WT*U H E E0 (...) returns a handle E to the legend
axes+ a vector 8CX containing handles for the text, lines, and patches in the legend+ a
vector 8WT of handles to the lines and patches in the plot+ and a cell array 8WT*
containing the text in the legend.
@
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 42/47
E E0 will try to install a <esi$e&cn on the figure if it hasn]t been defined before.
This resi$e function will try to keep the legend the same si$e.
Examples9
x H D9.:9 :+
plot(x,bessel( ,x),x,bessel(:,x),x,bessel(;,x))+
legend(]&irst],]1econd],]Third])+
legend(]&irst],]1econd],]Third]," )
b H bar(rand( D,3),]stacked])+
colormap(summer)+
hold on
x H plot( 9 D,3_rand( D, ),]marker],]s#uare],]markersi$e], :,...
]markeredgecolor],]y],]markerfacecolor], . D . U,... ]linestyle],]"],]color],]r],]linewidth],:)+hold off legend( b,xU, ]%arrots], ]-eas],] -eppers],] reen Ceans],...
]%ucumbers],]Eggplant])
1peaker <ecognition9 Testing 1tage
!nput9
testdir 9 string name of directory contains all test sound files
n 9 number of test files in testdir
code 9 codebooks of all trained speakers
0ote9
1ound files in testdir is supposed to be9
s .wav, s:.wav, ..., sn.wav
Example9
^^ test(]%9`data`amintest`], 2, code)+
-)CC
!nputs9 s contains the signal to anali$e fs is the sampling rate of the signal
8utput9 r contains the transformed signal
1peaker <ecognition9 Training 1tage
!nput9
traindir 9 string name of directory contains all train sound files
@:
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 43/47
n 9 number of train files in traindir
8utput9
code 9 trained ?Q codebooks, codeLiM for i"th speaker
Example9
^^ code H train(]%9`data`amintrain`], 2)+
DIST*U 9 -airwise Euclidean distances between columns of two matrices
!nput9
x, y9 Two matrices whose each column is an a vector data.
8utput9
d9 Element d(i,B) will be the Euclidean distance between two
column vectors N(9,i) and 6(9,B)
0ote9The Euclidean distance between two vectors N and 6 is9
H sum((x"y). :). D.3
YE<81 Yeros array.
YE<81(0) is an 0"by"0 matrix of $eros.
YE<81(*,0) or YE<81( *,0U) is an *"by"0 matrix of $eros.
YE<81(*,0,-,...) or YE<81( * 0 - ...U) is an *"by"0"by"-"by"... array of $eros.
YE<81(1!YE(A)) is the same si$e as A and all $eros.
C*I:; <ound towards plus infinity.
%E! (N) rounds the elements of N to the nearest integers towards infinity.
-IN ; 1mallest component.
&or vectors, *!0(N) is the smallest element in N. &or matrices, *!0(N) is a row
vector containing the minimum element from each column. &or 0" arrays, *!0(N)
operates along the first non"singleton dimension. 6,!U H *!0(N) returns the indices of
the minimum values in vector !.!f the values along the first non"singleton dimension
contain more than one minimal element, the index of the first one is
returned.*!0(N,6) returns an array the same si$e as N and 6 with the smallest
@;
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 44/47
elements taken from N or 6. Either one can be a scalar. 6,!U H *!0(N, U, !*) operates
along the dimension !*.
/hen complex, the magnitude *!0(AC1(N)) is used, and the angle A0 E(N) is
ignored. 0a0]s are ignored when computing the minimum.
S&ARS* 9 %reate sparse matrix.
1 H 1-A<1E(N) converts a sparse or full matrix to sparse form by s#uee$ing out any
$ero elements.
1 H 1-A<1E(i,B,s,m,n,n$max) uses the rows of i,B,sU to generate an m"by"n sparse
matrix with space allocated for n$max non$eros.The two integer index vectors, i and B,
and the real or complex entries vector, s, all have the same length, nn$, which is the
number of non$eros in the resulting sparse matrix 1 . Any elements of s which have
duplicate values of i and B are added together.There are several simplifications of this six argument call.
1 H 1-A<1E(i,B,s,m,n) uses n$max H length(s).
1 H 1-A<1E(i,B,s) uses m H max(i) and n H max(B).
1 H 1-A<1E(m,n) abbreviates 1-A<1E( U, U, U,m,n,D). This generates the ultimate
sparse matrix, an m"by"n all $ero matrix.The argument s and one of the arguments i or B
may be scalars,in which case they are expanded so that the first three arguments all
have the same length. &or example, this dissects and then reassembles a sparse matrix9
i,B,sU H find(1)+
m,nU H si$e(1)+
1 H sparse(i,B,s,m,n)+
1o does this, if the last row and column have non$ero entries9
i,B,sU H find(1)+
1 H sparse(i,B,s)+
All of *AT AC]s built"in arithmetic, logical and indexing operations can be applied to
sparse matrices, or to mixtures of sparse and full matrices. 8perations on sparse
matrices return sparse matrices and operations on full matrices return full matrices. !n
most cases,operations on mixtures of sparse and full matrices return full matrices. The
exceptions include situations where the result of a mixed operation is structurally
sparse, eg. A ._ 1 is at least as sparse as 1 . 1ome operations, such as 1 ^H D, generate
@@
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 45/47
'Cig 1parse', or 'C1', matrices "" matrices with sparse storage organi$ation but few
$ero elements.
-)CC;
!nputs9 s contains the signal to anali$e fs is the sampling rate of the signal
8utput9 r contains the transformed signal
DCT ; iscrete cosine transform.
6 H %T(N) returns the discrete cosine transform of N.The vector 6 is the same si$e as
N and contains the discrete cosine transform coefficients. 6 H %T(N,0) pads or
truncates the vector N to length 0 before transforming.!f N is a matrix, the %T
operation is applied to each column. This transform can be inverted using ! %T.
-*AN 9 Average or mean value.
&or vectors, *EA0(N) is the mean value of the elements in N. &or matrices,
*EA0(N) is a row vector containing the mean value of each column. &or 0" arrays,
*EA0(N) is the mean value of the elements along the first non"singleton dimension of
N.*EA0(N, !*) takes the mean along the dimension !* of N.
Example9 !f N H D : ; @ 3U
then mean(N, ) is .3 :.3 ;.3U and mean(N,:) is @U
Disad antages o' -AT:A0 9
*AT AC has two principle disadvantages. The first is an interpreted language and
therefore can execute more slowly than compiled languages. This problem can mitigate
by properly structuring the *AT AC program nand by the use of the *AT AC compiler
to compile the final *AT AC program before distribution and general use.
The second disadvantage is cost 9 a fully copy of *AT AC is 3 to D times more
expensive than a conventional % or fortan compiler
@3
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 46/47
.%80% W1!80
1peaker recognition is the process of automatically recogni$ing who is speaking on the basis of individual information included in speech waves. This techni#ue makes it possible to use the speaker>s voice to verify their identity and control access to services
such as voice dialing, banking by telephone, telephone shopping etcZThis proBect dealt with speaker independent system which is developed to operate for any
speaker of a particular type ( e.g. American English ) . These systems are the mostdifficult to develop, most expensive and accuracy is lower than speaker dependent
systems. owever, they are more flexible.Cy applying the procedure described above, for each speech frame of around ;Dmsecwith overlap, a set of mel"fre#uency cepstrum coefficients is computed.These are result of a cosine transform of the logarithm of the short term power
spectrum expressed on a mel" fre#uency scale. This set of coefficients is called anacoustic vector. Therefore each input utterance is transformed into a se#uence of acousticvectors. This proBect also discussed how those acoustic vectors can be used to representand recogni$e the voice characteristic of the speaker.
0I0I:IO<RA&=?
.<. <abiner and C. . Xuang, &undamentals of 1peech
<ecognition, -rentice" all, Englewood %liffs, 0.X., 77;.
@
7/21/2019 Report 123
http://slidepdf.com/reader/full/report-123-56d9d3a504823 47/47
.< <abiner and <./. 1chafer, igital -rocessing of 1peech 1ignals, -rentice"
all, Englewood %liffs, 0.X., 7F2.
6. inde, A. Cu$o <. ray, RAn algorithm for vector #uanti$er designS,
!EEE Transactions on %ommunications, ?ol. :2, pp.2@"73, 72D.
1. &urui, R1peaker independent isolated word recognition using dynamic
features of speech spectrumS, !EEE Transactions on Acoustic, 1peech, 1ignal -rocessing,
?ol. A11-";@, 0o. , pp. 3:"37, &ebruary 72 .
1. &urui, RAn overview of speaker recognition technologyS, E1%A /orkshop
on Automatic 1peaker <ecognition, !dentification and ?erification, pp. "7, 77@.
&.P. 1ong, A.E. <osenberg and C. . Xuang, RA vector #uantisation approach to
speaker recognitionS, AT T Technical Xournal, ?ol. ":, pp. @": , *arch 72F.
comp.speech &re#uently Asked Questions /// site,
http9 svr"www.eng.cam.ac.uk comp.speech