1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. © machine-learning based classification of speech and music...

1.2.3.4.5.6.7.8.9.10.11.12.

©

Machine-Learning Based Classification Of Speech And Music

Khan, MKS; Al-Khatib, WG

SPRINGER, MULTIMEDIA SYSTEMS; pp: 55-67; Vol: 12

King Fahd University of Petroleum & Minerals

http://www.kfupm.edu.sa

Summary

The need to classify audio into categories such as speech or music is an important

aspect of many multimedia document retrieval systems. In this paper, we investigate

audio features that have not been previously used in music-speech classification, such

as the mean and variance of the discrete wavelet transform, the variance of Mel-

frequency cepstral coefficients, the root mean square of a lowpass signal, and the

difference of the maximum and minimum zero-crossings. We, then, employ fuzzy C-

means clustering to the problem of selecting a viable set of features that enables better

classification accuracy. Three different classification frameworks have been

studied:Multi-Layer Perceptron (MLP) Neural Networks, radial basis functions (RBF)

Neural Networks, and Hidden Markov Model (HMM), and results of each framework

have been reported and compared. Our extensive experimentation have identified a

subset of features that contributes most to accurate classification, and have shown that

MLP networks are the most suitable classification framework for the problem at hand.

References:BEIERHOLM T, 2004, P 17 INT C PATT REC, V2, P379BEZDEK JC, 1981, PATTERN RECOGNITIONBUGATTI A, 2002, EURASIP J APPL SIG P, V4, P372CAREY MJ, 1999, P IEEE INT C AC SPEE, V1, P149CHOU W, 2001, P ICASSP 01 SALT LAK, V2, P865CYBENKO G, 1989, MATH CONTROL SIGNAL, V2, P303DELFS C, 1998, P INT C AC SPEECH SI, V3, P1569DUDA RO, 2001, PATTERN CLASSIFICATIELMALEH K, 2000, P ICASSP2000 JUN, V4, P2445HARB H, 2001, P 7 INT C DISTR MULT, P257HARB H, 2003, P 7 INT C SIGN PROC, V2, P125HOYT JD, 1994, P INT C NEUR NETW IE, V7, P4493

Copyright: King Fahd University of Petroleum & Minerals;http://www.kfupm.edu.sa

13.14.15.16.17.18.19.

©

KARNEBACK S, 2001, P EUR C SPEECH COMM, P1891KHAN MKS, 2005, THESIS KING FAHD U PLAMBROU T, 1998, P INT C AC SPEECH SI, V6, P3621LI DG, 2001, PATTERN RECOGN LETT, V22, P533LIPP OV, 2004, EMOTION, V4, P233, DOI 10.1037/1528-3542.4.3.233LU L, 2001, P 9 ACM INT C MULT, P203LU L, 2002, IEEE T SPEECH AUDI P, V10, P504, DOI

10.1109/TSA.2002.80454620. LU L, 2003, ACM MULTIMEDIA SYSTE, V8, P48221. MAMMONE RJ, 1994, ARTIFICIAL NEURAL NE22. PANAGIOTAKIS C, 2004, IEEE T MULTIMEDIA23. PARRIS ES, 1999, P EUROSPEECH 99 BUD, P219124. PELTONEN V, 2001, THESIS TAMPERE U TEC25. PINQUIER J, 2002, P ICSLP 02, V3, P200526. PINQUIER J, 2002, P INT C AC SPEECH SI, V4, P416427. PINQUIER J, 2003, P INT C AC SPEECH SI, V2, P1728. RABINER LR, 1986, IEEE ASSP MAG, V3, P429. SAAD EM, 2002, P 19 NAT RAD SCI C N, P20830. SAUNDERS J, 1996, P INT C AC SPEECH SI, V2, P99331. SCHEIRER E, 1997, P ICASSP 97, V2, P133132. SHAO X, 2003, P 4 INT C INF COMM S, V3, P182333. SRINIVASAN SH, 2004, P INT C AC SPEECH SI, V4, P32134. TZANETAKIS G, 1999, EUROMICRO WORKSH MUS, V2, P6135. TZANETAKIS G, 2001, P INT S MUS INF RETR, P20536. TZANETAKIS G, 2002, IEEE T SPEECH AUDI P, V10, P29337. WANG WQ, 2003, P INF COMM SIGN PROC, V3, P1325

For pre-prints please write to: [email protected]

Copyright: King Fahd University of Petroleum & Minerals;http://www.kfupm.edu.sa

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. © machine-learning based classification of speech and music...

Documents