음성인식기술을 이용한 일본드라마 감성분석

한국연구재단 학제간 융합연구팀 주최 세미나

음성인식 기술을 이용한 일본 드라마 감성 분석

(Analysis of Japanese Drama using Emotional Speech Recognition)

2011. 6. 10( 금 ).

김성호

영남대학교 전자공학과

Outline

Introduction of emotional speech recognition Related works and current status Standard emotional recognition system

MFCC feature Classification by SVM

Experimental results Concusions

Introduction

Speech A sequence of elementary acoustic symbols

Information in speech Gender information, age, accent, speaker’s identity, health, and

emotion

Application of emotional speech recognition Recently, increased attention in this area 융합과제 : 반한 감정에 대한 정량적 분석에 도움 . Human-Robot interaction Smart call-centers Computer tutoring system

Related Works (2007-2008)

[J. Sidorova, 2007] Feature: pitch, intensity, formant, harmonicity 116 dim. Classifier: MLP (neural-network) Number of emotions: 7 types (neutral, angry, disgusted, fear, joy,

surprise, sad) Test DB: EMO-DB (Deutch) 80.67%

[T. Danisman, 2008] Feature: MFCC, energy Classifier: SVM (Support Vector Machine) Number of emotions: 5 types (angry, happy, neutral, sad,

surprise) Test DB: DES-DB (Denmark) 67.6%

Related Works (2009-2011)

[M. Vondra, 2009] Feature: F0, Intensity, MFCC Classifier: GMM (Gaussian Mixture Model) Number of emotions: 7 types Test DB: EMO-DB (Deutch) 71.63%

[M. El Ayadi, 2011] Survey Feature

Best feature is unknown. Classifier

• HMM, GMM• SVM, Neural Net, k-NN

Current performance Speaker independent: around 50% Speaker dependent: over 90%

Standard Method of Emotional Speech Recognition

Key algorithm Feature extractor: MFCC Classifier: SVM

Recognized emotions

MFCCSVM orNearest class mean classifier

Training acoustic files

Testing acoustic files

Feature for Emotional Speech Recognition

Mel Frequency Cepstral Coefficients (MFCC) Convey information of short time energy in frequency domain

Signal

Fourier transform (frequency domain)

Mapping the power spectrum onto the mel scale

Take Log of powers at each mel frequency

Final MFCC: Amplitude of resulting spectrum

Mel scale: 사람이 차이를 느끼는

주파수 간격

Hertz ScaleTake discrete Cosine transform

Classifier: Support Vector Machine

Feature space Learning: Finding optimal classifier

Recognition: Performed by the learned classifier

Ex. y=ax+b

Original SVM basically binary class classifier Multiclass SVM use multiple SVMs and voting

Classifier: Nearest Class Mean

Feature space

Learning: Finding class means

Recognition: Finding nearest class

Exp.1 on EMO Database

EMO DB 7 types (happy, angry, anxious, fearful, bored, disgusted, neutral) 10 kinds of sentences 10 people (male 5, female 5) Language: Deutch

boredom

Recognition using Nearest Class Mean Classifier

Learning: 150 (randomly selected), test: 150

Recognition rate: 47.0%

Recognition using SVM

Recognition rate: 38.0%

SVM 보다 Nearest Class Mean Classifier 가 우수함 .

Exp2. 독일어로 학습 일본어 테스트 놀람

슬픔

기쁨

독일어와 일본어의 차이로 인해 인식이 불안정함 .

Exp3. 일본어로 학습 일본어로 테스트

'neutral

'anger’

'happy’

‘surprise’

DB 구성 : 5 개 감정 , 57 개 음성클립( 언덕 위의 구름 4 화만 활용 )

인식결과 : Nearest Class Mean Classifier 이용

surprise

neutral

인식결과 : SVM 이용

SVM 인식 기법이 더 우수함 .

surprise

neutral

Exp.4 확장 실험

학습 : 158 음성 클립 (1-4 화 , 2 초 / 클립 ) 26,635x20dim 10 회 반복 (cross-validation, random sampling, 5000 개

feature, 16ms/feature) 평균인식률 : 92.85

surprise

neutral

제 1 화 전체 음성 파일 분석 결과

세 주인공의 유년시절 나레이션 많음 . 배경 음악 자주 있음 .

surprisehappy

neutral

surprise

불꽃놀이 헤어짐 영어수업

순양함 감탄

세 주인공의 학창시절 나레이션 많음 . 배경 음악 자주 있음 .

19surprise

해군 훈련

청일전쟁 직전 나레이션 많음 . 배경 음악 자주 있음 .

20surprise

부친상 , 회상 , 나레이션

조선군대 파병 관련 관료 대화

청일전쟁 나레이션 많음 . 배경 음악 자주 있음 .

육상전쟁 해상전쟁 종군기자나레이션

청일전 승리 얘기 미국 방문 나레이션 많음 .

민비시해사건소개 ( 놀람 )

미국 무도회 나이아가라폭포관광 ( 놀람 )

해군교육(anger)

문학인죽음(sad)

장례식(sad)

출항 , 헤어짐(sad)

전투(anger)

결론

감성 언어 인식 기법 결론 MFCC 특징량 추출 및 인식기 (SVM, Nearest mean class

classifier) 개발 독일어 7 종 감정 인식 성능은 최대 47% 임 . 독일어 학습 일본어 감정 인식 성능은 매우 안좋음 . 일본어 5 종 감정 학습 일본어 감정 인식 최대 성능은 92.85%

‘ 언덕위의 구름’ 전체 음성 분석 결과 1-9 화 전체 음성 파일에 적용 및 통계적 분석 결과 특정 장면에서

감정이 일부 상관 관계가 있었지만 , 배경 음악 , 나레이션 등에 의해 무의미한 부분이 많음 .

반한 감정 관련 음성학적으로 특이 사항을 발견하기 어려웠음 .

음성인식기술을 이용한 일본드라마 감성분석

Education

구조방정식모형을 이용한 수도권 은퇴계층의...

vagrant를 이용한 개발환경 구축과 netbeans를...

adaboost를 이용한 face recognition

스크럼을 이용한 게임 개발

nodejs를 이용한 개발

워드프레스를 이용한 모바일웹 개발

뷰포리아를 이용한 증강현실

작업 스케줄러를 이용한 simulation 1. 작업...

퇴원요약 데이터베이스를 이용한...

트위터를 이용한 마케팅

스마트카드와 휴대용 플래쉬메모리를 이용한...

음향방출기법을 이용한 전로베어링 안전진단...

리눅스를 이용한 nas만들기

fmod를 이용한 사운드 프로그래밍

공모전을 이용한 인디게임 마케팅

회귀분석을 이용한 사회과학자료의 분석

흙막이벽체를 이용한 pile공사 간섭구간...

hanbert base tbai 제품소개서 · 2020. 9. 17. · fine...

다차원척도법을 이용한 강변저류지 설치...

html5를 이용한 음원 공유서비스