aspiring minds | svar

11
Aspiring Minds www.aspiringminds .com Spoken English Evaluation Machine Learning with Crowd Intelligence Varun Aggarwal Presented at KDD, 2015, ACL 2015

Upload: aspiring-minds

Post on 13-Apr-2017

73 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Aspiring Minds | Svar

Aspiring Minds

www.aspiringminds.com

Spoken English Evaluation Machine Learning with Crowd Intelligence

Varun Aggarwal

Presented at KDD, 2015, ACL 2015

Page 2: Aspiring Minds | Svar

Problem Statement & Motivation

Importance of spoken English

English language has a very high socio-economic impact – with people speaking the language fluently reported to earn 30-50% more than their peers who don’t.

Grading spoken English in a scalable way needed by companies, training organization and also individuals.

Problem Statement

Scalable grading of spontaneous English speech, as good as experts.

Page 3: Aspiring Minds | Svar

Why are automated methods not accurate?

Speaker independent Speech recognition for spontaneous

speech is a hard problem!

Page 4: Aspiring Minds | Svar

Proposed system architecture

Crowdsourcing helps us get accurate transcriptions. Crowd grades

also help improve!Crowd

Grades

FA Features

Page 5: Aspiring Minds | Svar

Crowdsourcing task

Page 6: Aspiring Minds | Svar

Crowdsourcing task

Worker quality control

• Each worker is assigned a risk level which reflects the quality of his past work.

• Based on the state, number and when to give a gold standard task is determined.

Page 7: Aspiring Minds | Svar

Supervised learning setup

Experiment Details• Sample Size : 566 • 319 India • 247 from Philippines

Expert Grading• Two expert raters• Overall score based on Pronunciation/Fluency

Content-Org/Grammar.• Inter-rater correlation ~0.8.

The learning task• Modelling done separately for Indian and Philippines

set.• Linear ridge regression, Neural Networks and SVM

regression with different kernels were used to build the models.

Page 8: Aspiring Minds | Svar

Case study

• Studied deployment of proposed algorithm in Philippines.

• Event had 500 applicants for the role of a customer support executive. The scoring algorithm was tested on a subset of 150 students.

• Internal expert graded each candidate’s speech as hirable or not-hireable.

Page 9: Aspiring Minds | Svar

Features usedWe use three classes of features

• Force Alignment features (FA) and • The speech sample is forced aligned on the crowdsourced transcription. • Features like– rate of speech, position and length of pauses, log likelihood of recognition, posterior probability,

hesitations and repetitions, etc are derived.

• Natural Language Processing features (NLP).• Surface level features : number of words, complexity or difficulty of words and the number of common words

used. • Semantic features like the coherency in text, context of the words spoken, sentiment of the text and grammar

correctness.

• Crowd Grades (CG)• Crowd provides scores on - pronunciation, fluency, content organization and grammar. • These grades are combined to form a composite score.

Page 10: Aspiring Minds | Svar

Experiment and Results

Crowdsourced transcriptions + Crowd grades outperforms all other methods

Accuracy nears inter expert agreement (~0.8).

Page 11: Aspiring Minds | Svar

Summing it up

• Svar provides an automated assessment of candidate’s pronunciation and fluency.

• Crowdsourcing, in addition to NLP feature, renders reliable composite scores.

• Speech assessments can be made scalable with accuracy nearly matching experts’ opinion.