aspiring minds | svar

Aspiring Minds

www.aspiringminds.com

Spoken English Evaluation Machine Learning with Crowd Intelligence

Varun Aggarwal

Presented at KDD, 2015, ACL 2015

Problem Statement & Motivation

Importance of spoken English

English language has a very high socio-economic impact – with people speaking the language fluently reported to earn 30-50% more than their peers who don’t.

Grading spoken English in a scalable way needed by companies, training organization and also individuals.

Problem Statement

Scalable grading of spontaneous English speech, as good as experts.

Why are automated methods not accurate?

Speaker independent Speech recognition for spontaneous

speech is a hard problem!

Proposed system architecture

Crowdsourcing helps us get accurate transcriptions. Crowd grades

also help improve!Crowd

Grades

FA Features

Crowdsourcing task

Crowdsourcing task

Worker quality control

• Each worker is assigned a risk level which reflects the quality of his past work.

• Based on the state, number and when to give a gold standard task is determined.

Supervised learning setup

Experiment Details• Sample Size : 566 • 319 India • 247 from Philippines

Expert Grading• Two expert raters• Overall score based on Pronunciation/Fluency

Content-Org/Grammar.• Inter-rater correlation ~0.8.

The learning task• Modelling done separately for Indian and Philippines

set.• Linear ridge regression, Neural Networks and SVM

regression with different kernels were used to build the models.

Case study

• Studied deployment of proposed algorithm in Philippines.

• Event had 500 applicants for the role of a customer support executive. The scoring algorithm was tested on a subset of 150 students.

• Internal expert graded each candidate’s speech as hirable or not-hireable.

Features usedWe use three classes of features

• Force Alignment features (FA) and • The speech sample is forced aligned on the crowdsourced transcription. • Features like– rate of speech, position and length of pauses, log likelihood of recognition, posterior probability,

hesitations and repetitions, etc are derived.

• Natural Language Processing features (NLP).• Surface level features : number of words, complexity or difficulty of words and the number of common words

used. • Semantic features like the coherency in text, context of the words spoken, sentiment of the text and grammar

correctness.

• Crowd Grades (CG)• Crowd provides scores on - pronunciation, fluency, content organization and grammar. • These grades are combined to form a composite score.

Experiment and Results

Crowdsourced transcriptions + Crowd grades outperforms all other methods

Accuracy nears inter expert agreement (~0.8).

Summing it up

• Svar provides an automated assessment of candidate’s pronunciation and fluency.

• Crowdsourcing, in addition to NLP feature, renders reliable composite scores.

• Speech assessments can be made scalable with accuracy nearly matching experts’ opinion.

aspiring minds | svar

Technology