qa fest 2016. Роман Горин. Введение в системы распознавания...
TRANSCRIPT
Киев 2016
Первый в Украине фестиваль тестирования
Introduction to Speech Recognition Software
testing
Roman Gorin
Киев 2016
About me
• Senior Technical Leader – Testing@ Delphi LLC http://udelphi.com
• 12+ years in Speech Recognition Testing• 6+ years as QA Team Lead• Main Product: Nuance Dragon Medical
http://www.nuance.com/for-healthcare/dragon-medical
• https://telegram.me/DJ_ZX• Facebook: rgorin.zx
Киев 2016
What it is
Киев 2016
Where used
• Nuance Dragon Family• Dragon Pro• Dragon Medical• Dragon for Mac• Dragon Anywhere• Etc
Windows Speech Recognition
Google Voice Search
Киев 2016
Where used
Personal assistants• Siri• Cortana• Google Now• Facebook M, etc
Car systems
Киев 2016
Where used
Smart Home assistants• Amazon Echo• Google Home• Zenbo• Homer, etc.
• Automated Call Сenters SWand more
Киев 2016
Where used: ViV AI (unreleased)
Киев 2016
Basic Principles• Capture audio
• Separate speech from other types of sounds (esp. noise)
• Compare speech audio with known patterns of text<->audio match
• Analyze language specific model
• Perform actions (type text, execute command) based on collected data
Киев 2016
Generic structure of how SR worksMain speech recognition models
(based on Wiki)
• Hidden Markov models• Dynamic time warping (DTW)-based speech
recognition• Neural networks
• Deep Feedforward and Recurrent Neural Networks
Киев 2016
Testing areas• Engine and Language Modelling (usually on recognition server side)• UI• Hardware• Deployment• Adaptation• Recognition and Text Editing• Language specificetc
Киев 2016
Testing areas: Hardware• Mobile HW
• Internal mic (notebooks/tablets)• Noise cancelling mic
• Sound card and drivers compatibility
• System Requirements compliance• HW Dependency• Driver Dependency (WASAPI, DirectSound, ASIO, Kernel streaming for
Windows, ALSA, PulseAudio – Linux, Core Audio – Mac)
Киев 2016
Testing areas: Hardware
• Mics and recorders (samples from nuance.com store)
• Special bundled HW for Professional*Nuance PowerMic *Philips SpeechMike
Киев 2016
Testing areas: Deployment
• Platform• Client OS (Desktop/Mobile)• Server OS for Client app• Server OS for Cloud/Remote app
• Azure Cloud• Amazon Cloud• Proprietary cloud hosts for server recognition (for ex. recognition servers for Siri, etc)
• Support for virtualization platforms: VDI and App Virtualization (standalone recognition on remote access)
• Citrix XenApp and XenDesktop/Thin and Thick clients• VMWare Workstation and Horizon• Oracle VirtualBox• Microsoft Remote Desktop/Terminal Services
Киев 2016
Testing areas: Adaptation
• Predefined language patterns• Statistical modelsA statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability P ( w 1 , … , w m ) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications.
• “Part of speech” detection
• Sound specific patterns• Person-specific
• How person pronounce words and sounds• How person construct sentences• Pronunciation speed
Киев 2016
Testing areas: Recognition and Commands control• Initial recognition tests
• Turn app into “listening mode”
• Basic commands (“what I can do”)• Extended commands (app-type specific)
• Non strict commands (pseudo-AI)• Search commands• 3rd party Apps specific commands/3rd party SW compatibility
• Dictating into app default text controls (if supported)• Dictating into 3rd party supported and unsupported apps• Transcribing prerecorded audio
Киев 2016
Testing areas: Recognition and Text Editing(sample from PCWorld/Nuance)
Киев 2016
Testing areas: Languages and Accents• Different accents (UK English, US English, Australian English, etc)• Issues with speaking• Language-specific sounds
• Homophones (French)• Umlauts (German)• etc
• Language specific syntax (using commas, periods, exclamation marks, etc)
• Similar or close pronunciation words (fr. voux, voi, vu, etc)• Hieroglyphs (Chinese, Japan, etc)
Киев 2016
Testing areas: Other stuff
• Audio codecs• Traffic consumption (for cloud or remote access apps)• Memory and CPU consumption• Response time and cancelling recognition
Киев 2016
Enterprise Recognition (based on Nuance.com info)
Киев 2016
Enterprise Recognition (based on Nuance.com info)
• Support Major EHR platforms—including Epic®, Cerner®, eClinicalWorks, athenahealth®, MEDITECH®, and more. © Nuance.com
Киев 2016
Киев 2016
Links• https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx• http://www.explainthatstuff.com/voicerecognition.html• http://scienceline.org/2014/08/ever-wondered-how-does-speech-to-text-software-work/• http://www.nuance.com/for-healthcare/capture-anywhere/360-mobile-solutions/powermicmobile/index.ht
m• http://www.nuance.com/for-individuals/by-product/dragon-accessories • https://en.wikipedia.org/wiki/List_of_speech_recognition_software• https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking• https://en.wikipedia.org/wiki/Speech_recognition• https://en.wikipedia.org/wiki/Language_model • http://www.pcmag.com/article2/0,2817,2464719,00.asp• http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html • http://www.oxygen.lcs.mit.edu/Speech.html• http://copia.com.au/medical-speech-recognition/