natural language processing and speech enabled applications by pavlovic nenad

Natural Language ProcessingNatural Language Processingand and

Speech Enabled ApplicationsSpeech Enabled Applications

Natural Language ProcessingNatural Language Processingand and

Speech Enabled ApplicationsSpeech Enabled Applications

by Pavlovic Nenadby Pavlovic Nenad

22

Presentation ContentPresentation Content

What is natural language processingWhat is natural language processing– Speech synthesisSpeech synthesis– Speech recognitionSpeech recognition– Natural language understandingNatural language understanding

Basic concepts and termsBasic concepts and termsTypes of speech recognition enginesTypes of speech recognition enginesHardware requirementsHardware requirementsHow speech recognition/synthesis worksHow speech recognition/synthesis worksSpeech enabled applicationsSpeech enabled applicationsApplications of speech enabled systemApplications of speech enabled systemCommercial & non-commercial softwareCommercial & non-commercial software

33

Natural language processingNatural language processing

Natural Language Processing (NLP) or Natural Language Processing (NLP) or Computational Linguistic (CL)Computational Linguistic (CL) “ “is a discipline is a discipline between linguistics and computer science which between linguistics and computer science which is concerned with the computational aspects of is concerned with the computational aspects of the human language facultythe human language faculty” [1]. ” [1].

““It belongs to the cognitive sciences and It belongs to the cognitive sciences and overlaps with the field of artificial intelligence overlaps with the field of artificial intelligence (AI), a branch of computer science that is aiming (AI), a branch of computer science that is aiming at computational models of human cognition”at computational models of human cognition” [1]. [1].

44

Natural Language ProcessingNatural Language Processing

Other words, NLP is a discipline that aims Other words, NLP is a discipline that aims to build computer systems that will be able to build computer systems that will be able to analyze, understand and generate to analyze, understand and generate human speech.human speech.

Therefore, NLP sub areas of research are:Therefore, NLP sub areas of research are:– Speech RecognitionSpeech Recognition (speech analysis), (speech analysis),– Speech SynthesisSpeech Synthesis (speech generation), and (speech generation), and – Natural Language Understanding (NLU)Natural Language Understanding (NLU)..

55

Speech Recognition & SynthesisSpeech Recognition & Synthesis

Speech recognitionSpeech recognition is the process of is the process of converting spoken language to written text converting spoken language to written text or some similar form.or some similar form.

Speech synthesisSpeech synthesis is the process of is the process of converting the text into spoken language.converting the text into spoken language.

66

Natural Language UnderstandingNatural Language Understanding

Natural Language Understanding (NLU) is Natural Language Understanding (NLU) is a process of analysis of recognized words a process of analysis of recognized words and transforming them into data and transforming them into data meaningful to computer. meaningful to computer. Other words, NLU is a computer based Other words, NLU is a computer based system that “understands” human system that “understands” human language.language.NLU is used in combination with speech NLU is used in combination with speech recognition. recognition.

77

Basic Terms and ConceptsBasic Terms and Concepts

UtteranceUtterance is any stream of speech between two is any stream of speech between two periods of silence.periods of silence.PronunciationPronunciation is what the speech engine thinks is what the speech engine thinks a word should sound like.a word should sound like.GrammarsGrammars define a domain (of words) within define a domain (of words) within which recognition engine works. which recognition engine works. Vocabulary (dictionary)Vocabulary (dictionary) a list of words a list of words (utterances) that can be recognized by the (utterances) that can be recognized by the speech recognition engine.speech recognition engine.TrainingTraining is the process of adapting the is the process of adapting the recognition system to a speaker.recognition system to a speaker.

88

Basic Terms and ConceptsBasic Terms and Concepts

AccuracyAccuracy is the measure of recognizer’s is the measure of recognizer’s ability to correctly recognize utterances. ability to correctly recognize utterances.

Speaker DependenceSpeaker Dependence– Speaker dependent systemSpeaker dependent system is designed for is designed for

only one user (at the time).only one user (at the time).– Speaker independent systemSpeaker independent system is designed is designed

for variety of speakers.for variety of speakers.

99

Types Of Speech RecognitionTypes Of Speech Recognition

Speech recognizers are divided into several Speech recognizers are divided into several different classes according to the type of different classes according to the type of utterance that they can to recognize:utterance that they can to recognize:– Isolated words,Isolated words,– Connected words,Connected words,– Continuous speech (computer dictation)Continuous speech (computer dictation)– Spontaneous speechSpontaneous speech– Voice VerificationVoice Verification– Voice IdentificationVoice Identification

1010

Hardware RequirementsHardware Requirements

Natural Language Processing requires Natural Language Processing requires string systems in order to work accurately string systems in order to work accurately and with a minimum response time. and with a minimum response time.

The important hardware parts are:The important hardware parts are:– Sound CardSound Card– MicrophoneMicrophone– Processor/RAMProcessor/RAM

1111

How speech synthesis works?How speech synthesis works?

There are five major steps in the process of There are five major steps in the process of speech synthesis:speech synthesis:– Structure analysisStructure analysis: process the structure of the input : process the structure of the input

text.text.– Text pre-processingText pre-processing: analyze input text for special : analyze input text for special

constructs of the language.constructs of the language.– Text-to-phoneme conversionText-to-phoneme conversion: converts each word : converts each word

to phonemes (e.g. “times” = “t ay m s”).to phonemes (e.g. “times” = “t ay m s”).– Prosody analysisProsody analysis: determining appropriate prosody : determining appropriate prosody

for the sentence (e.g. pitch, timing, pausing, etc…).for the sentence (e.g. pitch, timing, pausing, etc…).– Waveform productionWaveform production: phoneme and prosody : phoneme and prosody

information is used to produce the audio waveform. information is used to produce the audio waveform.

1212

How speech recognition works?How speech recognition works?

The basic characteristics of mostly used The basic characteristics of mostly used speech recognizers are:speech recognizers are:– Mono-lingual,Mono-lingual,– Process a single input at the time,Process a single input at the time,– Can optionally adopt to the voice of speaker,Can optionally adopt to the voice of speaker,– Grammars can be dynamically updated, andGrammars can be dynamically updated, and– Has a small defined set of properties.Has a small defined set of properties.

1313

How speech recognition works?How speech recognition works?

G ra m m a rs

Ac o ustic M o d e l

Sp e e c hRe c o g nitio n

Eng ineSpeec h Text

1. Grammar design:Defines the words that may be spoken by a user and the pattern in which they may be spoken.

2. Signal Processing:Analyze the spectrum (frequency) characteristics of the incoming audio. Holds the knowledge of the

environment (how user pronouncesPhonemes) – User profile.

3. Phoneme Recognition:Compare spectrum patternsTo the patterns of the phonemes.

4. Word recognition:Compare the sequence of likely phonemes against the words and patterns of words specified by grammar.

5. Result generation:Provides the information about the words that recognizer has detected.

1414

Speech Enabled Applications -1Speech Enabled Applications -1

The primary aim of speech enabled The primary aim of speech enabled applications is to improve interaction applications is to improve interaction between user and machine.between user and machine.

For this purpose are used both speech For this purpose are used both speech recognition and synthesis or either one of recognition and synthesis or either one of them. It mostly depends of the type of them. It mostly depends of the type of application and its purpose.application and its purpose.

1515

Speech Enabled Applications -2Speech Enabled Applications -2

Speech synthesis is farley easy for usage. Speech synthesis is farley easy for usage. After setting up the “type” of voice, the After setting up the “type” of voice, the speed of “speaking”, the duration of pause speed of “speaking”, the duration of pause between sentences, and so on, speech between sentences, and so on, speech synthesis engine is ready for usage.synthesis engine is ready for usage.

1616

Speech enabled applications -3Speech enabled applications -3

Applying speech recognition requires careful Applying speech recognition requires careful analysis of what could be the possible inputs to analysis of what could be the possible inputs to the system, and the way in which user provides the system, and the way in which user provides the input.the input.The way in which user provides the input to the The way in which user provides the input to the system, and the way the application responds to system, and the way the application responds to the user is called the user is called Natural Language DialogNatural Language Dialog..Natural Language DialogNatural Language Dialog is the first decision that is the first decision that developer must to make.developer must to make.

1717

Natural Language Dialog -1 Natural Language Dialog -1

Three essential types of interaction that Three essential types of interaction that are available to software applications are:are available to software applications are:

– Direct dialog,Direct dialog,– Mixed initiative dialog, andMixed initiative dialog, and– Natural dialog.Natural dialog.

1818

Natural Language Dialog -2Natural Language Dialog -2

Direct DialogDirect DialogInteraction directs the user to perform a specific task by Interaction directs the user to perform a specific task by asking for information at each turn and expecting the asking for information at each turn and expecting the specific words or phrases in response.specific words or phrases in response.

System:System: “Welcome to ABC bank customer services “Welcome to ABC bank customer services system. Please say your name.”system. Please say your name.”

User:User: “Nenad Pavlovic”“Nenad Pavlovic”System: System: “Please say your account number.”“Please say your account number.”User:User: “1234-123-12332-1233”“1234-123-12332-1233”System:System: “Would you like to perform a transfer or to see “Would you like to perform a transfer or to see

the status on your account?”the status on your account?”User:User: “Transfer.”, etc…“Transfer.”, etc…

1919

Natural Language Dialog - 3Natural Language Dialog - 3

Mixed initiative dialogMixed initiative dialogIs similar to previous interaction dialog but it gives Is similar to previous interaction dialog but it gives speaker some freedom. However, it allows user to have speaker some freedom. However, it allows user to have as much as little control as s/he desire.as much as little control as s/he desire.

System:System: “Welcome to ABC bank customer services “Welcome to ABC bank customer services system. Please say your name.”system. Please say your name.”

User:User: “My name is Nenad Pavlovic, and my “My name is Nenad Pavlovic, and my account account number is: 1234-123-12332-1233”number is: 1234-123-12332-1233”

System:System: “Would you like to perform a transfer or to see “Would you like to perform a transfer or to see the status on your account?”the status on your account?”

User:User: “Show me the status and than go to “Show me the status and than go to transfers.”, etc…transfers.”, etc…

2020

Natural Language Dialog - 4Natural Language Dialog - 4

Natural dialogNatural dialogAllows user to enjoy a more unstructured interaction with Allows user to enjoy a more unstructured interaction with an application (as natural as possible)an application (as natural as possible)

System:System: “Welcome to City Directory Dialer, how can I “Welcome to City Directory Dialer, how can I help you?”help you?”

User:User: “I’d like to call Mr. George Eleftherakis in “I’d like to call Mr. George Eleftherakis in Tsimiski building.” Tsimiski building.”

System:System: “George Eleftherakis – Tsimiski building. Is “George Eleftherakis – Tsimiski building. Is this correct?”this correct?”

User:User: “Yes”“Yes”System:System: “George Eleftherakis is found in directory. “George Eleftherakis is found in directory.

Calling…”, etc…Calling…”, etc…

2121

Grammars vs. Statistical NLUGrammars vs. Statistical NLU

More freedom is given to the user to More freedom is given to the user to interact with application, the more complex interact with application, the more complex processing of input data become.processing of input data become.According to complexity of possible user According to complexity of possible user inputs and used interaction dialog, it will inputs and used interaction dialog, it will be used on of two approaches of be used on of two approaches of implementation:implementation:– Grammar-based NLUGrammar-based NLU– Statistical NLUStatistical NLU

2222


Grammar-based NLU:Grammar-based NLU: relies on defining relies on defining (creating) the grammar, which means (creating) the grammar, which means constructing the phrases and stating all constructing the phrases and stating all posible words that can be used.posible words that can be used.

– Advantages: fast, allows freedom of phrases Advantages: fast, allows freedom of phrases construction.construction.

– Disadvantages: used only for small set of Disadvantages: used only for small set of phrases and words, if word or phrase is not phrases and words, if word or phrase is not defined it will not be recognized. defined it will not be recognized.

2323


Statistical NLU:Statistical NLU: relies on usage of relies on usage of statistical model of utterances derived statistical model of utterances derived from actual conversation data.from actual conversation data.

– Advantages: huge set of phrases and wordsAdvantages: huge set of phrases and words– Disadvantages: slow, difficult to add new Disadvantages: slow, difficult to add new

phrases. phrases.

2424

Uses of speech applicationsUses of speech applications

The speech technology is mostly used in The speech technology is mostly used in the following areas:the following areas:– DictationDictation– Command and ControlCommand and Control– TelephonyTelephony– WearablesWearables– Medical DisabilitiesMedical Disabilities– Embedded ApplicationsEmbedded Applications

2525

Speech SystemsSpeech Systems

CommercialCommercial– IBM’s ViaVoice (Linux, Windows, MacOS)IBM’s ViaVoice (Linux, Windows, MacOS)– Dragon NaturalySpeaking (Windows)Dragon NaturalySpeaking (Windows)– Microsoft’s Speech Engine (Windows)Microsoft’s Speech Engine (Windows)– BaBear (Linux, Windows, MacOS)BaBear (Linux, Windows, MacOS)– SpeechWorks (Linux, Sparc & x86 Solaris, Tru64, SpeechWorks (Linux, Sparc & x86 Solaris, Tru64,

Unixware, Windows)Unixware, Windows)

Non-commercialNon-commercial– OpenMind Speech (Linux)OpenMind Speech (Linux)– XVoice (Linux)XVoice (Linux)– CVoiceControl/kVOiceControl (Linux)CVoiceControl/kVOiceControl (Linux)– GVoice (Linux)GVoice (Linux)

2626

ConclusionConclusion

Developers’ perspectiveDevelopers’ perspective: developing speech : developing speech enabled application does not require redesigning enabled application does not require redesigning or explicitly designing systems to support or explicitly designing systems to support speech. It is treated and “attached entity” and speech. It is treated and “attached entity” and can be viewed as separate module. Also, It does can be viewed as separate module. Also, It does not require special linguistic or programming not require special linguistic or programming skills.skills.Business perspectiveBusiness perspective: usage of speech : usage of speech enabled applications can noticeable improve the enabled applications can noticeable improve the accuracy and effectives of employees that work accuracy and effectives of employees that work with big number of data or people or both.with big number of data or people or both.

Thank you Thank you

Pavlovic NenadPavlovic Nenad

[email protected]@city.academic.gr

2828

ReferencesReferences

[1] [1] Radev, R., D.(2001), “Radev, R., D.(2001), “Natural Language Processing FAQNatural Language Processing FAQ”, ”,

Columbia Columbia University, Dept. of Computer Science, NYC.University, Dept. of Computer Science, NYC.

natural language processing and speech enabled applications by pavlovic nenad

Documents

human speech

speech engine

speech recognitionsynthesis

stream of speech

process of speech synthesis

spoken language

recognition system

human language faculty