developing an effective wireless middleware strategy
TRANSCRIPT
Mobility
• Mobility is “the ability to move freely”• Wireless access to enterprises presents
many challenges:– bandwidth
– device issues
– synchronization of data
• Middleware needed to address the challenges
Wireless Access Challenges
• Bandwidth– The pervasive devices are still mobile phones
– 2.5/3G roll-out has been slow
– Networks are unreliable
• Wi-Fi Solution– 802.11b provides hot spots (range of 300 feet)
– High-speed (>= 11 Mbps)
– Access to wired network backbone
Wireless Access Challenges
• Device Issues– Thin client
• Limited display, text only in some cases
• Battery life, memory very limited
• Open programming environments limited
– Thick client• PDA’s support 256 color displays
• Improved battery life, CPU and memory
• Standard operating systems
• Mic and speaker access
Wireless Access Challenges
• Synchronizing Data– Replication model
• Hot-sync of data to docking station
• Static applications
• Lowers user acceptance
– Real-time access• Requires reliable network connectivity
• Allows heavy lifting to be done in-network
• Increases user acceptance
Extend the Web
• Use existing web infrastructure• From wireless device to the applications
requires middleware• Why isn’t 802.11 connectivity to web
applications enough?– User devices too difficult to enter data with
– Limited display and stylus difficult
Middleware Architecture
Audio
MultimodalGateway
Audio /Command
Intelligent Control Module(ICM) Resource Broker
Multimodal Service RuntimeEnvironment (SRE)
SIP
SIP MR
CP
Application Server
HTTP
Platform Channel
Multimodal Client /Platform
Multimodal Gateway Components
3rd Party Components
SoftServer Components
SoftServer ASR, TTS andStreaming Media Server Pools
SIP
HTTP
Middleware to Effectively Add Speech
• Speech can greatly improve the user experience
• Press to talk – user speaks input to the device
• Audio cue – prompts can be used for tutorial, accessibility, etc.
• The middleware needs to allow access to the speech resources in a web-centric way
Speech Recognition
• Input– Speech from user
– Grammar or dictionary of expected words/phrases
• Processing– Phonetic classification using acoustic models
– Pattern matching with grammar
• Output– Recognition matches and confidence
Speech Synthesis
• Input– Text/ASCII or XML markup (SSML)
– Language models
• Processing– Phonetic concatenation of audio
• Output– Audio representation of text
Speech Ecosystem
• Built from IVR– Speech has emerged as an extension of IVR
touchtone
• Hardware-Centric– Telephony switch and board vendors have
added speech technologies
• Wireless Access– Data devices such as PDA’s are NOT
connecting over a voice channel!
Web Architecture for Speech
• Host-based Technologies– The Speech Recognition and Speech Synthesis
resources are software-based.
– Can be deployed alongside web infrastructure
• Transactional Speech– Client/server interaction for speech resources
– Each transaction defined as a Web Service
– Allow the developer to add speech where appropriate
Role of the Middleware
• Resource Management– Speech Resources deployed on off-the-shelf
hardware
– Requires a common session, media and control protocol
• Open Standards– Session Initiation Protocol (SIP, RFC 3261)
– Real-time Transport Protocol (RTP, RFC 1889)
– Speech Server Control (SPEECSC, IETF WG)
Session Control
• SIP– Modeled after HTTP and SMTP
– Provides client/server network programming model
– Transported via UDP
– REGISTER message allows resources dynamic binding
– Includes Proxy functionality for load balancing
Media Transport
• RTP– Currently using in streaming video/audio web
applications
– Allows consumer/producers of audio to be connected via a network
– Transported via UDP
– Real-time packets to reduce latency and impact of lost packets
– Quality of service must be managed
Resource Control
• SPEECSC– Working group chartered after initial MRCP
Internet drafts submitted
– Defines message set to control• Speech Recognizers
• Speech Synthesizers
• Streaming Prompt Server
– Currently HTTP-style
– Web services being investigated
Component Servers for Speech
• Define each resource as a Component Server– Resources acquired thru SIP
– Media transmitted/received thru RTP
– Control thru MRCP• load grammar
• queue TTS
• play
• perform recognition
– Common resources pooled for resource management
Media Session Framework (MSF)
MSF server
MSF server
MSF server
ASREngine
TTSEngine
PromptCache
UAS
UAS
UAS
RTP
RTP
RTP
ASR client
UAC
UAC = SIP User Agent ClientUAS = SIP User Agent Server
SIP messaging
RTP media
Media Source/Target
Prompt client
TTS client
UAC
UAC
Device Access to the Component Servers
• Devices must be able to use the resources in a standards-based way
Media Gateway
Wired Phone CellPhone Handheld 2.5/3G CellPhone
Multimodal Gateway
SIP/RTPpackets
Voice Service(w/ASR,TTS)
Conclusion
• Bandwidth for wireless networks is improving
• Middleware needs to address:– resource management
– load balancing
– media resource integration
• Open standards-based approach