Download - See, Hear, Do: Language and Robots
IBM Research
© 2002 IBM Corporation
See, Hear, Do:Language and Robots
Jonathan Connell Exploratory Computer Vision Group
Etienne MarcheretSpeech Algorithms & Engines Group
Sharath Pankanti (ECVG)
Josef Vopicka (Speech)
Far Reaching Research (FRR) Project
2
IBM Research
© 2005 IBM Corporation
Challenge = Multi-modal instructional dialogs
Use speech, language, and vision to learn objects & actions
Innate perception abilities (objects / properties)
Innate action capabilities (navigation / grasping)
Easily acquire terms not knowable a priori
Example dialog:
Round up my mug.I don’t know how to “round up” your mug.
Walk around the house and look for it.When you find it bring it back to me.
I don’t know what your “mug” looks like.
It is like this <shows another mug> but sort of orange-ish.OK … I could not find your mug.
Try looking on the table in the living room.OK … Here it is!
Language Learning & Understanding is a AAAI Grand Challengehttp://www.aaai.org/aitopics/pmwiki/pmwiki.php/AITopics/GrandChallenges#language
verb learning
command following
noun learning
advice taking
3
IBM Research
© 2005 IBM Corporation
Eldercare as an application
Example tasks:Pick up dropped phone
Get blanket from another room
Bring me the book I was reading yesterday
Large potential marketMany affluent societies have a demographic imbalance (Japan, EU, US)
Institutional care can be very expensive (to person, insurance, state)
A little help can go a long wayCan be supplied immediately (no waiting list for admission)
Allows person to stay at home longer (generally easier & less expensive)
Boosts independence and feeling of control (psychological advantage)
Note: We are not attempting to address the whole problemX Aggressive production cost containment
X Robust self-recharging and stairs traversal
X Bathing and bathroom care, patient transfer, cooking
X OSHA, ADA, FDA, FCC, UL or CE certification
4
IBM Research
© 2005 IBM Corporation
State of the art Indoor navigation
Minerva from CMU, Jose from Univ. British Columbia
Perception & manipulationHerb from CMU / Intel (Kanade), PR2 from Willow Garage
Language learningRipley from MIT (Deb Roy), HAM from KTH in Sweden
Dialog and speechHonda system from IBM, call center handling from IBM
No object perceptionNo manipulation capability
Off-line object model generationNo natural language interface
Either fetch or carryNo procedural learning
No physical presence or actionNo visual perception of objects
5
IBM Research
© 2005 IBM Corporation
Business Model
IBM
customers
OEM
buy hardware
Third Party
add software and services
$70B / year
6
IBM Research
© 2005 IBM Corporation
Costs & revenue potential OEM sales price for hardware $6000
Electromechanical parts $1300Onboard computer $500Assembly (15hrs x $80 / hr) $1200+ 30% Sales & distribution + 20% profit $3000
Value-added wholesale price (w/ software) $15,00010% Continued R&D $150030% Sales & distribution $450020% Profit $3000
Price = Less than a new car
Total cost of ownership $8000 / yrLifetime = 3 years $5000 / yrService (15hrs / quarter x $50 / hr x 4 quarters) $3000 / yr
Effective wage (40hrs / wk x 50wks / yr = 2000 hrs / yr) $4 / hr
$24B / yrresell robot + value added software + field service
Eldercare market in US (x3 if EU and AP also) 3 millionTotal US population 300 millionAges 75-85 10%Suitable (ability level, desire, finances) 10%
Manufacturing business ($2000 / robot yr) $6B / yr
Services business ($3000 / robot yr) $9B / yr
7
IBM Research
© 2005 IBM Corporation
Sample business case Home eldercare now (employer costs) $25,000 / yr
1 aide from 8am to 6pm = 10 hrs
50wks x 5days / wk x 10hrs / day = 2500 hrs / yr
Federal min. wage = $7.25 / hr
+38% overhead (FICA + 401K + medical) = $10 / hr
Aide’s activities:Help with clothes, hygiene, meals
Odd tasks such as fetching objects
Sitting around watching TV
Alternative: Half-time aide + robot $20,500 / yrHuman still helps with clothes, hygiene, meals
Robot potentially available after hours and on weekends
No problem with robot Training, Turnover, and Trust (stealing)
Value proposition (to client): 30% more hours @ 10% less costSplit savings with customer ($50,000 $45,000 per client)
Human 5 hrs + robot 8 hrs = 13 hrs / day during week
10% less revenue but 22% more profit (= $6.6B / yr extra profit if 100% market share)
Bill at $20,000 - $3000 service = $17,000 / yr revenue 10.6 months payback on $15,000 purchase
8
IBM Research
© 2005 IBM Corporation
What’s different and important
Speech-driven interfaceNo headset required (far field), can learn new nouns and verbs
Multi-modal dialogResponds to gestures, exploits synergies between modalities
Manipulation as well as mobility
Not just a walking telephone, can do useful physical work also
One-shot learningNo turntable scanning, not 100’s of examples, no trial-and-error experiments
Cost containmentVision instead of special-purpose sensors and precise mechanicals