ibm research fire 2008, kolkata © 2008 ibm corporation december 14, 2008 ibm india research...
TRANSCRIPT
IBM Research
December 14, 2008
FIRE 2008, Kolkata © 2008 IBM Corporation
IBM India Research Laboratory Overview
with an effort to be in the context of FIRE
Debapriyo Majumdar ([email protected])IBM India Research Lab, Bangalore
IBM Research
2 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
IBM Research - Overview
The largest private research institution in the world
Annual R&D budget of around $6B (includes development as well)
Over 3,000 researchers
Mathematics, Computer Science, Physics, Service Science, …
Over 40,000 US patents since 1993
−Most patents of all companies in the world in the last 15 years
Eight labs across the world
IBM Research
4 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
Established: 1982
Tokyo
ZürichEstablished: 1956
Established: 1995
Austin
BeijingEstablished: 1995
Established: 1972
Haifa
Established: 1952
San JoseAlmaden
Established: 1986
WatsonEstablished: 1945
Columbia University
WatsonEstablished: 1961
Established:1998/2005
India–DEL/BLR
IBM Research Labs Worldwide
IBM Research
5 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
IBM India and India Research Lab
IBM India - Second largest population of IBM outside the US (over 75,000)
−Current technical population 40,000+
India Research Lab
−Delhi, since 1998
−Bangalore, since 2005
−About 150 technical people
Bangalore .Chennai
. HyderabadMumbai. Pune
Kolkata
Delhi
..
IBM India Business Units
Application Services
Business Process Transformation Services
India Software Lab (ISL)
Global Service Delivery Center
India Research Lab (IRL)
Domestic Operations/Others
.
.
IBM Research
7 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
Technical Competencies
Business Areas
Computer Science• Distributed Systems – systems mgmt., middleware• Information Management – IE, Data mining• Interaction Technologies – speech• Programming Technologies – parallel and hi-perf. prog.• Software Engineering – model-driven, distributed dev.
Service Delivery
IRL Focus Areas
InfrastructureServices
ApplicationServices
ContactCenter
Services
Emerging Solutions
TelecomOthers
(Banking, etc.)
Math Science• Operations Research• Algorithms• Optimization• Game Theory
Service Science• Service Engineering• Service Productivity• Service Management• Service Quality• Service Supply Chains
Software
Systems
IBM Research
8 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1810 1835 1860 1885 1910 1935 1960 1985 2010
0%
20%
40%
60%
80%
100%
1810 1835 1860 1885 1910 1935 1960 1985 2010
Japan
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000
United States
0%10%20%30%40%50%60%70%80%90%
100%
1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000
India China
Why do we care?Services dominate the world’s GDP…
IBM Research
9 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
Technical Competencies
Business Areas
Computer Science• Distributed Systems – systems mgmt., middleware• Information Management – IE, Data Mining• Interaction Technologies – speech• Programming Technologies – parallel and hi-perf. prog.• Software Engineering – model-driven, distributed dev.
Service Delivery
IRL Focus Areas
InfrastructureServices
ApplicationServices
ContactCenter
Services
Emerging Solutions
TelecomOthers
(Banking, etc.)
Math Science• Operations Research• Algorithms• Optimization• Game Theory
Service Science• Service Engineering• Service Productivity• Service Management• Service Quality• Service Supply Chains
Software
Systems
In the context of
FIRE
IBM Research
10 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
Information and Knowledge Management @ IRL
Speech recognition and synthesis
−Hindi, Indian English & Hinglish
Translation: Hindi English and English Hindi
UIMA Annotators (rule based)
−with IIT-Bombay
Linking structured and unstructured data
Learning attributes from noisy or incomplete information
−For example, customer transaction logs
More…
IBM Research
11 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
Challenges
Data
−Noisy
−Incomplete
−Could be ill-structured
Problem
−Defining the problem is often our job too
Focus on the application
−What you build must work
−Users must be satisfied
−Firefighting
IBM Research
12 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
Some Examples…
IBM Research
13 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
Desktop speech recognition (Hindi & Indian English)−More than 1100 speakers−More than 250 hours of
broadband speech data−Vocabulary of 75000 words−Accuracy: 90-95%
Telephony speech recognition−500 speakers each for Hindi,
English & Hinglish−A prototype for movie
booking system in Indian English
Speech - Core Technologies
IBM Research
14 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
SENSEI: Voice and Accent Training for Call-Centers
Challenges−Increase in the number of call centers in India −Agents need to speak in foreign accent−Very high attrition rates in call-centers−Hiring involves evaluation and training
Solution: Sensei, a tool that is used for:−Candidate Screening: evaluates a candidate’s
pronunciation, grammar and fluency −On-board Training: evaluates correctness of
sounds produced, syllable stress, speaking rate and fluency
−Monitoring: analyzes pre-recorded calls to determine if the agent maintained the required quality of voice/accent
Application: Cost reduction by automation of Accent Training and Evaluation
The Hindu, 30 Oct. 2006
IBM Research
15 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
Dictionaries Lexical rules
Grammar
Source Text Translated Text
Linguistic Experts
Source Text Translated Text
Translation SMT
Language model, alignment, translation
probabilities, Decoding
Statistical Translation Modeling
Linguistic Approach Statistical Machine Translation
Parallel Corpora
Training
Machine Translation: Linguistic & Statistical
IBM Research
16 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
English-Hindi Machine Translation system
IBM Research
17 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
Speech Recognition in IBM-IRL
Nitendra Rajput, “Statistical Language Modeling for Hindi Speech Recognition” National Symposium on Modelling and Shallow Parsing of Indian Languages, MSPIL 2006.
M Kumar, N Rajput, A Verma, “Hybrid Baseform Builder for Phonetic Languages,” International Conference on Intelligent Sensor and Information Processing, Jan 2005, Chennai.
Mohit Kumar, Nitendra Rajput, Ashish Verma, “A large-vocabulary continuous speech recognition system for Hindi,” IBM Journal of Research and Development, Vol. 48, No. 5/6, 2004.
Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma, Adapting Phonetic Decision Trees Between Languages for Continuous Speech Recognition,” Proceedings: IEEE International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China, Oct 16-20, 2000.
Niloy Mukherjee, Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma, On Deriving a Phoneme Model for a New Language,” Proceedings: IEEE International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China, Oct 16-20, 2000.
Raghavendra Udupa U, Tanveer A Faruquie, Hemanta K Maji, "An algorithmic framework for the decoding problem in statistical machine translation," COLING 2004.
R. Udupa and T. Faruquie, "An english-hindi statistical machine translation system," in Proceedings of the 1st IJCNLP, Sanya, Hainan Island, China.
Tanveer Faruquie, Nitendra Rajput, Vimal Raj, “Improving automatic call classification using machine translation,” IEEE ICASSP 2007, Honolulu, Hawai, USA, Jan 2007.
IBM Research
18 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
EROCS: Entity RecOgnition in Context of Structured data
Exploit linked information analysis in core business
Up-sell/Cross-sell, customer segmentation, campaign assessment, churn analysis.
Extracted entities and keywords/featuresOriginal text
Complain:
I have noticed in my statement that you have deducted Rs750 from my a/c (#20310284) as account maintenance fee. Can you please explain why you have charged this money?
I have noticed in my statement that you have deducted Rs750 from my a/c (#20310284) as account maintenance fee. Can you please explain why you have charged this money?CustID: 0205492SavingID: 20310284
Unhappy, Simple Saving A/C
Complaint metadata + Customer/account data brought together by automatically linking the complaint with the customer/account
IBM Research
19 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
EROCS: Entity RecOgnition in Context of Structured data
Linkage Discovery
IBM Research
20 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
Customer to Agent: Hi, I am John ….….. status of a DVD player ….…….Agent to Customer: …tell me the brand…?
Customer to Agent: …… I bought a Sony….
Customer
StoreId Product Brand
John Smith
S8976 DVD Player
LG
John Parker
S8976 DVD Player
Sony
TRANSACTION
CUSTOMER STORE TRANSPROD
PRODUCT
MANUFACTURER
TRANSACTION
CUSTOMER STORE TRANSPROD
PRODUCT
MANUFACTURER
C3C2C1
X
X
Y
X
X
Y
X
X
8
7
6
5
4
3
2
1
B
B
B
A
B
A
A
A
C3C2C1
X
X
Y
X
X
Y
X
X
8
7
6
5
4
3
2
1
B
B
B
A
B
A
A
A
C3C2C1
X
X
Y
X
X
Y
X
X
8
7
6
5
4
3
2
1
B
B
B
A
B
A
A
A
C3C2C1
X
X
Y
X
X
Y
X
X
8
7
6
5
4
3
2
1
B
B
B
A
B
A
A
A
C3C2C1
X
X
Y
X
X
Y
X
X
8
7
6
5
4
3
2
1
B
B
B
A
B
A
A
A
C3C2C1
X
X
Y
X
X
Y
X
X
8
7
6
5
4
3
2
1
B
B
B
A
B
A
A
A
C3C2C1
X
X
Y
X
X
Y
X
X
8
7
6
5
4
3
2
1
B
B
B
A
B
A
A
A
C3C2C1
X
X
Y
X
X
Y
X
X
8
7
6
5
4
3
2
1
B
B
B
A
B
A
A
A
Transcript of Call
Present relevant transaction data and follow-up question to the agent within seconds
Consistent, high-quality customer experienceReduce agent training costReduces privacy concerns
CallAssist
IBM Research
21 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
That’s all for now…
IBM Research – India
Technical areas, applications, Services
Examples on:
−Speech related works…
−Translation…
−Information Extraction…
FIRE: It has been a great start!
IBM Research
22 December 14, 2008FIRE 2008, Kolkata
© 2008 IBM Corporation
Thank you!