ibm research fire 2008, kolkata © 2008 ibm corporation december 14, 2008 ibm india research...

20
IBM Research December 14, 2008 FIRE 2008, Kolkata © 2008 IBM Corporation IBM India Research Laboratory Overview with an effort to be in the context of FIRE Debapriyo Majumdar ([email protected]) IBM India Research Lab, Bangalore

Upload: charleen-boyd

Post on 17-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

December 14, 2008

FIRE 2008, Kolkata © 2008 IBM Corporation

IBM India Research Laboratory Overview

with an effort to be in the context of FIRE

Debapriyo Majumdar ([email protected])IBM India Research Lab, Bangalore

Page 2: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

2 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

IBM Research - Overview

The largest private research institution in the world

Annual R&D budget of around $6B (includes development as well)

Over 3,000 researchers

Mathematics, Computer Science, Physics, Service Science, …

Over 40,000 US patents since 1993

−Most patents of all companies in the world in the last 15 years

Eight labs across the world

Page 3: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

4 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

Established: 1982

Tokyo

ZürichEstablished: 1956

Established: 1995

Austin

BeijingEstablished: 1995

Established: 1972

Haifa

Established: 1952

San JoseAlmaden

Established: 1986

WatsonEstablished: 1945

Columbia University

WatsonEstablished: 1961

Established:1998/2005

India–DEL/BLR

IBM Research Labs Worldwide

Page 4: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

5 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

IBM India and India Research Lab

IBM India - Second largest population of IBM outside the US (over 75,000)

−Current technical population 40,000+

India Research Lab

−Delhi, since 1998

−Bangalore, since 2005

−About 150 technical people

Bangalore .Chennai

. HyderabadMumbai. Pune

Kolkata

Delhi

..

IBM India Business Units

Application Services

Business Process Transformation Services

India Software Lab (ISL)

Global Service Delivery Center

India Research Lab (IRL)

Domestic Operations/Others

.

.

Page 5: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

7 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

Technical Competencies

Business Areas

Computer Science• Distributed Systems – systems mgmt., middleware• Information Management – IE, Data mining• Interaction Technologies – speech• Programming Technologies – parallel and hi-perf. prog.• Software Engineering – model-driven, distributed dev.

Service Delivery

IRL Focus Areas

InfrastructureServices

ApplicationServices

ContactCenter

Services

Emerging Solutions

TelecomOthers

(Banking, etc.)

Math Science• Operations Research• Algorithms• Optimization• Game Theory

Service Science• Service Engineering• Service Productivity• Service Management• Service Quality• Service Supply Chains

Software

Systems

Page 6: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

8 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1810 1835 1860 1885 1910 1935 1960 1985 2010

0%

20%

40%

60%

80%

100%

1810 1835 1860 1885 1910 1935 1960 1985 2010

Japan

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

United States

0%10%20%30%40%50%60%70%80%90%

100%

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

India China

Why do we care?Services dominate the world’s GDP…

Page 7: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

9 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

Technical Competencies

Business Areas

Computer Science• Distributed Systems – systems mgmt., middleware• Information Management – IE, Data Mining• Interaction Technologies – speech• Programming Technologies – parallel and hi-perf. prog.• Software Engineering – model-driven, distributed dev.

Service Delivery

IRL Focus Areas

InfrastructureServices

ApplicationServices

ContactCenter

Services

Emerging Solutions

TelecomOthers

(Banking, etc.)

Math Science• Operations Research• Algorithms• Optimization• Game Theory

Service Science• Service Engineering• Service Productivity• Service Management• Service Quality• Service Supply Chains

Software

Systems

In the context of

FIRE

Page 8: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

10 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

Information and Knowledge Management @ IRL

Speech recognition and synthesis

−Hindi, Indian English & Hinglish

Translation: Hindi English and English Hindi

UIMA Annotators (rule based)

−with IIT-Bombay

Linking structured and unstructured data

Learning attributes from noisy or incomplete information

−For example, customer transaction logs

More…

Page 9: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

11 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

Challenges

Data

−Noisy

−Incomplete

−Could be ill-structured

Problem

−Defining the problem is often our job too

Focus on the application

−What you build must work

−Users must be satisfied

−Firefighting

Page 10: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

12 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

Some Examples…

Page 11: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

13 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

Desktop speech recognition (Hindi & Indian English)−More than 1100 speakers−More than 250 hours of

broadband speech data−Vocabulary of 75000 words−Accuracy: 90-95%

Telephony speech recognition−500 speakers each for Hindi,

English & Hinglish−A prototype for movie

booking system in Indian English

Speech - Core Technologies

Page 12: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

14 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

SENSEI: Voice and Accent Training for Call-Centers

Challenges−Increase in the number of call centers in India −Agents need to speak in foreign accent−Very high attrition rates in call-centers−Hiring involves evaluation and training

Solution: Sensei, a tool that is used for:−Candidate Screening: evaluates a candidate’s

pronunciation, grammar and fluency −On-board Training: evaluates correctness of

sounds produced, syllable stress, speaking rate and fluency

−Monitoring: analyzes pre-recorded calls to determine if the agent maintained the required quality of voice/accent

Application: Cost reduction by automation of Accent Training and Evaluation

The Hindu, 30 Oct. 2006

Page 13: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

15 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

Dictionaries Lexical rules

Grammar

Source Text Translated Text

Linguistic Experts

Source Text Translated Text

Translation SMT

Language model, alignment, translation

probabilities, Decoding

Statistical Translation Modeling

Linguistic Approach Statistical Machine Translation

Parallel Corpora

Training

Machine Translation: Linguistic & Statistical

Page 14: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

16 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

English-Hindi Machine Translation system

Page 15: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

17 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

Speech Recognition in IBM-IRL

Nitendra Rajput, “Statistical Language Modeling for Hindi Speech Recognition” National Symposium on Modelling and Shallow Parsing of Indian Languages, MSPIL 2006.

M Kumar, N Rajput, A Verma, “Hybrid Baseform Builder for Phonetic Languages,” International Conference on Intelligent Sensor and Information Processing, Jan 2005, Chennai.

Mohit Kumar, Nitendra Rajput, Ashish Verma, “A large-vocabulary continuous speech recognition system for Hindi,” IBM Journal of Research and Development, Vol. 48, No. 5/6, 2004.

Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma, Adapting Phonetic Decision Trees Between Languages for Continuous Speech Recognition,” Proceedings: IEEE International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China, Oct 16-20, 2000.

Niloy Mukherjee, Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma, On Deriving a Phoneme Model for a New Language,” Proceedings: IEEE International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China, Oct 16-20, 2000.

Raghavendra Udupa U, Tanveer A Faruquie, Hemanta K Maji, "An algorithmic framework for the decoding problem in statistical machine translation," COLING 2004.

R. Udupa and T. Faruquie, "An english-hindi statistical machine translation system," in Proceedings of the 1st IJCNLP, Sanya, Hainan Island, China.

Tanveer Faruquie, Nitendra Rajput, Vimal Raj, “Improving automatic call classification using machine translation,” IEEE ICASSP 2007, Honolulu, Hawai, USA, Jan 2007.

Page 16: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

18 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

EROCS: Entity RecOgnition in Context of Structured data

Exploit linked information analysis in core business

Up-sell/Cross-sell, customer segmentation, campaign assessment, churn analysis.

Extracted entities and keywords/featuresOriginal text

Complain:

I have noticed in my statement that you have deducted Rs750 from my a/c (#20310284) as account maintenance fee. Can you please explain why you have charged this money?

I have noticed in my statement that you have deducted Rs750 from my a/c (#20310284) as account maintenance fee. Can you please explain why you have charged this money?CustID: 0205492SavingID: 20310284

Unhappy, Simple Saving A/C

Complaint metadata + Customer/account data brought together by automatically linking the complaint with the customer/account

Page 17: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

19 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

EROCS: Entity RecOgnition in Context of Structured data

Linkage Discovery

Page 18: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

20 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

Customer to Agent: Hi, I am John ….….. status of a DVD player ….…….Agent to Customer: …tell me the brand…?

Customer to Agent: …… I bought a Sony….

Customer

StoreId Product Brand

John Smith

S8976 DVD Player

LG

John Parker

S8976 DVD Player

Sony

TRANSACTION

CUSTOMER STORE TRANSPROD

PRODUCT

MANUFACTURER

TRANSACTION

CUSTOMER STORE TRANSPROD

PRODUCT

MANUFACTURER

C3C2C1

X

X

Y

X

X

Y

X

X

8

7

6

5

4

3

2

1

B

B

B

A

B

A

A

A

C3C2C1

X

X

Y

X

X

Y

X

X

8

7

6

5

4

3

2

1

B

B

B

A

B

A

A

A

C3C2C1

X

X

Y

X

X

Y

X

X

8

7

6

5

4

3

2

1

B

B

B

A

B

A

A

A

C3C2C1

X

X

Y

X

X

Y

X

X

8

7

6

5

4

3

2

1

B

B

B

A

B

A

A

A

C3C2C1

X

X

Y

X

X

Y

X

X

8

7

6

5

4

3

2

1

B

B

B

A

B

A

A

A

C3C2C1

X

X

Y

X

X

Y

X

X

8

7

6

5

4

3

2

1

B

B

B

A

B

A

A

A

C3C2C1

X

X

Y

X

X

Y

X

X

8

7

6

5

4

3

2

1

B

B

B

A

B

A

A

A

C3C2C1

X

X

Y

X

X

Y

X

X

8

7

6

5

4

3

2

1

B

B

B

A

B

A

A

A

Transcript of Call

Present relevant transaction data and follow-up question to the agent within seconds

Consistent, high-quality customer experienceReduce agent training costReduces privacy concerns

CallAssist

Page 19: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

21 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

That’s all for now…

IBM Research – India

Technical areas, applications, Services

Examples on:

−Speech related works…

−Translation…

−Information Extraction…

FIRE: It has been a great start!

Page 20: IBM Research FIRE 2008, Kolkata © 2008 IBM Corporation December 14, 2008 IBM India Research Laboratory Overview with an effort to be in the context of

IBM Research

22 December 14, 2008FIRE 2008, Kolkata

© 2008 IBM Corporation

Thank you!