introducing the web intelligence (wit) group microsoft research asia

32
INTRODUCING THE WEB INTELLIGENCE (WIT) GROUP Microsoft Research Asia

Upload: roland-tyler

Post on 02-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

INTRODUCING THE WEB INTELLIGENCE (WIT) GROUP

Microsoft Research Asia

TALK OUTLINE

Introducing WIT – Web InTelligence Group

SQuADSummary

Mission Statement

Enable synergetic collaboration between people

and between people and

computers to enlighten them and

enrich their lives http://research.microsoft.com/en-us/groups/WIT/

Vision – a Web with IntelligenceSatisfy user needs, simplify key tasks, promote serendipitous discovery, and

foster task-oriented social network

Web InTelligence group (WIT) I’m the manager!

Chin-Yew Lin

Tetsuya Sakai

Yunbo Cao

Wei Lai

Bo Wang

YounginSong

I’m the SECOND Japanese

researcher at MSRA!

I’m the FIRST Korean researcher

at MSRA!

WIT spun off from the Natural Language

Computing group in June 2009!

I joined MSRA in April 2009!

I joined MSRA in May 2009!

WIT research topicsSocial question

answeringand summarisation

Sentiment analysis

Expert and social search

User intent/activityrecognition and

predictionInarticulate user

assistance

Information access

evaluation

TALK OUTLINE

Introducing WIT – Web InTelligence Group

SQuADSummary

Mining Community Knowledge: Social Q&A and Its ApplicationWeb Intelligence (WIT), Microsoft Research Asia

Chin-Yew LIN [email protected]

Search vs. Question Answering (QA)

Understanding what users want is difficult!

User intention

Search vs. Question Answering (QA)

QA Complements Search

  short queries   long queries  

  high mid low high mid low Query 50 50 50 49 50 50question 134 122 94 136 119 67

Total 184 172 144 185 169 117

• short: length <= 2, long: length >= 3• high: freq >100K, mid: between 1K and 50K, low: freq < 300

Goal: Create a scalable question and answering service

Methods: Index all question and answer pairs (QnA) and their authors

on the web Enrich QnA through summarization Expand QnA database by auto-posting questions to and

acquiring answers from community QnA services Refine QnA through Wiki-style online collaboration

Motivations: Leverage and add value to search Leverage questions that already have been answered Leverage people’s knowledge and their networks

Scalable Question Answering & Distillation

CampusCS

Baidu Zhidao (百度知道 )

17,012,767 resolved questions in two years’ operation.

8,921,610 are knowledge related. 96.7% of questions are resolved. 10,000,000 daily visitors. 71,308 new questions per day. 3.14 answers per question.

http://www.searchlab.com.cn (中国人搜索行为研究 /User Research Lab of Chinese Search)

A Traditional QA Architecture

A QA system gives direct answers to aquestion instead of documents

Falcon QA system (LCC)Moldovan et al. ACL 2000Surdeanu et al. IEEE Trans. PDS 2002Best QA system in TREC 8 & 9

•Average question answering time•TREC 8: 48 seconds•TREC 9: 94 seconds

Module TREC8 TREC9QP 1.1% 1.2%

PR (21.3 sec) 44.4% (24.9 sec) 26.5%

PS 5.4% 2.2%

PO 0.1% 0.1%

AP (23.4 sec) 48.7% (65.5 sec) 69.7%

Falcon QA system module analysis: processing time

Traditional IR

http://weblogs.hitwise.com/leeann-prescott/2006/12/yahoo_answers_captures_96_of_q.html

Yahoo! Answers has 19,041,128 resolved questions in 26 categories adding about 48K questions per day. (August 24, 2007)

Community Question and Answering

Community QnA in Details

Context 2

Topic

Context 1

topic

Online Discussion Forum

FAQ

About 28,424,184 results on Live Searchusing query: “FAQ travel”

(Google: about 64,200,000)

Context dependent

Challenges

List of Papers Accepted

Recommending Questions Using the MDL-based Tree Cut Model – Cao et al.; WWW 2008

Searching Questions by Identifying Question Topic and Question Focus – Duan et al.; ACL 2008

Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums – Ding el al.; ACL 2008

Finding Question Answer Pairs from Online Forums – Cong et al.; SIGIR 2008

Question Utility: A Novel Static Ranking of Question Search – Song et al.; AAAI 2008

Answer Summarization: Understanding and Summarizing Answers in Community-Based Question Answering Services – Liu et al; COLING 2008

Automatic Question Generation from Queries – Lin; NSF Workshop on Question Generation Shared Task and Evaluation Challenge 2008

Question Mining & Answering(ACL 2008 & SIGIR 2008)

Extract question and answer pairs Community QnA

Create a resolved question listExtract & index question, best answer,

and other answersLive Qna, Yahoo! Answers, Baidu Zhidao,

… Forum

Extract and index threads and postings, find questions and their answers

QA Pairs in Online Forums

Question Search & Recommendation(ACL 2008 & WWW 2008)

Query We would like to know what will be available to see in the

Forbidden City because we understand that it will be under repairs.

Question search Is it true that the Forbidden City is undergoing renovation & we

won't be allow to enter?

Question recommendation Would you get a lower price by not needing a guide for the

Forbidden City and etc? Can anybody recommend a budget hotel near Forbidden City?

Question = Topic + Focus + Others (TFO) Search: same topic similar foci Recommend: same topic different foci

Identifying Topic and Focus

Specificity: the inverse of the entropy of the topic term‘s distribution over the sub-categories

Order topic terms by their specificity

Travel @Yahoo! Answers

Asia Pacific

Europe

China

Japan

Travel @Yahoo! Answers

Asia Pacific

Europe

China

Japan

China1. Anyone know where to see the Dragon

Boat Festival in Beijing? 2. Where is a good (Less expensive) place

to shop in Beijing? 3. What's the cheapest way to get from

Beijing to Hong Kong?

Europe1. How far is it from Berlin to Hamburg?2. What is the cheapest way from Berlin

to Hamburg?3. Where to see between Hamburg and

Berlin?4. How long does it take from Hamburg to

Berlin?

Question Utility(AAAI 2008)

Motivation How useful is a question? How should we rank questions without

queries? Definition

How likely a question would be asked again?

The probability generating query Q’from question Q (Relevance score)

The prior probability of question Q reflecting a static rank of the questioni.e. Question Utility

)'(

)|'()()'|(

Qp

QQpQpargmaxQQpargmax QQ )|'()()'|( QQpQpargmaxQQpargmax QQ

'

)|()()'|(Qw

QQ QwpQpargmaxQQpargmax

Answer Summarization(COLING 2008)

Example: “Where to stay in Paris?” 2,645 answers (Yahoo!

Answers 03/04/09) Is the “best answer”

the best answer? Question clustering

Find similar questions Answer summarization

Aggregate answers for aquestion cluster

Answer Taxonomy

Question Taxonomy

Travel FAQ

Microsoft Travel Guide Http://travel.msra.cn

TALK OUTLINE

Introducing WIT – Web InTelligence Group

SQuADSummary

Knowledge Distillation & Dissemination

Knowledge Distillation and Dissemination

Mixed Mode Question Answering

Q&A = Knowledge = Power

Q&A is complement to web keyword search

Q&A can enhance existing QnA and search services

Leverage existing knowledge in the question and answer forms and their authors

Acquire or elicit human knowledge automatically

Discussion