dan roth department of computer science university of illinois at urbana/champaign

19
Multimodal Information Access and Synthesis A DHS Institute of Discrete Science Center @ UIUC Dan Roth Department of Computer Science University of Illinois at Urbana/Champaign

Upload: nili

Post on 19-Mar-2016

37 views

Category:

Documents


0 download

DESCRIPTION

M ultimodal I nformation A ccess and S ynthesis A DHS Institute of Discrete Science Center @ UIUC. Dan Roth Department of Computer Science University of Illinois at Urbana/Champaign. MIAS Mission. Most of the data today is unstructured - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

Multimodal Information Access and

Synthesis A DHS Institute of Discrete Science Center @ UIUC

Dan Roth Department of Computer Science

University of Illinois at Urbana/Champaign

Page 2: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis2

Most of the data today is unstructured

books, newspaper articles, journal publications, field reports, images, and audio and video streams.

How to deal with the huge amount of unstructured data as if it was organized in a database with a known schema.

how to locate, access, organize, analyze and synthesize unstructured data.

MIAS Mission: develop the theories, algorithms, and tools for analysts to

access a variety of data formats and models integrate them with existing resources transform raw data into useful and understandable information.

MIAS Mission

Page 3: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis3

Task Perspective

In the next decade, intelligence analysts will need to

monitor a huge number of interesting events and entities formulate and evaluate hypotheses with respect to them.

Analysts must interact, at the appropriate level of semantic abstraction, with a system

that can synthesize, summarize and interpret vast amounts of

multimodal information, integrate observed data with domain models and

background information in multiple formats, propose hypotheses, and help verify them.

Page 4: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis4

Scenario Consider an intelligence analyst researching a

problem

Iranian nuclear program – generate a list of Iranian nuclear scientists, affiliations, specialties, biographies, photos, and notable recent activities.

Medical treatment – what is known about it; who are the experts; what do users say about it; what side effects have been reported

Current technologies have solved the problem of collecting and storing huge amounts of information; it would be reasonable to assume that the information she is after does exist;

However, multiple barriers exist on the way to a successful completion of the analysis, each posing a significant research challenge.

Page 5: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis5

Multimodal Information Access & Synthesis

Focused Multimodal Data Retrieval

News Articles

Field reportsSpecific

Web sitesText

RepositoriesRelational databases

Surveillance Videos

...

Online Data Sources

Web pages

Text documents* *

*

** * *

**

U. of TehranElkhan

Factory

Elkhan Factory

**

**** visited

Saeed Zakeri

Relational data

Images

Saeed Zakeri

attended

Infer Metadata: Semantic entities

Discover Relations Between Semantic Entities

Support Information Analysis, Knowledge Discovery, Monitoring

Discover unusual events, entities, and associations. Continuous monitoring of events, entities & associations

Rapid retrieval of all info. about a particular entity

Efficient search, querying, question answering, browsing, mining,

loc = Northern Iranname = Elkhan Factorytopics = fertilizer, enrichmentSemantic categoriesTemporal categoriesSubjectivity/Opinions

Semantic Disambiguation & Integration across multiple sources and modalities

Page 6: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis6

MIAS Processes Focused data retrieval and integration

Identify and collect relevant data from multiple sources Semantic data enrichment

Infer semantics from unstructured data and images; Allow navigation and search across disparate data modalities; augment

KBs

Entity identification and relations discovery Identify real-world entities and relations among them Relate them to existing institutional resources for information

integration

Knowledge discovery and hypotheses generation and verification Construct the rich semantic structure and hidden networks of entity

linkages

Foundations Machine learning, database and data mining, natural language

processing, inference and optimization and computer vision techniques Required for and driven by the aforementioned problems.

Page 7: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis7

Develop diverse human resources to enhance the scientific research, educational, and governmental workforce in MIAS

Educational and Outreach Initiatives: Target students from small research programs & minority-serving Expose them to the national labs Open opportunities for bigger impact

A comprehensive education program designed to increase participation in the study and practice of MIAS topics:

Provide substantive training for a new generation of experts in the field,

Serve as a tool for recruiting an experienced group of undergraduates into graduate study in one of the broad fields of information science

Be an intellectual community center, where participants at all levels of expertise come together in an enriched environment of collaboration.

Integrated Mission: Research & Education

Page 8: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis8

Mathematical Foundations

of Information Science

Tutorial 2

Research

Research

4 weeks 4 weeks

Tutorial 3

Tutorial 1

Tutorial 4

Tutorial 5

Intensive Course in

The Math of Data Sciences

1. Probability and Statistics

2. Linear Algebra

3. Data Structures and Algorithms

4. Optimization

5. Learning & Clustering

Data Science Summer Institute at UIUC

Research Projects

Led by co-PIs and Grad Students

Topics:

1. Virtual Web: Focused Crawling

2. Relations and Entities

3. Text and Images

Advanced MIAS Related Tutorials

1st DSSI: May 2007

Huge Success

Speaker Series

8 weeks course

25 students

Faculty from UIUC, Kansas State, UTSA, UTEP

Page 9: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis9

Mathematical Foundations of Information Science

Tutorial 2

Research

Research

4 weeks 4 weeks

Tutorial 3

Tutorial 1

Tutorial 4

Tutorial 5 Machine

Learning & Data Mining

Information Retrieval &

Web Information

Access

Computer VisionNatural

Language Processing & Information Extraction

Databases & Information Integration

Data Science Summer Institute at UIUC

Advanced MIAS Related Tutorials

Page 10: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis10

Data Science Summer Institute at UIUC

Page 11: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis11

Our Team Leading researchers in intelligent information

access & analysis and its foundations

A large number of affiliates/consultants covering all areas of interest to the MIAS center.

Page 12: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis12

Kevin C. Chang: Data Integration and Retrieval

Deep WebMetaQuerier: Large-scale integration over deep

Web pioneered the “holistic

integration” paradigm Widely published at SIGMOD, VLDB, ICDE System building and demo at CIDR, SIGMOD, VLDB, … PC/editor/organizers of SIGMOD, ICDE, WWW, SIGKDD

special issue, WIRI06, IIWeb06 workshops Awards: NSF CAREER, IBM Faculty, NCSA Faculty

VLDB00 Best Paper Selection

Page 13: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis13

AnHai Doan: Data Integration

ACM Doctoral Dissertation Award 2004

Sloan Fellowship 2006 Edited special issues on

Data Integration Co-chaired workshops on

data integration, Web technologies, machine learning

Co-writing a book titled “Data Integration”

Monitoring People and Events

Data integration Matching Schemas, Ontologies,

Entities Integrating databases and text

Page 14: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis14

David Forsyth:Computer Vision and Learning

Linking Text and ImagesLabeling images via (a lot of) caption text

Leading Computer Vision researcher,

Over 110 papers on vision, graphics, learning applications

Program Chair, CVPR 2000, General chair CVPR 2006, regular member of PC in all major vision conferences

IEEE Technical Achievement Award, 2006

Lead author of main textbook, widely adopted

Page 15: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis15

Jiawei Han: Data Mining

Mining patterns and knowledge discovery from massive data

Research focus: Mining data streams, frequent patterns, sequential patterns, graph patterns, and their applications

Privacy preserving Data Mining Developed many popular data mining

algorithms, e.g., FPgrowth, PrefixSpan, gSpan, StarCubing, CrossMine, RankingCube, and CrossClus

Over 300 research papers published in conferences and journals

Editor-in-Chief, ACM Transactions on Knowledge Discovery from Data

Textbook, “Data mining: Concepts and Techniques,” adopted worldwide

Page 16: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis16

UIUC CS Department Director of Diversity Programs Lecturer for Discrete Math, Data Structures

courses at UIUC One of the leading teachers and educational

leaders at UIUC. Research in Data Mining and Data bases

Speaker and regular presenter at conferences for young women, including:

GAMES and WYSE summer camps Expanding Your Horizons careers conference

Grace Hopper Celebration of Women in Computing 2004 - panel on best practices for recruiting women into undergraduate programs in CS.

SIGCSE 2006 - workshop, “How to host your own Small Regional Celebration of Women in Computing.”

Cinda Heeren:Data Mining, Education

Summer School

MIAS Summer School Director Mathematical Foundation of Data Science Discrete

Page 17: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis17

ChengXiang Zhai: Information Retrieval and Text

Mining Probabilistic Paradigms for Information Retrieval Personalized/Context Dependent Search Relation Identification and Data Integration

Leading expert in information retrieval and search technologies

Recipient of the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE),

Main architect and key contributor of the Lemur Toolkit (being used by many research groups and IR companies around the world)

ACM SIGIR’04 best paper award

Selected services include Program Chair of ACM CIKM 2004; HLT/NAACL 06; SIGIR’09

Page 18: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis18

Leading Researcher in Machine Learning, NLP, AI

Developed Popular Machine Learning system and machine learning based NLP tools used in industry and NLP classes.

Program Chair, ACL’03, CoNLL’02; Regular senior PC member in all major Machine Learning, NLP and AI conferences

Associate Editor: Journal of Artificial Intelligence Research; Machine Learning

Multiple papers awards

Dan Roth:Machine Learning, NLP

Semantic Analysis and Data enrichment Entity and Relation Identification and

Integration Textual Entailment Machine Learning Methods for NLP and IE

Page 19: Dan Roth Department of Computer Science  University of Illinois at Urbana/Champaign

M I A S Multi-Modal Information Access & Synthesis19

Information Access & Synthesis Processes

Focused data retrieval and integration, Identify and collect relevant data from multiple sources

Semantic data enrichment, Infer semantics from unstructured data and images; Allow navigation and search across disparate data modalities; augment

KBs

Entity identification and relations discovery, Identify real-world entities and relations among them Relate them to existing institutional resources for information

integration

Knowledge discovery and hypotheses generation and verification, Construct the rich semantic structure and hidden networks of entity

linkages

Foundations Machine learning, database and data mining, natural language

processing, inference and optimization and computer vision techniques Called for and driven by the aforementioned problems.

ToolsText

Processing& AnalysisSemantic Analysis & Information Extraction

Information Integration

Machine Learning & Data MiningIntegrating

Text & Images