multimodal information access and synthesis a dhs institute of discrete science center @ uiuc dan...
TRANSCRIPT
Multimodal Information Access and
Synthesis A DHS Institute of Discrete Science Center @ UIUC
Dan Roth Department of Computer Science
University of Illinois at Urbana/Champaign
M I A S Multi-Modal Information Access & Synthesis2
Most of the data today is unstructured
books, newspaper articles, journal publications, field reports, images, and audio and video streams.
How to deal with the huge amount of unstructured data as if it was organized in a database with a known schema.
how to locate, access, organize, analyze and synthesize unstructured data.
MIAS Mission: develop the theories, algorithms, and tools for analysts to
access a variety of data formats and models integrate them with existing resources transform raw data into useful and understandable information.
MIAS Mission
M I A S Multi-Modal Information Access & Synthesis3
Task Perspective
In the next decade, intelligence analysts will need to
monitor a huge number of interesting events and entities formulate and evaluate hypotheses with respect to them.
Analysts must interact, at the appropriate level of semantic abstraction, with a system
that can synthesize, summarize and interpret vast amounts of
multimodal information, integrate observed data with domain models and
background information in multiple formats,
propose hypotheses, and help verify them.
M I A S Multi-Modal Information Access & Synthesis4
Scenario
Consider an intelligence analyst researching a problem
Iranian nuclear program – generate a list of Iranian nuclear scientists, affiliations, specialties, biographies, photos, and notable recent activities.
Medical treatment – what is known about it; who are the experts; what do users say about it; what side effects have been reported
Current technologies have solved the problem of collecting and storing huge amounts of information; it would be reasonable to assume that the information she is after does exist;
However, multiple barriers exist on the way to a successful completion of the analysis, each posing a significant research challenge.
M I A S Multi-Modal Information Access & Synthesis5
Multimodal Information Access & Synthesis
Focused Multimodal Data Retrieval
News Articles
Field reportsSpecific
Web sitesText
RepositoriesRelational databases
Surveillance Videos
...
News Articles
Field reportsSpecific
Web sitesText
RepositoriesRelational databases
Surveillance Videos
...
Online Data Sources
Web pages
Text documents
* **
** * *
**
U. of Tehran
Elkhan Factory
Elkhan Factory
**
**** visited
Saeed Zakeri
Relational data
Images
Saeed Zakeri
attended
Infer Metadata: Semantic entities
Discover Relations Between Semantic Entities
Support Information Analysis, Knowledge Discovery,
Monitoring
Discover unusual events, entities, and associations. Continuous monitoring of events, entities & associations
Rapid retrieval of all info. about a particular entity
Efficient search, querying, question answering, browsing, mining,
loc = Northern Iranname = Elkhan Factorytopics = fertilizer, enrichmentSemantic categoriesTemporal categoriesSubjectivity/Opinions
Semantic Disambiguation & Integration across multiple sources and modalities
M I A S Multi-Modal Information Access & Synthesis6
MIAS Processes Focused data retrieval and integration
Identify and collect relevant data from multiple sources
Semantic data enrichment Infer semantics from unstructured data and images; Allow navigation and search across disparate data modalities;
augment KBs
Entity identification and relations discovery Identify real-world entities and relations among them Relate them to existing institutional resources for information
integration
Knowledge discovery and hypotheses generation and verification Construct the rich semantic structure and hidden networks of entity
linkages
Foundations Machine learning, database and data mining, natural language
processing, inference and optimization and computer vision techniques
Required for and driven by the aforementioned problems.
M I A S Multi-Modal Information Access & Synthesis7
Develop diverse human resources to enhance the scientific research, educational, and governmental workforce in MIAS
Educational and Outreach Initiatives: Target students from small research programs & minority-serving Expose them to the national labs Open opportunities for bigger impact
A comprehensive education program designed to increase participation in the study and practice of MIAS topics:
Provide substantive training for a new generation of experts in the field,
Serve as a tool for recruiting an experienced group of undergraduates into graduate study in one of the broad fields of information science
Be an intellectual community center, where participants at all levels of expertise come together in an enriched environment of collaboration.
Integrated Mission: Research & Education
M I A S Multi-Modal Information Access & Synthesis8
Mathematical Foundations of Information Science
Tutorial 2
Research
Research
4 weeks 4 weeks
Tutorial 3
Tutorial 1
Tutorial 4
Tutorial 5
Intensive Course
in
The Math of Data Sciences
1. Probability and Statistics
2. Linear Algebra
3. Data Structures and Algorithms
4. Optimization
5. Learning & Clustering
Data Science Summer Institute at UIUC
Research Projects
Led by co-PIs and Grad Students
Topics:
1. Virtual Web: Focused Crawling
2. Relations and Entities
3. Text and Images
Advanced MIAS Related Tutorials
1st DSSI: May 2007
Huge Success
Speaker Series
8 weeks course
25 students
Faculty from UIUC, Kansas State, UTSA, UTEP
M I A S Multi-Modal Information Access & Synthesis9
Mathematical Foundations of Information Science
Tutorial 2
Research
Research
4 weeks 4 weeks
Tutorial 3
Tutorial 1
Tutorial 4
Tutorial 5
Machine Learning & Data Mining
Information Retrieval &
Web Information
Access
Computer VisionNatural
Language Processing & Information Extraction
Databases & Information Integration
Data Science Summer Institute at UIUC
Advanced MIAS Related Tutorials
M I A S Multi-Modal Information Access & Synthesis10
Data Science Summer Institute at UIUC
M I A S Multi-Modal Information Access & Synthesis11
Our Team
Leading researchers in intelligent information access & analysis and its foundations
A large number of affiliates/consultants covering all areas of interest to the MIAS center.
M I A S Multi-Modal Information Access & Synthesis12
Kevin C. Chang: Data Integration and Retrieval
Deep WebMetaQuerier: Large-scale integration over deep
Web pioneered the “holistic
integration” paradigm
Widely published at SIGMOD, VLDB, ICDE System building and demo at CIDR, SIGMOD, VLDB, … PC/editor/organizers of SIGMOD, ICDE, WWW, SIGKDD
special issue, WIRI06, IIWeb06 workshops Awards: NSF CAREER, IBM Faculty, NCSA Faculty
VLDB00 Best Paper Selection
M I A S Multi-Modal Information Access & Synthesis13
AnHai Doan: Data Integration
ACM Doctoral Dissertation Award 2004
Sloan Fellowship 2006 Edited special issues on
Data Integration Co-chaired workshops on
data integration, Web technologies, machine learning
Co-writing a book titled “Data Integration”
Monitoring People and Events
Data integration Matching Schemas, Ontologies,
Entities Integrating databases and text
M I A S Multi-Modal Information Access & Synthesis14
David Forsyth:Computer Vision and Learning
Linking Text and ImagesLabeling images via (a lot of) caption text
Leading Computer Vision researcher,
Over 110 papers on vision, graphics, learning applications
Program Chair, CVPR 2000, General chair CVPR 2006, regular member of PC in all major vision conferences
IEEE Technical Achievement Award, 2006
Lead author of main textbook, widely adopted
M I A S Multi-Modal Information Access & Synthesis15
Jiawei Han: Data Mining
Mining patterns and knowledge discovery from massive data
Research focus: Mining data streams, frequent patterns, sequential patterns, graph patterns, and their applications
Privacy preserving Data Mining
Developed many popular data mining algorithms, e.g., FPgrowth, PrefixSpan, gSpan, StarCubing, CrossMine, RankingCube, and CrossClus
Over 300 research papers published in conferences and journals
Editor-in-Chief, ACM Transactions on Knowledge Discovery from Data
Textbook, “Data mining: Concepts and Techniques,” adopted worldwide
M I A S Multi-Modal Information Access & Synthesis16
UIUC CS Department Director of Diversity Programs Lecturer for Discrete Math, Data Structures
courses at UIUC One of the leading teachers and educational
leaders at UIUC. Research in Data Mining and Data bases
Speaker and regular presenter at conferences for young women, including:
GAMES and WYSE summer camps Expanding Your Horizons careers conference
Grace Hopper Celebration of Women in Computing 2004 - panel on best practices for recruiting women into undergraduate programs in CS.
SIGCSE 2006 - workshop, “How to host your own Small Regional Celebration of Women in Computing.”
Cinda Heeren:Data Mining, Education
Summer School
MIAS Summer School Director Mathematical Foundation of Data Science Discrete
M I A S Multi-Modal Information Access & Synthesis17
ChengXiang Zhai: Information Retrieval and Text
Mining
Probabilistic Paradigms for Information Retrieval Personalized/Context Dependent Search Relation Identification and Data Integration
Leading expert in information retrieval and search technologies
Recipient of the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE),
Main architect and key contributor of the Lemur Toolkit (being used by many research groups and IR companies around the world)
ACM SIGIR’04 best paper award
Selected services include Program Chair of ACM CIKM 2004; HLT/NAACL 06; SIGIR’09
M I A S Multi-Modal Information Access & Synthesis18
Leading Researcher in Machine Learning, NLP, AI
Developed Popular Machine Learning system and machine learning based NLP tools used in industry and NLP classes.
Program Chair, ACL’03, CoNLL’02; Regular senior PC member in all major Machine Learning, NLP and AI conferences
Associate Editor: Journal of Artificial Intelligence Research; Machine Learning
Multiple papers awards
Dan Roth:Machine Learning, NLP
Semantic Analysis and Data enrichment Entity and Relation Identification and
Integration Textual Entailment Machine Learning Methods for NLP and IE
M I A S Multi-Modal Information Access & Synthesis19
Information Access & Synthesis Processes
Focused data retrieval and integration, Identify and collect relevant data from multiple sources
Semantic data enrichment, Infer semantics from unstructured data and images; Allow navigation and search across disparate data modalities;
augment KBs
Entity identification and relations discovery, Identify real-world entities and relations among them Relate them to existing institutional resources for information
integration
Knowledge discovery and hypotheses generation and verification, Construct the rich semantic structure and hidden networks of entity
linkages
Foundations Machine learning, database and data mining, natural language
processing, inference and optimization and computer vision techniques
Called for and driven by the aforementioned problems.
Tools
Text Processing&
Analysis
Semantic Analysis & Information Extraction
Information Integration
Machine Learning & Data Mining
Integrating Text & Images