data mining

28
Data & Information System (DAIS) Research Lab Department of Computer Science | University of Illinois at Urbana-Champaign DATA MINING RESEARCH GROUP Spring 09

Upload: tommy96

Post on 11-Jun-2015

2.241 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining

Data & Information System (DAIS) Research LabDepartment of Computer Science | University of Illinois at Urbana-Champaign

Data MiningReseaRch gRoup Spring 09

Page 2: Data Mining

Preparing for the challenges in information access, retrieval, and management that lie ahead requires a coordinated and multi-faceted approach.

The Data Mining Research Group at the University of Illinois Department of Computer Science is proud of the successful research partnerships and research initiatives that are a part of our hallmark of excellence.

Page 3: Data Mining

TABLE OF CONTENTS

About the Data Mining Research Group

Jiawei Han

Students, Alumni, and Visiting Scholars

Awards and Publications

Projects

Funding

01

02

03

08

11

23

Page 4: Data Mining

The Data Mining Research Group in the Department of Computer Science, University of Illinois at Urbana-Champaign, conducts leading edge research in the areas of data mining, data warehousing, database systems, and Web-based information systems.

Work conducted by the group is pioneering new directions in the field, and is pushing the boundaries of data mining techniques. Their work aims to integrate and advance the knowledge produced in multiple disciplines, including database systems, statistics, machine learning, algorithms, information theory, spatial and multimedia databases, and Web technology, among others.

With more than 20 members, the group is characterized by their breadth and depth of excellence, and their integrated approach to complex problem chains. The group is associated with the Data and Information System Laboratory.

Its research projects include: information network analysis•OLAP and mining of multidimensional text databases•graph mining•privacy and trust validation by data mining•mining moving objects, trajectories, RFID, and traffic data•image and video mining•multidimensional promotion and ranking analysis•transfer learning, dimensionality reduction, and •pattern-based classificationstream data mining•data mining in biomedical, software engineering •and cyberphysical system applications.

01 02

Page 5: Data Mining

Professor Han is a world-recognized leader in the data mining field. His ground-breaking work includes pioneering techniques on frequent, sequential, and graph pattern mining; heterogeneous information network analysis; spatiotemporal data mining, stream data mining; and text cube, ranking cube, and data cube computa-tion. His contributions and discoveries have been characterized by an integrative approach, advancing knowledge produced in multiple disciplines.

Professor Han is one of the most cited authors in Data Mining, has written more than 400 papers for conferences and journals, organized a number of international conferences, and is the Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data.

Working with government funding agencies and industry partners, Professor Han has extensive experience in managing large-scale, complex projects that take a multi-disciplinary approach.

awaRDs:SIGKDD Innovations Award (2004)•ACM Fellow (2004)•IEEE CS Technical Achievement Award (2005)•IEEE Fellow (2009)•

jiawei han

ReseaRch collaboRatoRs anD FunDeRs:

01 02

Page 6: Data Mining

students

Dustin boRtneR

Network mining•

Deng cai

Machine learning, especially manifold •learning and dimensionality reductionInformation retrieval•

chen chen

Graph mining and related data •management problems

bolin Ding

Pattern mining algorithms•Theoretical aspects of data mining and •database problems

Jing gao

Ensemble learning, transfer learning•Data stream mining•Anomaly detection•

chanDRasekaR RaMachanDRan

Video/Image mining•Dimensionality reduction on sparse datasets•Indexing and search•

sangkyuM kiM

Image/video mining•High dimensional indexing•

Zhenhui li

Mining moving objects•Spatialtemporal data mining•

cinDy XiDe lin

Graph mining•Web mining•Multidimensional analysis•

Xin Jin

Image/video mining and retrieval•

03 04

Page 7: Data Mining

Dustin boRtneR

Network mining•

Deng cai

Machine learning, especially manifold •learning and dimensionality reductionInformation retrieval•

chen chen

Graph mining and related data •management problems

bolin Ding

Pattern mining algorithms•Theoretical aspects of data mining and •database problems

Jing gao

Ensemble learning, transfer learning•Data stream mining•Anomaly detection•

chanDRasekaR RaMachanDRan

Video/Image mining•Dimensionality reduction on sparse datasets•Indexing and search•

sangkyuM kiM

Image/video mining•High dimensional indexing•

Zhenhui li

Mining moving objects•Spatialtemporal data mining•

cinDy XiDe lin

Graph mining•Web mining•Multidimensional analysis•

Xin Jin

Image/video mining and retrieval•

03 04

Page 8: Data Mining

sebastian seith

Moving object and traffic mining•

yiZhou sun

Link analysis and information network analysis•Graph mining and Web mining•Machine learning•

luan tang

Spatial data mining•Privacy-Preserving data mining•Data mining with bio-medical application•

tianyi wu

Ranking query processing•Association analysis•Information network analysis•

ZhiJun yin

Web mining•Information retrieval•Machine learning•

Xiao yu

Anomaly detection•Web mining•

FeiDa Zhu

Structural pattern mining•Approximation and complexity analysis •for data mining problems

yintao yu

Information network and social network analysis•Web mining•

bo Zhao

Multidimensional text database systems•Web mining, entity search and extraction•Information network analysis•

peiXiang Zhao

Structural data mining•Algorithms on massive data sets•

05 06

Page 9: Data Mining

sebastian seith

Moving object and traffic mining•

yiZhou sun

Link analysis and information network analysis•Graph mining and Web mining•Machine learning•

luan tang

Spatial data mining•Privacy-Preserving data mining•Data mining with bio-medical application•

tianyi wu

Ranking query processing•Association analysis•Information network analysis•

ZhiJun yin

Web mining•Information retrieval•Machine learning•

Xiao yu

Anomaly detection•Web mining•

FeiDa Zhu

Structural pattern mining•Approximation and complexity analysis •for data mining problems

yintao yu

Information network and social network analysis•Web mining•

bo Zhao

Multidimensional text database systems•Web mining, entity search and extraction•Information network analysis•

peiXiang Zhao

Structural data mining•Algorithms on massive data sets•

05 06

Page 10: Data Mining

visiting scholars

alumni

07 08

hong cheng Ph.D. 2008, City University of Hong Kong

hectoR gonZaleZ Ph.D. 2008, Google Research

Xiaolei li Ph.D. 2008, Microsoft

chao liu Ph.D. 2007, Microsoft Research

Dong Xin Ph.D. 2007, Microsoft Research

XiaoXin yin Ph.D. 2007, Microsoft Research

XiFeng yan Ph.D. 2006, University of California at Santa-Barbara

hwanJo yu Ph.D. 2004, POSTECH University, Korea

ph.D.

Min-soo kiMGraph/network data mining•Bioinformatics•Indexing & query processing•Information retrieval & search engines•

lu liuWeb video analysis and mining•Topic modeling•Social-network analysis•

R. alves (Portugal)

R. angRyk (Montana State U.)

F. beRZal (Spain)

Jianlin Feng (China)

Jae-gil lee (IBM Research) cuiping li (China)

Recent visiting scholaRs

Recent MasteRs anD unDeRgRaDuate aluMni

luiZ MenDes

Jacob lee

MaRgaRet Myslinska

RicaRDo ReDDe

John paul sonDag

Page 11: Data Mining

distinguished honors: jiawei han

distinguished honors: students

IEEE Fellow (2009)

IEEE Computer Society Technical Achievement Award (2005)

ACM SIGKDD Innovations Award (2004)

ACM Fellow (2004)

IBM Faculty Awards (2002, 2003, 2004)

The Outstanding Contribution Award (2002, IEEE Computer Society, International Conference on Data Mining)

UIUC Teachers Ranked as Excellent (2002-2007)

Microsoft Research Graduate Women’s Scholarship (2009): Cindy Xide Lin

ACM SIGKDD Dissertation Award (2008): Xiaoxin Yin

ACM SIGMOD Ph.D. Dissertation Runner-Up Award (2007): Xifeng Yan

IBM Scholarship (2007): Hong Cheng

Midwest Database Symp. Best Presentation Award (2007): Feida Zhu

Henry Ford II Award (2006): Deng Cai

07 08

R. alves (Portugal)

R. angRyk (Montana State U.)

F. beRZal (Spain)

Jianlin Feng (China)

Jae-gil lee (IBM Research) cuiping li (China)

Recent visiting scholaRs

Page 12: Data Mining

D. Zhang, C. Zhai, and J. Han, “Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases”, in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM’09)(One of “Best of SDM’09”) F. Zhu, X. Yan, J. Han, and P. S. Yu, “gPrune: A Constraint Pushing Framework for Graph Pattern Mining”, in Proc. 2007 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD’07)(Best Student Paper Award)

X. Li, J. Han, S. Kim, and H. Gonzalez, “ROAM: Rule- and Motif-Based Anomaly Detection in Massive Moving Object Data Sets”, in Proc. 2007 SIAM Int. Conf. on Data Mining (SDM’07)(One of “Best of SDM’07”)

F. Zhu, X. Yan, J. Han, P. S. Yu, and H. Cheng, “Mining Colossal Frequent Patterns by Core Pattern Fusion”, in Proc. 2007 Int. Conf. on Data Engineering (ICDE’07)(Best Student Paper Award)

Q. Mei, D. Xin, H. Cheng, J. Han, and C. Zhai, “Generating Semantic Annotations for Frequent Patterns with Context Analysis”, in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’06)(Best Student Paper Runner-Up Award)

Hongyan Liu, Jiawei Han, Dong Xin, and Zheng Shao, “Mining Interesting Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach”, in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM’06).(One of “Best of SDM’06”)

H. Gonzalez, J. Han, X. Li, and D. Klabjan, “Warehousing and Analysis of Massive RFID Data Sets”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE’06)(Best Student Paper Award)

X. Yan, H. Cheng, J. Han, and D. Xin, “Summarizing Itemset Patterns: A Profile-Based Approach”, in Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD’05)(Best Student Paper Runner-Up Award)

conference awards

09 10

Page 13: Data Mining

D. Cai, X. He, and J. Han, “A Geometric Perspective on DimensionalityReduction”, SDM’09, Sparks, NV, April 2009

J. Pei, Y. Tao, and J. Han, “Preference Queries from OLAP and DataMining Perspective”, ICDE’09, Shanghai, China, March 2009

J. Han, X. Yan, and P. S. Yu, “Scalable OLAP and Mining of InformationNetworks”, EDBT’09, St. Petersburg, Russia, March 2009

H. Cheng, J. Han, X. Yan, and P. S. Yu, “Integration of Classification andPattern Mining: A Discriminative and Frequent Pattern-basedApproach”, ICDM’08, Pisa, Italy, December 2008

J. Han, J.-G. Lee, H. Gonzalez, and X. Li, “Mining Massive RFID, Trajectory,and Traffic Data Sets”, ACM SIGKDD’08, Las Vegas, NE, August 2008

J. Han, X. Yin, and P. S. Yu, “Exploring the Power of Links in DataMining”, ICDE’08, Cancun, Mexico, April 2008 (Also, ECML/PKDD’07,Warsaw, Poland, Sept. 2007)

C. Liu, T. Xie, and J. Han, “Mining for Software Reliability”, ICDM’07,Omaha, NE, Oct. 2007

J. Han, X. Yan, and P. S. Yu, “Mining and Searching Graphs andStructures”, KDD’06, Philadelphia, PA, August 2006 (Also, ICDE’06,Atlanta, GA, April 2006, and ICDM’05, Huston, TX, Nov. 2005)

conference tutorials

09 10

Page 14: Data Mining

Information Network Analysis

OLAP and Mining of Multidimensional Text Databases

Graph Mining

Privacy and Trust Validation by Data Mining

Mining Moving Objects, Trajectories, RFID, and Traffic Data

Image and Video Mining

Multidimensional Promotion and Ranking Analysis

Transfer Learning, Dimensionality Reduction, and Pattern-Based Classification

Stream Data Mining

Data Mining Applications

project list

12

13

14

15

16

17

18

19

20

21

information network analysis

11 12

Page 15: Data Mining

description:

Information network analysis investigates effective discovery of patterns and knowledge from large-scale networks that consist of interconnected physical, technological, conceptual, and human/societal components. The major themes in our study include: (1) ranking-based clustering on different types of objects in heterogeneous information networks; (2) hierarchi-cal network structure analysis for OLAP, multidimensional text database analysis, and ranking promotion; (3) query-based information network extraction and analysis; and (4) link-based veracity analysis for bibliographic networks and news information networks.

researchers: Yizhou Sun, Yintao Yu, Chen Chen, Cindy Xide Lin, Tianyi Wu, Bo Zhao, Dustin Botner, and Jiawei Han

selected publications:

Y. Sun, Y. Yu, and J. Han, “Ranking-Based Clustering of Heterogeneous Information Networks with Star Network Schema”, KDD’09

Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu, “RankClus: Integrating Clustering with Rank-ing for Heterogeneous Information Network Analysis”, EDBT’09

Y. Sun, T. Wu, H. Cheng, J. Han, X. Yin, and P. Zhao, “BibNetMiner: Mining Bibliographic Informa-tion Networks”, (demo paper), SIGMOD’08

X. Yin, J. Han, and P. S. Yu, “LinkClus: Efficient Clustering via Heterogeneous Semantic Links”, VLDB’06

information network analysis

11 12

Page 16: Data Mining

description:

A multidimensional text database, such as customer reviews, flight reports, job descriptions and service feedbacks, is a database that consists of both multidimensional categorical attributes and narrative text attributes. We investigate how to construct text or topic data cubes, perform effective information retrieval, OLAP, and text mining on such data cubes, and how textual and structured multidimensional information could work together to enhance information retrieval and knowledge discovery.

researchers: Cindy Xide Lin, Bo Zhao, Bolin Ding, Duo Zhang, ChengXiang Zhai, and Jiawei Han

selected publications:

C. X. Lin, B. Ding, J. Han, F. Zhu, and B. Zhao. “Text Cube: Computing IR Measures for Multidimen-sional Text Database Analysis”, ICDM’08

D. Zhang, C. Zhai, and J. Han, “Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases”, SDM’09 (Best of SDM’09)

olap and mining of multidimensional text databases graph mining

13 14

Page 17: Data Mining

description:

Graph mining is to mine patterns, classification models, clusters, and other kinds of knowledge from massive graph data sets and develop indexing, similarity search and OLAP tools for graph data. Applications include bioinformatics, computer system diagnoistics, social network analy-sis, and Web search and mining.

researchers: Chen Chen, Feida Zhu, Cindy Xide Lin, Peixiang Zhao, Xifeng Yan (Univ. of California at Santa-Barbara), Jiawei Han and Philip S. Yu (Univ. of Illinois at Chicago)

selected publications:

X. Yan, H. Cheng, J. Han, and P. S. Yu, “Mining Significant Graph Patterns by Scalable Leap Search”, SIGMOD’08

C. Chen, X. Yan, F. Zhu, J. Han, and P. S. Yu, “Graph OLAP: Towards Online Analytical Processing on Graphs”, ICDM’08

C. Chen, C. X.Lin, X. Yan, and J. Han, “On Effective Presentation of Graph Patterns: A Structural Representative Approach”, CIKM’08

C. Chen, X. Yan, P. S. Yu, J. Han, D. Zhang, and X. Gu, “Towards Graph Containment Search and Indexing”, VLDB’07

X. Yan, F. Zhu, P. S. Yu, and J. Han, “Feature-based Substructure Similarity Search”, ACM Transac-tions on Database Systems (TODS), .31: 1418 -1453, 2006

olap and mining of multidimensional text databases graph mining

13 14

Page 18: Data Mining

description:

Can we trust pieces of information provided by other parties and other information providers including newspapers, Web, TV?

We investigate this issue and develop techniques to provide trustable analysis of the truthfulness of information from multiple information providers and automatically identify the trustworthy information. Alternatively, can we develop data mining mechanisms that find interesting infor-mation and still preserve the required privacy specified by information providers?

We study privacy-preserving data mining and developed a constraint-based clustering approach for privacy-preservation data publishing. We are also working on privacy- preserving data cube that may provide multidimensional aggregate information as well as preserve privacy of sensi-tive data.

researchers: Bolin Ding, Zhijun Yin, and Jiawei Han

selected publications:

A. K. H. Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng, “Privacy-Preserving Data Publishing : A Constraint-Based Clustering Approach”, in S. Basu, et al. (eds.), Constrained Clustering: Advances in Algorithms, Theory, and Applications, Taylor and Francis, 2008

X. Yin, J. Han, and P. S. Yu, “Truth Discovery with Multiple Conflicting Information Providers on the Web”, TKDE’08

X. Yin, J. Han, and P. S. Yu, “Object Distinction: Distinguishing Objects with Identical Names by Link Analysis”, ICDE’07

mining moving objects, trajectories, rfid, and traffic dataprivacy and trust validation by data mining

15 16

Page 19: Data Mining

description:

The world is increasingly become more mobile. We design and develop effective and scalable methods for mining massive moving-object data, trajectory data, RFID data, and traffic data to uncover clusters, classification models, frequent and sequential patterns, and outliers in large sets of moving objects, with applications in homeland security, law enforcement, traffic control, animal/bird migration analysis, and environmental studies.

researchers: Zhenhui Li, LuAn Tang, Sebastian Seith, and Jiawei Han

selected publications:

X. Li, Z. Li, J. Han, and J.-G. Lee, “Temporal Outlier Detection in Vehicle Traffic Data”, ICDE’09

J.-G. Lee, J. Han, X. Li, and H.Gonzalez, “TraClass: Trajectory Classification Using Hierarchical Region-Based and Trajectory-Based Clustering”, VLDB’08

J.-G. Lee, J. Han, and X. Li, “Trajectory Outlier Detection: A Partition-and-Detect Framework”, ICDE’08

H. Gonzalez, J. Han, X. Li, M. Myslinska, and J. P. Sondag, “Adaptive Fastest Path Computation on a Road Network: A Traf-fic Mining Approach”, VLDB’07

J.-G. Lee, J. Han, and K.-Y. Whang, “Trajectory Clustering : A Partition-and-Group Framework”, SIGMOD’07

mining moving objects, trajectories, rfid, and traffic dataprivacy and trust validation by data mining

15 16

Page 20: Data Mining

description:

We investigate efficient image and video pattern mining, clustering, classification, and indexing methods. including developing an image frequent spatial pattern mining algorithm SpIBag (Spatial Item Bag Mining), an image clustering algorithm SpaRClus (Spatial Relationship Pattern-Based Hierarchical Clustering) which persists over shifting, scaling and rotation transformations, and a multi-layer ring-based index structure for both r-Range search and k-NN search.

researchers: Sangkyum Kim, Xin Jin, Chandrasekar Ramachandran, Liangliang Cao, and Klara Nahrstedt

selected publications:

X. Jin, S. Kim, J. Han, L. Cao, and Z. Yin, “GAD: General Activity Detection for Fast Clustering on Large Data”, SDM’09

R. Malik, S.Kim, X. Jin, C. Ramachandran, J. Han, I. Gupta, and K. Nahrstedt, “MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions”, SSDBM’09

S. Kim, X. Jin, and J. Han, “SpaRClus: Spatial Relationship Pattern-Based Hierarchical Clustering”, SDM’08

image and video mining multidimensional promotion and ranking analysis

17 18

Page 21: Data Mining

image and video mining

description:

As decision support and business intelligence applications become increasingly large-scale, it is critical to support effective and efficient search and knowledge discovery through online multidimensional analysis. Promotion and ranking are indispensable functions of such an analysis engine: ranking aims at enabling analysts to explore top-k interesting aggregate or non-aggregate answers at multiple resolutions; and promotion helps decision makers promote any given object of interest through discovering the best subspaces or data regions where the object becomes prominent, without manually navigating the data set. We have developed Ranking-Cube and PromotionCube methods that are efficient and scalable at processing flexible queries in multidimensional space.

researchers: Tianyi Wu, Dong Xin (Microsoft Research), and Jiawei Han

selected publications:

T. Wu, D. Xin, and J. Han, “ARCube: Supporting Ranking Aggregate Queries in Partially Materialized Data Cubes”, SIGMOD’08

D. Xin and J. Han, “P-Cube: Answering Preference Queries in Multi-Dimensional Space”, ICDE’08

T. Wu, X. Li, D. Xin, J. Han, J. Lee, and R. Redder, “DataScope: Viewing Database Contents in Google Maps’ Way”, VLDB’08 (demo)

multidimensional promotion and ranking analysis

17 18

Page 22: Data Mining

description:

Classification is a core problem widely studied in machine learning, statistical learning and data mining. Real-world applications, such text, image and web categorization, gene prediction, system and network intrusion detection, can be cast into a classification problem. Although many learning algorithms, such as Support Vector Machines, logistic regression, and decision tree induction, have been developed, there are still numerous challenges in effective classifica-tion. We investigate methods for improving classification accuracy by exploring knowledge embedded in data and develop novel methods to construct discriminative and compact feature set for complex structured data, explore manifold structure for learning, and combine multiple sources or learning models for better predictions.

researchers: Jing Gao, Deng Cai, Hong Cheng (Chinese Univ. of Hong Kong), and Jiawei Han

selected publications:

J. Gao, W. Fan, J. Jiang, and J. Han, “Knowledge Transfer via Multiple Model Local Structure Map-ping”, KDD’08

D. Cai, X. He, and J. Han, “Training Linear Discriminant Analysis in Linear Time”, ICDE’08

H. Cheng, X. Yan, J. Han, and P. S. Yu, “Direct Discriminative Pattern Mining for Effective Classifica-tion”, ICDE’08

D. Cai, X. He, and J. Han, “SRDA: An Efficient Algorithm for Large Scale Discriminant Analysis”, TKDE’08.

H. Cheng, X. Yan, J. Han, and C.-W. Hsu, “Discriminative Frequent Pattern Analysis for Effective Classification”, ICDE”07

transfer learning, dimesionality reduction, and pattern-based classification

stream data mining

19 20

Page 23: Data Mining

transfer learning, dimesionality reduction, and pattern-based classification

description:

In many real-time applications, such as network traffic monitoring, credit card fraud detection, and web click stream, data arriving continuously and in large amount, forming data streams. We investigate stream data mining principles and algorithms, develop effective and scalable methods for mining the dynamics of data streams in multi-dimensional space, including dis-covering changes, trends and evolution characteristics in data streams, constructing clusters and classification models, and exploring frequent patterns and similarities among data streams.

researchers: Jing Gao, Wei Fan (IBM Research), and Jiawei Han

selected publications:

L. Mendes, B. Ding, and J. Han, “Stream Sequential Pattern Mining with Precise Error Bounds”, ICDM’08

J. Gao, W. Fan, and J. Han, “On Appropriate Assumptions to Mine Data Streams: Analysis and Practice”, ICDM’07

J. Gao, W. Fan, J. Han, and P. S. Yu, “A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions”, SDM’07

stream data mining

19 20

Page 24: Data Mining

Motivated by long sequences in text data, biological data, software engineering, and sensor networks, we study mining repetitive gapped subsequences to capture the occurrences of sequential patterns repeating within each sequence of a large database and use them as features for classification or prediction.

We investigate medical classification problems include gene prediction based on micro-array data and cancer prediction based on medical images and develop discriminative pattern based methods to improve the accuracy of medical data classification, as well as provide useful dis-criminative patterns to help the medical experts with their decisions.

We investigate statistical analysis and sequence/graph mining methods for software bug detec-tion, failure indexing, troubleshooting and root-cause analysis in sensor networks and data streams.

A cyberphysical system consists of a large number of interacting physical and information components. For example, a patient-care system may link a patient monitoring system with a network of patients and associated medical information and an emergency handling system. We investigate data mining cyberphysical networks, including real-time analysis of massive amount of streaming data, reliable and trusted data analysis, and effective spatiotemporal data analysis in cyberphysical networks.

sequential pattern mining (bolin ding and jiawei han):

biological and medical data mining (jing gao, xiao yu, min-soo kim, Zhijun yin, jiawei han):

software engineering and sensor network mining (xin jin, jiawei han and tarek abdelzaher):

cyberphysical systems (luan tang and jiawei han):

data mining applications

21 22

Page 25: Data Mining

selected publications:

D. Lo, H.Cheng, J. Han, S. Khoo, and C. Sun, “Classification of Software Behaviors for Failure Detec-tion: A Discriminative Pattern Mining Approach”, KDD’09

B. Ding, D. Lo, J. Han, and S.-C. Khoo, “Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database”, ICDE’09

M. M. H. Khan, T. Abdelzaher, J. Han, and H. Ahmadi, “Finding Symbolic Bug Patterns in Sensor Networks”, DCOSS’09

M. M. H. Khan, H. Le, H. Ahmadi, T. Abdelzaher, and J. Han, “DustMiner: Troubleshooting Interac-tive Complexity Bugs in Sensor Networks”, Sensys’08

F. Zhu, X. Yan, J. Han, P. S. Yu, and H. Cheng, “Mining Colossal Frequent Patterns by Core Pattern Fusion”, ICDE’07 (Best Student Paper Award)

data mining applications

21 22

Page 26: Data Mining

NASA (with ChengXiang Zhai, et al.): “Event Cube: An Organized Approach for Mining and Understanding Anomalous Aviation Events” (2008-2010)

Air Force (MURI, with Tim Finin as PI, et al.): “A Framework for Managing the Assured Informa-tion Sharing Lifecycle” (2008-2012)

NSF: “SGER: CS-BibCube: OLAPing and Mining of Computer Science Literature” (2008-2010)

NSF (with Roland Kays et al.): “BDI: Movebank: Integrated Database for Networked Organism Tracking” (2007-2010)

NSF: “SGER: DataScope: Viewing Database Contents in Multi-Resolution at Your Finger Tips” (2006-2007)

NSF (with Jasmine Zhou): “Collaborative Research: Endowing Biological Databases With Analytical Power: Indexing , Querying , and Mining of Complex Biological Structures” (2005-2009)

NSF (with Ouri Wolfson): “SEI(IIS): MotionEye: Querying and Mining Large Datasets of Moving Objects” (2005-2008)

NSF (With Xiaosong Ma) “Collaborative Research: Reusable, Observation-based Performance Predic-tion across Platforms” (2004-2005)

DHS (with Dan Roth as PI, et al.): Multimodal Information Access and Synthesis Center (2007-2010)

ONR/NCASSR (with Michael Welge), “Detection and Apprehension of Rare Events in Data Streams” (2008-2009)

Boeing: “On-Line Mining of Strange Moving Objects for Security Protection” (2007-2010)

U.S. Air Force (with IAI Inc) “Distributed High-Dimensional Mining Tool for Bioscience Data Analy-sis” (2006-2009)

NSF (with Josep Torrelas as PI, et al.): “ITR: Automatic On-the-fly Detection, Characterization, Recovery, and Correction of Software Bugs in Production Runs” (2003-2008)

NSF: “Mining Dynamics of Data Streams in Multi-Dimensional Space” (2003-2006)

research funding

23 24

Page 27: Data Mining

ONR (with Michael Welge): “Mining Changes and Alarming Events in Streaming Data” (2003-2006)

NSF: “Mining Sequential and Structured Patterns: Scalability, Flexibility, Extensibility and Applicability” (2002-2006)

Research gifts and grants from industry: Microsoft Research, Intel, IBM (Faculty Award, Innovation Award), Google, Yahoo!, NCSA (Faculty Fellowship), HP-Labs.

23 24

Page 28: Data Mining

DataMiningResearchGroup•DepartmentofComputerScience,UIUC

http://dm1.cs.uiuc.edu