will data mining change the functions of dbms? jiawei han dais (data and information systems) lab...

7
Will Data Mining Change the Functions of DBMS? Jiawei Han DAIS (Data And Information Systems) Lab University of Illinois at Urbana-Champaign

Upload: preston-barton

Post on 26-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Will Data Mining Change the Functions of DBMS? Jiawei Han DAIS (Data And Information Systems) Lab University of Illinois at Urbana-Champaign

Will Data Mining Change the Functions of DBMS?

Jiawei HanDAIS (Data And Information Systems) Lab

University of Illinois at Urbana-Champaign

Page 2: Will Data Mining Change the Functions of DBMS? Jiawei Han DAIS (Data And Information Systems) Lab University of Illinois at Urbana-Champaign

Will DM Be Integrated with DB Functions? DM: Already a functional component of DBMS

Microsoft/SQLServer: Analysis Manager IBM/DB2 & IntelligentMiner Oracle: Data Mining Package

But will DM be “intruding” into DBMS, i.e., be integrated with essential DBMS functions? Indexing Data integration Data cleaning Query processing

Page 3: Will Data Mining Change the Functions of DBMS? Jiawei Han DAIS (Data And Information Systems) Lab University of Illinois at Urbana-Champaign

Indexing by Data Mining Indexing graphs? ─ # of subgraphs: exponential!

Chemical Informatics/bioinformatics …

Discriminative frequent graph patterns (SIGMOD’04)

Indexing subsequences?

Shopping sequence, DNA/protein sequence (SDM’05)

When is discriminative frequent pattern indexing useful?

Complex objects, big (object) queries

(a) (b) (c)

Sample database

Query graph

Page 4: Will Data Mining Change the Functions of DBMS? Jiawei Han DAIS (Data And Information Systems) Lab University of Illinois at Urbana-Champaign

Data Cleaning by Data Mining Load messy data into a structured database?

Inconsistent data: age = “1946”? Field mis-alignments Glitches of data: completely messed up inputs Missing/un-matching delimiters: XML, HTML

data Big field: BLOB, CLOB, multimedia and text

Data mining Data cleaning by distribution/outlier analysis Dependency/correlation analysis Schema-directed or schema “discovery”

Page 5: Will Data Mining Change the Functions of DBMS? Jiawei Han DAIS (Data And Information Systems) Lab University of Illinois at Urbana-Champaign

Data Integration by Data Mining Linking and mining cross-over multiple data

relations Cross-mine (Classification across multiple

data relations: ICDE’04) Search across heterogeneous databases

Object identification/merge, reference reconciliation (Alon’s group)

Mining across heterogeneous DBs Personalizing data from heterogeneous

sources

Page 6: Will Data Mining Change the Functions of DBMS? Jiawei Han DAIS (Data And Information Systems) Lab University of Illinois at Urbana-Champaign

Query Processing by Data Mining Query plan refinement based on query execution

history

Better query planning by investigating additional

data statistics

Current optimizer: key/foreign key, cardinality,

# distinct values

Additional information:

Strong dependency/correlation

Histogram, dense vs. sparse regions, etc.

Page 7: Will Data Mining Change the Functions of DBMS? Jiawei Han DAIS (Data And Information Systems) Lab University of Illinois at Urbana-Champaign

Conclusions DBers have been “invading” into DM and made

great contributions It is time to consider that DM may invade DBMS

to enhance its functionality General philosophy

Invisible data mining Google is doing this for page ranking

successfully Can we do it to enhance DBMS?

You can do better if you know your data better!