www.infobright.org [email protected] rsctc 2008 rough sets in data warehousing infobright...

16
www.infobright.org www.infobright.com [email protected] RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

Upload: jaron-matlock

Post on 14-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

[email protected]

RSCTC 2008

Rough Sets inData Warehousing

Infobright CommunityEdition (ICE)

Page 2: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

2

Data Warehousing

Page 3: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

3

Page 4: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

4

Page 5: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

5

Technology Layout

Page 6: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

6

Two-Level Computing

Large Data (10TB)and Mixed Workloads

Page 7: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

7

Rough Sets

Outlook Temp. Humid. Wind Sport?

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cold Normal Weak Yes

6 Rain Cold Normal Strong No

7 Overcast Cold Normal Strong Yes

8 Sunny Mild High Weak No

9 Sunny Cold Normal Weak Yes

10 Rain Mild Normal Weak Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

Sport? = Yes Classes of records with the same values of the subset of the attributes

Page 8: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

8

Information Systems

Data-based knowledge models, classifiers...

Database indices, data partitioning, data sorting...

Difficulty with fast updates of structures...

Outlook Temp. Humid. Wind Sport?

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cold Normal Weak Yes

6 Rain Cold Normal Strong No

7 Overcast Cold Normal Strong Yes

8 Sunny Mild High Weak No

9 Sunny Cold Normal Weak Yes

10 Rain Mild Normal Weak Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

Page 9: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

Packs storing the values of records for column Salary

We can imagine the set of all records relevant to the given query, that is satisfying its SQL filter

SELECT COUNT(*) FROM EmployeesWHERE Salary > $

Rough Sets in Infobright

Salary > $

Using Knowledge Grid, we verify, which packs are irrelevant (disjoint with the set), relevant (fully inside the set) and suspect (overlapping)

We do not need irrelevant packs. We do not need to decompress relevant ones: we store their local COUNT(*) in the corresponding Data Pack Nodes

Page 10: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

10

Information Systems in Infobright

Query

min OUT

max

Nulls

sum

match

???

pattern

Page 11: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

11

SELECT MAX(A) FROM T WHERE B>15;

STEP 1 STEP 2 STEP 3DATA

Page 12: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

Order Number

Order Date

Part ID

Quantity $Amt

005 20070214

234 500 1500.00

005 20070214

334 125 250.25

006 20070215

334 100 212.50

Supplier ID

Effective Date

Expiry Date

Part ID

Description

A456 20050315 Null 234 Pre-measured coffee packets – gold blend

A456 20061201 Null 235 Pre-measured coffee packets – silver blend

A456 20060501 Null 334 4-cup Cone coffee filters; quantity 50

Order Detail Table – assume many more rows

Supplier/Part Table – assume many more rows

Advanced Knowledge Nodes

Pack 1 Pack 2

Pack 1 0 1

Pack 2 1 0

Pack 3 0 0

Page 13: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

13

Community Inspirations

Count DistinctCount(*) on Self-JoinsDecision TreesContingencies

New ObjectivesNew SchemasNew VolumesNew QueriesNew KNs

New Data TypesSQL ExtensionsFeature ExtractionData Compression

Page 14: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

14

Conclusion

Technology based on interaction between rough and precise operations, open for adding new structures

Full product, simple framework, ad-hoc analytics, good load speed, 10:1 „all inclusive” compression

The core technology based on more data mining, rough sets, computing with rough values, et cetera

Infobright Community Edition (ICE) ready for a free usage and study, as well as open for contributions

Page 15: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

15

References

D. Ślęzak, J. Wróblewski, V. Eastwood, P. Synak: Bright-house: An Analytic Data Warehouse for Ad-hoc Queries. PVLDB 1(2): 1337-1345 (2008).

M. Wojnarski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna, J. Wróblewski: Method and System for Data Compression in a Relational Database. US Patent Application, 2008/0071818 A1.

J. Wróblewski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna, M. Wojnarski: Method and System for Storing, Organizing and Processing Data in a Relational Database. US Patent Application, 2008/0071748 A1.

Page 16: Www.infobright.org  slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

THANK YOU!!!

[email protected]

RSCTC 2008