cs206 --- electronic commerce › ~cosmo › isa › em7033_01_introduzione.pdf · class hours:...

25
1 EM7033 Data Management and Business Intelligence Luca Cosmo Database Systems: The Complete Book (2nd Edition). Hector Garcia- Molina, Jeffrey D. Ullman, Jennifer Widom. Prentice Hall Business Intelligence: Data Mining and Optimization for Decision Making. Carl Vercellis. Wiley

Upload: others

Post on 27-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

1

EM7033Data Management and Business Intelligence

Luca Cosmo

Database Systems: The Complete Book (2nd Edition). Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom. Prentice Hall

Business Intelligence: Data Mining and Optimization for Decision Making. Carl Vercellis. Wiley

Page 2: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

Class hours:Wednesday 14:00-15:00

Thursday 15:45-17:15

Friday 15:45-17:15

Office hours: In my office in via Torino, send me an email for an

appointment!

EMAIL: [email protected]

Put [EM7033] as email subject prefix

2

Page 3: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

3

Content of EM7033

Design of databases.

E/R model, relational model.

Database programming.

SQL, Relational algebra.

Introduction to Data Mining

Page 4: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

4

Course Requirements

1. Exam type: written exam

2. Group activity: design of a database groups of (exactly) 4 students MUST register by sending

a mail by midnight September 30th to [email protected] with

subject: EM7033 WG: GROUPNAME (write the name of the group, not

GROUPNAME !)

content: ID (matricola) and name for each student of the group

Project will be assigned by October 5th

Presentations will be scheduled from October 13th

Each one will run about 20 minutes long

Up to 4 bonus points to be gained ! Good luck ;-)

Page 5: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

ExampleRetention in the mobile phone industry

The marketing manager of a mobile phone company realizes that a large number of customers are discontinuing their service, leaving her company in favor of some competing provider.

Suppose that the marketing manager can rely on a budget adequate to pursue a customer retention campaign aimed at 2000 individuals out of a total customer base of 2 million people.

How she should go about choosing those customers to be contacted so as to optimize the effectiveness of the campaign?

The target group can be chosen as the 2000 people having the highest churn likelihood among the customers of high business value. (Not even that simple…)

5

Page 6: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

Business Intelligence vs Intuitive approach

6

Page 7: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

Business Intelligence

The main purpose of business intelligence systems is to provide knowledge workers with tools and methodologies that allow them to make effective and timely decisions. Effective decisions. The application of rigorous analytical

methods allows decision makers to rely on information and knowledge which are more dependable.

Timely decisions. The ability to rapidly react to the actions of competitors and to new market conditions is a critical factor in the success or even the survival of a company.

7

Page 8: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

https://en.wikipedia.org/wiki/DIKW_Pyramid8

Page 9: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

DIKW Pyramid Data

For a retailer data refer to primary entities such as customers, points of sale and items, while sales receipts represent the commercial transactions.

Information

Information is the outcome of extraction and processing activities carried out on data, and it appears meaningful for those who receive it in a specific domain

Knowledge

Information is transformed into knowledge when it is used to make decisions and develop the corresponding actions.

Wisdom

Take a decision based on knowledge and previous experience

9

Page 10: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

What is Business Intelligence?

Business Intelligence is a set of methods, processes,

architectures, applications, and technologies that gather and transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making (to drive business performance).

10

Page 11: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

BI: A General Process

11

Data Gathering

Data Cleanse

Data Storage

Data Analysis

Data Presentation

The collection of raw data from different sources by different means.

The transformation of data into clean and standard models and formats

The refined data will be stored under a particular data model for quality management, easy and fast access

Results are presented and delivered in different human comprehendible formats, to support decisions.

The process involves analytical components, such as OLAP, data quality, data profiling, business rule analysis, and data mining, to extract information and knowledge

Page 12: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

GATHER AND ORGANIZE DATA

The first source of data is internal to the enterprise:

Customers, products, transactions, warehouse, ...

Historical data

It is very important to organize data as it is easly accessible and contains all the usefull information (but not more -> data overload).

12

Page 13: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

DATABASES

Page 14: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

14

Do You Know SQL?

Explain the difference between:

SELECT a

FROM R

WHERE a<10 OR a>=10;

and

SELECT b

FROM R;

a b5 2010 3020 40… …

R

Page 15: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

15

And How About These?

SELECT b

FROM R, S

WHERE R.b = S.b;

SELECT b

FROM R

WHERE b IN (SELECT b FROM S);

Page 16: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

16

Interesting Stuff About Databases

It used to be about boring stuff: employee records, bank records, etc.

Today, the field covers all the largest sources of data, with many new ideas.

Web search.

Data mining.

Scientific and medical databases.

ERPs.

Page 17: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

17

More Interesting Stuff

Database programming centers around limited programming languages.

Only area where non-Turing-complete languages make sense.

Leads to very succinct programming, but also to unique query-optimization problems.

Page 18: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

18

Still More …

You may not notice it, but databases are behind almost everything you do on the Web.

Google searches.

Queries at Amazon, eBay, etc.

Page 19: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

19

And More…

Databases often have unique concurrency-control problems.

Many activities (transactions) at the database at all times.

Must not confuse actions, e.g., two withdrawals from the same account must each debit the account.

ACID properties

(Atomicity, Consistency, Isolation, Durably)

Page 20: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

20

What is a Data Model?

1. Mathematical representation of data.

Examples: relational model = tables; semistructured model = trees/graphs.

2. Operations on data.

3. Constraints.

Page 21: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

21

A Relation is a Table

name manf

Kilkenny Arthur Guinness

Bud Lite Anheuser-Busch

Beers

Attributes(columnheaders)

Tuples(rows)

Relationname

Page 22: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

22

Schemas

Relation schema = relation name and attribute list. Optionally: types of attributes.

Example: Beers(name, manf) or Beers(name: string, manf: string)

Database = collection of relations.

Database schema = set of all relation schemas in the database.

Page 23: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

23

Why Relations?

Very simple model.

Often matches how we think about data.

Abstract model that underlies SQL, the most important database language today.

Page 24: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

24

Our Running Example

Beers(name, manf)

Bars(name, addr, license)

Drinkers(name, addr, phone)

Likes(drinker, beer)

Sells(bar, beer, price)

Frequents(drinker, bar)

Underline = key (tuples cannot have the same value in all key attributes).

Excellent example of a constraint.

Page 25: CS206 --- Electronic Commerce › ~cosmo › ISA › EM7033_01_introduzione.pdf · Class hours: Wednesday 14:00-15:00 Thursday 15:45-17:15 Friday 15:45-17:15 Office hours: In my office

25

Database Schemas in SQL

SQL is primarily a query language, for getting information from a database.

But SQL also includes a data-definitioncomponent for describing database schemas.

Data-manipulation instructions to insert, delete and modify tables.