big data

61
Big Data Yasin Zamani [email protected] PhD Candidate of Artificial Intelligence Image Processing Lab ( Prof. S.Kasaei) Sharif University of Technology

Upload: yasinzamani

Post on 17-Jul-2015

43 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Big DataYasin Zamani

[email protected]

PhD Candidate of Artificial Intelligence

Image Processing Lab (Prof. S.Kasaei)

Sharif University of Technology

What is Big Data?

2

3

1

"The number of transistors incorporated in a chip will approximately double every 24 months.“

--Gordon Moore, Intel co-founder4

Excel’s Limit

Feature Maximum limit

Open

workbooks

Limited by available

memory and system

resources

Worksheet size 65,536 rows by 256 columns

Column width 255 characters

Row height 409 pointsFeature Maximum limit

Open

workbooks

Limited by available memory

and system resources

Worksheet size 1,048,576 rows by 16,384 columns

Column width 255 characters

Row height 409 points

Excel 2003

Excel 2007

5

6

Iris Data

Petal

Sepal

7

Iris Data

8

9

• 6,000 tweets per second

• 500 million tweets per day

• 200 billion tweets per year

10

Structured Data

11

Unstructured Data

12

NoSQL : Not Only SQL Data

13

Does big data need all three?

14

How is Big Data Used?

15

16

21

Predictive Marketing

Predicts major life events

Looks at consumer behavior

Uses demographic info

Can purchase more data

23

Predict Trends

24

Fraud Detection

Point of sale

Geolocation and IP address

Login time

Biometrics

25

26

Big Data and Data Science

28

The three facets of data science

29

30

31

32

33

34

35

Sources of Big Data

36

37

Intentional Data

Photos, videos, audio

Text on social network

Clicking “Like”

Web searches

38

Intentional Data

Webpages bookmarked

Emails and text messages

Cell phone calls

Online purchases

39

Metadata

Data about data

“Second order” data

Machine readable

40

Photograph Exif Metadata

41

Email Metadata

From, To, CC, and timestamp

42

Immersion

43

Sources

Cell phones connecting to towers

Satellite radio and GPS connecting

RFID readings

Readings from medical devices

44

The Internet of Things (IoT)

45

Storing Big Data

46

Local Storages

47

Cloud Storages

48

Cloud Computing

49

Preparing Data for Analysis

50

Challenges with Data Quality

51

Nearly 95% of spreadsheets

have errors.

Possible Errors

Incomplete or corrupted data

Duplicate records

Typographical errors

Data that is missing context

52

53

54

55

Big Data Analysis

56

Visualization

Humans as Visual Animals

Computers excel at predictive models.

Computers excel at data mining.

Humans perceive and interpret better.

Human vision still plays an important role.

57

What Humans Do Well

Identifying visual patterns

Identifying anomalies

Seeing patterns across groups

Interpreting content of images

58

Gestalt Patterns

59

61