big data ecosystem for data-driven decision making

25
Assoc.Prof. Assoc.Prof. Abzetdin ADAMOV Abzetdin ADAMOV CeDAWI - Center for Data Analytics and Web Insights CeDAWI - Center for Data Analytics and Web Insights Qafqaz University Qafqaz University [email protected] [email protected] http://ce.qu.edu.az/~aadamov 12 March 2015 12 March 2015 Big Data Ecosystem for Big Data Ecosystem for Data-Driven Decision Making Data-Driven Decision Making

Upload: abzetdin-adamov

Post on 19-Jul-2015

69 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Big Data Ecosystem for Data-Driven Decision Making

Assoc.Prof.Assoc.Prof. Abzetdin ADAMOV Abzetdin ADAMOV

CeDAWI - Center for Data Analytics and Web InsightsCeDAWI - Center for Data Analytics and Web InsightsQafqaz UniversityQafqaz University

[email protected] [email protected] http://ce.qu.edu.az/~aadamov

12 March 201512 March 2015

Big Data Ecosystem for Big Data Ecosystem for Data-Driven Decision MakingData-Driven Decision Making

Page 2: Big Data Ecosystem for Data-Driven Decision Making

Digital Universe Digital Universe Volume of Digital DataVolume of Digital Data

IDC's Digital Universe Study

• 2003 – 5 exabytes from beginning of civilization• 2005 – 130 exabytes

• 2008 – 480.000 petabytes (PB)

• 2009 – 800.000 PB• 2010 – 1200 000 PB or 1.2 zettabyte (ZB)• 2011 – 1.8 ZB• 2012 – 2.7 ZB

• 2014 ~ 6.2 ZB

• Expected to reach 44 ZB by 2020

Every day now we create as much information as we did from the Every day now we create as much information as we did from the dawn of civilization up until 2003dawn of civilization up until 2003

Page 3: Big Data Ecosystem for Data-Driven Decision Making

Big Measures for Big Data Big Measures for Big Data

• kilobyte (kB) 103 210

• megabyte (MB) 106 220

• gigabyte (GB) 109 230

• terabyte (TB) 1012 240

• petabyte (PB) 1015 250

• exabyte (EB) 1018 260

• zettabyte (ZB) 1021 270

• yottabyte (YB) 1024 280

Page 4: Big Data Ecosystem for Data-Driven Decision Making

Why Data Grows so Fast?Why Data Grows so Fast?

Data sets gathered by ubiquity devices:• Information-sensing mobile devices, • Aerial sensory technologies (remote

sensing), • Software logs, • Cameras, • Microphones, • Radio-frequency identification readers, • wireless sensor networks

Page 5: Big Data Ecosystem for Data-Driven Decision Making

Internet PenetrationInternet Penetration

Note: Internet stats for December 2001Avarage Internet usage ın the world 8% - 500 Million - 2001

Page 6: Big Data Ecosystem for Data-Driven Decision Making

Foundations of the WebFoundations of the Web

Note: Internet stats for January 2014 Avarage Internet usage ın the world 42% - 3.0 Billion - 2014

Page 7: Big Data Ecosystem for Data-Driven Decision Making

Top 15 Most Popular Social Networking Sites | January 2015

1,310,000,000 - Estimated Unique Monthly Visitors | 2 - Compete Rank

284,000,000 - Estimated Unique Monthly Visitors | 24 - Compete Rank

347,000,000 - Estimated Unique Monthly Visitors | 44 - Compete Rank

70,500,000 - Estimated Unique Monthly Visitors | 51 - Compete Rank

343,000,000 - Estimated Unique Monthly Visitors

25,500,000 - Estimated Unique Monthly Visitors | 346 - Compete Rank

20,500,000 - Estimated Unique Monthly Visitors | 605 - Compete Rank

19,500,000 - Estimated Unique Monthly Visitors | 447 - Compete Rank

17,500,000 - Estimated Unique Monthly Visitors | *NA* - Compete Rank

12,500,000 - Estimated Unique Monthly Visitors | 127 - Compete Rank

12,000,000 - Estimated Unique Monthly Visitors | 617 - Compete Rank

7,500,000 - Estimated Unique Monthly Visitors | 838 - Compete Rank

5,400,000 - Estimated Unique Monthly Visitors | 122 - Compete Rank

3,000,000 - Estimated Unique Monthly Visitors | 451 - Compete Rank

2,500,000 - Estimated Unique Monthly Visitors | 1,596 - Compete Rank

Social NetworkingSocial Networking

Page 8: Big Data Ecosystem for Data-Driven Decision Making

Problem with Moore’s LawProblem with Moore’s Law

• The number of transistors that can be placed on an integrated circuit doubles every 18 months to two years

• It’s predicted to reach its limit with existing technology in 2020

• Cutting the size of a transistor to a single atom may defeat that concept

• The Digital Universe is growing much more faster than Processing Power

Page 9: Big Data Ecosystem for Data-Driven Decision Making

What Big Data is and isn’t?What Big Data is and isn’t?

Computing + Internet = Big DataComputing + Internet = Big Data

• Big Data is not new technologyBig Data is not new technology• Big Data is not just about sizeBig Data is not just about size• Big Data is not Business Intelligence (BI)Big Data is not Business Intelligence (BI)• Big Data is not Solution by itself!Big Data is not Solution by itself!

Is it time to move from Big Data 1 to Big Data 2?Is it time to move from Big Data 1 to Big Data 2?

Page 10: Big Data Ecosystem for Data-Driven Decision Making

Interdisciplinary Subfield of Interdisciplinary Subfield of Computer ScienceComputer Science

• Artificial Intelligence, • Machine Learning,

• Statistics,

• Applied Mathematics,• Text Mining,

• Database Systems,• Business Intelligence,

• Computational Linguistics,• Natural Language Processing (NLP),

• ….

Page 11: Big Data Ecosystem for Data-Driven Decision Making

Jobs Derived from Big DataJobs Derived from Big Data

• Chief Data Officer,• Big Data Solution Architect,

• Big Data Platform Engineer, • Big Data Analyst,

• Big Data Analytics Business Consultant, • Big Data Software Designer,

• Big Data Consultant, • Hadoop Architects,

• Consultant Hadoop Developer,

• Senior Analytics Manager,• Data & Reporting Analyst,

• Analytics Analyst (Big Data)

Forbes - Where Big Data Jobs Will be in 2015

Page 12: Big Data Ecosystem for Data-Driven Decision Making

Data-Driven Decision MakingData-Driven Decision Making

(DDD)(DDD)

Data-driven decision making (DDD) refers to the practice of basing decisions on the analysis of data rather than purely on intuition.

Data alone won’t change the world. It’s the people that use data to make better decisions.

Page 13: Big Data Ecosystem for Data-Driven Decision Making

Data Science ApplicationData Science Application

• Direct Marketing,• Online Advertising,

• Credit Scoring and Risk Management• Help Desk Management

• Fraud Detection• Search Ranking

• Product Recommendation• Predicting Unusual Behavior• Customer Retention in Telecom

Page 14: Big Data Ecosystem for Data-Driven Decision Making

Big Data Management Life-CycleBig Data Management Life-Cycle

- Apache Hadoop- HDFS- Microsoft Azure- ….

- Microsoft Analytics Platform System- Excel- R Programming- Python- ….

- Web Crawling- Data Mining- Information Retrieval- ….

- Parsing - Indexing- Searching- Ranking- NLP- ….

Big Data Management involves Data Science and Data Engineering areas for implementing Data Mining Techniques

Page 15: Big Data Ecosystem for Data-Driven Decision Making

Big Data InfrastructureBig Data Infrastructure

Page 16: Big Data Ecosystem for Data-Driven Decision Making

Google’s First Data CentersGoogle’s First Data Centers

Google’s first data center

Page 17: Big Data Ecosystem for Data-Driven Decision Making

Google New Data CentersGoogle New Data Centers

Map of Google Data Centers Worldwide

450,000 servers range upwards of 20 megawatts, which cost on the order of US$2 million per month in electricity charges.

Page 18: Big Data Ecosystem for Data-Driven Decision Making

Big Data Terms and Big Data Terms and ComponentsComponents

• Microsoft AzureMicrosoft Azure• Red Hat GFS - Global File SystemRed Hat GFS - Global File System• GoogleFS or GFS - Google File System GoogleFS or GFS - Google File System • HDFS - Hadoop Distributed File SystemHDFS - Hadoop Distributed File System

• SAN - Storage Area NetworkSAN - Storage Area Network

• Google BigTableGoogle BigTable• VFS - Virtual File SystemVFS - Virtual File System• IBM GPFS - General Parallel File SystemIBM GPFS - General Parallel File System• HPSS - High Performance Storage SystemHPSS - High Performance Storage System

Page 19: Big Data Ecosystem for Data-Driven Decision Making

Hadoop Distributed File SystemHadoop Distributed File System

Page 20: Big Data Ecosystem for Data-Driven Decision Making

Web Crowlers for Web AnalyticsWeb Crowlers for Web Analytics

• Indexing

• Searching

• Ranking

• Analysis

• Crowling is Essential Job for all Internet Giants: Google, Yahoo, Facebook, etc.Some of available open source crowlers: Apache Nutch, Crawler4j, Bixo, Heritrix, etc.

Page 21: Big Data Ecosystem for Data-Driven Decision Making

Web Crowlers for Web AnalyticsWeb Crowlers for Web Analytics

• Thanks to Crowlers any website can appear in search results without doing any extra work.

• Customized Crowling by METATags and “ROBOTS.TXT”

Page 22: Big Data Ecosystem for Data-Driven Decision Making

Natural Language Processing Natural Language Processing (NLP)(NLP)

• Natural Language Processing (NLP)• Computational Linguistics (CL)• Machine Translation (MT)

Page 23: Big Data Ecosystem for Data-Driven Decision Making

Data Mining and Knowledge Data Mining and Knowledge DiscoveryDiscovery

• Data collection • Selection of useful data• Data transformation: smoothing, aggregation,

normalization• Discovering of interesting patterns:

classification, clustering, regression, anomaly detection, association

• Knowledge visualization

Some of available open source Data Mining tools: RapidMiner, RapidAnalytics, OpenNN, Carrot2, KNIME, etc.

Page 24: Big Data Ecosystem for Data-Driven Decision Making

Quotes on Big DataQuotes on Big Data

“You can have data without information, but you cannot have information without data.” – Daniel Keys Moran

“War is ninety percent information.” – Napoleon Bonaparte

“If you torture the data long enough, it will confess.” – Ronald Coase, Economist

“He who search for pearls must dive below” – John Dryden

Page 25: Big Data Ecosystem for Data-Driven Decision Making

Thank youThank you

www.www.CeDAWICeDAWI .qu.edu.az .qu.edu.az

CeDAWICeDAWI @qu.edu.az @qu.edu.az