do you have big data? (most likely!)

34

Upload: saptak-sen

Post on 08-Feb-2017

166 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Do You Have Big Data? (Most Likely!)
Page 2: Do You Have Big Data? (Most Likely!)

Do You Have Big Data? (Most Likely!)Peter Myers – Bitwise SolutionsSaptak Sen – Microsoft

DBI-B325

Page 3: Do You Have Big Data? (Most Likely!)

Presenter IntroductionPeter MyersBI Expert – Bitwise SolutionsBBus, SQL Server MCSE, MCT, SQL Server MVPExperienced in designing, developing and maintaining Microsoft database and application solutions, since 1997Focuses on education and mentoringBased in Melbourne, [email protected]://www.linkedin.com/in/peterjsmyers

Page 4: Do You Have Big Data? (Most Likely!)

Presenter IntroductionSaptak SenSenior Product Manager, Big Data, Microsoft Corporation

Focused on Big Data and NoSQL offerings for Microsoft customers. For last 12 years at Microsoft he has worked on various distributed computing platforms.

Twitter: @saptak

Page 5: Do You Have Big Data? (Most Likely!)

Session ObjectivesTo introduce:Big dataHadoopHDInsightTo describe big data processesTo demonstrate various big data scenariosTo describe and inspire you with big data capabilities and potentialTo provide relevant resources for further investigation

Page 6: Do You Have Big Data? (Most Likely!)

Introducing Big Data“Big data is a collection of data sets so large

and complex that it becomes awkward to work with using on-hand database

management tools. Difficulties include capture, storage, search, sharing, analysis,

and visualization.” – Wikipedia

Page 7: Do You Have Big Data? (Most Likely!)

Introducing Big DataContinuedBig data solutions deal with complexities of:

VOLUME (Size)

VARIETY (Structure)

VELOCITY (Speed)

Page 8: Do You Have Big Data? (Most Likely!)

Introducing Big DataContinued

Data Complexity: Variety and Velocity

Terabytes

Gigabytes

Megabytes

Petabytes Big

DataLog filesSpatial & GPS coordinatesData market feedseGov feedsWeather Text/image

Click streamWikis/blogs

Sensors/RFID/devices

Social sentimentAudio/video

Web 2.0

Web LogsDigital MarketingSearch MarketingRecommendations

AdvertisingMobile

CollaborationeCommerce

ERP/CRMPayables

PayrollInventory

ContactsDeal TrackingSales Pipeline

Page 9: Do You Have Big Data? (Most Likely!)

Introducing Big DataContinued

Page 10: Do You Have Big Data? (Most Likely!)

Introducing Big DataResponding to New Questions

Advanced Analytics

Live Data Feed

Social Analytics

How do I optimize my services based on patterns of weather, traffic, etc.?

What’s the social sentiment of my product?

How do I better predict future outcomes?

Page 11: Do You Have Big Data? (Most Likely!)

Introducing HadoopApache Hadoop is for big dataIt is a set of open source projects that transform commodity hardware into a service that can:Store petabytes of data reliablyAllow huge distributed computations

Key attributes:Open sourceHighly scalableRuns on commodity hardwareRedundant and reliable (no data loss)Batch processing centric –using “Map-Reduce” processing paradigm

Page 12: Do You Have Big Data? (Most Likely!)

Introducing the Hadoop Ecosystem

Distributed Storage(HDFS)

Query(Hive)

Distributed Processing(Map Reduce)

Scripting(Pig)

NoSQL Database(HBase)

Metadata(HCatalog)

Data Integration( ODBC / SQOOP/

REST)

Business Intelligence (Excel, PowerView…

)

Machine Learning(Mahout)

Graph(Pegasus)

Stats processing(RHadoop)

Pipeline / workflow(Oozie)

Log file aggregation

(Flume)

PDW

World’s Data (Azure Data Marketplace) AD, System CenterWindows Azure

Storage

Page 13: Do You Have Big Data? (Most Likely!)

Introducing HDInsightHDInsight is Microsoft’s 100% Apache compatible Hadoop distributionAvailable as a Windows Azure service – presently available as developer previewEmpowers organizations with new insights on previously untouched unstructured data, while connecting to the most widely used BI tools on the planet

Page 14: Do You Have Big Data? (Most Likely!)

How it WorksFIRST, STORE THE DATA

Server

Files

Server Server

Server

Page 15: Do You Have Big Data? (Most Likely!)

How it WorksSECOND, TAKE THE PROCESSING TO THE DATA

// Map Reduce function in JavaScriptvar map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {

if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());

}context.write(key, sum);};

ServerServer

ServerServer

RUNTIME

Code

Page 16: Do You Have Big Data? (Most Likely!)

Demonstration

Peter MyersBitwise Solutions

1 – Word Count (The “Hello World” for Hadoop)

Page 17: Do You Have Big Data? (Most Likely!)

Traditional E-Commerce Data FlowOPERATIONAL DATA

NEW USER REGISTRY

NEW PURCHASE

NEW PRODUCT

Excess Data

Logs

ETL Some Data

Data Warehouse

Page 18: Do You Have Big Data? (Most Likely!)

New E-Commerce Big Data FlowOPERATIONAL DATA

NEW USER REGISTRY

NEW PURCHASE

NEW PRODUCT

Data Warehouse

Logs

Logs Raw Data“Store it All” Cluster

Raw Data“Store it All” Cluster

Page 19: Do You Have Big Data? (Most Likely!)

Demonstration

Peter MyersBitwise Solutions

2 – Integration Services ETL with HIVE

Page 20: Do You Have Big Data? (Most Likely!)

The Hadoop Data Flow

HadoopData Analytics

Page 21: Do You Have Big Data? (Most Likely!)

Demonstration

Saptak SenMicrosoft

3 – Self-Service BI with HIVE

Page 22: Do You Have Big Data? (Most Likely!)

Hadoop Capabilities

Machine Learning

Graph Processing

Distributed Compute

Extract Load Transform

Predictive

Analysis

Page 23: Do You Have Big Data? (Most Likely!)

Common Big Data Algorithms

Mining Social-Network Graphs

Finding Similar Items Mining Data Streams Frequent Item Sets

Advertising on the Web

Link Analysis

Recommendation SystemsClustering

c

Page 24: Do You Have Big Data? (Most Likely!)

Common Big Data AlgorithmsFrequent Item Sets – Market Basket Analysis

Market Basket Analysis

Plagerism

BioMarkers

Related Concepts

Page 25: Do You Have Big Data? (Most Likely!)

Demonstration

Peter MyersBitwise Solutions

4 – Analysis Services Data Mining with HIVE

Page 26: Do You Have Big Data? (Most Likely!)

Collaborative FilteringSimilar Music tastes

Common Big Data AlgorithmsFinding Similar or Complimentary Items

Page 27: Do You Have Big Data? (Most Likely!)

Demonstration

Saptak SenMicrosoft

5 – Data Mining with Apache Mahout

Page 28: Do You Have Big Data? (Most Likely!)

Do You Have Big Data?It is likely that you have big data – you’re definitely capturing outcome data, and probably capturing ambient data

All data – outcome or ambient – has value

Azure and SQL Server Data Platform can unleash insight from big data, small data, all data

Page 29: Do You Have Big Data? (Most Likely!)

Take action and operationalize

Form theories, analyze, and refine

Find, combine,

and manage

Complete.

Powerful.Easy.

DATA INSIGHT

Page 30: Do You Have Big Data? (Most Likely!)

ResourcesMicrosoft Big Datahttp://www.microsoft.com/bigdataWindows Azure HDInsighthttps://www.hadooponazure.comHDInsight Services for WindowsIncludes an excellent set of BI specific resources in the section named “Using HDInsight with Other BI Technologies”http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows-en-us.aspxBlog: Big Data for Everyone: Using Microsoft’s Familiar BI Tools with Hadoophttp://blogs.msdn.com/b/microsoft_business_intelligence1/archive/2012/02/24/big-data-for-everyone-using-microsoft-s-familiar-bi-tools-with-hadoop.aspx

Page 31: Do You Have Big Data? (Most Likely!)

Related contentBreakout Sessions

DBI-B366: Big Data Analytics with Microsoft Excel 2013 [Wed 8:30AM]DBI-B340: Taking Your Application Design to the Next Level by Using SQL Server 2012 Data Mining [Thu 10:15AM]DBI-B401: Enriching Big Data for Analysis [Fri 10:15AM]DBI-B221: Data Management in Microsoft HDInsight: How to Move and Store Your Data [Fri 4:30PM]

Page 32: Do You Have Big Data? (Most Likely!)

msdnResources for Developers

http://microsoft.com/msdn

LearningMicrosoft Certification & Training Resources

www.microsoft.com/learning

TechNet

Resources

Sessions on Demandhttp://channel9.msdn.com/Events/TechEd

Resources for IT Professionalshttp://microsoft.com/technet

Page 33: Do You Have Big Data? (Most Likely!)

Evaluate this session

Scan this QR code to evaluate this session.

Page 34: Do You Have Big Data? (Most Likely!)

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.