microsoft's hadoop story
DESCRIPTION
Presentation at the Seattle Hadoop Meetup 1/23 about Microsoft's Hadoop Story.TRANSCRIPT
![Page 1: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/1.jpg)
Hadoop and Microsoft.
Michael Rys | Principal Program Manager @SQLServerMike
![Page 2: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/2.jpg)
Session Objectives
• What is BigData?• How it fits into the Windows and Windows Azure environments• How do I program against it in the Microsoft Environment
![Page 3: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/3.jpg)
What is Big Data?• Traditionally: • Physics Experiments, Sensor data, Satellite data, …
• Now:• Operational Logs• Customer behavior• Social interactions online• …
• From Terabytes in the 1990 over Petabytes today to Zetabytes in the future
![Page 4: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/4.jpg)
Big Data.
![Page 5: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/5.jpg)
Big Data.
VOLUME (Size)
VARIETY (Structure)
VELOCITY (Speed)
![Page 6: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/6.jpg)
Advanced Analytics
Live Data Feed
Social Analytics
How do I optimize my services based on patterns of weather, traffic, etc.?
What’s the social sentiment of my product?
How do I better predict future outcomes?
New Questions.
![Page 7: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/7.jpg)
Hadoop is for Big Data.
![Page 8: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/8.jpg)
What is Hadoop (v1)?
• Processing Platform for Big Data Processing• Using the “Map-Reduce” Processing Paradigm
• Characteristics:• Highly-scalable (scaled out)• Commodity HW-based• Open Source
=> Very low cost for acquisition and storage costs
![Page 9: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/9.jpg)
Hadoop Data Flow
HadoopData Analytics
![Page 10: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/10.jpg)
![Page 11: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/11.jpg)
Hadoop Capabilities
Machine Learning
Graph Processing
Distributed Compute
Extract Load Transform
Predictive
Analysis
![Page 12: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/12.jpg)
Distributed Storage(HDFS)
Query(Hive)
HDInsight Ecosystem
Distributed Processing(Map Reduce)
Scripting(Pig)
NoSQ
L Data
base
(HB
ase
)
Metadata(HCatalog)
Data
Inte
gra
tion
( OD
BC
/ SQ
OO
P/
REST)
Busin
ess In
tellig
ence
(E
xcel, Po
werV
iew
…)
Machine Learning(Mahout)
Graph(Pegasus)
Stats processing(RHadoop)
Pipelin
e /
workflo
w(O
ozie
)
Log file
aggre
gatio
n(Flu
me)
PDW
World’s Data (Azure Data Marketplace) AD, System CenterWindows Azure Storage
![Page 13: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/13.jpg)
Data Knowledge Action
HDInsight
![Page 14: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/14.jpg)
Front endFront end
Stream Layer
Partition Layer
HDFS on Azure: Tale of two File Systems
NameNode
Data Node Data Node
Front end
HDFS API
DFS (1 Data Node per Worker Role)and Compute Cluster
Azure Storage Vault (ASV)
…
Containers on Azure Blob Storage
![Page 15: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/15.jpg)
.Net Map/Reduce Support• Install NuGet• “NuGet” Microsoft .Net MapReduce API for Hadoop• Provide an implementation of a HadoopJob• Execute the job via either
• MRLib\MRRunner.exe -dll ConsoleAppHadoopJob.exeOr
– HadoopJobExecutor.ExecuteJob<HadoopJobClass>();
• Collect your result on HDFS
![Page 16: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/16.jpg)
Javascript Map/Reduce Support• Provide a map and reduce function variable in JS file• Use Javascript console with• runJS(‘/user/myself/MRjob.js’, ‘/path/to/inputfile’, ‘/path/to/output/dir’)
• Collect your result on HDFS
![Page 17: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/17.jpg)
Invoking HiveQL Queries• Run queries in Hadoop Command Shell after invoking hive• Through the web console• Programmatically through ODBC• Coming soon: LINQ to Hive!
![Page 18: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/18.jpg)
Social Apps
Sensor & RFID
Mobile Apps
WebApps
Unstructured data Structured data
Polybase – Enhancing PDW query engine
Traditional schema-based DW applications
EnhancedPDW query engine
Data ScientistsBI Users
DB Admins
Regular T-SQL
Results
PDW V2Hadoop
![Page 19: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/19.jpg)
Microsoft Hadoop Vision
Microsoft Business Intelligence (BI) • Hive ODBC Connectivity • BI Tools for Big Data
Better on Windows and Azure • Active Directory• System Center • .Net Programmability
Microsoft Data Connectivity• SQL Server / SQL Parallel Data Warehouse• Azure Storage / Azure Data Market
Collaborate with and Contribute to OSS• Collaborate with HortonWorks• Provide improvements and Windows support back to OSS
![Page 20: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/20.jpg)
Getting started• On prem: http://www.microsoft.com/bigdata/
• Single node cluster (onebox) install• C:\hadoop• Starts local services• Can start/stop them with start-onebox.cmd/stop-onebox.cmd• Comes with:
• Hadoop command line (shell)• Hadoop Status for name node and map-reduce cluster• HDInsight Dashboard
• On Windows Azure: http://HadoopOnAzure.com/• 3 node cluster running as a service in Azure• Can be used for 5 days• Provides samples and HDInsight Dashboard
• TAP Program
![Page 21: Microsoft's Hadoop Story](https://reader034.vdocuments.site/reader034/viewer/2022051323/54b6e73f4a7959f7708b4670/html5/thumbnails/21.jpg)
Related Content and Links
http://www.microsoft.com/bigdatahttp://www.hadooponazure.comNuget: http://nuget.codeplex.com/LinqPad: http://www.linqpad.net/Linq to Hive (see http://hadoopsdk.codeplex.com)
Find Me Later At…Twitter: @SQLServerMike
ACM SIGOPS Paper: Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency (Calder et al)http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx Developing Big Data Analytics Applications with JavaScript and .NET for Windows Azure and Windows: http://channel9.msdn.com/Events/Build/2012/3-038