hdinsight hadoop on windows azure
DESCRIPTION
Introduction to HDInsight Hadoop on Windows Azure services, including using the interactive console with JavaScript and running WordCount via other methods (Streaming, Hive, etc..)TRANSCRIPT
S
Hadoop on Azure@LynnLangit
Data Expertise / Lynn Langit
Practicing Architect
Cloud Deployments (Azure, AWS, Google)
Technical author / trainer
Google Cloud Developer SeriesSQL Server 2012 Developer Series Cloudera Certified Developer2 books on SQL Server BI
Industry awards
Microsoft – MVP for SQL Server Google – GDE for Cloud Platform10Gen – Master for MongoDB
Former MSFT FTE
4 years
What is Hadoop?
HUGE Hype factor in 2011 / 2012
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• Uses HDFS storage to enable applications to work with
thousands of nodes and petabytes of data • Uses MapReduce to process the data• Inspired by Google
• MapReduce • Google File System
What is HDInsight?
Hadoop on Windows Azure On-premise
Microsoft worked with Hortonworks to port Hadoop to Windows (from Linux)
Working with HDInsight
RDBMS vs. Hadoop
RDBMS Hadoop
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times
Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response Time
Can be near immediate Has latency (due to batch processing)
Setting Up Your Cluster
Configuration 1
Configuration 2
Pricing (during Preview)
Dem
o
Basic Administration
Connect via RDP
NameNode Utility – Top Level
NameNode Utility – Drill Down
Understanding Storage
Using the Azure Storage Viewer
What is MapReduce?
MapReduce using Java
WordCount example
MapReduce using C# Streaming
WordCount example
MapReduce using JavaScript
WordCount example
Simple Output Graphing
WordCount example
Using HIVE
Understanding Pig
Load>Transform>Dump or Store
Monitoring Job Results
In the portal Main Console
Job icon (button) status summary
Job History Interactive Console
JS quick feedback JS detailed feedback (log)
Using RDP Map/Reduce tool Hadoop command
prompt
Monitoring Job Status
Download – ODBC for HIVE
Includes add-in for Excel
Hadoop Connector to Excel
Connecting to PowerPivot
Create an ODBC connection to HIVE
Connect to ‘other data source’ in PowerPivot
Connecting with PowerQuery
Pulling it Together - Klout
Hadoop To-Do List
• Use Hadoop when business needs designate
• Use other NoSQL if a better fit
BigData = Hadoop
• Quick and cheap• Specialized use
cases• Behavioral data• dev, test ,
training environments
Hadoop on the cloud • Learn
Map/Reduce• Use HIVE via
Excel• Pay attention to
ImpalaHadoop access
technologies
www.TeachingKidsProgramming.org
VOTECONFIRMSHARE
Keep Learning
@LynnLangit
YouTube – SoCalDevGal
Hire Me Architecture Best Practices Performance Tuning