big data & data management - glasspaper data and data management...using analytic engines like...

31

Upload: others

Post on 22-May-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics
Page 2: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Big Data & Data ManagementA Monday morning chat about

Azure Data Lake, Azure SQL Data Warehouse, Azure HDInsight, and Azure Data Factory

Page 3: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics
Page 4: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics
Page 5: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

https://azure.microsoft.com/en-us/services/data-factory/

Page 6: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

A managed cloud service for building & operating data pipelines (aka. data flows)

1. Orchestrate, monitor & schedule

• compose data processing, storage & movement services (on premises & cloud)

2. Automatic infrastructure mgmt

• combine pipeline intent w/ resource allocation & mgmt

• data movement as a service (global footprint & on premises)

3. Single pane of glass

• one place to manage your network

of data flows

Page 7: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Call Log Files

Customer Table

Call Log Files

Customer Table

Customer

Churn Table

Data Sources Ingest Transform & Analyze Publish

Customer

Call Details

Customers

Likely to

Churn

Page 8: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics
Page 9: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics
Page 10: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

https://azure.microsoft.com/en-us/services/sql-data-warehouse/

Page 11: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics
Page 12: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Broad SQL Server PartnerEcosystem

+ Leverage Azure ML, HDInsight, PowerBI, ADF,

and more.

+ Industry’s broadest ecosystem of DW partners,

including Tableau, Informatica, Attunity, and SAP.

Streamlined deployment with Azure Portal.

Deep tool integration with top partners including:

• Single-click configuration

• Optimized data movement

• Logical pushdown

Azure SQL DW

Azure ML

Azure Event Hub

Azure HDInsight

Page 13: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics
Page 14: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

https://azure.microsoft.com/en-us/services/data-lake-store/

Page 15: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Implement Data Warehouse

Physical Design

ETL

Development

Reporting &

Analytics

Development

Install and Tune

Reporting & Analytics Design

Dimension Modelling

ETL Design

Setup Infrastructure

Understand Corporate Strategy

Data sources

ETL

BI and analytic

Data warehouse

Gather Requirements

Business Requirements

Technical Requirements

Page 16: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Ingestregardless of requirements

Storein native format without

schema definition

AnalyzeUsing analytic engines

like Hadoop

Interactive queries

Batch queries

Machine Learning

Data warehouse

Real-time analytics

Devices

Page 17: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Distributed, parallel file system in

the cloud

Performance-tuned and optimized

for analytics

No fixed size limits

Stores all data types

Highly available with local & geo

redundant storage

WebHDFS REST API

Supported by leading

Hadoop distros

Role-based security

Low latency and high

throughput workloads

Azure Data Lake: Store

YARN

HDFS

HDInsightAnalytics

Service

Store

U-SQL

Clickstream

Sensors

Video

Social

Web

Devices

Relational

Applications

24

Page 18: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Store indefinitely Analyze See resultsGather data

from all sources

Iterate

New big data thinking: All data has value

All data has potential value

Data hoarding

No defined schema—stored in native format

Schema is imposed and transformations are done at query time (schema-on-read).

Apps and users interpret the data as they see fit

25

Page 19: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Data Lake Store: Technical Requirements

26

Secure Must be highly secure to prevent unauthorized access (especially as all data is in one place).

Native format Must permit data to be stored in its ‘native format’ to track lineage & for data provenance.

Low latency Must have low latency for high-frequency operations.

Must support multiple analytic frameworks—Batch, Real-time, Streaming, ML etc.

No one analytic framework can work for all data and all types of analysis.

Multiple analytic

frameworks

Details Must be able to store data with all details; aggregation may lead to loss of details.

Throughput Must have high throughput for massively parallel processing via frameworks such as Hadoop and Spark

Reliable Must be highly available and reliable (no permanent loss of data).

Scalable Must be highly scalable. When storing all data indefinitely, data volumes can quickly add up

All sources Must be able ingest data from a variety of sources-LOB/ERP, Logs, Devices, Social NWs etc.

Page 20: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

A highly scalable, distributed, parallel file system in the cloud

specifically designed to work with multiple analytic frameworks

What is Azure Data Lake Store?

LOB Applications

SocialDevices

Clickstream

Sensors

Video

Web

Relational

HDInsight

ADL Analytics

Machine Learning

Spark

R

27

ADL Store

Page 21: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

https://azure.microsoft.com/en-us/services/data-lake-analytics/

https://channel9.msdn.com/Series/AzureDataLake

Page 22: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Analytics

Storage

HDInsight(“managed clusters”)

Azure Data Lake Analytics

Azure Data Lake Storage

Azure Data Lake

Page 23: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Azure Data Lake Analytics Service

A new distributed analytics service

Built on Apache YARN

Scales dynamically with the turn of a dial

Pay by the query

Supports Azure AD for access control, roles, and integration with on-premidentity systems

Built with U-SQL to unify the benefits of SQL with the power of C#

Processes data across Azure

30

Page 24: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Work across all cloud data

Azure Data Lake Analytics

Azure SQL DW Azure SQL DBAzure

Storage BlobsAzure

Data Lake Store

SQL DB in an Azure VM

Page 25: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Analytics: Two form factors

HDInsightManaged Hadoop clusters

ADLA

Analytics service

HDInsight Cluster

n1 n2 n3 n4

Hive/Pig/etc. job

Lots of containers

U-SQL/Hive/Pig jobADLA Account

YARN Layer

StorageBlobs or ADLS Input Output

Page 26: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

ADLA complements HDInsightTarget the same scenarios, tools, and customers

HDInsight

For developers familiar with the Open Source: Java, Eclipse, Hive, etc.

Clusters offer customization, control, and flexibility in a managed Hadoop cluster

ADLA

Enables customers to leverage

existing experience with C#, SQL &

PowerShell

Offers convenience, efficiency,

automatic scale, and management in

a “job service” form factor

Page 27: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

What is

U-SQL?

A hyper-scalable, highly extensible

language for preparing, transforming

and analyzing all data

Allows users to focus on the what—

not the how—of business problems

Built on familiar languages (SQL and

C#) and supported by a fully integrated

development environment

Built for data developers & scientists

34

Page 28: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

Developing big data apps

Author, debug, & optimize big data apps in Visual Studio

Multiple LanguagesU-SQL, Hive, & Pig

Seamlessly integrate .NET

Page 29: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics
Page 30: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics
Page 31: Big Data & Data Management - Glasspaper Data and Data Management...Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics

https://channel9.msdn.com/Events/Cortana-Analytics-Suite/CA-Suite-Workshop-10-11SEP15

http://www.microsoft.com/en-us/server-cloud/cortana-analytics-suite/overview.aspx