how to use big data

24
Digicomp 1 Kursleitung: Die Microsoft BI Plattform in der Cloud Matthias Gessenay, 20. Januar 2016 / [email protected]

Upload: digicomp-academy-ag

Post on 14-Apr-2017

465 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: How to use Big Data

Digicomp 1

Kursleitung:

Die Microsoft BI Plattform in der Cloud

Matthias Gessenay, 20. Januar 2016 / [email protected]

Page 2: How to use Big Data

2Digicomp

Copyrights

Folien z.T. entnommen aus dem Azure Readiness Slidedeck von Microsoft (https://github.com/Azure-Readiness/CloudDataCamp/blob/master/Presentation/HDInsight/Hadoop%20in%20Azure.pptx)

Folien z.T. entnommen aus der MS Ignite Session PowerBI Overview (http://www.google.ch/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&cad=rja&uact=8&ved=0ahUKEwiH3pygp7XKAhVBVRoKHQ9KCJwQFghcMAc&url=http%3A%2F%2Fvideo.ch9.ms%2Fsessions%2Fignite%2F2015%2Fdecks%2FBRK2556_Doyle.pptx&usg=AFQjCNHOr7Kb8pJEFnLKHvAMUho0AOBhjA)

Page 3: How to use Big Data

Digicomp 3

Einführung in Apache Hadoop

Page 4: How to use Big Data

4Digicomp

Apache Hadoop

Page 5: How to use Big Data

6Digicomp

Data volume

Hadoop speichert Dateien in einem verteilten Dateisystem

Verteilt über viele Server

Dateien können über viele Knoten verteilt werden

Hadoop kann sehr grosse Datenmengen speichern

Skalierbar von einigen zu vielen tausend Knoten

Dateien können grösser sein als die Kapazität eines einzelnen Knotens

Page 6: How to use Big Data

7Digicomp

Data variety

Hadoop speichert Dateien in einem nicht-relationalen Format

Page 7: How to use Big Data

CalibriDigicomp

Hadoop vs. SQL

RelationalDatabase

SCALE (storage & processing)

HadoopPlatform

schema

speed

governance

best fit use

processing

Required on write Required on read

Reads are fast Writes are fast

Standards and structured Loosely structured

Limited, no data processing Processing coupled with data

data typesStructured Multi and unstructured

Interactive OLAP Analytics

Complex ACID Transactions

Operational Data Store

Data Discovery

Processing unstructured data

Massive Storage/Processing

Page 8: How to use Big Data

CalibriDigicomp

YARN: Next Generation Hadoop (Azure DataLake ist auf Yarn gebaut)

Single Use System

Batch Apps

Multi Use Data Platform

Batch, Interactive, Online, Streaming, …

1st Gen of Hadoop

HDFS(redundant, reliable storage)

MapReduce(cluster resource management

& data processing)

Redundant, Reliable Storage(HDFS)

Efficient Cluster Resource Management & Shared Services

(YARN)

Flexible DataProcessing

Hive, Pig, others…

BatchMapReduce

Batch & InteractiveTez

Online Data Processing

HBase, Accumulo

Stream Processing

Storm

others…

2nd Gen of Hadoop

Classic Hadoop

Apps

Page 9: How to use Big Data

CalibriDigicomp

http://hortonworks.com/blog/introducing-apache-hadoop-yarn/

Hadoop 2.0: Yarn

Page 10: How to use Big Data

11Digicomp

Datenknoten

Verteilt

Lokaler Speicher

Fehlertolerant (3 Kopien per Block)

Splittet Dateien in Blöcke

Namensknoten

Speichert keine Daten

Weiss aber, wo welche Blöcke liegen

HDFS: Hadoop Storage

Page 11: How to use Big Data

CalibriDigicomp

Hadoop MapReduce

………

Do work() Do work() Do work()

Page 12: How to use Big Data

Digicomp 13

Apache Hadoop in Azure

Page 13: How to use Big Data

14Digicomp

HDInsight: What’s Different?

Nicht so viel …

HDP on Windows

HDP on Linux

Compute und Storage sind verteilt

Azure Blob Storage

Page 14: How to use Big Data

CalibriDigicomp

HDInsight Storage Infrastructure

HDInsight Compute Nodes (Large VMs)

Azure Blob Storage

Azure Flat Network Storage

Stream datato compute

Push databack to storage

map sort shuffle reduce

http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/

Page 15: How to use Big Data

16Digicomp

HDInsight Demo

Page 16: How to use Big Data

17Digicomp

Microsoft Self Service-BI

Page 17: How to use Big Data

CalibriDigicomp

Mächtige Self-Service BI mit Excel 2013

Page 18: How to use Big Data

19Digicomp

Suited for self-service data that fits in Excel

Data driven shaping – design while you drive

Ideal for sampling data

Partition data in Hadoop/Hive based on user workloads

No governors to prevent users from pulling «too much data»

Does not read compressed or binary files (yet)

Power Query

Page 19: How to use Big Data

22Digicomp

Demo - HDInsight

Page 20: How to use Big Data

23Digicomp

Azure Data Lake

Basierend auf Apache YARN

Praktisch unbegrenzte Datenmengen / Rechenpower

Zahlung nach Nutzung

Aktuell noch auf Einladung

Neue Sprache: U-SQL

Page 21: How to use Big Data

CalibriDigicomp

Demo

Page 22: How to use Big Data

25Digicomp

PowerBI

Cloud Dashboards

On Premise-Technologie verfügbar (DataZen)

Datenanbindung via PowerBI sehr einfach

Hybrid möglich

Page 23: How to use Big Data

CalibriDigicomp

Demo

Page 24: How to use Big Data

CalibriDigicomp

Fragen?