dld summer workshop big data

77
Big Data Workshop - DLD Summer 15 Big Data – Workshop DLD Summer 15 21/06/15, DLD Summer 15, @rjudas

Upload: roland-judas

Post on 15-Aug-2015

176 views

Category:

Technology


7 download

TRANSCRIPT

Big Data Workshop - DLD Summer 15

Big Data – WorkshopDLD Summer 15

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Understanding Big DataAnd getting the right mindset

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Agenda

Syncing Defining Big Data Hype or Evolution Tech Drivers Big Data – Big Business? What‘s it all about? How do we get there?

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Syncing

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Syncing

Please tell us your opinion about Big Data

Please tell us about your Big Data projects

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Defining Big Data

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Definition(s)

“Big Data describes datasets so large they become very difficult to manage with traditional database tools.”

„big data is “data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures”.“

"Very pragmatically, it's about building net-new analytic applications based on new types of data that (an organization) wasn't previously tracking."

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

The 3 V‘s Variety

Tables, Images, Videos, XML, Logs

Velocity Batch, Streams, Real-

Time

Volume Lot‘s of xBytes

Variety

VolumeVelocity

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Variety

Mix of Data types BLOB‘s and CLOB‘s

Images, Audio, Videos, Log Files

Semi-Structured, Unstructured Email, EDI-Messages, Transaction Logs, Sensor-

Data

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Velocity

Crucial – Speed of „Feedback Loop“ Streaming Data Complex Event Processing From Batch to (Near) Real-Time Different Lifetime

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Volume - Big?

KiloByte MegaByte GigaByte TeraByte PetaByte Exabtye ZettaByte YottaByte

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Figures

„Digital Universe“ according to EMC/IDG Study 2014 in 2013 4.4 Zettabytes, in 2020 44 Zettabytes

All human speech ever spoken 42 Zettabyte (16kHz, 16bit)

2013 - Speculations about NSA Datacenter 1 YB, real estimation 3-12 EB

CERN / LHC Datacenter passes 100 PB

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Volume – Most famous quote

2.5 Exabytes of Data Created each Day (2,500,000,000,000,000,000 bytes) ≈ 1 ZB/Year

(with 90% of World Data created in the last two years)  

Source IBM CMO Study 2011

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Even more V‘s

Veracity Uncertainty of Data, Trustworthiness, Accountability

Value Big Data only if it generates value

Visibility Security, stitching together data from various

sources

Validity Logic inference, Correlation vs. Causation

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Hype or Evolution?

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Old wine? OLTP, OLAP,

DataWareHouse- Around since 1970s- ACID (Atomicity,

Consistency, Isolation, Durability)

- based on SQL

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Big Data 15 years ago

OLTP

Orders

Articles

Receiving

Orders,Articles,

ReceivingEtc.

Data Warehouse

Decision SupportSystems (OLAP)

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Business Intelligence

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Enter Big Data

http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovationhttp://www.gartner.com/newsroom/id/1731916http://chucksblog.emc.com/chucks_blog/2011/06/2011-idc-digital-universe-study-big-data-is-here-now-what.html

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

“New” Big Data

New Paradigm BASE (Basic Availability, Soft State and Eventually consistency)

New Data Model Data LifeCycle and Variability Data Linking and referral integrity

New Analytics Real-time/streaming analysis, interactive Machine-learning

New Infrastructure and Tools High Performance Computing, Storage, Network Multi-Provider Services Integration New Data Centric service models and security models

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Hadoop on Premise

Big Data

Cluster

Mgmt /Monitoring

NoSQL

NewSQL Databases

MPP Databases

GraphDB

Crowd-sourcing

Transformation

Security

Storage

App Dev

Cross Infrastructure / Cloud Services

Analytics Platform

BI Platforms

For Business Analysts

Data Science / Platform

Data VisualizationUnstructured Data

AI Social Analytics

Analytic Services

MachineLearning

Location/People/Events

SearchStatistical Computing

LogAnalytics

Crowd-sourced

RealTime SMB

Frame-work

Query Data AccessCollab.

workflowReal-Time

Stat.Tools

ML

Data Source Sensors DataData Markets Incubators

Cloud Deploy

Gov / Regulation

Security

Education / Learning

HealthLog

Analytics

Search

FinanceHuman Capital

Legal

Marketing

Publisher Tools

Ad Optimi-zation

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Big Data

Hype AND Evolution

Some Vendors use it to remarket “old” stuff

Many “new” products/services

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Tech Drivers

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Drivers

Vendors Hardware, Storage, Network, Software

Business Mobile Social Customer Insights

Technology Open Source Technology, Cloud Computing

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

The Elephant in the Room

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Hadoop

- Hadoop is an Open Source „Big Data“ Framework

- Distributed Storage (HDFS) and Processing (Map Reduce)

- Reliable, Fault tolerant

- Horizontal scalability from Single to thousands of Cluster Nodes

- Cost 2.500$ / TB vs. 250.000$ / TB in Datawarehouses

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

MapReduce

Programming Model/Framework for processing large Data Sets

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

NoSQL Databases

Traditional RDBMS outdated for modern paradigms

- Big Data- Connectivity- Concurrency- Diversity- Cloud

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

The difference – SQL / Tables

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

The NoSQL difference

{ _id: ObjectId(”2341"), type: "Article", author: ”Chris Boos", title: ”Introduction AutoPilot", date: ISODate("2015-04-21T13:21:12.343Z"),},{ _id: ObjectId(2342"), type: "Book", author: ”Roland Judas", title: ”Big Data", isbn: "978-0-213434235-5-7"}

Document-based„User1“, „Roland Judas“„User2“, „Chris Boos“„User3“, „Charly Brown“

Key-Value

Graph-Based

Columns

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Pros/Cons Hadoop / NoSQL

Pro Highly flexible, agile, available, performant Scalable Modern, open technology with Commercial Support Support for very large datasets on commodity

hardware

Cons Immature No Standardization - Schema-free means

Application needs to know how to retrieve data

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Even more tools

Search/Index

Business Intelligence

Analytical Programming

Visualisation

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Machine Learning

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Big Data – Big Business?

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Big Data Market

Big Data Market projected in 2015 – $125bn* (in comparison Public Cloud - $95bn**)

Big Funding Cloudera – $1.2bn MongoDB – $300m HortonWorks – $250m DataStax – $190m BIRST – $130m

* According to Forbes.co / 2014/12/11 / 6 Predictions for Big Data / IDC Research** According to Forrester Research

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Shares of Big Data Market

Hardware≈ 40%

Services≈ 40%

Software≈ 20%

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Vendors love Big Data

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Vendors REALLY love Big Data!

Latest in Corporate Tech: In-Memory

Oracle Exalytics

SAP HANA

„Has SAP Bet The House With The Biggest Update to its ERP in Two Decades?“http://www.forbes.com/sites/greatspeculations/2015/03/04/has-sap-bet-the-house-with-the-biggest-update-to-its-erp-in-two-decades/

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Even more Sales!!!

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Best Practices DWH / BI / Big Data

Analyze problem / data / quality Data Cleaning Data quality initiatives

Sync Business / IT Buy stuff Implement stuff Train users Use governance / strategic approaches

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

And the success?

Through 2017, 60% of big data projects will fail to go

beyond piloting and experimentation and will be abandoned.

Through 2017, fewer than half of lagging organizations

will have made cultural or business model adjustments sufficient to benefit from big data.

Through 2018, 90% of deployed data lakes will be

useless as they are overwhelmed with information assets captured for uncertain use cases.

Gartner: Predicts 2015: Big Data Challenges Move From Technology to the Organization

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Challenges

Usage Scenarios Goals

Skills Missing Data Scientists Need to understand the Math

Technical Data Integration

Privacy Main discussion in Germany

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Syncing

What‘s your opinion?

Do you have experience with big vendors offerings?

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

What‘s it all about?

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

What‘s it all about?

Data contains information of great business value

If you can extract those insights you can make far better decisions

Ultimately - Predicting the future

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Common Use Cases

Customer Insights

Market Basket/Pricing optimization

Fraud Detection / Security Analytics

(Proactive) Monitoring

Sensor Data (IoT)

Data Warehouse Optimization

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Understanding is important

Data Understanding

Connectedness

Information

Knowledge

Intelligence/Wisdom

Understandingrelations

Understandingpatterns

Understandingprinciples

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

How do we get there?

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Syncing

Anyone heard about „Semantic Web“ or „Ontology“?

Anyone having experience or projects around Ontologies?

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Mapping the territory

Enterprise Architecture (traditional) „Holistic“ Approach Many „Best practices“ and patterns

Big Data Discovery Kind of Self-Service for Big Data Next Big Thing?

Semantic Layer Should exist from BI implementation (proprietary) Or use modern approach “Linked Data”

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Data + Semantic = Knowledge

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Key is getting machine readable Data<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

xmlns:foaf="http://xmlns.com/foaf/0.1/"

xmlns:admin="http://webns.net/mvcb/">

<foaf:PersonalProfileDocument rdf:about="">

<foaf:maker rdf:resource="#me"/>

<foaf:primaryTopic rdf:resource="#me"/>

</foaf:PersonalProfileDocument>

<foaf:Person rdf:ID="me">

<foaf:name>Roland Judas</foaf:name>

<foaf:title>Mr.</foaf:title>

<foaf:givenname>Roland</foaf:givenname>

<foaf:family_name>Judas</foaf:family_name>

<foaf:homepage rdf:resource="http://about.me/rjudas"/>

<foaf:workplaceHomepage rdf:resource="http://arago.co"/>

<foaf:knows>

<foaf:Person>

<foaf:name>Chris Boos</foaf:name>

</foaf:Person></foaf:knows></foaf:Person>

</rdf:RDF>

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Ontologies

“A Data Model that represents Knowledge as

a set of concepts within a domain and the

relationships between these concepts”

FOAF Schema.org DBPedia Ontology Good Relations

http://www.w3.org/wiki/Good_Ontologies

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Triples

Representation of facts

PredicateSubject Object

Is a (has type)Roland Person

http://about.me/rjudas rdf:type foaf:Person

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

From Triples to Graphs

Is a

Person

Roland

likes

DLD

Songs

plays

Vertice / Node

Edge

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Famous Examples

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

A pragmatic ApproachFrom the Basement

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Bringing Pieces together

Semantic Graphs

Big DataAPIs

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

http://github.com/arago/ogit

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Semantic Data Platform

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Visualization

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Use Cases from/beyond the IT Department

Ticket Statistics Provider Management Network Planning Comparing Architectures Forecasting Technological Trends Data Center Planning Application Migration Technical Analysis for Business Processes IT Organisation Insights User Ranking

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

The right Mindset

SemanticsGraphsAPIs“New” Big Data Tools

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

www.autopilot.co www.graphit.co www.tabtab.co

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Roland Judas Frankfurt, Germany Technical Evangelist, Product

Manager at arago Organizer Webmontag

Frankfurt, Cloudcamp Frankfurt

Mail: [email protected] Twitter:

@rjudas (en) @rolandjudas (de)

http://about.me/rjudas

21/06/15, DLD Summer 15, @rjudas

Big Data Workshop - DLD Summer 15

Image References and Licenses

Facebook Datacenter https://www.flickr.com/photos/intelfreepress/ License CC BY 2.0

Winery https://www.flickr.com/photos/joceykinghorn/ License CC BY-SA 2.0

BI Dashboard https://www.flickr.com/photos/ctsi-global/ License CC BY-SA 2.0

Dollars https://www.flickr.com/photos/amagill/ License CC BY 2.0

Old Timer Truck: https://www.flickr.com/photos/ell-r-brown/ License CC BY 2.0

SQL Designer https://www.flickr.com/photos/ejk/ License CC BY-SA 2.0

Crystal Ball https://www.flickr.com/photos/frogman2212/ License CC BY 2.0

MapReduce https://www.flickr.com/photos/lkaestner/ License CC BY-SA 2.0

Foaf https://www.flickr.com/photos/dullhunk/ License CC BY 2.0

Linked Open Data Richard Cyganiak and Anja Jentzsch License CC BY-SA 3.0

Rear-View Mirror https://www.flickr.com/photos/labyrinthx-2/ License CC BY-SA 2.0

Servers-8055_13.jpg https://commons.wikimedia.org/wiki/User:Victorgrigas License CC BY-SA 3.0

Watson https://commons.wikimedia.org/wiki/User:Clockready License CC BY-SA 3.0

Wolfram Alpha https://www.flickr.com/photos/morville/ License CC BY 2.0

Social_Network_Visualization MartinGrandjean http://www.martingrandjean.ch/wp-content/

21/06/15, DLD Summer 15, @rjudas