Download - DLD Summer Workshop Big Data
Big Data Workshop - DLD Summer 15
Understanding Big DataAnd getting the right mindset
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Agenda
Syncing Defining Big Data Hype or Evolution Tech Drivers Big Data – Big Business? What‘s it all about? How do we get there?
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Syncing
Please tell us your opinion about Big Data
Please tell us about your Big Data projects
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Definition(s)
“Big Data describes datasets so large they become very difficult to manage with traditional database tools.”
„big data is “data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures”.“
"Very pragmatically, it's about building net-new analytic applications based on new types of data that (an organization) wasn't previously tracking."
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
The 3 V‘s Variety
Tables, Images, Videos, XML, Logs
Velocity Batch, Streams, Real-
Time
Volume Lot‘s of xBytes
Variety
VolumeVelocity
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Variety
Mix of Data types BLOB‘s and CLOB‘s
Images, Audio, Videos, Log Files
Semi-Structured, Unstructured Email, EDI-Messages, Transaction Logs, Sensor-
Data
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Velocity
Crucial – Speed of „Feedback Loop“ Streaming Data Complex Event Processing From Batch to (Near) Real-Time Different Lifetime
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Volume - Big?
KiloByte MegaByte GigaByte TeraByte PetaByte Exabtye ZettaByte YottaByte
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Figures
„Digital Universe“ according to EMC/IDG Study 2014 in 2013 4.4 Zettabytes, in 2020 44 Zettabytes
All human speech ever spoken 42 Zettabyte (16kHz, 16bit)
2013 - Speculations about NSA Datacenter 1 YB, real estimation 3-12 EB
CERN / LHC Datacenter passes 100 PB
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Volume – Most famous quote
2.5 Exabytes of Data Created each Day (2,500,000,000,000,000,000 bytes) ≈ 1 ZB/Year
(with 90% of World Data created in the last two years)
Source IBM CMO Study 2011
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Even more V‘s
Veracity Uncertainty of Data, Trustworthiness, Accountability
Value Big Data only if it generates value
Visibility Security, stitching together data from various
sources
Validity Logic inference, Correlation vs. Causation
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Old wine? OLTP, OLAP,
DataWareHouse- Around since 1970s- ACID (Atomicity,
Consistency, Isolation, Durability)
- based on SQL
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Big Data 15 years ago
OLTP
Orders
Articles
Receiving
Orders,Articles,
ReceivingEtc.
Data Warehouse
Decision SupportSystems (OLAP)
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Enter Big Data
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovationhttp://www.gartner.com/newsroom/id/1731916http://chucksblog.emc.com/chucks_blog/2011/06/2011-idc-digital-universe-study-big-data-is-here-now-what.html
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
“New” Big Data
New Paradigm BASE (Basic Availability, Soft State and Eventually consistency)
New Data Model Data LifeCycle and Variability Data Linking and referral integrity
New Analytics Real-time/streaming analysis, interactive Machine-learning
New Infrastructure and Tools High Performance Computing, Storage, Network Multi-Provider Services Integration New Data Centric service models and security models
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Hadoop on Premise
Big Data
Cluster
Mgmt /Monitoring
NoSQL
NewSQL Databases
MPP Databases
GraphDB
Crowd-sourcing
Transformation
Security
Storage
App Dev
Cross Infrastructure / Cloud Services
Analytics Platform
BI Platforms
For Business Analysts
Data Science / Platform
Data VisualizationUnstructured Data
AI Social Analytics
Analytic Services
MachineLearning
Location/People/Events
SearchStatistical Computing
LogAnalytics
Crowd-sourced
RealTime SMB
Frame-work
Query Data AccessCollab.
workflowReal-Time
Stat.Tools
ML
Data Source Sensors DataData Markets Incubators
Cloud Deploy
Gov / Regulation
Security
Education / Learning
HealthLog
Analytics
Search
FinanceHuman Capital
Legal
Marketing
Publisher Tools
Ad Optimi-zation
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Big Data
Hype AND Evolution
Some Vendors use it to remarket “old” stuff
Many “new” products/services
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Drivers
Vendors Hardware, Storage, Network, Software
Business Mobile Social Customer Insights
Technology Open Source Technology, Cloud Computing
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Hadoop
- Hadoop is an Open Source „Big Data“ Framework
- Distributed Storage (HDFS) and Processing (Map Reduce)
- Reliable, Fault tolerant
- Horizontal scalability from Single to thousands of Cluster Nodes
- Cost 2.500$ / TB vs. 250.000$ / TB in Datawarehouses
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
MapReduce
Programming Model/Framework for processing large Data Sets
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
NoSQL Databases
Traditional RDBMS outdated for modern paradigms
- Big Data- Connectivity- Concurrency- Diversity- Cloud
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
The NoSQL difference
{ _id: ObjectId(”2341"), type: "Article", author: ”Chris Boos", title: ”Introduction AutoPilot", date: ISODate("2015-04-21T13:21:12.343Z"),},{ _id: ObjectId(2342"), type: "Book", author: ”Roland Judas", title: ”Big Data", isbn: "978-0-213434235-5-7"}
Document-based„User1“, „Roland Judas“„User2“, „Chris Boos“„User3“, „Charly Brown“
Key-Value
Graph-Based
Columns
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Pros/Cons Hadoop / NoSQL
Pro Highly flexible, agile, available, performant Scalable Modern, open technology with Commercial Support Support for very large datasets on commodity
hardware
Cons Immature No Standardization - Schema-free means
Application needs to know how to retrieve data
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Even more tools
Search/Index
Business Intelligence
Analytical Programming
Visualisation
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Big Data Market
Big Data Market projected in 2015 – $125bn* (in comparison Public Cloud - $95bn**)
Big Funding Cloudera – $1.2bn MongoDB – $300m HortonWorks – $250m DataStax – $190m BIRST – $130m
* According to Forbes.co / 2014/12/11 / 6 Predictions for Big Data / IDC Research** According to Forrester Research
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Shares of Big Data Market
Hardware≈ 40%
Services≈ 40%
Software≈ 20%
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Vendors REALLY love Big Data!
Latest in Corporate Tech: In-Memory
Oracle Exalytics
SAP HANA
„Has SAP Bet The House With The Biggest Update to its ERP in Two Decades?“http://www.forbes.com/sites/greatspeculations/2015/03/04/has-sap-bet-the-house-with-the-biggest-update-to-its-erp-in-two-decades/
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Best Practices DWH / BI / Big Data
Analyze problem / data / quality Data Cleaning Data quality initiatives
Sync Business / IT Buy stuff Implement stuff Train users Use governance / strategic approaches
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
And the success?
Through 2017, 60% of big data projects will fail to go
beyond piloting and experimentation and will be abandoned.
Through 2017, fewer than half of lagging organizations
will have made cultural or business model adjustments sufficient to benefit from big data.
Through 2018, 90% of deployed data lakes will be
useless as they are overwhelmed with information assets captured for uncertain use cases.
Gartner: Predicts 2015: Big Data Challenges Move From Technology to the Organization
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Challenges
Usage Scenarios Goals
Skills Missing Data Scientists Need to understand the Math
Technical Data Integration
Privacy Main discussion in Germany
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Syncing
What‘s your opinion?
Do you have experience with big vendors offerings?
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
What‘s it all about?
Data contains information of great business value
If you can extract those insights you can make far better decisions
Ultimately - Predicting the future
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Common Use Cases
Customer Insights
Market Basket/Pricing optimization
Fraud Detection / Security Analytics
(Proactive) Monitoring
Sensor Data (IoT)
Data Warehouse Optimization
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Understanding is important
Data Understanding
Connectedness
Information
Knowledge
Intelligence/Wisdom
Understandingrelations
Understandingpatterns
Understandingprinciples
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Syncing
Anyone heard about „Semantic Web“ or „Ontology“?
Anyone having experience or projects around Ontologies?
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Mapping the territory
Enterprise Architecture (traditional) „Holistic“ Approach Many „Best practices“ and patterns
Big Data Discovery Kind of Self-Service for Big Data Next Big Thing?
Semantic Layer Should exist from BI implementation (proprietary) Or use modern approach “Linked Data”
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Key is getting machine readable Data<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:admin="http://webns.net/mvcb/">
<foaf:PersonalProfileDocument rdf:about="">
<foaf:maker rdf:resource="#me"/>
<foaf:primaryTopic rdf:resource="#me"/>
</foaf:PersonalProfileDocument>
<foaf:Person rdf:ID="me">
<foaf:name>Roland Judas</foaf:name>
<foaf:title>Mr.</foaf:title>
<foaf:givenname>Roland</foaf:givenname>
<foaf:family_name>Judas</foaf:family_name>
<foaf:homepage rdf:resource="http://about.me/rjudas"/>
<foaf:workplaceHomepage rdf:resource="http://arago.co"/>
<foaf:knows>
<foaf:Person>
<foaf:name>Chris Boos</foaf:name>
</foaf:Person></foaf:knows></foaf:Person>
</rdf:RDF>
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Ontologies
“A Data Model that represents Knowledge as
a set of concepts within a domain and the
relationships between these concepts”
FOAF Schema.org DBPedia Ontology Good Relations
http://www.w3.org/wiki/Good_Ontologies
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Triples
Representation of facts
PredicateSubject Object
Is a (has type)Roland Person
http://about.me/rjudas rdf:type foaf:Person
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
From Triples to Graphs
Is a
Person
Roland
likes
DLD
Songs
plays
Vertice / Node
Edge
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
A pragmatic ApproachFrom the Basement
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Bringing Pieces together
Semantic Graphs
Big DataAPIs
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Use Cases from/beyond the IT Department
Ticket Statistics Provider Management Network Planning Comparing Architectures Forecasting Technological Trends Data Center Planning Application Migration Technical Analysis for Business Processes IT Organisation Insights User Ranking
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
The right Mindset
SemanticsGraphsAPIs“New” Big Data Tools
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
www.autopilot.co www.graphit.co www.tabtab.co
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Roland Judas Frankfurt, Germany Technical Evangelist, Product
Manager at arago Organizer Webmontag
Frankfurt, Cloudcamp Frankfurt
Mail: [email protected] Twitter:
@rjudas (en) @rolandjudas (de)
http://about.me/rjudas
21/06/15, DLD Summer 15, @rjudas
Big Data Workshop - DLD Summer 15
Image References and Licenses
Facebook Datacenter https://www.flickr.com/photos/intelfreepress/ License CC BY 2.0
Winery https://www.flickr.com/photos/joceykinghorn/ License CC BY-SA 2.0
BI Dashboard https://www.flickr.com/photos/ctsi-global/ License CC BY-SA 2.0
Dollars https://www.flickr.com/photos/amagill/ License CC BY 2.0
Old Timer Truck: https://www.flickr.com/photos/ell-r-brown/ License CC BY 2.0
SQL Designer https://www.flickr.com/photos/ejk/ License CC BY-SA 2.0
Crystal Ball https://www.flickr.com/photos/frogman2212/ License CC BY 2.0
MapReduce https://www.flickr.com/photos/lkaestner/ License CC BY-SA 2.0
Foaf https://www.flickr.com/photos/dullhunk/ License CC BY 2.0
Linked Open Data Richard Cyganiak and Anja Jentzsch License CC BY-SA 3.0
Rear-View Mirror https://www.flickr.com/photos/labyrinthx-2/ License CC BY-SA 2.0
Servers-8055_13.jpg https://commons.wikimedia.org/wiki/User:Victorgrigas License CC BY-SA 3.0
Watson https://commons.wikimedia.org/wiki/User:Clockready License CC BY-SA 3.0
Wolfram Alpha https://www.flickr.com/photos/morville/ License CC BY 2.0
Social_Network_Visualization MartinGrandjean http://www.martingrandjean.ch/wp-content/
21/06/15, DLD Summer 15, @rjudas