big data, hadoop - lunchtime talk 2015.02.26
TRANSCRIPT
Big Data Consulting
Hadoop, big dataRobert Gibbon - www.bigindustries.be
The information age
■ The “economic third wave” has badly hit many blue chip organisations
■ Manufacturing and retail is in rapid decline in Europe and the US■ Tech, connectivity and information is restructuring our societies■ Levels of political and social engagement have surged■ Trading platforms are empowering small businesses
Innovation■ Mass-production hates innovation■ Innovation means change – a huge cost with little benefit for
production-line economies■ Continuous improvement mentality
■ Knowledge services need to innovate to differentiate■ Change in a virtual world can be cheap and yield huge rewards■ Continuous reinvention mentality
The rover bicycle, 1885
Big data viz. innovation■ In a free market like the web, innovation can open up new
opportunities■ Consumer access to grid computing tech is a recent innovation■ Grid computing opens up new opportunities that would otherwise
not be viable■ Ideal for ventures architected around the long-tail economic
model
The future - thingternet■ The internet of things is with us■ Billions of connected devices, even digital tattoos
Big data viz. internet of things
■ Billions of connected devices create a huge amount of data
■ Until big data tech, Internet of Things was nearly impossible to monetize
The internet of things is a wild west■ Many new, unsolved challenges
■ Privacy■ Governance■ Civil liberties
■ New challenges = new opportunities
let's get back to hadoop
■ FOSS software solution for processing terabytes to petabytes of data■ Using arrays of regular servers
■ Hadoop core:■ HDFS - a scale-out file system■ YARN - a scale-out application resource manager
■ Runtimes:■ Spark, Impala, Flink, MapReduce, Kafka, SolrCloud etc.
■ Components for data protection, access control and operational management■ NOSQL databases
■ Hbase, Accumulo, Cassandra etc.
Hadoop refresher
what can you do with hadoop?
Storage
■ Pure online data storage, with no other processing ■ Low cost per-GB for petascale online storage ■ Option to directly query and analyse the data is
available if required.
■ Example: huge, constantly changing catalogue of products – like Ebay and Amazon
■ SolrCloud – an advanced search engine serving terabytes of content from Hadoop
Search
Messaging■ A distributed message queue backed by a Hadoop
cluster - Apache Kafka■ Elastically scalable■ Messages are persisted and replicated for durability■ TBs of messages per broker with predictable
performance
Targeting■ Personalised content for users■ Generates and consumes a huge amount of log data
■ for reporting ■ for predictive analysis
■ Predictive analysis is compute intensive ■ Can be TBs of data per day
Self-service Business Intelligence■ Enterprise Data Hub paradigm ■ A very popular emerging use case
■ Business users directly access raw datasets using specialised discovery tools built on top of Hadoop - DataMeer, Platfora and others
Data warehousing
■ Migration of Enterprise Data Warehouse to Hadoop ■ Big cost savings versus trad vendors like Oracle and
Teradata
Machine learning
■ Predictive analytics with Spark MLLib or Revolution R Enterprise
■ Automatically predict component failures for proactive intervention
Big Database■ Low latency, high throughput, high concurrency,
high volume■ Algotrading■ Realtime ad auctions
■ Volumes at 200BN transactions per day in realtime reliably served
■ Analysis and response to threats detected by SPI module on remote switch
■ Automated systems management – shut down heating when nobody home to reduce heating bill and emissions
■ Monitor driver propensity to break the speed limit - offer lower insurance premiums to good drivers
Device management
hadoop - mature?
Choice of vendors
Solid operational management
Impala v Teradata
Free grid computing
Free scale-out database
Growing commercial ecosystem
Secure and available■ RPC authentication and encryption with PKI■ Data encryption at rest and in transit■ Kerberos resource access control - HDFS, YARN■ Table cell level permissions - Accumulo■ Online snapshot backups■ No SPoF
thanks for listeningbe.linkedin.com/in/robertgibbon