why big data - the data rush
DESCRIPTION
A slide deck that I put together with my thoughts on the main economic drivers that have led to the new "data rush" and the commercialisation of grid computing, and some examples of the many diverse applications for hadoop in the contemporary, third wave, knowledge economy.TRANSCRIPT
Why Big Data
THE INFORMATION AGE
The so-called “economic third wave” has bankrupted or seriously damaged many blue chip organisations
Traditional manufacturing and retail is in rapid and heavy decline in Europe and the US
Technology, connectivity and access to information is restructuring our societies
Levels of political and social engagement have surged Peer-to-peer lending platforms have revolutionized banking in
many countries
NEW AGE NEW ORDER Manufacturing is shifting from the mass-production
model of the 20th century back to build-to-order production
High street stores are being used as showrooms while the actual sale is made online
Web based services are run with tiny profit margins on huge transaction volumes
Systems like Amazon Marketplace, Etsy and Ebay are empowering small business, delivering globalised trade and driving socioeconomic change that has never been seen before
INNOVATION
Mass-production rarely benefits from innovation Innovation drives change – a huge cost with little benefit for
production-line driven economies “Refinement of product” mentality
Knowledge services need to innovate to differentiate Change in a virtual world can be cheap and yield huge
rewards “Reinvention of product” mentality
THE ROVER BICYCLE, 1885
A SHIFT IN DEMANDS
Shifting emphasis from mass-production to knowledge services and build-to-order production means shifting priorities
Innovation and change become more valued attributes than stability and reliability
LONG TAIL
Long-tail economics underpin the information age
everything else / lower value
Wallmart, Best Buy
Amazon, eBay, Netflix
On
ly t
he
mo
st p
op
ula
r /
hig
hes
t va
lue
BIG DATA VIZ LONG TAIL
Knowledge and information-driven services are following the “long-tail” paradigm in many ways, including processing huge amounts of low value data to yield profit
Google Now Amazon recommendations Ebay search Facebook Exchange
BIG DATA VIZ INNOVATION
In a competitive, free market like the world-wide-web, innovation is valued because it can open up new opportunities
Consumer-grade access to grid computing technology is a recent innovation
Grid computing can open up new opportunities that would otherwise not be addressable
It is an excellent solution to the needs of ventures architected around the long-tail economic model
CURRENT TREND
Industrial economies and traditional production line manufacturing require stability, reliability and minimal change
Knowledge economies thrive on innovation, and process huge amounts of information
The US and Europe are transitioning from industrial to knowledge economies
Big Data concepts and technologies are a key enabler for the new economy
THE FUTURE - THINGTERNET
The internet of things is with us Billions of connected devices, even e-tattoos
INTERNET OF THINGSAND BIG DATA
Billions of connected devices create a huge amount of data to process
Until grid computing, IoT was technically near impossible to implement
INTERNET OF THINGS IS A WILD WEST
The IoT poses many new, unsolved challenges
An internet alarm clock, monitoring how often you sleep late, could be accessed by HR for employee performance evaluations
But new challenges = new opportunities
CLASSIC BIG DATA APPLICATIONS
STORAGE
Hadoop can be used purely for online data storage, with no direct processing
Low cost per-GB for petascale online storage The option of directly querying or analysing
the the data available if required.
PRODUCT SEARCH
A huge, constantly changing catalogue of products – like Ebay and Amazon
Simple keyword search matching customer to product
SolrCloud – a full text search engine indexing and serving up terabytes of live content, running on Hadoop clusters
BEHAVIOURAL TARGETING
Matching advertising content with users based on the user's demographic and interests – like Google AdWords
Behavioural Targeting can yield twice as many conversions (eg. Click-throughs) as untargeted advertising
Generates a huge amount of log data which is used for reporting and reprocessed for predictive analysis
Predictive analysis is compute intensive TBs of data per day
PRODUCT RECOMMENDERS
Recommending products to the user based on their demographic and interests, other [similar] user's purchase history, and their current browsing pattern
Like Amazon and Zalando recommendations A hybrid between Behavioural Ad Targeting and
Product Search Combines product catalogue, clickstream data and
passive user profiling, possibly running live in-session
EMERGING BIG DATA APPLICATIONS
SELF SERVICE BIG DATA BUSINESS INTELLIGENCE
So-called “Enterprise Data Hub” paradigm The fastest growing use case in 2014 on
Yahoo's YGrid, a set of 16 clusters composed from 32.500 hadoop nodes
Sales, accounting, executive and other business users run the data analysis jobs themselves on the available datasets using discovery tools like MicroStrategy, Tableau and Tibco Spotfire
DATA WAREHOUSING
Many migrations of classical Enterprise Data Warehousing applications to Hadoop
2-3x+ performance gains over Teradata on 3TB – 30TB workloads
Huge cost savings versus trad enterprise technologies like Oracle and Teradata
Fraud detection – eg. Credit Card, Medical Insurance, Welfare
Credit risk appraisal – eg. Credit card application Banking and Retail batch processes
OLTP DBMS
Many large scale OLTP dbms implementations use HBase, Accumulo or other NOSQL grid db
For low latency, high throughput, high concurrency, high volume
eg. Sharedealing, Realtime ad auction Volumes at 200BN transactions per day in
realtime reliably served
RESEARCH
Low cost solution for mapping the human genome
About 4TB of data per person eg. Cancer research, personalised drugs etc.
DEVICE MANAGEMENT
Automated, managed service for analysis and response to threats detected by SPI module on remote switch
Central heating system management – shut down boiler when nobody home to reduce heating bill and emissions – eg. Nest
Monitor drivers' propensity to break the speed limit and apply lower insurance premiums to good drivers