heavy, messy, misleading: how big data is a human problem, not a tech one
Post on 02-Jul-2015
411 Views
Preview:
DESCRIPTION
TRANSCRIPT
Francesco D’Orazio, @abc3d VP Product, PulsarPlatform.com
Heavy, Messy, Misleading Why Big Data is a human problem, not a technology one
Every talk about Big Data should start with Twin Peaks
There’s more to big data than the technology behind it.
And the best way to find out what it is, is to start from the
metaphors of Big Data
stream of data
ocean of data
river of data
a data leak
data firehose
data flood
data tsunami
data “is” fluid
data “is” huge
data “is” powerful
data “is” unpredictable
data “is” uncontrollable
Data is the new oil(?!)
We are not going to war for it (yet)
Data is not a scarce resource
The abundance of data is the result of the instrumentalisation
of the natural, industrial and social worlds
The Large Hadron Collider can record up to 40 million particle interactions per second
The Square Kilometer Array will collect data from the deep space dating back to more than 13 billion years ago
Wolfram Data Science on Facebook Data: how our topics of discussion change by age and gender
Carna Botnet: in just 60 seconds nearly 640 Terabytes of IP data is tranferred across the globe via the Internet
Machine Sensing
The sensors on the new Airbus 380 generate 10 terabytes of data every 30 minutes. That’s 120T every LDN-NYC flight
And yet, another reason why data is not the new oil is that we are
not actually doing it much…
99.5% Percentage of newly created digital data
that’s never analysed
But that’s not strictly true either…
0.5% Percentage of newly created digital data
that’s actually being used
higher % of teenagers having sex vs
% of new data being analysed
Credit Scores have replaced the handshake with the bank manager
Fair and Isaac came along in 1956. Today they crunch around 10 billion scores each year
Buying advertising used to be about smiles and jokes over Martini lunches
Now it looks more like this…
11 seconds of trading for the FB shares. Already in 2006 one third of all transactions in EU and US was algorithmic
Walmart handles more than 1M customer transactions per hour, all affected by price elasticity
Price Discrimination based on log in info, browser history, device, A/B testing is common practice for most online retailers
75% of the content Netflix serves is chosen based on a Netflix recommendation
At Buzzfeed every item of content has its own dashboard showing how it spreads form ‘seed views’ to ‘social views’ and by what ‘viral lift’
Upworthy
Systematic experimentation: 15% of the top 10.000 websites uses A/B testing
Crowdpac matches candidates and funders based on analysis of public speeches, contribution and other sources of public data on the candidate
LAPD run a pilot to predict where a crime is going to happen next (‘crime aftershocks’) based on 13 million crimes over 80 years
The Dubai police is equipping officers with Google Glass enabled with face recognition to identify potential wanted criminals
Systematic experimentation: 15% of the top 10.000 websites uses A/B testing
LinkedIn had a student problem: so they re-arranged the data they already have for a student audience
99.5% Why then are we throwing away this
much data?
We are still learning to recognize problems as
data-problems
Big Data changes the very definition of how we produce knowledge
Less > More Exact > Messy
Causation > Correlation
Significant correlation requires scale. And scale is hard to handle.
DNA research is a case in point: DNA data is hard to manipulate and there’s not enough sequenced DNA available to establish significant patterns
Big Data comes with Big Errors
Data is rarely normalised.
Data is siloed and not verifiable.
Big does not equal whole.
Big does not equal representative.
Data doesn’t speak for itself. We speak for it.
Big Data is still biased and the result of interpretation.
Correlation doesn’t imply causality.
Models are often too simple and not peer-reviewed.
Context is hard to interpret at scale. Traditional Qual & Quant
have to work with big data.
3 billion queries/day 50 million top keywords identified 5 years of data on flu spread matched Overestimates by 50% Didn’t predict pandemics
Big Data also means a big
new digital divide.
Accessible doesn’t mean ethical.
The problems slowing down the adoption of Big Data
are human problems
And that’s because the biggest innovation in Big
Data is a human innovation
An innovation in decision-making: framing,
solving and actioning a problem
“Data is just like crude. It’s valuable, but if unrefined it
cannot really be used. It has to be changed into gas, plastic, chemicals, etc., to create a valuable entity that drives
profitable activity; so must data be broken down, analyzed for it
to have value” Michael Palmer
The opportunity in Big Data is data middleware: turning
crude into gas, plastic, chemicals
But until we invent the new plastic, the new gas, the new chemicals, we are stuck with the smokescreen. Or even the smoke monster.
Francesco D’Orazio, @abc3d VP Product, PulsarPlatform.com
Thank You
top related