same problems, more zeroes: why the spreadsheet (and powerpivot) will dominate big data usage
DESCRIPTION
Presentation to NYC MSBIgData GroupTRANSCRIPT
Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage
Rob Collie
Me
13+ years at Microsoft in Redmond
Technical design, strategic direction, and project management
Office 97, Windows Installer (MSI) v1, Excel 2003, Excel 2007, Bing
Designed much of PowerPivot v1
CTO, Pivotstream.com
PowerPivotPro.com, PowerPivotFAQ.com
“Dominate” is a dramatic word
Back end storage and processing isn’t going anywhere (but it will change slightly)
Not a threat – an opportunity
Two Agendas
What the opportunity looks like & where Excel/PowerPivot fits
How Excel earned its stigmas and how PowerPivot dispels most of them
Will swap back and forth between them
Why Excel “Sucks” Let’s be analytical! What are the precise problems?
Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh
Think of it as a programming environment…1. Files are the source projects!2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 &
like it!5. No separation of presentation and logic. No “portability.”
DEMO1: MILES AND MILES OF DATA
300 Million Rows in One Workbook
If Printed Out, Those 300M Rows Would Stretch 1,000 Miles!
Want Billions of Rows? Import to Tabular BISM
Want Billions? Import to Tabular BISM
Import Results – Same Formulas and UX as PowerPivot, Just a Different Frame (VS vs. Excel)
Updating The Checklist – PowerPivot “Fixes” Excel Let’s be analytical! What are the precise problems?
Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh
Think of it as a programming environment…1. Files are the source projects!2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 &
like it!5. No separation of presentation and logic. No “portability.”
OPPORTUNITY
Trend #1: Data Explosion
Library of Congress:
530 miles of bookshelves
10 Terabytes (That’s it???)
2006 2007 2008 2009 2010 20110
200
400
600
800
1000
1200
1400
1600
1800
2000
Worldwide Data Storage (EB)
Worldwide Data in Storage:
~180 Million TB in 2006
10x increase in 5 years!
~3 Libraries of Congress per US Household
Trend #2: BI Spending ACCELERATES in Recessions
page 15
If Big Data is not accessible via the right tools, you might as well not even be storing it.
DEMO: WHAT NEW YORKERS DRINK
Demo Screenshot: Corona Dominates NYC Beer Sales
Note that this demo is running in my browser!– No Excel or PowerPivot install required– Even runs on Mac and iPad
Very “Fisher Price” UX, not scary like Excel – just a friendly website
But Stella Artois Rules Manhattan
Note that the report is sliced to Manhattan only, one click
Also note that the user cannot download the workbook, just interact with it – secure and controlled
VERY Different Bestseller list in the Bronx
“Cordina” brand holds spots 2, 3, 5, and 7
This report automatically refreshes itself with the latest data on a regular schedule – no human intervention required!
The Checklist Let’s be analytical! What are the precise problems?
Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh
Think of it as a programming environment…1. Files are the source projects, with no enforced “blessed” version2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 & like
it!5. No separation of presentation and logic. No “portability.”
Big Data is a Matter of Opinion
The v’s
<Went looking for supporting articles>
Confirmation!
Important Points/My Opinions
Decisionmakers don’t care how data is stored
Decisionmakers don’t care how big the data is– Even 1,000 rows is bigger than they can digest– Humans can digest one screen at most– They need us to give them SMALL data
Decisionmakers don’t like to learn new tools
It is pointless and counterproductive to fight any of this
Opinion: At the place where it matters, there is no difference between Big Data and BI – it’s all Insight, consumed primarily by non-technical humans
But Decisionmakers are an Obstacle
Only they know what they know
Only they know what they need– They don’t even know what they want til they see what
they don’t!
They don’t know how to explain either of the above
They don’t understand your language at all – what’s easy, what’s difficult
They budget to spend about 10% of the time required with you
True Story: How a week became an hour
In 2006, I hired a top-notch BI pro for a project at MS
I was the domain expert (the “decisionmaker”) but knew nothing of the toolset.
He was the technical pro (the “doer”) and knew nothing of the domain.
Writing and debugging a single formula took a full week of iteration and communication.
In 2009 I revisited the same project– But thanks to PowerPivot, this time I was both decisionmaker and doer
The same formula process now took LESS THAN ONE HOUR!– This was true even though I had forgotten every last detail of the 2006
project
Why did a week become an hour? HOW???
Communication: The “Dark Matter” of BI Projects
Knowledge Worker
…but person to person communication at “2400 Baud Dialup” speed BI Pro
Internal Communicationat “Broadband” Speed…
Where the Time Gets Spent Where the Time Gets Spent
Internal Communicationat “Broadband” Speed…
Never budgeted or accounted or rewarded… so they don’t commit
Of which, 10% create PivotTables
- Every org has them- ~7M Java Devs, 2M SQL Pros- Each supports avg of 15 BDM’s- Support majority of informed decisions in the biz world
Excel Pros – Data Pros’ New Allies
300M Users
30M Pros
But Even Better…
They intrinsically know the business as well as the decisionmakers (often, they ARE decisionmakers)
They share your (IT, development) mindset more than you’d expect
They can and will pick up PowerPivot quickly
They NEED you
They’re great teammates and are thrilled to cooperate with you
Traditional Model Bottlenecks on BI Pro, and Coming Soon to Big Data
Knowledge Worker /Analyst / Excel Pro BI Pro
BI Pro Intensely Engaged with One Project at a Time
Everyone Else Waits Make uninformed decisions/guesses Burn time inefficiently with
spreadsheets Make costly spreadsheet mistakes Leak sensitive information Become entrenched in spreadsheet
process, resistant to improvement once BI resources available
BIgData Pro Now Can Address Multiple Projects
BUDGET VS ACTUALS
The Demo See blog posts:
– http://ppvt.pro/BudgetActuals1– http://ppvt.pro/BudgetActuals2
The Checklist Let’s be analytical! What are the precise problems?
Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh
Think of it as a programming environment…1. Files are the source projects, with no enforced “blessed” version2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 & like
it!5. No separation of presentation and logic. No “portability.”
Bonus Demos
Weather
Power View
Connection to Hadoop
UFO’s