W I N T E R C O R P
T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
TCOD: A Framework for the
Total Cost of Big Data
(key charts)
Richard WinterWinterCorp
December 6, 2013, V17
Research Report: wintercorp.com/tcod-report
Spreadsheet: wintercorp.com/tcod-spreadsheet
Key Charts: wintercorp.com/tcod-charts
©2010 Winter Corporation. All Rights Reserved.
2W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
©2010 Winter Corporation. All Rights Reserved.
Total Cost of Data
(TCOD)
TCOD is the cost of storing, managing and using data over time for analytic purposes
* ETL is extract, transform and load (preparing data for analytic use)
© 2010, 2011, 2012 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.
System Admin
ET
L*
Ap
ps
Qu
eries
An
alytics
Software Development/Maintenance
Diagram not to scale.
©2010 Winter Corporation. All Rights Reserved.
3W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
©2010 Winter Corporation. All Rights Reserved.
Data Refining Example Data from Turbines
© 2010, 2011, 2012 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.
©2010 Winter Corporation. All Rights Reserved.
4W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
©2010 Winter Corporation. All Rights Reserved.
Data Refining ExampleData Management Requirements
1. Hundreds of TB of data per week – 500 TB data
capacity
2. Raw data life: few hours to a few days
3. Challenge: find the important events or trends quickly
4. Massive analysis problem
5. When analyzing, read entire files
6. Keep only the significant data
© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.
©2010 Winter Corporation. All Rights Reserved.
5W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
©2010 Winter Corporation. All Rights Reserved.
Cost ComparisonEngineering Example – Data Refining
© 2010, 2011, 2012 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.
On Hadoop
$9.3m(not to scale)
On Data Warehouse Appliance*
$30m
Data WarehouseAppliance
Hadoop
Volume of Data 500 TB 500 TB
System Cost $23 million $1.3 million
Total Cost of Data $30 million $9.3 million
* Performance class of DW Appliance – not the lowest price class
©2010 Winter Corporation. All Rights Reserved.
6W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
©2010 Winter Corporation. All Rights Reserved.
Observations on Hadoop
1. Many examples of the data refining requirement in
engineering, operations, business, science, healthcare
2. Cost equation is favorable to Hadoop in these
applications even with a wide variety of data types
3. There are also many other excellent Hadoop use cases
– Data landing zone
– Archive
– Intensive batch processing of data
4. Example is one illustration of Hadoop sweet spot
© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.
©2010 Winter Corporation. All Rights Reserved.
7W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
©2010 Winter Corporation. All Rights Reserved.
Business ExampleEnterprise Data Warehouse
© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.
©2010 Winter Corporation. All Rights Reserved.
8W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
©2010 Winter Corporation. All Rights Reserved.
Business Example - EDWData Management Requirements
1. Data volume
a. 500 TB to start – all retained for at least five years
b. Continual growth of data and workload
2. Data sources: thousands
a. Data sources change their feeds frequently
b. New data sources are frequent
3. Challenges
a. Data must be correct
b. Data must be integrated
4. Typical enterprise data lifetime: decades
5. Analytic application lifetime: years
6. Many thousands of data users (104 – 106)
7. Hundreds of analytic applications
8. Thousands of one time analyses
9. Tens of thousands of complex queries
© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.
©2010 Winter Corporation. All Rights Reserved.
9W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
©2010 Winter Corporation. All Rights Reserved.
Cost ComparisonBusiness Example – EDW
© 2010, 2011, 2012 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.
Total System Cost
System and Data Admin
ETL
Application Development
Complex Queries
Analysis
Data WarehousePlatform
Hadoop
Volume of Data 500 TB
System Cost $45 million $1.4 million
Total Cost of Data $265 million $740 million
On EDW Platform
$265 million On Hadoop
$740 million(not to scale)
©2010 Winter Corporation. All Rights Reserved.
10W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
©2010 Winter Corporation. All Rights Reserved.
Conclusions – Two TCOD Examples
1. TCOD is NOT platform cost
2. Each technology has large advantages in its sweet spot(s)
3. Neither platform is cost effective in the other’s sweet spot
4. Biggest differences for the data warehouse are the development of:
Complex queries Analytics
Data Refining: Hadoop winsAlso: Landing Zone, Archive
EDW: Data W/H Platform Wins
© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.
$0
$5
$10
$15
$20
$25
$30
$35
On Hadoop On DataWarehouse
Mill
ion
s
$0
$100
$200
$300
$400
$500
$600
$700
$800
On Hadoop On Data Warehouse
Mill
ion
s
Total System Cost
System and Data Admin
Application Development
ETL
Complex Queries
Analysis
©2010 Winter Corporation. All Rights Reserved.
11W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
©2010 Winter Corporation. All Rights Reserved.
TCOD FrameworkAdditional Notes
Not taken into account
Actual system workloads, concurrency, availability reqmts.
Cost of preparing simple queries
Cost of query execution
Workload management
Vendor supported distributions of Hadoop/Hadoop Appliances
ETL products available with Hadoop
New Products Should Eventually Decrease TCOD with Hadoop
Cloudera Impala, IBM BigSQL, Teradata SQL-H, EMC Pivotal
New version of Hive supports subset of SQL
Further analysis, evaluation and measurement is required
©2010 Winter Corporation. All Rights Reserved.
12W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S
©2010 Winter Corporation. All Rights Reserved.
In Conclusion
1. TCOD estimates what your company will really spend to get to your business goal.
2. Total cost is extremely sensitive to technology choice
3. Analytic architectures will require both Hadoop and data warehouse platforms
4. Focus on total cost, not platform cost, in making your choice for a particular application or use.
5. Many analytic processes will use both Hadoop and data warehouse technology – so integration counts!
Questions and comments welcome at [email protected]
© 2010, 2011, 2012 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.