tcod a framework for the total cost of big data - december 6 2013 - winter corp - v17

12
W I N T E R C O R P T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S TCOD: A Framework for the Total Cost of Big Data (key charts) Richard Winter WinterCorp December 6, 2013, V17 Research Report: wintercorp.com/ tcod-report Spreadsheet: wintercorp.com/ tcod-spreadsheet Key Charts: wintercorp .com/tcod-charts

Upload: wintercorp

Post on 04-Jul-2015

178 views

Category:

Technology


1 download

DESCRIPTION

Big Data: What Does it Really Cost? The WinterCorp Real Cost of Big Data research compares the total cost of an analytic data solution on Hadoop and on a data warehouse. Learn about: - The major cost components of an analytic big data project and how they are estimated in the total cost of data (TCOD) framework - Why it is critical to consider total project cost, not just platform cost - How the costs differ on a Hadoop platform and a data warehouse platform - An example where the Hadoop platform is more cost effective - An example where the data warehouse platform is more cost effective Why you need both Hadoop and data warehouse platforms in your analytic data architecture. Key charts are posted here. Full report is available at www.wintercorp.com/tcod-report

TRANSCRIPT

Page 1: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

W I N T E R C O R P

T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

TCOD: A Framework for the

Total Cost of Big Data

(key charts)

Richard WinterWinterCorp

December 6, 2013, V17

Research Report: wintercorp.com/tcod-report

Spreadsheet: wintercorp.com/tcod-spreadsheet

Key Charts: wintercorp.com/tcod-charts

Page 2: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

©2010 Winter Corporation. All Rights Reserved.

2W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Total Cost of Data

(TCOD)

TCOD is the cost of storing, managing and using data over time for analytic purposes

* ETL is extract, transform and load (preparing data for analytic use)

© 2010, 2011, 2012 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.

System Admin

ET

L*

Ap

ps

Qu

eries

An

alytics

Software Development/Maintenance

Diagram not to scale.

Page 3: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

©2010 Winter Corporation. All Rights Reserved.

3W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Data Refining Example Data from Turbines

© 2010, 2011, 2012 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.

Page 4: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

©2010 Winter Corporation. All Rights Reserved.

4W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Data Refining ExampleData Management Requirements

1. Hundreds of TB of data per week – 500 TB data

capacity

2. Raw data life: few hours to a few days

3. Challenge: find the important events or trends quickly

4. Massive analysis problem

5. When analyzing, read entire files

6. Keep only the significant data

© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.

Page 5: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

©2010 Winter Corporation. All Rights Reserved.

5W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Cost ComparisonEngineering Example – Data Refining

© 2010, 2011, 2012 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.

On Hadoop

$9.3m(not to scale)

On Data Warehouse Appliance*

$30m

Data WarehouseAppliance

Hadoop

Volume of Data 500 TB 500 TB

System Cost $23 million $1.3 million

Total Cost of Data $30 million $9.3 million

* Performance class of DW Appliance – not the lowest price class

Page 6: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

©2010 Winter Corporation. All Rights Reserved.

6W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Observations on Hadoop

1. Many examples of the data refining requirement in

engineering, operations, business, science, healthcare

2. Cost equation is favorable to Hadoop in these

applications even with a wide variety of data types

3. There are also many other excellent Hadoop use cases

– Data landing zone

– Archive

– Intensive batch processing of data

4. Example is one illustration of Hadoop sweet spot

© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.

Page 7: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

©2010 Winter Corporation. All Rights Reserved.

7W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Business ExampleEnterprise Data Warehouse

© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.

Page 8: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

©2010 Winter Corporation. All Rights Reserved.

8W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Business Example - EDWData Management Requirements

1. Data volume

a. 500 TB to start – all retained for at least five years

b. Continual growth of data and workload

2. Data sources: thousands

a. Data sources change their feeds frequently

b. New data sources are frequent

3. Challenges

a. Data must be correct

b. Data must be integrated

4. Typical enterprise data lifetime: decades

5. Analytic application lifetime: years

6. Many thousands of data users (104 – 106)

7. Hundreds of analytic applications

8. Thousands of one time analyses

9. Tens of thousands of complex queries

© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.

Page 9: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

©2010 Winter Corporation. All Rights Reserved.

9W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Cost ComparisonBusiness Example – EDW

© 2010, 2011, 2012 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.

Total System Cost

System and Data Admin

ETL

Application Development

Complex Queries

Analysis

Data WarehousePlatform

Hadoop

Volume of Data 500 TB

System Cost $45 million $1.4 million

Total Cost of Data $265 million $740 million

On EDW Platform

$265 million On Hadoop

$740 million(not to scale)

Page 10: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

©2010 Winter Corporation. All Rights Reserved.

10W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

Conclusions – Two TCOD Examples

1. TCOD is NOT platform cost

2. Each technology has large advantages in its sweet spot(s)

3. Neither platform is cost effective in the other’s sweet spot

4. Biggest differences for the data warehouse are the development of:

Complex queries Analytics

Data Refining: Hadoop winsAlso: Landing Zone, Archive

EDW: Data W/H Platform Wins

© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.

$0

$5

$10

$15

$20

$25

$30

$35

On Hadoop On DataWarehouse

Mill

ion

s

$0

$100

$200

$300

$400

$500

$600

$700

$800

On Hadoop On Data Warehouse

Mill

ion

s

Total System Cost

System and Data Admin

Application Development

ETL

Complex Queries

Analysis

Page 11: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

©2010 Winter Corporation. All Rights Reserved.

11W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

TCOD FrameworkAdditional Notes

Not taken into account

Actual system workloads, concurrency, availability reqmts.

Cost of preparing simple queries

Cost of query execution

Workload management

Vendor supported distributions of Hadoop/Hadoop Appliances

ETL products available with Hadoop

New Products Should Eventually Decrease TCOD with Hadoop

Cloudera Impala, IBM BigSQL, Teradata SQL-H, EMC Pivotal

New version of Hive supports subset of SQL

Further analysis, evaluation and measurement is required

Page 12: Tcod   a framework for the total cost of big data  - december 6 2013  - winter corp - v17

©2010 Winter Corporation. All Rights Reserved.

12W I N T E R C O R P : T H E L A R G E S C A L E D A T A M A N A G E M E N T E X P E R T S

©2010 Winter Corporation. All Rights Reserved.

In Conclusion

1. TCOD estimates what your company will really spend to get to your business goal.

2. Total cost is extremely sensitive to technology choice

3. Analytic architectures will require both Hadoop and data warehouse platforms

4. Focus on total cost, not platform cost, in making your choice for a particular application or use.

5. Many analytic processes will use both Hadoop and data warehouse technology – so integration counts!

Questions and comments welcome at [email protected]

© 2010, 2011, 2012 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.© 2012, 2013 WINTER CORPORATION, CAMBRIDGE MA. ALL RIGHTS RESERVED.