the economics of sql on hadoop
DESCRIPTION
Watch the recorded event at: http://info.datameer.com/Slideshare- Economics-SQL-Hadoop.html As organizations clamor to utilize their new investments in Hadoop ecosystems AND leverage their existing analytical infrastructures, many rush to integrate SQL as a data access layer to leverage existing skill sets and get started faster. However, this approach relegates Hadoop to a data management and processing platform rather than the storage and compute engine optimized for analytical workloads it was purpose-built to be. These slides by EMA and Datameer, will discuss the technical limitations of SQL on Hadoop and propose alternative ways to fully maximize Hadoop investments. You will understanding: *how SQL negates the inherent benefits of Hadoop *why technological paradigm changes can sometimes be good *use cases when SQL on Hadoop makes senseTRANSCRIPT
![Page 1: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/1.jpg)
© 2013 Datameer, Inc. All rights reserved.
The Economics of SQL on Hadoop
![Page 2: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/2.jpg)
Watch the Recording of this Webinar
View the entire recorded webinar at:
http://info.datameer.com/Slideshare-Economics-SQL-Hadoop.html
![Page 3: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/3.jpg)
About our Speakers
John Myers !John Myers joined Enterprise Management Associates in 2011 as senior analyst of the business intelligence (BI) practice area. John has 10+ years of experience working in areas related to business analytics in professional services consulting and product development roles, as well as helping organizations solve their business analytics problems, whether they relate to operational platforms, such as customer care or billing, or applied analytical applications, such as revenue assurance or fraud management. !
Slide 3 © 2013 Datameer, Inc. All rights reserved.
![Page 4: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/4.jpg)
About our Speakers Stefan Groschupf!!▪ Stefan Groschupf is the co-founder and CEO of
Datameer. He is one of the original contributors to Nutch, the open source predecessor of Hadoop, Stefan has been at the forefront of the Hadoop and Big Data market.�Prior to Datameer, Stefan was the co-founder and CEO of Scale Unlimited, which implemented custom Hadoop analytic solutions for HP, Sun, Deutsche Telekom, Nokia and others. Earlier, Stefan was CEO of 101Tec, a supplier of Hadoop and Nutch-based search and text classification software to industry-leading companies such as Apple, DHL and EMI Music. Stefan has also served as CTO at multiple companies, including Sproose, a social search engine company.
Slide 4 © 2013 Datameer, Inc. All rights reserved.
![Page 5: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/5.jpg)
About our Speakers
Matt Schumpert!!Matt has been working in enterprise software of over 10 years in various capacities, including sales engineering, strategic alliances and consulting. !!Matt currently runs the pre-sales engineering team at Datameer, supporting all technical aspects of customer engagement through roll-out of customers into production. ! !Matt holds a BS in Computer Science from the University of Virginia.!
Slide 5 © 2013 Datameer, Inc. All rights reserved.
![Page 6: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/6.jpg)
Agenda ▪ EMA on Current State of the Big Data Industry!
– Online Archiving in Practice!– SQL on NoSQL: Metadata!– Exploratory Use Cases!– Late Binding Schemas better for Discovery!– Economics of Hadoop!
▪ Datameer on how to solve these problems!– Use Case #1: Semi-Structured Data !– Use Case #2: Text Analytics data!– Use Case #3: Path Analysis!
▪ Takeaways; and Question and Answer!
Slide 6 © 2013 Datameer, Inc. All rights reserved.
![Page 7: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/7.jpg)
© 2013 Datameer, Inc. All rights reserved.
State of Big Data Industry
![Page 8: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/8.jpg)
Online Archiving is the majority use case for Big Data projects
© 2013Enterprise Management Associates, Inc. Slide 8
![Page 9: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/9.jpg)
Moving Beyond select * from tablename SQL requires a managed set of metadata
© 2013Enterprise Management Associates, Inc. Slide 9
![Page 10: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/10.jpg)
Big Data Platforms have Multiple Uses: Discovery is a significant portion
© 2013Enterprise Management Associates, Inc. Slide 10
![Page 11: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/11.jpg)
Late Binding Schemas are good for Discovery
© 2013Enterprise Management Associates, Inc. Slide 11
![Page 12: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/12.jpg)
Free as a Free puppy…
Slide 12 © 2013 Enterprise Management Associates, Inc.
![Page 13: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/13.jpg)
© 2013 Datameer, Inc. All rights reserved.
Datameer Demos
![Page 14: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/14.jpg)
Use Case #1: Semi-Structured Data ▪ Noisy, log-structured data à signal
Slide 14 © 2013 Datameer, Inc. All rights reserved.
![Page 15: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/15.jpg)
Use Case #1: Semi-Structured Data ▪ Noisy, log-structured data à signal ▪ Extract, cast, & define fields on demand
Slide 15 © 2013 Datameer, Inc. All rights reserved.
![Page 16: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/16.jpg)
Use Case #1: Semi-Structured Data ▪ Noisy, log-structured data à signal ▪ Extract, cast, & define fields on demand ▪ Painful/impossible without inspection
Slide 16 © 2013 Datameer, Inc. All rights reserved.
![Page 17: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/17.jpg)
Use Case #1: Semi-Structured Data ▪ Noisy, log-structured data à signal ▪ Extract, cast, & define fields on demand ▪ Painful/impossible without inspection ▪ “One-offs” are possible with SQL+UDFs ▪ But better to collaborate with shared “views”
Slide 17 © 2013 Datameer, Inc. All rights reserved.
![Page 18: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/18.jpg)
Use Case #1: Semi-Structured Data ▪ Noisy, log-structured data à signal ▪ Extract, cast, & define fields on demand ▪ Painful/impossible without inspection ▪ “One-offs” are possible with SQL+UDFs ▪ But better to collaborate with shared “views”
▪ Examples: ▪ “User-agent” string ▪ URL Parameters ▪ JSON
Slide 18 © 2013 Datameer, Inc. All rights reserved.
![Page 19: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/19.jpg)
Use Case #2: Text Analytics ▪ Few/no known fields
Slide 19 © 2013 Datameer, Inc. All rights reserved.
![Page 20: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/20.jpg)
Use Case #2: Text Analytics ▪ Few/no known fields ▪ Notion of a record is nebulous / fluid
Slide 20 © 2013 Datameer, Inc. All rights reserved.
![Page 21: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/21.jpg)
Use Case #2: Text Analytics ▪ Few/no known fields ▪ Notion of a record is nebulous / fluid ▪ Wrangling and mining
Slide 21 © 2013 Datameer, Inc. All rights reserved.
![Page 22: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/22.jpg)
Use Case #2: Text Analytics ▪ Few/no known fields ▪ Notion of a record is nebulous / fluid ▪ Wrangling and mining ▪ “Bag-of-Words” is a sensible start
Slide 22 © 2013 Datameer, Inc. All rights reserved.
![Page 23: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/23.jpg)
Use Case #2: Text Analytics ▪ Few/no known fields ▪ Notion of a record is nebulous / fluid ▪ Wrangling and mining ▪ “Bag-of-Words” is a sensible start ▪ Again, frequent inspection is key
Slide 23 © 2013 Datameer, Inc. All rights reserved.
![Page 24: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/24.jpg)
Use Case #3: Path Analysis ▪ Key component of clickstream analysis
Slide 24 © 2013 Datameer, Inc. All rights reserved.
![Page 25: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/25.jpg)
Use Case #3: Path Analysis ▪ Key component of clickstream analysis ▪ Compares each record to the next/previous
Slide 25 © 2013 Datameer, Inc. All rights reserved.
![Page 26: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/26.jpg)
Use Case #3: Path Analysis ▪ Key component of clickstream analysis ▪ Compares each record to the next/previous ▪ Defines/summarizes transitions, not events
Slide 26 © 2013 Datameer, Inc. All rights reserved.
![Page 27: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/27.jpg)
Use Case #3: Path Analysis ▪ Key component of clickstream analysis ▪ Compares each record to the next/previous ▪ Defines/summarizes transitions, not events ▪ Supported by list/array types
Slide 27 © 2013 Datameer, Inc. All rights reserved.
![Page 28: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/28.jpg)
Use Case #3: Path Analysis ▪ Key component of clickstream analysis ▪ Compares each record to the next/previous ▪ Defines/summarizes transitions, not events ▪ Supported by list/array types ▪ Requires multi-pass queries
Slide 28 © 2013 Datameer, Inc. All rights reserved.
![Page 29: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/29.jpg)
© 2013 Datameer, Inc. All rights reserved.
Takeaways
![Page 30: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/30.jpg)
When NOT to use SQL on Hadoop ▪ Structured Schemas
or “Schema on Write”
Slide 30 © 2013 Datameer, Inc. All rights reserved.
![Page 31: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/31.jpg)
When NOT to use SQL on Hadoop ▪ Structured Schemas
or “Schema on Write” ▪ “Realtime” Query
SLAs for operational or reporting tasks
Slide 31 © 2013 Datameer, Inc. All rights reserved.
![Page 32: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/32.jpg)
When NOT to use SQL on Hadoop ▪ Structured Schemas
or “Schema on Write” ▪ “Realtime” Query
SLAs for operational or reporting tasks
▪ Highly detailed SQL query requirements (SQL-2003)
Slide 32 © 2013 Datameer, Inc. All rights reserved.
![Page 33: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/33.jpg)
When to use SQL on Hadoop ▪ Unstructured
Datasets and “Schema on Read”
Slide 33 © 2013 Datameer, Inc. All rights reserved.
![Page 34: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/34.jpg)
When to use SQL on Hadoop ▪ Unstructured
Datasets and “Schema on Read”
▪ Discovery tasks designed to find new connections and new business value
Slide 34 © 2013 Datameer, Inc. All rights reserved.
![Page 35: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/35.jpg)
When to use SQL on Hadoop ▪ Unstructured
Datasets and “Schema on Read”
▪ Discovery tasks designed to find new connections and new business value
▪ Lower level SQL queries (SQL-99)
Slide 35 © 2013 Datameer, Inc. All rights reserved.
![Page 36: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/36.jpg)
Summary ▪ EMA on Current State of the Big Data Industry
– Online Archiving in Practice – SQL on NoSQL: Metadata – Exploratory Use Cases – Late Binding Schemas better for Discovery
▪ Datameer on how to solve these problems – Use Case #1: Semi-Structured Data – Use Case #2: Text Analytics – Use Case #3: Path Analysis
Slide 36 © 2013 Datameer, Inc. All rights reserved.
![Page 37: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/37.jpg)
Call To Action ■ Visit our website
– www.datameer.com
■ Download our Trial – http://www.datameer.com/Datameer-trial.html
Slide 37 © 2013 Datameer, Inc. All rights reserved.
![Page 38: The Economics of SQL on Hadoop](https://reader034.vdocuments.site/reader034/viewer/2022042614/5555c4f4d8b42afe5d8b547c/html5/thumbnails/38.jpg)