strata conference nyc 2013
DESCRIPTION
TRANSCRIPT
Taewook Eom Data Infrastructure Team SK planet 2014-01-28
Taewook Eom
http://www.flickr.com/photos/oreillyconf/10616622085/
Data Programmer Plaster(Planet Master) of Big Data Infra Pre-Assessor of Hiring Programmers Mentor of 101 Startup Korea
Twitter: @taewooke LinkedIn: http://kr.linkedin.com/in/taewookeom
http://strataconf.com/
by O’Reilly
Web 2.0 : Open, Sharing, Participation
Santa Clara : Technical
New York with Cloudera : Financial, Business
Europe : Privacy, Government
Boston : Medical
Big Data : Making Data Work Change the World with Data.
Data
When hardware became commoditized, software was valuable. Now software being commoditized, data is valuable.
– Tim O’Reilly, 2011
Data is like the blood of the enterprise.
– Amr Awadallah, CTO at Cloudera, 2013
Big Data Architectural Patterns http://strataconf.com/stratany2013/public/schedule/detail/30397
What is Big Data?
All data that is not a fit for a traditional RDBMS, whether used for OLTP or Analytics purposes
http://blog.vitria.com/Portals/47881/images/3values-resized-600.png
Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data - Gartner, 2011
http://im
age-s
tore
.slid
esh
are
cdn.com
/ae63030a-3
d9b-1
1e3-9
cff-
22000a970267-o
rigin
al.j
pg
Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS http://strataconf.com/stratany2013/public/schedule/detail/29968
Data Science
http://en.wikipedia.org/wiki/File:DataScienceDisciplines.png http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Big Data
http://mappingignorance.org/fx/media/2013/07/Figura-11.jpg
Open Mind!
Big Data
Gartner's 2013 Hype Cycle for Emerging Technologies (2013-08-19)
more than half of technical sessions are presented by Chinese or Indian
39 of 125 sessions are sponsored sessions
Big Data: 4 Approaches
Search-based Hadoop-based
RDB-based NoSQL
Real-time Processing
Real-time Recommendations for Retail: Architecture, Algorithms, and Design http://strataconf.com/stratany2013/public/schedule/detail/30217
Real-time Stream Processing
Apache Storm
Streaming
Apache Kafka Gathering
Processing
Querying Search-based
NoSQL
Stringer/Tez Shark SQL
… not yet Graph Processing
Big Data Space
No one tools is the right fit for all Big Data problem Do not be afraid to recommend the right solution for the problem over the popular solution To do this, you must be aware of the entire ecosystem
Big Data Architectural Patterns http://strataconf.com/stratany2013/public/schedule/detail/30397
Practical Performance Analysis and Tuning for Cloudera Impala http://strataconf.com/stratany2013/public/schedule/detail/30551
Hadoop and the Relational Data Warehouse – When to Use Which? http://strataconf.com/stratany2013/public/schedule/detail/30964
Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS http://strataconf.com/stratany2013/public/schedule/detail/29968
Ignite
Signal Detection Theory: Man vs Machine
Co-Founder @VividCortex Kyle Redinger
http://www.youtube.com/watch?v=Fg6mN-jevds
(5 minutes 6 seconds)
http://www.slideshare.net/realkyleredinger/man-vs-machine-signal-detection-theory-and-big-data
Signal Detection Theory: Man vs Machine
Remove the obvious and look at what is important Remember: Less is more.
Towards Strata 2014
Director of market research at O’Reilly Media Roger Magoulas
http://www.youtube.com/watch?v=Ytd5VkEgQf8
(5 minutes 26 seconds)
http://strataconf.com/stratany2013/public/schedule/detail/31935
Keynote
http://www.oreilly.com/data/free/files/stratasurvey.pdf
Towards Strata 2014
Towards Strata 2014
Towards Strata 2014
Towards Strata 2014
Beyond R and Ph.D.s: The Mythology of Data Science Debunked Douglas Merrill (ZestFinance)
http://www.youtube.com/watch?v=J2sgObXbIWY (8 minutes 9 seconds)
Science is fundamentally about data, but data is not fundamentally about science
People
A data scientist is a data analyst who lives in California. – George Roumeliotis, (Intuit)
http://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html
http://cdn.oreillystatic.com/oreilly/radarreport/0636920029014/Analyzing_the_Analyzers.pdf
Data Businessperson: Business person, Leader, Entrepreneur Data Creative: Artist, Jack-of-All-Trades, Hacker Data Researcher: Scientist, Researcher, Statistician Data Engineer: Engineer, Developer http://datacommunitydc.org/blog/2012/08/data-scientists-survey-results-teaser/
Scientists think they can code, software engineers think they are scientists. Team them up so they collaborate.
– Scott Sorenson (Ancestry.com) Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop
How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce http://strataconf.com/stratany2013/public/schedule/detail/30707
Data scientists spend their lives as data janitors instead of leveraging their skills
– Wes McKinney (DataPad) Building More Productive Data Science and Analytics Workflows
Keynote
Is Bigger Really Better? Predictive Analytics
with Fine-grained Behavior Data
Professor at the NYU Stern School of Business Foster Provost
http://www.youtube.com/watch?v=1jzMiAfLH2c
(10 minutes 16 seconds)
http://strataconf.com/stratany2013/public/schedule/detail/31685
Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
Predictive does not mean actionable. – Scott Sorenson (Ancestry.com)
Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop
Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
More data gives you more precision, not more prediction. Using multiple datasets to reduce errors when measuring values.
- Ravi Iyer (Ranker.com) Using Graphs of Data to Understand your Customers, Users, and Employees
Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
Big Impact from Big Data
Head of Analytics at Facebook Ken Rudin
http://www.youtube.com/watch?v=RJFwsZwTBgg
(11 minutes 57 seconds)
http://strataconf.com/stratany2013/public/schedule/detail/31903
Keynote
Big Impact from Big Data
Designing Your Data-Centric Organization Josh Klahr (Pivotal)
http://www.youtube.com/watch?v=D86udfrVzrI (12 minutes)
Hadoop is a hammer, but you need other tools along with it.
Big Impact from Big Data
The way you organize information depends on the question you intend to ask of it.
- Richard Saul Wurman Building a Data Platform
HaDump : Loading data into Hadoop for not reason.
Data Science Without a Scientist http://strataconf.com/stratany2013/public/schedule/detail/31801
Big Impact from Big Data
Technical people still don't understand the business needs of business people! Business people don't know what's a table.
- Anurag Tandon (MicroStrategy) Inject Big Data into your Corporate DNA: Enable Every Employee to Make Data Driven Decisions
Ask the Right Questions Organizations already have people who know their own data better than mystical data scientists. Learning Hadoop is easier than learning the company’s business.
- Gartner, 2012
Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS http://strataconf.com/stratany2013/public/schedule/detail/29968
Non-linear Storytelling: Towards New Methods and Aesthetics for Data Narrative http://strataconf.com/stratany2013/public/schedule/detail/30207
Every Soldier is a Sensor: Countering Corruption in Afghanistan http://strataconf.com/stratany2013/public/schedule/detail/30828
Big Impact from Big Data
Big Impact from Big Data
Big Impact from Big Data
< Actionable Usable < Useful
with Impact If you can't answer for "so what?", you only have facts, not insight
- Baron Schwartz (VividCortex Inc) Making Big Data Small
Descriptive (Easy) What happened?
Predictive (Medium) What will happen?
Prescriptive (Hard) What should we do about it? Hadoop & Data Science for the Enterprise
Value of Data
Big Data is first industry that was created by open source.
- Jack Norris (MapR Technologies) Separating Hadoop Myths from Reality
The Future of Hadoop : What Happened
& What's Possible?
Co-Founder of Hadoop Doug Cutting
http://www.youtube.com/watch?v=_WwuZI6AhN8
(14 minutes 41 seconds) http://strataconf.com/stratany2013/public/
schedule/detail/31591 Hadoop the kernel of the OS for data.
Hadoop's Impact on the Future of Data Management Mike Olson (Cloudera)
http://www.youtube.com/watch?v=puHS2JNKgRM http://strataconf.com/stratany2013/public/schedule/detail/31380
Single : S/W & H/W system : security model : management model : metadata model : audit model : resource management model
Common : storage & schema
http://www.slideshare.net/cloudera/enterprise-data-hub-the-next-big-thing-in-big-data
Last generation of data management is not sufficient More copies, representations, transformations increase risk Index once and reuse across workloads, lifecycle NoSQL: indexing and updates for interactive apps Hadoop: staging, persistence, and analytics
Data Governance for Regulated Industries Using Hadoop http://strataconf.com/stratany2013/public/schedule/detail/30738
Rethink How You See Data Sharmila Shahani-Mulligan (ClearStory Data)
http://www.youtube.com/watch?v=07hGulTOZGk (9 minutes 6 seconds) http://strataconf.com/stratany2013/public/schedule/detail/31742
Data Intelligence
?
Question Analysis & Discovery
Access Sampling Modeling Presentation
The Data Availability Problem
Insight
Data Prep – too slow!
Loading
Introducing a New Way to Interact with Insight http://strataconf.com/stratany2013/public/schedule/detail/31743
Information Supply Chain
Running Non-MapReduce Big Data applications on Apache Hadoop http://strataconf.com/stratany2013/public/schedule/detail/30755
What’s Next for Apache HBase: Multi-tenancy, Predictability, and Extensions. http://strataconf.com/stratany2013/public/schedule/detail/30857
Apache HBase for Architects http://strataconf.com/stratany2013/public/schedule/detail/30619
Securing the Apache Hadoop Ecosystem http://strataconf.com/stratany2013/public/schedule/detail/30302
An Introduction to the Berkeley Data Analytics Stack With Spark, Spark Streaming, Shark, Tachyon, and BlinkDB http://strataconf.com/stratany2013/public/schedule/detail/30959
Schema
Information does not exist until a schema is defined and data is stored in a relational database
- anonymous
Building a Data Platform http://strataconf.com/stratany2013/public/schedule/detail/31400
Lessons Learned From A Decade’s Worth of Big Data At The U.S. National Security Agency (NSA) http://strataconf.com/stratany2013/public/schedule/detail/30913
Managing a Rapidly Evolving Analytics Pipeline http://strataconf.com/stratany2013/public/schedule/detail/30635
SQL on/in Hadoop/Hbase Solutions
Stringer/Tez Shark
Perception is Key: Telescopes, Microscopes and Data http://strataconf.com/strataeu2013/public/schedule/detail/32351
All SQL on Hadoop Solutions are Missing the Point of Hadoop
Every Solution makes you define a schema - SQL(Structured Query Language) is expressed over an assumed schema
Major reasons why Hadoop has taken of include: - Ability to load data without defining a schema - Process data using schema-on-read instead of first defining a schema
Hadoop contains a lot of: - Raw, granular data sets with potentially inconsistent schemas - Data sets in JSON, key-value, and other self-describing (non-relational) models designed for schema-on-read processing
SQL on Hadoop solutions that make you first define a schema are missing a major part of Hadoop’s usage patterns
Flexible Schema and the End of ETL http://strataconf.com/stratany2013/public/schedule/detail/31868
Lessons Learned
Hadoop Adventures At Spotify http://strataconf.com/stratany2013/public/schedule/detail/30570
Hadoop Adventures At Spotify http://strataconf.com/stratany2013/public/schedule/detail/30570
Prototyping is key to overcoming resistance to change Technical architecture is heavily influenced by people organization Developing a team of experienced Hadoop users can often be done using internal employees A culture of experimentation and innovation yields the best result
Quick prototyping is the fastest way to internal advocacy. Ship It! Cloud == Speed We don’t always need a complicated solution. KISS Play to your differentiating strengths. Experience >> Data Bias towards impact. It Takes a Village EASE!! (Emulate, Analyze, Scale, Evaluate)
Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop http://strataconf.com/stratany2013/public/schedule/detail/30499
How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce http://strataconf.com/stratany2013/public/schedule/detail/30707
Questions? SELECT questions FROM audience;
Strata Conference + Hadoop World 2013 Keynotes & Interviews http://www.youtube.com/playlist?list=PL055Epbe6d5ZtziVAooUC04i1hL_Z9Xvk
Slides & Video http://strataconf.com/stratany2013/public/schedule/proceedings
Tweets https://twitter.com/search?q=%23strataconf #strataconf
References