observe€¦ · •substantial data volumes (e.g. 500gb-1tb/day ingest is common) •long retention...

27

Upload: others

Post on 21-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total
Page 2: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

ObserveFOUNDED : NOV 2017 SAN MATEO, CAFOUNDERS : ARISTA,FACEBOOK,SNOWFLAKE,SPLUNK,WAVEFRONTFUNDING : $15M SERIES A – SUTTER HILL VENTURESTEAM SIZE : 15VISION : TURN THE WORLDS MACHINE DATA INTO INFORMATIONMISSION : BECOME THE MARKET LEADER IN OBSERVABILITY

Page 3: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

WE NOW LIVE IN A DIGITAL ECONOMY.

Page 4: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

IT PROBLEMS ARE BUSINESS PROBLEMS.DIGITAL BUSINESSES LOSE ON AVERAGE $100,000 PER HOUR.

Page 5: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

DIAGNOSING PROBLEMS IS A NIGHTMAREDEVOPS ENGINEER IS #2 ON THE LIST OF ‘BEST JOBS IN AMERICA’

150,000+ JOB VACANCIES AT THE END OF 2018.

Page 6: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

TOO MANY TOOLS. TOO MANY SILOS.WITHOUT A SUPERHERO YOU CAN’T SEE THE BIG PICTURE.

Page 7: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

ITS EXPENSIVE.EXPENSIVE TO BUY. EXPENSIVE TO MAINTAIN.

Page 8: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

bserveO Introducing

Page 9: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

DON’T MONITOR. OBSERVE.Mon i tor ingIs The Application Running ?

Ob serv a bil ityWhy Is The Application Running This Way ?

Page 10: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

ITS ALL ABOUT THE CLUE.TIME TO INVESTIGATE IS THE BIGGEST VARIABLE.

Page 11: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

DATA PLATFORM

OBSERVE – KEY COMPONENTS.

INGEST LOGS, METRICS, TRACES… AND ANYTHING ELSE!

OBSERVABILITY TOOLSUNDERSTAND & INVESTIGATE SYSTEM BEHAVIOR

Page 12: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

DATA PLATFORM – KEY CONCEPTS.

OBSERVABILITY TOOLSUNDERSTAND & INVESTIGATE SYSTEM BEHAVIOR

OBSERVATION FIRE HOSEALL EVENTS THAT HAPPENED (LOGS, METRICS & TRACES).

EVENT STREAMSIMPORTANT EVENTS THAT HAPPENED (CPU METRICS, CONTAINER LOGS…)

THINGS YOU WANT TO ASK QUESTIONS ABOUT (SERVERS, PODS…)RESOURCES

DATA

PLAT

FORM

Page 13: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

OBSERVABILITY TOOLSUNDERSTAND & INVESTIGATE SYSTEM BEHAVIOR

10X FASTER ‘MEAN TIME TO CLUE’.

DATA PLATFORMINGEST LOGS, METRICS, TRACES… AND ANYTHING ELSE!

OBSE

RVAB

ILITY

SEARCH ANALYTICS DASHBOARDS ALERTSTIME MACHINETOOL

S

DATA PLATFORMINGEST LOGS, METRICS, TRACES… AND ANYTHING ELSE!

Page 14: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

OUR BUSINESS MODEL IS BASED ON ANALYZING DATA.NOT INGESTING DATA.

10X LOWER COST.

Page 15: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

$20K

$30K

$10K

$0K

$81,000

$3000

PRIC

E PE

R YE

AR50

GB

/ DAY

10X LOWER COST.

$29,000

$400

Page 16: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total
Page 17: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

Data Management for Observability

• Most vendors build proprietary data stores for their primary use-case:• Time-series databases (e.g. Datadog, Wavefront)• Full-text search engines (e.g. Splunk, Elastic)

• Tend to be highly optimized for specific access patterns• Reading a range of a time-series (e.g. CPU on www1 for last 24 hours)• Needle in a haystack (e.g. “NullPointerException”)

Page 18: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

Snowflake for Observability

• Principal differentiator of Observe is navigation and correlation of previously siloed data

• Most queries are just generic relational queries!• “Count the number of NullPointerExceptions over last 24 hours,

group by affected customer,and show me the top 10”

• Starting from first principles:• Want a massively scalable, columnar, scan-based OLAP database• Corroboration: Honeycomb and Scalyr both settled on this requirement too

Page 19: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

Build or Buy?

• We chose BUY. Focus our limited resources on customer value.

(circa Mar 2018)

Page 20: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

Properties of Observability Use-Case

• Substantial data volumes (e.g. 500GB-1TB/day ingest is common)• Long retention required in some cases (e.g. 1yr for PCI compliance)• Assume 100-1000s TB total storage per typical customer

• Recent data accessed way more often than historical (last 1h-24h)• Machine data is a mess!!• Unstructured logs, unresolved entities, sloppy formats, etc.• Too much effort to structure up-front: “ELT”

• Troubleshooting use-case is highly ad-hoc and interactive• We target <2s time to first result for most interactions

Page 21: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

How we leverage Snowflake today

• Snowflake is embedded, transparent to the customer• Our Snowflake account, one database per customer

• Separation of storage and compute is crucial to our business model• Cheap ingest, usage-based pricing

• Multi-tenant: Shared pool of warehouses for high utilization• Tables generally clustered by TIME• Enables effective partition pruning for recent data (e.g. last 24 hours)• Conveniently, Observability data naturally arrives clustered by time

• Modeled datasets stored in distinct tables• e.g. Logs and metrics in different tables, less data to scan

Page 22: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

How we leverage Snowflake today

• Heavy use of VARIANT type and support for semi-structured data• Lots of machine data arrives as JSON records (e.g. AWS CloudTrail Logs)

• Joins, joins, and more joins• Fundamentally, how we correlate disparate data sources

• Window analytic functions for temporal aggregation• How we materialize “Resources” from Event data

• UDFs for log parsing, string processing

• Future use: Data sharing for data import/export

Page 23: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total
Page 24: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

Closing Thoughts

My eBay review of Snowflake:“A+++++++++ prompt shipping, product as described, would buy again”

• Snowflake has been exceptionally versatile for building a product for the Observability use-case• Buy (vs. Build) decision has enabled us to focus on delivering an end-

to-end solution rather than reinventing the wheel at every step

Page 25: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

Looking for Development Partners!

• If you are an Elastic/Sumo Logic/Splunk/Datadog user interested in migrating to Snowflake, come talk to us!

[email protected]

Page 26: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

THANK YOU

Page 27: Observe€¦ · •Substantial data volumes (e.g. 500GB-1TB/day ingest is common) •Long retention required in some cases (e.g. 1yr for PCI compliance) •Assume 100-1000s TB total

Observe