big data use cases in europe - bi...
TRANSCRIPT
![Page 1: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/1.jpg)
1© Cloudera, Inc. All rights reserved.
Marton Balassi | Solutions Architect
| Flink PMC@MartonBalassi | [email protected]
Big Data Use Cases in EuropeExperiences from the field
![Page 2: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/2.jpg)
2© Cloudera, Inc. All rights reserved.
Introduction
• As a Solutions Architect I have worked with 20+ customers in Europe during the last year
• Focused on architecture, but also involved in implementation
• My favorite topics are stream processing and data science
• Let me share some of the uplifting and the challenging lessons learned from colleagues
of mine and my own experience
• Solutions from Telco, Finance, Retail, Gaming, Data Science
• Disclaimer: My view is my own, subjective and inherently partial.
![Page 3: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/3.jpg)
3© Cloudera, Inc. All rights reserved.
Let us do our first Hadoop PoC
What is the most common first Hadoop use case?
![Page 4: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/4.jpg)
4© Cloudera, Inc. All rights reserved.
Data warehouse offloading
• Reproduce an RDBMS-based report
• Easily comparable results
• Ingestion (Sqoop, Flume, Gobblin)
• Storage (HDFS, Kudu, HBase)
• Interactive Query (Impala, Spark
SQL, Hive LLAP, Presto)
• User interface (Hue, Zeppelin)
![Page 5: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/5.jpg)
5© Cloudera, Inc. All rights reserved.
Let us see some more interesting use cases
![Page 6: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/6.jpg)
6© Cloudera, Inc. All rights reserved.
Syslog ingest @ Vodafone UK
• SIEM/Cybersecurity depends on
the input data quality and quantity
• Facilitates fault monitoring, threat
intelligence, incident response, and
litigation
• Data is collected on national level
from TCP, UDP syslog
Tristans Stevens,https://blog.cloudera.com/blog/2016/03/building-benchmarking-and-tuning-syslog-ingest-architecture/
![Page 7: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/7.jpg)
7© Cloudera, Inc. All rights reserved.
Syslog ingest @ Vodafone UK
• Ingestion with Flume, Kafka
• Interactive queries with Impala
• Free-text search with Solr
• Machine Learning with Spark MLLib
Tristans Stevens,https://blog.cloudera.com/blog/2016/03/building-benchmarking-and-tuning-syslog-ingest-architecture/
![Page 8: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/8.jpg)
8© Cloudera, Inc. All rights reserved.
Augmenting the log analytics pipeline
Michael Sun and Jeff Shmain,https://blog.cloudera.com/blog/2017/03/how-to-log-analytics-with-solr-spark-opentsdb-and-grafana/
![Page 9: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/9.jpg)
9© Cloudera, Inc. All rights reserved.
Augmenting the log analytics pipeline
Michael Sun and Jeff Shmain,https://blog.cloudera.com/blog/2017/03/how-to-log-analytics-with-solr-spark-opentsdb-and-grafana/
Error tracking
(Solr/Hue)
Custom monitoring
(OpenTSDB/Graphana)
![Page 10: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/10.jpg)
10© Cloudera, Inc. All rights reserved.
• Search works on distance of features
• The canonical example is searching words in documents
• Searching dresses by color or shape is also possible (given we can describe a shape)
• Implementation relies on Solr
Search is not solely for text
Base implementation by Mathias Lux, https://github.com/dermotte/liresolr.Use case by Nihed Mbarek.
![Page 11: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/11.jpg)
11© Cloudera, Inc. All rights reserved.
Near real-time transactional analytics system@ Santander• Bank card transactions data
• “Spendlytics” app
• Stored in HBase to serve the
frontend
• Ingested through Flume/Kafka
• Enriched from local RocksDB
instances
James Kinley, Ian Buss, and Rob Siwickihttp://blog.cloudera.com/blog/2015/08/inside-santanders-near-real-time-data-ingest-architecture/
![Page 12: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/12.jpg)
12© Cloudera, Inc. All rights reserved.
Near real-time transactional analytics system@ Santander• Bank card transactions data
• “Spendlytics” app
• Stored in Hbase to serve the
frontend
• Ingested through Flume/Kafka
• Enriched from local RocksDB
instances
James Kinley, Ian Buss, and Rob Siwickihttp://blog.cloudera.com/blog/2015/08/inside-santanders-near-real-time-data-ingest-architecture/
![Page 13: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/13.jpg)
13© Cloudera, Inc. All rights reserved.
Scalable Real-Time Analytics Platform @ King.com
• Low latency Gaming analytics
• Analysts write Groovy scripts
• Deployed in Apache Flink
• 30 billion events/day
• RocksDB state in TB scale
• State is queryable from the outside
Gyula Fora, Mattias Anderssonhttps://data-artisans.com/blog/rbea-scalable-real-time-analytics-at-king
![Page 14: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/14.jpg)
14© Cloudera, Inc. All rights reserved.
Scalable Real-Time Analytics Platform @ King.com
• Low latency Gaming analytics
• Analysts write Groovy scripts
• Deployed in Apache Flink
• 30 billion events/day
• RocksDB state in TB scale
• State is queryable from the outside
Gyula Fora, Mattias Anderssonhttps://data-artisans.com/blog/rbea-scalable-real-time-analytics-at-king
![Page 15: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/15.jpg)
15© Cloudera, Inc. All rights reserved.
A new breed of Data Science libraries
• Hail is a Genomics library
• Implemented in Python, on Spark
• Genome sequencing is feasible,
today we are facing thousands of
sequences
• Easy access to distributed
computing is key
Tom White, Jonathan Keebler https://blog.cloudera.com/blog/2017/05/hail-scalable-genomics-analysis-with-spark/
![Page 16: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/16.jpg)
16© Cloudera, Inc. All rights reserved.
Data Science environments
• Notebook environments (Jupyter,
Zeppelin)
• Great for story telling
• Pain points:
• Collaboration
• Multi-tenancy
• Security
• New solutions are emerging…Tristan Zajonchttps://blog.cloudera.com/blog/2017/05/getting-started-with-cloudera-data-science-workbench/
![Page 17: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/17.jpg)
17© Cloudera, Inc. All rights reserved.
We have some gotchas too…
![Page 18: Big Data Use Cases in Europe - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/balassi_marton...Introduction •As a Solutions Architect I have worked with 20+ customers in](https://reader034.vdocuments.site/reader034/viewer/2022042219/5ec5461a470744059118784b/html5/thumbnails/18.jpg)
18© Cloudera, Inc. All rights reserved.
Be mindful of…
• Educating your team
• Security
• Authentication
• Authorization
• Encryption
• Auditing, lineage
• Workflow management