real time business intelligence with cassandra, kafka and hadoop - a real story... (alexandra...
TRANSCRIPT
![Page 1: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/1.jpg)
Dominique Rondé (@talk2nerd)Alexandra Klimova (@aklimova)
Real Time Business Intelligence with Cassandra, Kafka and HadoopA real story @ Allianz Deutschland AG
![Page 2: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/2.jpg)
© Copyright Allianz
Dominique Rondé Big Data Pilot
Dipl. Wirt.-Inf. (FH)
128479 hrs with Java
40831 hrs with Big Data
14047 hrs Certified Datastax
Cassandra Solution Architect
Twitter: @Talk2Nerd
Alexandra Klimova Big Data Pilotesse
M.Sc. Informatik 75895 hrs with Big Data
40831 hrs with Hadoop
14047 hrs Certified Datastax Cassandra Solution Architect
Twitter: @Aklimova
![Page 3: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/3.jpg)
© Copyright Allianz
We don‘t have an agenda-
We have some checklists!
3. Mai 2023 3
Agenda
![Page 4: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/4.jpg)
© Copyright Allianz
Security
Instructions
3. Mai 2023 4
![Page 5: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/5.jpg)
© Copyright Allianz
Checklist
Before Engine Start
Define the destination
3. Mai 2023 5
![Page 6: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/6.jpg)
© Copyright Allianz 3. Mai 2023 6
Real Time Reporting
• Sold items for the current day• Open tickets during the day• Response Time on consumer
requests• Sold items grouped by type• Current Errors
![Page 7: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/7.jpg)
© Copyright Allianz 3. Mai 2023 7
Fraud Protection
• Prevent „Fake Accounts“• Figure out „data grabber“• Detect fraud pattern
![Page 8: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/8.jpg)
© Copyright Allianz 3. Mai 2023 8
Helping decision makers to understand the market
• Risk Specialists• Product Designers• Marketing Experts
![Page 9: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/9.jpg)
© Copyright Allianz 3. Mai 2023 9
Our destination
TTDReduce the Time – To – Data
![Page 10: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/10.jpg)
© Copyright Allianz
Time to Data is the time which is required until a requester received the data he / she needs to do his / her job.
Time to • find the source of required data• get the needed aggregation• clean up the data• write the statistical scripts• execute and refine these scripts• get a visualized result
3. Mai 2023 10
Definition of TTD
![Page 11: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/11.jpg)
© Copyright Allianz
Checklist
Before Taxi
Check if we know all we need
3. Mai 2023 11
![Page 12: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/12.jpg)
© Copyright Allianz
• Decoupled from all other development workChanges in analytics should not require additional work in all other applications
• Allow fast deploymentsLearn through the data and bring improvements fast into production
• High availableNo Event should get lost after it was fired
• Very accurateMake sure that every Event processed
• Horizontal scalableStart small and grow with the data
3. Mai 2023 12
Define functional requirements
![Page 13: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/13.jpg)
© Copyright Allianz
• Data Privacy
• Data Security
• Data Protection
3. Mai 2023 13
Define legal requirements
![Page 14: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/14.jpg)
© Copyright Allianz
Checklist
Before Take Off
Do the first steps
3. Mai 2023 14
![Page 15: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/15.jpg)
© Copyright Allianz
Picking Measuring points
• Implement servlet filters to stay informed about http headers i.e. error-code, referrer
• Implement interceptors for the or-mapper to store the history of entities
• Instrument the web ui to send events about user interactions i.e. changes between pages
• Instrument the java code to send events with additional data at some points i.e. create a document
![Page 16: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/16.jpg)
© Copyright Allianz
Each transfer object holds at least the
• current sessionId• timestamp when this event occurs• unique identifier of this event• version identifier
In some cases• current authenticated user
3. Mai 2023 16
Create some transfer objects
![Page 17: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/17.jpg)
© Copyright Allianz 3. Mai 2023 17
Find an architecture
WebApplication
Reports
Dashboards
R-Scripts
![Page 18: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/18.jpg)
© Copyright Allianz 3. Mai 2023 18
Design you first CF
Design conceptual
model
Specify access pattern
Choose a logical model
Configure physical model
Write a cql script
![Page 19: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/19.jpg)
© Copyright Allianz
Checklist
During Take-Off
Run everything up
3. Mai 2023 19
![Page 20: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/20.jpg)
© Copyright Allianz
But mention the difference
Start small
Add nodes
Grow up
![Page 21: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/21.jpg)
© Copyright Allianz
Checklist
During Climb Out
Fill your speed-layer
3. Mai 2023 21
![Page 22: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/22.jpg)
© Copyright Allianz
Monitor the Instruments
![Page 23: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/23.jpg)
© Copyright Allianz
Consume
DataStream<String> messageStream =env.addSource(new FlinkKafkaConsumer09<>(parameterTool.getRequired("topicName"), new SimpleStringSchema(), properties));
MapDataStream<Tuple3<String,Date,Double>> clickMessageStream = messageStream.map(new ClickEventMapper());
Aggregate
DataStream<Tuple2<Date,Double,String>> aggregatedClickMessageStream = clickMessageStream.map(new KeyStreamMapper()).keyBy("f1").timeWindow(Time.minutes(2)).apply(new KeyWindowFunktion())
Store
CassandraSink.addSink(clickMessageStream).setQuery("INSERT INTO itemssale_by_product (eventtime, price, product) values (?, ?,?);").setClusterBuilder(new ClusterBuilder() {
public Cluster buildCluster(Cluster.Builder builder) {return builder.addContactPoint(„csn-node1.development.allianz.de").build();}
}).build();
![Page 24: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/24.jpg)
© Copyright Allianz
Use the cassandra connector coming with Apache Flink since v. 1.1.0
<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-cassandra_2.11</artifactId> <version>1.1.1</version></dependency>
Write aggregated data
![Page 25: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/25.jpg)
© Copyright Allianz
@Table(keyspace= "allianz", name = "itemssale_by_product")public class MyCustomSalesEvent implements Serializable {
private static final long serialVersionUID = 1L;
@Column(name = "product")private String product;@Column(name = "eventdate")private Date eventdate;@Column(name = "price")private double price;
//Getter and Setter}
3. Mai 2023 25
Write aggregated data
![Page 26: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/26.jpg)
© Copyright Allianz
DataStream<MyCustomSalesEvent> clickMessageStream = messageStream.map(new ClickEventMapper());
CassandraSink.addSink(clickMessageStream) .setClusterBuilder(new ClusterBuilder() { @Override public Cluster buildCluster(Cluster.Builder builder) { return builder.addContactPoint(„csn-node1.development.allianz.de").build(); } }) .build();
3. Mai 2023 26
Write aggregated data
![Page 27: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/27.jpg)
© Copyright Allianz
Checklist
At 10.000 Feet
Make it safe and fancy
3. Mai 2023 27
![Page 28: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/28.jpg)
© Copyright Allianz 3. Mai 2023 28
Privacy
WebApplication
Reports
Dashboards
R-Scripts
![Page 29: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/29.jpg)
© Copyright Allianz 3. Mai 2023 29
Single gateway to the data
AdHoc Queries
Proof of Thesis
Quick Lookups
PeriodicReports
Web-basedDashboard
3rd PartieReportings
ExpertSystems
![Page 30: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/30.jpg)
© Copyright Allianz 3. Mai 2023 30
Encryption
DC 1
Node 1
Node 3
Node 5
DC 2
Node 0
Node 4
Node 2
![Page 31: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/31.jpg)
© Copyright Allianz
server_encryption_options: internode_encryption: all keystore: nasmount/conf/keystore.node0 keystore_password: changeme truststore: nasmount/conf/truststore.node0 truststore_password: changeme require_client_auth: true
3. Mai 2023 31
Encryption – Just easy to enable
allnone
dc: Cassandra encrypts the traffic between the data centers.rack: Cassandra encrypts the traffic between the racks.
![Page 32: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/32.jpg)
© Copyright Allianz
CREATE TABLE zzz …
with compression_parameters:sstable_compression = 'Encryptor'... and compression_parameters:cipher_algorithm = 'AES/ECB/PKCS5Padding'... and compression_parameters:secret_key_strength = 128;
3. Mai 2023 32
Encryption – With DSE
![Page 33: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/33.jpg)
© Copyright Allianz
• ZeppelinIs ok as developer or data scientist toolNot suitable for C-Level reports
• MicroStrategyOnly support of Cassandra 2.xNeeds write permissions for the Column family (?)
• TablaeuAccess Cassandra via Spark (?)
3. Mai 2023 33
Hard to find a visualization solution
![Page 34: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/34.jpg)
© Copyright Allianz
• D3.jsIs great to visualize and has stunning featuresNeeds an AngularJS developer to create a new report
• RProvides simple visualizationNeeds knowledge in R
3. Mai 2023 34
Hard to find a visualization solution
![Page 35: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/35.jpg)
© Copyright Allianz
CREATE ROLE flink;
CREATE ROLE productsales;CREATE ROLE riskanalyst; GRANT SELECT ON allianz.solditems TO productsales;GRANT SELECT ON allianz.riskdata TO riskanalyst;
GRANT MODIFY ON KEYSPACE allianz TO flink;
3. Mai 2023 35
Limit read / write access
![Page 36: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/36.jpg)
© Copyright Allianz
The maximum period to store some detailed information is limited by law
We have to ensure that me meet this requirement
TTL in cassandra does this job well
INSERT INTO proposal (id,date,product,price) VALUES (‘p-4711’, ‘09.09.2016’,’product-1’,50.00);UPDATE proposal USING TTL 86400 SET firstname = ‘Joe’ WHERE id = ‘p-4711’;UPDATE proposal USING TTL 86400 SET lastname = ‘Doe’ WHERE id = ‘p-4711’;UPDATE proposal USING TTL 172800 SET city = ‘Berlin’ WHERE id = ‘p-4711’;
3. Mai 2023 36
Remove outdated events
![Page 37: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/37.jpg)
© Copyright Allianz
Checklist
At cruising altitude
Work with it
3. Mai 2023 37
![Page 38: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/38.jpg)
© Copyright Allianz
Circle of dataMeet the experts
Extract and Enrich data
Aggregate data
Analyse the dataVisualize
Test Hypothesis
Discuss Actions
![Page 39: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/39.jpg)
© Copyright Allianz 3. Mai 2023 39
Recalculate theSpeed-Layer
WebApplication
![Page 40: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/40.jpg)
© Copyright Allianz
#Load RJDBClibrary(RJDBC)
#Load in the Cassandra-JDBC divercassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver", list.files(„/opt/cassandra/lib/",pattern="jar$",full.names=T))
#Connect to Cassandra node and Keyspacecasscon <- dbConnect(cassdrv, "jdbc:cassandra://localhost:9160/allianz")
3. Mai 2023 40
Bring the Data to R
![Page 41: Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f75ed1a28ab10258b6283/html5/thumbnails/41.jpg)
© Copyright Allianz
#Query timeseries datares <- dbGetQuery(casscon, "select * from solditems limit 10")
#Transposetres <- t(res[2:10])
#Plotboxplot(tres,names=res$KEY,col=topo.colors(length(res$KEY)))title("BoxPlot of 10 Sold Items prices Historie")
3. Mai 2023 41
Bring the Data to R