călin andrei burloiu - connecting hadoop with couchbase: engineering for performance

63

Upload: huguk

Post on 25-May-2015

95 views

Category:

Technology


3 download

DESCRIPTION

We needed a bridge between the real-time tier, where we used Couchbase, and the batch tier, built on Hadoop. For lack of something better, we built our own: Couchdoop – an open-source Hadoop connector for Couchbase. Our presentation will discuss best practices on how to create a Hadoop connector for a NoSQL database. We will talk about the challenges we encountered while developing Couchdoop and share how we tuned it for performance. Together with Bigstep we worked on performance benchmarks for our technology, which show how much throughput that can be squeezed from a Hadoop connector.

TRANSCRIPT

Page 1: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 2: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 3: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 4: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 5: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 6: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 7: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 8: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 9: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 10: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 11: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance

Two-tier Architecture Real-time Tier (Couchbase) • Detects user intent • Gives next best recommendation or deal

Data Bridge (Couchdoop)  

Batch Tier (Hadoop) • Recommends products

User events

Recom

mendations

Page 12: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 13: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 14: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 15: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 16: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance

Importing Data { “user”: “Rudy”, “action”: “view”, “product”: “Fender Guitar” }

{ “user”: “Rudy”, “action”: “click”, “product”: “Guitar Amplifier” } {

“user”: “Emma”, “action”: “buy”, “product”: “Blue Skirt” }

Couchdoop

Machine  Learning   Recommenda0ons  Hadoop

IMPORT

HDFS

Page 17: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance

{ “user”: “Rudy”, “recommendations”: [ [“Ibanez Acoustic Guitar”, 450], [“Guitar Tuner”, 120], [“Sound Mixer”, 30] ] }

EXPORT  

Exporting Data

Couchdoop

Machine  Learning   Recommenda0ons  Hadoop

Page 18: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance

{ “user”: “Rudy”, “recommendations”: [ [“Ibanez Acoustic Guitar”, 450], [“Guitar Tuner”, 120], [“Sound Mixer”, 30] ] }

Update  

Updating Data

Couchdoop

Machine  Learning   Recommenda0ons  Hadoop

Page 19: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 20: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 21: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 22: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 23: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 24: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 25: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 26: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 27: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 28: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 29: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 30: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 31: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 32: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 33: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 34: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 35: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 36: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 37: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 38: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 39: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 40: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 41: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 42: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 43: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 44: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 45: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 46: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 47: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 48: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 49: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 50: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 51: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 52: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 53: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 54: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 55: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 56: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 57: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 58: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 59: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 60: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 61: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 62: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance
Page 63: Călin Andrei Burloiu - Connecting Hadoop with Couchbase: Engineering for performance