călin andrei burloiu - connecting hadoop with couchbase: engineering for performance
DESCRIPTION
We needed a bridge between the real-time tier, where we used Couchbase, and the batch tier, built on Hadoop. For lack of something better, we built our own: Couchdoop – an open-source Hadoop connector for Couchbase. Our presentation will discuss best practices on how to create a Hadoop connector for a NoSQL database. We will talk about the challenges we encountered while developing Couchdoop and share how we tuned it for performance. Together with Bigstep we worked on performance benchmarks for our technology, which show how much throughput that can be squeezed from a Hadoop connector.TRANSCRIPT
Two-tier Architecture Real-time Tier (Couchbase) • Detects user intent • Gives next best recommendation or deal
Data Bridge (Couchdoop)
Batch Tier (Hadoop) • Recommends products
User events
Recom
mendations
Importing Data { “user”: “Rudy”, “action”: “view”, “product”: “Fender Guitar” }
{ “user”: “Rudy”, “action”: “click”, “product”: “Guitar Amplifier” } {
“user”: “Emma”, “action”: “buy”, “product”: “Blue Skirt” }
Couchdoop
Machine Learning Recommenda0ons Hadoop
IMPORT
HDFS
{ “user”: “Rudy”, “recommendations”: [ [“Ibanez Acoustic Guitar”, 450], [“Guitar Tuner”, 120], [“Sound Mixer”, 30] ] }
EXPORT
Exporting Data
Couchdoop
Machine Learning Recommenda0ons Hadoop
{ “user”: “Rudy”, “recommendations”: [ [“Ibanez Acoustic Guitar”, 450], [“Guitar Tuner”, 120], [“Sound Mixer”, 30] ] }
Update
Updating Data
Couchdoop
Machine Learning Recommenda0ons Hadoop