![Page 1: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/1.jpg)
Lambda Architecture and Open Source Tools for
Real-time Big Data● Concepts & Techniques “Thinking with Lambda”● Case studies in Practice
Trieu Nguyen - http://nguyentantrieu.info or @tantrieuf31Principal Engineer at eClick Data Analytics team, FPT OnlineAll contents and thoughts in this slide are my subjective ideas and compiled from Communities
![Page 2: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/2.jpg)
Just a little introduction● 2008 Java Developer, developed Social
Trading Network for a small startup (Yopco)● 2011 worked at FPT Online, software engineer
in Banbe Project, Restful API for VnExpress Mobile App
● 2012 joined Greengar Studios in 6 months, scaling backend API mobile games (iOS, Android)
● 2013 back to FPT Online, R&D about Big Data & Analytics, developing the new core Analytics Platform (on JVM Platform)
![Page 3: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/3.jpg)
Contents for this talk
● The lessons from history● Problems In Practice● What is the Lambda Architecture?● Why lambda architecture for real-time big
data ?● Open Source Technology Stack ● Lambda in Practice (Mobile Data and Web Data)● Lessons I have learned● Questions & Answers
![Page 4: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/4.jpg)
History ?The best way to predict the future is looking at the past and now ?
![Page 5: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/5.jpg)
Big data is a buzzword for old problems
![Page 6: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/6.jpg)
Explaining Big Datahttp://www.youtube.com/watch?v=7D1CQ_LOizA
![Page 7: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/7.jpg)
Learning ?
![Page 8: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/8.jpg)
Working ?
![Page 9: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/9.jpg)
![Page 10: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/10.jpg)
Big Data + Old Historyhttp://www.youtube.com/watch?v=tp4y-_VoXdA
![Page 11: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/11.jpg)
This is most valuable things!
This is Big DATA
![Page 12: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/12.jpg)
![Page 13: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/13.jpg)
We can't solve problems by using the same kind of thinking we used when we created them.Albert Einstein
Think more withLambda and Reactive
![Page 14: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/14.jpg)
![Page 15: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/15.jpg)
![Page 16: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/16.jpg)
Where Big Data can be used
![Page 17: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/17.jpg)
![Page 18: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/18.jpg)
BBC Horizon 2013 The Age of Big Data
http://www.youtube.com/watch?v=RE0ITQ7XQjM
![Page 19: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/19.jpg)
![Page 20: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/20.jpg)
![Page 21: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/21.jpg)
![Page 22: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/22.jpg)
![Page 23: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/23.jpg)
Google’s mission is to organize
the world’s information and make it
universally accessible and useful.
![Page 24: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/24.jpg)
![Page 25: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/25.jpg)
Organize the world’s information?
![Page 26: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/26.jpg)
![Page 27: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/27.jpg)
How did Google scale their search engine ?How does Hadoop really work ?
![Page 28: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/28.jpg)
![Page 29: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/29.jpg)
http://stackoverflow.com/questions/6087834/how-scalable-is-mapreduce-in-the-original-functional-languages
![Page 30: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/30.jpg)
Trends of Now and the Future
● MapReduce Programming● Reactive Programming● Functional Programming● Streaming Computation
=> All just the special cases of Lambda
![Page 31: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/31.jpg)
![Page 32: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/32.jpg)
So what is the λ (Lambda) Architecture ?
![Page 33: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/33.jpg)
![Page 34: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/34.jpg)
![Page 35: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/35.jpg)
the Lambda Architecture:
● apply the (λ) Lambda philosophy in designing big data system
● equation “query = function(all data)” which is the basis of all data systems
● proposed by Nathan Marz (http://nathanmarz.com/), a software engineer from Twitter in his “Big Data” book.
● is based on three main design principles:
○ human fault-tolerance – the system is unsusceptible to data loss or data
corruption because at scale it could be irreparable. (BUGS ?)
○ data immutability – store data in it’s rawest form immutable and for
perpetuity. (INSERT/ SELECT/DELETE but no UPDATE !)
○ recomputation – with the two principles above it is always possible to
(re)-compute results by running a function on the raw data.
![Page 36: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/36.jpg)
Lambda In Practice2 case studies from my experiences
![Page 37: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/37.jpg)
Case Study 1: Mobile Data
Monitor API Backend + System KPI
![Page 38: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/38.jpg)
Problem:Inside “mobile data”, What's the most valuable piece of information
![Page 39: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/39.jpg)
Backend System for mobile app
I applied “Lambda” here
![Page 40: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/40.jpg)
Web vs Mobile AppWeb
Visitors
Visits
Pageviews
Events
Mobile AppUsers
Sessions
Events
![Page 41: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/41.jpg)
Metrics: Cause and Effect● Screen Size => App Design, UI/UX, Usability● App version => Deployment, Marketing● Connectivity => Code, User Experience ● Location => Marketing, User Behaviour● OS => Marketing, Cost, Development● Memory => User Experience ● Feature Session => How to engage app users
![Page 42: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/42.jpg)
The data and the size, not too big for a small startup!
Where is the lambda ?I used Groovy + GPars (Groovy Parallel Systems) + MongoDB for fast parallel computation (actor model) on statistical datahttp://gpars.codehaus.org/ The GPars framework offers Java developers intuitive and safe ways to handle Java or Groovy tasks concurrently. Support:
● Dataflow concurrency● Actor programming model● CSP● Agent - an thread-safe reference to mutable state● Concurrent collection processing● Composable asynchronous functions● Fork/Join● STM (Software Transactional Memory)
![Page 43: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/43.jpg)
Mobile Apps => Backend APIs => Statistics => Find the Trends & Insights?
![Page 44: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/44.jpg)
Reactive Data Analytics for Mobile Apps
It means real-time recommendation by:➔ context (location, time)➔ user profile (preferences, level,
...)
![Page 45: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/45.jpg)
Big Data on Small Devices: Data Science goes Mobilehttp://strataconf.com/strata2013/public/schedule/detail/27605
![Page 46: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/46.jpg)
Case Study 2: Web Data
● Real-time Data Analytics ● Monitoring Stream Data (Reactive)
http://eclick.vn
![Page 47: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/47.jpg)
at eClick we have30~40 GB Logs in Stream10~20 GB Bandwidthjust for tracking user actions (click, impression,...) in ONE day !
at eClick we must check campaigns in near-real-time (seconds) !
at eClick we have many types of log (video, web, mobile, system logs, ad-campaign, articles, … )
![Page 48: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/48.jpg)
![Page 49: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/49.jpg)
![Page 51: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/51.jpg)
Netty Http Server
Kafka
Storm
Redis
Hadoop Tools
KPI Report
Internet
the open-source lambda architecture at eClick
Redis
Akka Workers
TCP Connection
![Page 52: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/52.jpg)
The big-data technology stack ● Netty (http://netty.io/) a framework using reactive programming
pattern for scaling HTTP system easier, by JBoss http://www.jboss.org ● Kafka (http://kafka.apache.org/) a publish-subscribe messaging
rethought as a distributed commit log, open sourced by Linkedin● Storm (http://storm-project.net/) the framework for distributed
realtime computation system, by Twitter● Redis (http://redis.io/) a advanced key-value in-memory NoSQL
database, all fast statistical computations in here.● Groovy for scripting layer on JVM, ad-hoc query on Redis ● Hadoop ecosystem: HDFS, Hive, HBase for batch processing● RxJava https://github.com/Netflix/RxJava a library for composing
asynchronous and event-based programs● Hystrix https://github.com/Netflix/Hystrix : for Latency and Fault
Tolerance for Distributed Systems
![Page 53: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/53.jpg)
My new ideas for the future
Connecting the active functor pattern + reactive programming + stream computation + in-memory computing to make:● real-time data analytics easier● better recommendation system● build more profitable in big data
More Information:● http://activefunctor.blogspot.com/ (a special case of Lambda
that actively search best connections to form optimal topology) - from ideas when internship at DRD with my advisor.
● Can a function be persistent (stored as data), distributed in a cluster (cloud), reactive to right data (best value in network) ?
● http://www.reactivemanifesto.org/ (reactive pattern)
![Page 54: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/54.jpg)
LessonsWhat I have learned from Lambda and Big Data World
![Page 55: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/55.jpg)
![Page 56: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/56.jpg)
What I have learned● Study about lambda and read some books● Ask questions=> analytics=> Profit & Value● Collect any data you can, learn inside !● Implement it! Just right tools for right jobs.● Turn your data into the things everyone can
"look & feel"
![Page 57: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/57.jpg)
read papers
![Page 58: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/58.jpg)
Study the “lambda”I studied Haskell in 2007 with Dr.Peter Gammie http://peteg.org/ when internship at DRD (a non-profit organization).● Imperative programs will always be vulnerable to data races because
they contain mutable variables.● There are no data races in purely functional languages because they
don't have mutable variables.
![Page 59: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/59.jpg)
Reading some books
![Page 60: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/60.jpg)
![Page 61: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/61.jpg)
![Page 62: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/62.jpg)
Improve your business knowledge !=> read the Behavioral Economics Books
http://www.goodreads.com/shelf/show/behavioral-economics
![Page 63: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/63.jpg)
Collect the data ?
![Page 64: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/64.jpg)
Use your imagination is more than just knowledge you have
![Page 66: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/66.jpg)
“Logic will get you from A to Z;
imagination will get you
everywhere.” - Albert Einstein
Use your imagination with data analytics, not just logic
Learn Data Visualization
![Page 67: Lambda Architecture and open source technology stack for real time big data](https://reader034.vdocuments.site/reader034/viewer/2022042515/54c672484a7959f67d8b45ee/html5/thumbnails/67.jpg)
Questions & AnswersThe link of this slide is here:● http://nguyentantrieu.info/blog/lambda-architecture-and-
open-source-tools-for-real-time-big-data/
More useful resources:
● http://nguyentantrieu.info/blog● http://www.mc2ads.com