l'évolution de l'infrastructure bi viadeo par françois le lay
DESCRIPTION
http://fr.viadeo.com/fr/profile/francois.lelayTRANSCRIPT
Techdays 22/11/2012
The evolution of Business Intelligence at Viadeo
Agenda
What is Business Intelligence?
Key Roles
Viadeo Data
Technical Solutions : a short history
Actions
Insights
Awareness
Application Stack
Data Warehouse & ETL
What is Business Intelligence ?
ActionsActionsInsightsInsightsAwarenessAwarenessApplication StackApplication StackData Warehouse & ETLData Warehouse & ETL
Plumbing of structured and unstructured data, logic to persists data
Meta Data, KPI’s, Visual Templates, Security, Information Dissemination, Scheduling
Reports, Dashboards
Forecasting, Predicting, Statistics, Competitor Information, Analysis
Marketing Actions, Business Strategies, Operations
Feedback
BI Dashboards Specification
Simple (Metrics)
Complex (Data viz)
Information Access
BI Dashboard
s
(Scalars)
Direct (SQL,
Datameer)
AnalysisFollowup
Proactive
Web Product
Specification
Functional
(Challenge PO)
Technical
(Enforce data quality)
Key Roles : the Business Analyst
BI
BI
● Simple (Metrics)
● Complex (Data
● BI Dashboards
● Direct (SQL,
● Followup
Information Access
Information Access
● Simple (Metrics)
● Complex (Data
● BI Dashboards
● Direct (SQL,
● Followup
Analysis
Analysis
Web Product Specification
Web Product Specification
Data plumbingReal Time
Batch
Expose to AppsREST/Scala/Java APIs
JDBC/ODBC
Awareness
Implement Data
Visualization
Enforce data quality
Key Roles : the Big Data Engineer
Data plumbing
Data plumbing
● Real Time
● Implement Data Visualization
● Enforce
Expose to Apps
Expose to Apps
● Real Time
● Implement Data Visualization
● Enforce
Awareness
Awareness
Usage
Mining
Viadeo data : The Dynamics
Usage
• 45 million members• Worldwide presence
• China, India, Russia, Mexico,..• Mobile App, Web, API• B2B / B2C
User Engagement
Viadeo data : Graph
Technical solutions : The Beginnings
MysqlServer name : Peach
Phase 1: 2006-2008
Internal tool to allow C-Level, Sales,…Access data
Phase 2 : 2008-2010
MysqlServer name : Lakitu
Technical solutions : A better architecture
Mysql
Phase 3: 2010 - 2012
Server name : « Unfied ODS »
Server name : ODS LiveCluster 1
Server name : ODS LiveCluster 2
Server name : ODS LiveCluster 3
Server name : ODS LiveCluster 5
MySQL
Technical solutions : 2 new internal productsScala-centric, Play! framework
Cross-channel messaging systemEmail, Mobile, SocialFlexible content managementFlexible targeting of recipientsContent testing strategies : A/B, multivariateEvent-driven : web app events, mobile events, ad hoc eventsAutomation, scheduling, frequency capping
Analytics Data visualization : based on Javascript D3.js, processing.js etc.Tabular Reports, OLAP navigationPluggable alerts : business activity monitoring
A common requirement : scalability!!!Viadeo data is BigProcessing performance is not an option, it is mandatory
Technical solutions : a new architecturebased on CQRS pattern
Technical solutions : a new architecture
• Master dataset : • Historical data stored in HBase• Provided as a service by architects team
• Datamarts : • Built on HDFS using MapReduce jobs• MapReduce eased by use of Cascading library
and Scala DSL (Scalding) • Pushed to in-memory distributed storage• Elastic Search, Riak
Technical solutions : A better architecture
MySQLSQ
OOP
Conclusion
• Many scalable data storage solutions• Rapid application development frameworks and low-risk
programming languages on the JVM• Custom analytics = what we implement is what we use
• Analytical needs are very well identified• Blend data stream and batch processing to answer
different needs• Pluggable Data mining R&D• Analytics for Viadeo members/recruiters/companies :
Social Media Monitoring as a Complex Event Processing topic
?