brendan haire, atlassian, presentation at chief data & analytics officer forum, melbourne
TRANSCRIPT
Building a data lake in the skyD ATA L A K E O N AW S
A G I L E L A K E D E L I V E RY
Who am I?
Data experienceThrough my career I have built and managed:
• reporting platform for an Australian University• data warehouse and BI solution for a telco in Europe• data warehouse and real-time data integration platform for a bank
.. and finally I led the Analytics and Data Integration team at Atlassian for the past year delivering on our data strategy.
About myself• Atlassian for over 4 years• IT for 20 years• Roles from developer, dev mgr, architect to project mgmt• Software Engineering background• Developer at heart
Brendan Haire
Starting pointData
Context
• Software company• Fast growing• Data Driven• IPO
• 200TB Data• ~1000 users per week
(~800 reporting, ~200 ad hoc)• 30k queries per day
• Team of 4• Legacy EDW• Multiple data silos• Emerging problem
Atlassian
Scale/CostData EverywhereSlow Analysis Duplication Effort
The Problem
Data lake on AWS
“A lake in the clouds”
PrinciplesA data pipeline and analytic platform that:
Vision
•handles large and small data sets•supports real-time and batch functionsE
nabl
ing
Ana
lytic
s
•is easy to add raw data for immediate use•allows value to be progressively added through stages•support self-service analysis and integration functionsS
cale
Fr
ictio
n
Conceptual
Source Systems
Data Applications Business Intelligence
1 Data Lake
2 Data Stream
Solution
The UglyThe BadThe Good
Good, Bad, Ugly
• New analytics capability• Less ETL and moving data• Performance• AWS - flexibility• Scaling - compute vs storage• Cost - control + predictability
• High learning curve• New tooling• Data Governance
• ‘Cutting edge’ hurts
Agile lake delivery
“From pond to lake”
by Henrik Kniberg
Minimal Viable Product (MVP)
Weekly Active Usage (WAU)
FeedbackTest
Enabling Innovation
• Problem statement• Vision• Research• Talk to people
• ShipIT / Hackathons• Spikes• Minimum Viable Product
• User Feedback• Usage
Hypothesis
IncrementalSelf ServiceRaw Data Usage FeedbackSelf service is key in reducing friction and enabling scale
Providing analysts access to raw data is a game changer
Incremental delivery and feedback drive innovation
When building a platform usage is a great proxy for value
Takeaways
Thank you!