I’m @stevenbeeckman - a digital dj!mixcloud.com/gehorschade.kollektiv
Vienna
Poland
Estonia
GermanyUK
France
SpainItaly
Greece
Pre-apply now at startupbus.com
Follow @TheStartupBus
Defence 101 (bis)
An army needs a very strong HR and logistics machine
Belgian government budget cuts usually cut in its defence budget first
Need for integrated management
calculating the cost of a training exercise took
4 people 4 weeks
!to go bug
!5 application owners
!for data hidden in
relational databases Excel sheets
Business Objects reports Access databases
(not so) shared drives
some logistics guy deployed in Afghanistan
I can’t access the shared drive, I wish I had my data locally!
Stone Age
I’m tired of these Excel files and Access databases saying
something contradictory.
Gimme the damn truth!
Requirements
1. Centralize data
2. But protect sensitive data (HR, medical privacy, …)
3. Make the data available offline
4. Nodes should be able to regain current state after loss of communication for 5 days
some logistics guy deployed in Afghanistan
I can’t access the shared drive, I wish I had my data locally!
Stone Age 2009
First XML based prototypes
I’m tired of these Excel files and Access databases saying
something contradictory.
Gimme the damn truth!
XML-based prototypes
• Able to extract maximum 40 tables from the logistics application in one night
• Slow
• Problems with identical rows
some logistics guy deployed in Afghanistan
I can’t access the shared drive, I wish I had my data locally!
Stone Age 2009
First XML based prototypes
New team & new approach
I’m tired of these Excel files and Access databases saying
something contradictory.
Gimme the damn truth!
New approach
Systems engineering: holistic view on the problem
Take into account the protection of sensitive data
Make it more stable than the prototype
Explicitly not real-time
Check out NASA’s course: http://www.saylor.org/sse101/
Conceptually
• lots of data sources with data owners
• 1 central data “warehouse”
• lots of nodes downloading the data they have access rights to
Extraction Engine (EE)
Based on open-source software:
Linux
MySQL
Talend (Eclipse based ETL workflow tool)
What does the EE do every night?
• Detect the meta data (store it in XML format)
• Take a full dump of each data source in csv format
• Calculate delta (deleted rows and inserted rows, in csv format)
• Create two zip files:
• One full copy
• One delta for this day
File server
• Stores the zip files available for the nodes
• Full copy only for the current day (but we have a history for a month)
• Delta zip files for 14 days
Access control
• Data providers determine themselves whether their data is
• “public” within the organisation
• “restricted” to a set of nodes
The nodes
Custom XAMPP package for local development of reporting or JBoss for bigger nodes with validated reports
Custom loader contacting Access Control and filling the MySQL database
Custom “Local Reporting Framework” (XML + XSLT)
some logistics guy deployed in Afghanistan
I can’t access the shared drive, I wish I had my data locally!
Stone Age 2009
First XML based prototypes
New team & new approach
I’m tired of these Excel files and Access databases saying
something contradictory.
Gimme the damn truth!
2014
Growth
4090
1000
Some statistics
• 400 users (nodes)
• > 1 billion rows processed each night
• ~ 75 gigabytes of data processed each night
• making the EE work requires > 2000 tables
What used to take my team 4 weeks now takes us one click on a
button!
A major responsible for military training & exercises
Questions?@stevenbeeckman #csvconf
Hackers, hipsters & hustlers should pre-apply at
www.startupbus.com
Image credits
http://www.photographersgallery.com/photo.asp?id=2411Diagonal full of silos
http://www.pragmaticdevops.com/2014/04/management/hacking-management/devops-as-a-team-or-a-responsibility/
Two silos