add a billion row data warehouse to your app
DESCRIPTION
Add A Billion Row Data Warehouse To Your App. with Redshift, sql and duct tape. James Crisp, Tech Principal @ Getup. Context. GetUp ! is an independent movement to build a progressive Australia and bring participation back into our democracy . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/1.jpg)
Add A Billion Row Data Warehouse To Your App
with Redshift, sql and duct tape
James Crisp, Tech Principal @ Getup
![Page 2: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/2.jpg)
ContextGetUp! is an independent movement to build a progressive Australia and bring participation back into our democracy.
Lots of online campaigns, field actions, social media. Supported by small donations. 600K+ members. Budget, medicare, uni fees, barrier reef, forests etc.
![Page 3: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/3.jpg)
Big Rails App• CMS + Petitions, emailing MPs,
donations etc• Email blasting & segmenting• Back office & mini-crm• 2 X [3 app, 2 worker, 1 DB servers], Au
and Sg
![Page 4: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/4.jpg)
Data Warehouse… why??
• Reporting & exports• Experimental data science• Stop locking up transactional DB!!• More data sources (logs, CRM, ..) =>
customer• Different schema & faster queries
![Page 5: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/5.jpg)
Options• Read-only replica of transactional DB• Mongo / Cassandra / ..• Hadoop, Pig, Hive• Elastic search• BIG Sql, eg Redshift
![Page 6: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/6.jpg)
Why BIG Sql?• Team skills: tech & data scientists• Easy integration from SQL DB• Good hosted options• Fast performance, column based• Sets and aggregations• Can do JSON for less structured data
![Page 7: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/7.jpg)
Why Redshift?• Fully hosted & managed multi-node• Fast & Column based, semi-compressed• Relatively cheap and easy to try• Good import options• Massively expandable• (Security & backup options)
![Page 8: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/8.jpg)
What is Redshift… really?
• Heavily modified fork of PostgreSQL 8
• Specialised data storage & query engine
• Can use normal ODBC/JDBC/Postgres clients to connect
![Page 9: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/9.jpg)
![Page 10: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/10.jpg)
![Page 11: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/11.jpg)
string connString = "Driver={PostgreSQL Unicode};" + String.Format("Server={0};Database={1};" + "UID={2};PWD={3};Port={4};SSL=true;Sslmode=Require", server, DBName, masterUsername, masterUserPassword, port);
OdbcConnection conn = new OdbcConnection(connString);conn.Open();
OdbcDataAdapter da = new OdbcDataAdapter(sql, conn);da.Fill(ds);dt = ds.Tables[0];foreach (DataRow row in dt.Rows){ // Do something useful}
![Page 12: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/12.jpg)
$DBConnectionString = "Driver={PostgreSQL UNICODE}:Server=$MyServer;Port=$MyPort;Database=$MyDB;Uid=$MyUid;Pwd=$MyPass;"
$DBConn = New-Object System.Data.Odbc.OdbcConnection;$DBConn.ConnectionString = $DBConnectionString;$DBConn.Open();
$DBCmd = $DBConn.CreateCommand();$DBCmd.CommandText = "SELECT * FROM mytable;";$DBCmd.ExecuteReader();$DBConn.Close();
![Page 13: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/13.jpg)
psql.exe -h $DBSERVER -U $DBUSER -d $DBName -f script.sql
![Page 14: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/14.jpg)
How much does it cost?• Min cluster size 2, leader is free• 2 X 2TB HDD, 15G RAM: Syd $2.50/h, US
$1.70/h (reserved 1yr Syd $13.2K, US $8.8K)
• Up to 2.56TB flash or 16TB HDD and 244GB RAM per node. Cluster up to 1.6 PB.
![Page 15: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/15.jpg)
Data Sources so far…
Transactional DBApplication request logs
![Page 16: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/16.jpg)
Hooking up DB Data
RedshiftMySQL S3 LOADCSVs
Map & Dump CSVs
DB Server
Fire Drop/Create tables, Load data
![Page 17: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/17.jpg)
Hooking up Request Logs
RedshiftS3
Map & LoadJSON
Upload JSON logs
App ServersFire data load
![Page 18: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/18.jpg)
Demos
• Scripts & SQL for hooking up• AWS Redshift console• Connect & query
![Page 19: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/19.jpg)
What have we used it for?
• Faster data science / reporting without locking up transactional DB
• Combining sharded tables• Request logs (browser agent, params, ..)• Marking/tracking segments +
debugging info
![Page 20: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/20.jpg)
Other uses
• Data exploration with tools like Tableau• BI tools• Denormalised data• Dump in more data for single view of
user – FB info, CRM, etc
![Page 21: Add A Billion Row Data Warehouse To Your App](https://reader036.vdocuments.site/reader036/viewer/2022081604/5681631d550346895dd39771/html5/thumbnails/21.jpg)
Conclusion
• Easy to set up and use (for Win/.NET too)
• Super fast• Reasonable price• Met our needs well so far
We are hiring atm!