ramunas balukonis. research dwh

Post on 10-May-2015

2.576 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

#BigDataBY

TRANSCRIPT

VISIT OUR BLOG: adform.comTWITTER: adforminsider

Research of technologies for Big Data Analytics

(2013-2014)

1

Ramūnas Balukonis, Adform

Our impressions growth

3

Now 2 blns transaction or 1,4 TB per day

(RAW)

2012 we started to research for technology to

process, load and provide data for analytics

0

50

100

150

200

250

300

350

400

450

500

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Impressions Per Year, BLNS of ROWS

Where we are now

4

DWH – our needs for Big Data Analytics

5

Query performance up to moments

No downtime window

Short time to market

Near real time latency

No backups

Unattended scaling

Inessential data loss and data discrepancies

6

How we tested

7

Testing takes up 3 month for each technology to

finish test

Testing env: 3X (24 Cores + 96 GB RAM + 800

GB RAID10)

Loaded 5 TB of data (non compressed data)

Candidates for BIG Data Analytics

8

IBM Netezza

9

Appliance: no commodity HW

No elastic scale out

Global presence, sales, delivery and support.

HP Vertica

10

Elastic scale out

Brilliant performance (Load/Select)

No stored procedures

No UI

Price per TB

SAP Sybase IQ

11

Scaling using shared disk

Similar to MS SQL (tools, logic, stored procs,

system views and SP, BOL similar)

Concerns about easy of implementation and

use

Price per core

Amazon Redshift

12

Price – the only player we tested that provides

prices online

Filters impact on query performance badly

Cluster resize/scaling

Unstable connection

Calpont InfiniDB

13

Shared nothing

MySQL as front end – tools, connectors,

procedures etc.

Community (offers prebuild solutions) or EE

Super fast load

Relatively slow query perf

Slow insert/update/delete

Where we are now

15

What we learned

Number of suitables technologies drops whenTBs increses

Adopt technology to your requirements and notvice versa

No Silver Bullet: Queries vs row store – 10X

Load speed vs row store – 4X

Compression vs row store – 4X

... And we‘ll learn much more after we‘ll run ourfirst report

16

top related