a tool for a job

Post on 11-Jul-2015

175 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2014 EXASOL AG

1

A Tool For A Job

Thomas Bestfleisch, Solution Engineer, EXASOL

© 2014 EXASOL AG

2

@EXADude Loves His Lawnmower ...

... because it cuts his grass well

© 2014 EXASOL AG

3

But ...

... he struggles quite a bit cutting his hedge

© 2014 EXASOL AG

4

And ...

... it isn‘t good at making apple sauce

© 2014 EXASOL AG

5

And don‘t even thinking about...

... using it to cut hair

© 2014 EXASOL AG

6

So … how does this apply to Big Data ?

You can run Analytical queries

On Multipurpose databases

But they stop running well

As data volumes increase

You could run them on Hadoop

but you‘ll wish you hadn‘t

Maybe you need a Tool for This Job

© 2014 EXASOL AG

7

“Pimp My Database”

Add to a multipurpose database

Add to Hadoop

Build a specialist tool from scratch

© 2014 EXASOL AG

8

Multi-Purpose Databases

Architecture dates back to the 1980s

Row-based

(usually) run on a single machine

Heavily reliant on Disk-based processing

Why this architecture

Memory was expensive

Works well with a wide range of data problems

BUT : doesn‘t work well with big analytical queries

© 2014 EXASOL AG

9

Enhancing a Multipurpose Database

ADD PARALLEL

Run the database on a clusters of machines

ADD COLUMNAR

Replace row-based with columnar

OR offer an additional column store

ADD IN-MEMORY

© 2014 EXASOL AG

10

Multipurpose Database plus Parallel

Better to “scale out“ than “scale up“.

Get many people to dig rather than using a bigger shovel.

There is a physical limit to the size of shovel one person can effectively use.

© 2014 EXASOL AG

11

Multipurpose Database plus Column Storage

Better compression

Faster query performance

© 2014 EXASOL AG

12

Multipurpose Database plus In-Memory

© 2014 EXASOL AG

13

A Multipurpose Database

© 2014 EXASOL AG

14

Examples of Enhanced Multipurpose Databases

Netezza = Postgres + Parallel

Greenplum & ParAccel = Postgres + Parallel + Columnar

Redshift = ParAccel + Cloud

Aster Data = Postgres + Map/Reduce

InfiniDB = mySQL + Parallel

Oracle 12c in-memory

= Oracle + Columnar + In-memory

© 2014 EXASOL AG

15

My Opinion of Enhanced Multipurpose Databases

• Those add-ons are very clever, but not as good as purpose-built

• Often make it worse for the original purpose

© 2014 EXASOL AG

16

Hadoop

Hadoop was invented to index the Internet on a cluster of machines. Map/Reduce – distributed processing

HDFS – distributed file system

Analytical Queries are NOT like indexing the internet

© 2014 EXASOL AG

17

Potential “Fixes” for Hadoop

Forget Map/Reduce

Add another Execution Engine

e.g. Tez or Spark

OR

Forget HDFS

Add a columnar, in-memory file store

Tachyon in-memory file system

Columnar file formats like parquet, RCFile

© 2014 EXASOL AG

18

My Opinion of a Partly-Fixed Hadoop solution

The legacy bits don’t really fit with the new stuff

© 2014 EXASOL AG

19

A complete replacement

Let’s ignore Map/Reduce AND HDFS for analytical queries.

But without Map/Reduce and HDFS, is it still Hadoop ?

Or have we started from scratch and made something new …

© 2014 EXASOL AG

20

Adapting Hadoop for Analytical Queries is not an option

All new SQL-on-Hadoop projects are starting from scratch

New columnar in-memory file formats / systems

New low latency execution frameworks

This is EXASOL over a decade ago

Catch us if you can !

© 2014 EXASOL AG

21

A Tool For A Job

Use the appropriate tool for each job Multipurpose databases are great for

transactional processing

Hadoop is amazing with unstructured data

EXASolution is breathtaking on analytical queries

Why choose one when you can have them all ?

© 2014 EXASOL AG

22

Questions ?

More details and a FREE community version of our database available at

www.exasol.com

Email:

thomas.bestfleisch@exasol.com

Twitter : @EXASOLAG, @EXADude

top related