a tool for a job

22
© 2014 EXASOL AG A Tool For A Job Thomas Bestfleisch, Solution Engineer, EXASOL

Upload: exasol-ag

Post on 11-Jul-2015

175 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: A Tool For A Job

© 2014 EXASOL AG

1

A Tool For A Job

Thomas Bestfleisch, Solution Engineer, EXASOL

Page 2: A Tool For A Job

© 2014 EXASOL AG

2

@EXADude Loves His Lawnmower ...

... because it cuts his grass well

Page 3: A Tool For A Job

© 2014 EXASOL AG

3

But ...

... he struggles quite a bit cutting his hedge

Page 4: A Tool For A Job

© 2014 EXASOL AG

4

And ...

... it isn‘t good at making apple sauce

Page 5: A Tool For A Job

© 2014 EXASOL AG

5

And don‘t even thinking about...

... using it to cut hair

Page 6: A Tool For A Job

© 2014 EXASOL AG

6

So … how does this apply to Big Data ?

You can run Analytical queries

On Multipurpose databases

But they stop running well

As data volumes increase

You could run them on Hadoop

but you‘ll wish you hadn‘t

Maybe you need a Tool for This Job

Page 7: A Tool For A Job

© 2014 EXASOL AG

7

“Pimp My Database”

Add to a multipurpose database

Add to Hadoop

Build a specialist tool from scratch

Page 8: A Tool For A Job

© 2014 EXASOL AG

8

Multi-Purpose Databases

Architecture dates back to the 1980s

Row-based

(usually) run on a single machine

Heavily reliant on Disk-based processing

Why this architecture

Memory was expensive

Works well with a wide range of data problems

BUT : doesn‘t work well with big analytical queries

Page 9: A Tool For A Job

© 2014 EXASOL AG

9

Enhancing a Multipurpose Database

ADD PARALLEL

Run the database on a clusters of machines

ADD COLUMNAR

Replace row-based with columnar

OR offer an additional column store

ADD IN-MEMORY

Page 10: A Tool For A Job

© 2014 EXASOL AG

10

Multipurpose Database plus Parallel

Better to “scale out“ than “scale up“.

Get many people to dig rather than using a bigger shovel.

There is a physical limit to the size of shovel one person can effectively use.

Page 11: A Tool For A Job

© 2014 EXASOL AG

11

Multipurpose Database plus Column Storage

Better compression

Faster query performance

Page 12: A Tool For A Job

© 2014 EXASOL AG

12

Multipurpose Database plus In-Memory

Page 13: A Tool For A Job

© 2014 EXASOL AG

13

A Multipurpose Database

Page 14: A Tool For A Job

© 2014 EXASOL AG

14

Examples of Enhanced Multipurpose Databases

Netezza = Postgres + Parallel

Greenplum & ParAccel = Postgres + Parallel + Columnar

Redshift = ParAccel + Cloud

Aster Data = Postgres + Map/Reduce

InfiniDB = mySQL + Parallel

Oracle 12c in-memory

= Oracle + Columnar + In-memory

Page 15: A Tool For A Job

© 2014 EXASOL AG

15

My Opinion of Enhanced Multipurpose Databases

• Those add-ons are very clever, but not as good as purpose-built

• Often make it worse for the original purpose

Page 16: A Tool For A Job

© 2014 EXASOL AG

16

Hadoop

Hadoop was invented to index the Internet on a cluster of machines. Map/Reduce – distributed processing

HDFS – distributed file system

Analytical Queries are NOT like indexing the internet

Page 17: A Tool For A Job

© 2014 EXASOL AG

17

Potential “Fixes” for Hadoop

Forget Map/Reduce

Add another Execution Engine

e.g. Tez or Spark

OR

Forget HDFS

Add a columnar, in-memory file store

Tachyon in-memory file system

Columnar file formats like parquet, RCFile

Page 18: A Tool For A Job

© 2014 EXASOL AG

18

My Opinion of a Partly-Fixed Hadoop solution

The legacy bits don’t really fit with the new stuff

Page 19: A Tool For A Job

© 2014 EXASOL AG

19

A complete replacement

Let’s ignore Map/Reduce AND HDFS for analytical queries.

But without Map/Reduce and HDFS, is it still Hadoop ?

Or have we started from scratch and made something new …

Page 20: A Tool For A Job

© 2014 EXASOL AG

20

Adapting Hadoop for Analytical Queries is not an option

All new SQL-on-Hadoop projects are starting from scratch

New columnar in-memory file formats / systems

New low latency execution frameworks

This is EXASOL over a decade ago

Catch us if you can !

Page 21: A Tool For A Job

© 2014 EXASOL AG

21

A Tool For A Job

Use the appropriate tool for each job Multipurpose databases are great for

transactional processing

Hadoop is amazing with unstructured data

EXASolution is breathtaking on analytical queries

Why choose one when you can have them all ?

Page 22: A Tool For A Job

© 2014 EXASOL AG

22

Questions ?

More details and a FREE community version of our database available at

www.exasol.com

Email:

[email protected]

Twitter : @EXASOLAG, @EXADude