presentations from the cloudera impala meetup on aug 20 2013

46
1 Parquet Update/UDFs in Impala Nong Li So:ware Engineer, Cloudera

Upload: cloudera-inc

Post on 02-Jul-2015

5.685 views

Category:

Technology


0 download

DESCRIPTION

Presentations from the Cloudera Impala meetup on Aug 20 2013: - Nong Li on Parquet+Impala and UDF support - Henry Robinson on performance tuning for Impala

TRANSCRIPT

Page 1: Presentations from the Cloudera Impala meetup on Aug 20 2013

1

Parquet  Update/UDFs  in  Impala    Nong  Li  So:ware  Engineer,  Cloudera  

Page 2: Presentations from the Cloudera Impala meetup on Aug 20 2013

Agenda  

2

•  Parquet  •  File  format  descripBon  •  Benchmark  Results  in  Impala  •  Parquet  2.0  

•  UDF/UDAs  

Page 3: Presentations from the Cloudera Impala meetup on Aug 20 2013

3

Parquet  

Page 4: Presentations from the Cloudera Impala meetup on Aug 20 2013

4

Page 5: Presentations from the Cloudera Impala meetup on Aug 20 2013

5

Page 6: Presentations from the Cloudera Impala meetup on Aug 20 2013

6

Page 7: Presentations from the Cloudera Impala meetup on Aug 20 2013

7

Page 8: Presentations from the Cloudera Impala meetup on Aug 20 2013

Data  Pages  

8

•  Values  are  stored  in  data  pages  as  a  triple:  DefiniBon  Level,  RepeBBon  Level  and  Value.  

•  These  are  stored  conBguous  on  disk  =>  1  seek  to  read  a  column  regardless  of  nesBng.  

•  Data  pages  are  stored  with  different  encodings:  

•  Bit  packing  and  Run  Length  Encoding  (RLE)  •  DicBonary  for  strings  

•  Extended  to  all  types  in  Parquet  1.1  •  Plain  (liWle  endian  encoding)  for  naBve  types.  

Page 9: Presentations from the Cloudera Impala meetup on Aug 20 2013

Parquet  2.0  

9

•  AddiBonal  Encodings  •  Group  VarInt  (for  small  ints)  •  Improved  string  storage  format  •  Delta  Encoding  (for  strings  and  ints)  

•  AddiBonal  Metadata  •  Sorted  files  •  Page/Column/File  StaBsBcs  

•  Expected  to  further  reduce  on  disk  size  and  allow  for  skipping  values  on  the  read  path.  

Page 10: Presentations from the Cloudera Impala meetup on Aug 20 2013

Hardware  Setup  

10

•  10  Nodes  •  16  Core  Xeon  •  48  GB  Ram  •  12  Disks  •  CDH4.3  •  Impala  1.1  

Page 11: Presentations from the Cloudera Impala meetup on Aug 20 2013

TPC-­‐H  lineitem  table  @  1TB  scale  factor  

11

0  

100  

200  

300  

400  

500  

600  

700  

800  

Text   Text  w/  Lzo   Seq  w/  Snappy   Avro  w/  Snappy   RcFile  w/  Snappy   Parquet  w/  Snappy   Seq  w/  Gzip  

Size  (GB)  

Page 12: Presentations from the Cloudera Impala meetup on Aug 20 2013

Query  Times  on  TPC-­‐H  lineitem  table  

12

0  

100  

200  

300  

400  

500  

600  

700  

800  

1  Column   3  Columns   5  Columns   16  (all)  Columns   5  Columns,  3  Clients  

Tpch  Q1  (7  Columns)  

Bytes  Read  Q1  (GB)  

Text  

Seq  w/  Snappy  

Avro  w/  Snappy  

RcFile  w/  Snappy  

Parquet  w/  Snappy  

Page 13: Presentations from the Cloudera Impala meetup on Aug 20 2013

Query  Times  on  TPCDS  Queries  

13

0  

50  

100  

150  

200  

250  

300  

350  

400  

450  

500  

Q27   Q34   Q42   Q43   Q46   Q52   Q55   Q59   Q65   Q73   Q79   Q96  

Second

s  

Text  

Seq  w/  Snappy  

RC  w/Snappy  

Parquet  w/Snappy  

Average  Times  (Geometric  Mean)  •  Text:  224  seconds  •  Seq  Snappy:  257  seconds  •  RC  Snappy:  150  seconds  •  Parquet:  61  seconds  

Page 14: Presentations from the Cloudera Impala meetup on Aug 20 2013

Agenda  

14

•  Parquet  •  File  format  descripBon  •  Benchmark  Results  in  Impala  • What’s  Next  

•  UDF/UDAs  (Work  in  Progress)  

Page 15: Presentations from the Cloudera Impala meetup on Aug 20 2013

Terminology  

15

•  UDF:  Tuple  -­‐>  Scalar  user-­‐defined  funcBon  

•  E.g.  Substring  

•  UDA/UDAF:  {Tuple}  -­‐>  Scalar  user-­‐defined  aggregate  funcBon  

•  E.g.  Min  

•  UDTF:  {Tuple}  -­‐>  {Tuple}  user-­‐defined  table  funcBon  

Page 16: Presentations from the Cloudera Impala meetup on Aug 20 2013

Impala  1.2  

16

•  Support  Hive  UDFs  (java)  •  ExisBng  hive  jars  will  run  without  a  recompile.  

•  Add  Impala  (naBve)  UDFs  and  UDAs.  •  New  interface  designed  to  execute  as  efficiently  as  possible  for  Impala.  

•  Similar  interface  as  Postgres  UDFs/UDAs  •  UDF/UDA  registered  for  impala  service  in  metadata  catalog  

•  i.e.  CREATE  FUNCTION/CREATE  AGGREGATE  

   

Page 17: Presentations from the Cloudera Impala meetup on Aug 20 2013

Example  UDF  

17

//  This  UDF  adds  two  ints  and  returns  an  int.    IntVal  AddUdf(UdfContext*  context,    

           const  IntVal&  arg1,                                const  IntVal&  arg2)  {        if  (arg1.is_null  ||  arg2.is_null)  return  IntVal::null();      return  IntVal(arg1.val  +  arg2.val);  }  

Page 18: Presentations from the Cloudera Impala meetup on Aug 20 2013

DDL  

18

CREATE  statement  will  need  to  specify  the  UDF/UDA  signature,  the  locaBon  of  the  binary  and  the  symbol  for  the  UDF  funBon.  

CREATE  FUNCTION  substring(string,  int,  int)  RETURNS  string  LOCATION  “hdfs://path”  “com.me.Substring”    CREATE  FUNCTION  log(anytype)  RETURNS  anytype  LOCATION  “hdfs:://path2”  “Log”  

Page 19: Presentations from the Cloudera Impala meetup on Aug 20 2013

UDFs  

19

•  Support  for  variadic  args    •  Support  for  polymorphic  types  

Page 20: Presentations from the Cloudera Impala meetup on Aug 20 2013

UDAs  

20

•  UDA  must  implement  typical  state  machine:  •  Init()  •  Update()  •  Serialize()  •  Merge()  •  Finalize()  

•  Data  movement  handled  by  Impala  

Page 21: Presentations from the Cloudera Impala meetup on Aug 20 2013

UDA  Example  

21

//  This  is  a  sample  of  implementing  the  COUNT  aggregate  function.    void  Init(UdfContext*  context,  BigIntVal*  val)  {      val-­‐>is_null  =  false;      val-­‐>val  =  0;  }    

void  Update(UdfContext*  context,  const  AnyVal&  input,  BigIntVal*  val)  {      if  (input.is_null)  return;      ++val-­‐>val;  }    

void  Merge(UdfContext*  context,  const  BigIntVal&  src,  BigIntVal*  dst)  {      dst-­‐>val  +=  src.val;  }    

BigIntVal  Finalize(UdfContext*  context,  const  BigIntVal&  val)  {      return  val;  }  

Page 22: Presentations from the Cloudera Impala meetup on Aug 20 2013

RunBme  Code-­‐GeneraBon  

22

•  Impala  uses  LLVM  to,  at  runBme,  generate  code  to  run  the  query.  •  Takes  into  account  constants  that  that  are  only  

known  a:er  query  analysis.  •  Greatly  improves  CPU  efficiency  

•  NaBve  UDFs/UDAs  can  benefit  from  this  as  well.  

•  Instead  of  providing  the  UDF/UDA  as  a  shared  object,  compile  it  (with  CLANG)  with  an  addiBonal  flag  and  Impala  to  LLVM  IR  

•  IR  will  be  integrated  with  the  query  execuBon.  •  No  funcBon  call  overhead  for  UDF/UDAs  

Page 23: Presentations from the Cloudera Impala meetup on Aug 20 2013

LimitaBons  

23

•  Hive  UDAs/UDTFs  not  supported  •  No  UDTFs  in  naBve  interface  •  Can’t  run  out  of  process  

•  NaBve  interface  is  designed  to  support  this,  will  be  able  to  run  without  a  recompile  

• We’re  planning  to  address  this  in  Impala  1.3  

   

Page 24: Presentations from the Cloudera Impala meetup on Aug 20 2013

Thanks!  

24

• We’d  love  your  feedback  for  UDFs/UDAs  

•  QuesBons?  

Page 25: Presentations from the Cloudera Impala meetup on Aug 20 2013

Performance Considerations

for Cloudera Impala

Henry Robinson [email protected] / @henryrImpala Meetup 2013-08-20

Page 26: Presentations from the Cloudera Impala meetup on Aug 20 2013

Agenda

● The basics: Performance Checklist● Review: How does Impala execute queries?● What makes queries fast (or slow)?● How can I debug my queries?

Page 27: Presentations from the Cloudera Impala meetup on Aug 20 2013

Impala Performance Checklist

● Verify – Simple count * query on a relatively big table and verify:○ Data locality, block locality, and NO check-summing (“Testing Impala

Performance”)○ Optimal IO throughput of HDFS scans (typically ~100 MB/s per disk)

● Stats – BOTH table and column stats, especially for:○ Joining two large tables○ Insert into as select through Impala

● Join table ordering – will be automatic in the Impala 2.0 wave. Until then:○ Largest table first○ Then most selective to least selective

● Monitor - monitor Impala queries to pinpoint slow queries and drill into potential issues○ CM 4.6 adds query monitoring○ CM 5.0 will have the next big enhancements

Page 28: Presentations from the Cloudera Impala meetup on Aug 20 2013

Part 1: How does Impala execute queries?

Page 29: Presentations from the Cloudera Impala meetup on Aug 20 2013

The basic idea

● Every Impala query runs across a cluster of multiple nodes, with lots of available CPU cores, memory and disk

● Best query speeds usually come when every node in the cluster has something to do

● Impala solves two basic problems: ○ Figure out what every node should do (compilation)○ Make them do it really quickly! (execution)

Page 30: Presentations from the Cloudera Impala meetup on Aug 20 2013

Query compilation

● a.k.a. ‘figuring out what every node should do’

● Impala compiles a SQL query into a plan describing what to execute, and where

● A plan is shaped like a tree. Data flows up from the leaves of the tree to the root.

● Each node in the tree is a query operator

● Impala chops this tree up into plan fragments

● Each node gets one or more plan fragments

Page 31: Presentations from the Cloudera Impala meetup on Aug 20 2013

Query execution

● Once started, each query operator can run independently of any other operator

● Every operator can be doing something at the same time

● This is the not-so-secret sauce for all massively parallel query execution engines

Page 32: Presentations from the Cloudera Impala meetup on Aug 20 2013

Part 2: What makes queries fast (or... slow)?

Page 33: Presentations from the Cloudera Impala meetup on Aug 20 2013

What determines performance?

● Data size

● Per-operator execution efficiency

● Available parallelism

● Available concurrency

● Hardware

● Schema design and file format

Page 34: Presentations from the Cloudera Impala meetup on Aug 20 2013

Data size

● More data means more work

● Not just the size of the disk-based data at plan leaves, but size of internal data flowing in to any operator

● How can you help?○ Partition your data

○ SELECT with LIMIT in subqueries

○ Push predicates down

○ Use correct JOIN order■ Gather table statistics

○ Use the right file format

Page 35: Presentations from the Cloudera Impala meetup on Aug 20 2013

● Tables are joined in the order listed in the FROM clause

● Impala uses left-deep trees for nested joins

● “Largest” table should be listed first○ largest = returning most rows before join filtering○ In a star schema, this is often the fact table

● Then list tables in order of most selective join filter to least selective○ Filter the most rows as early as possible

Table Ordering

Page 36: Presentations from the Cloudera Impala meetup on Aug 20 2013

Join Types

● Two types of join strategy are supported○ Broadcast○ Shuffle/Partitioned

● Broadcast○ Each node receives a full copy of the right table○ Per node memory usage = size of right table

● Shuffle○ Both sides of the join are partitioned○ Matching partitions sent to same node○ Per node memory usage = 1/nodes x size of right table

● Without column statistics, all joins are broadcast

Page 37: Presentations from the Cloudera Impala meetup on Aug 20 2013

Per-operator execution efficiency

● Impala is fast, and getting faster

● LLVM-based improvements

● More efficient disk scanners

● More modern algorithms from the DB literature

● How can you help?○ Upgrade to the latest version

Page 38: Presentations from the Cloudera Impala meetup on Aug 20 2013

Available parallelism

● Parallelism: number of resources available to use at once

● More hardware means more parallelism

● Impala will take advantage of more cores, disks and memory where possible

● Easiest (but most expensive!) way to improve performance of large class of queries

● You can scale up incrementally

Page 39: Presentations from the Cloudera Impala meetup on Aug 20 2013

Available concurrency

● Concurrency: how well can a query take advantage of available parallelism?

● Impala will take care of this mostly for you

● But some operators naturally don’t parallelise well in certain conditions

● For example: joining two huge tables together.○ The hash-node operators have to wait for one side to be read

completely before reading much of the other side

● How you can help:○ Read the profiles, look for obvious bottlenecks, rephrase if possible

Page 40: Presentations from the Cloudera Impala meetup on Aug 20 2013

Hardware

● Designed for modern hardware○ Leverages SSE 4.2 (Intel Nehalem or newer)○ LLVM Compiler Infrastructure○ Runtime Code Generation○ In-memory execution pipelines

● Today’s hardware○ 2 x Xeon E5 6 core CPUs○ 12 x 3 TB HDD○ 128 GB RAM

● How you can help:○ Use the supported platforms, with Cloudera’s

packages

Page 41: Presentations from the Cloudera Impala meetup on Aug 20 2013

Schema design

● PARTITION BY is an easy win

● In general, string is slower than fixed-width types (particularly for aggregations etc)

● File formats are crucial○ Experiment with Parquet for performance○ Avoid text

Page 42: Presentations from the Cloudera Impala meetup on Aug 20 2013

Supported File Formats

● Various HDFS file formats○ Text File (read/write)○ Avro (read)○ SequenceFile (read)○ RCFile (read)○ ParquetFile (read/write)

● Various compression codecs○ Snappy (ParquetFile, RCFile, SequenceFile, Avro)○ LZO (Text)○ Bzip (ParquetFile, RCFile, SequenceFile, Avro)○ Gzip (ParquetFile, RCFile, SequenceFile, Avro)

● HBase also supported

Page 43: Presentations from the Cloudera Impala meetup on Aug 20 2013

Partitioning Considerations

● Single largest performance feature○ Skips unnecessary data○ Requires queries contain partition keys as filters

● Choose a reasonable number of partitions○ Lots of small files becomes an issue○ Metadata overhead on NameNode○ Metadata overhead for Hive Metastore○ Impala caches this, but first load may take long

Page 44: Presentations from the Cloudera Impala meetup on Aug 20 2013

Part 3: Debugging queries

Page 45: Presentations from the Cloudera Impala meetup on Aug 20 2013

The Debug Pages

● Every impalad exports a lot of useful information on http://<impalad>:25000 (by default), including:○ Last 25 queries○ Active sessions○ Known tables○ Last 1MB of the log○ System metrics○ Query profiles

● Information-dense - not for the faint of heart!

Page 46: Presentations from the Cloudera Impala meetup on Aug 20 2013

Thanks! Questions?

Try It Out!● Apache-licensed open source

○ Impala 1.1 released 7/24/2013○ Impala 1.0 GA released 4/30/2013

● Questions/comments?○ Download: cloudera.com/impala○ Email: [email protected]○ Join: groups.cloudera.org○ MeetUp: meetup.com/Bay-Area-Impala-Users-

Group/