user-defined table generating functions

22
User-Defined Table-Generating Functions (UDTF) Table-Generating Functions (UDTF) By Paul Yang ([email protected])

Upload: pauly1

Post on 07-Nov-2014

9.084 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: User-Defined Table Generating Functions

User-Defined

Table-Generating Functions (UDTF)Table-Generating Functions (UDTF)

By Paul Yang ([email protected])

Page 2: User-Defined Table Generating Functions

Outline

� UDTF Description and Usage

� Execution Phase

� Compile Phase

Page 3: User-Defined Table Generating Functions

UDF vs UDAF vs UDTF

� User Defined Functions

– One-to-one mapping

– concat(“foo”, “bar”)

� User Defined Aggregate Functions� User Defined Aggregate Functions

– Many-to-one mapping

– sum(num_ads)

� User Defined Table-generating Functions

– One-to-many mapping

– explode([1,2,3])

Page 4: User-Defined Table Generating Functions

UDTF Example (Transform)

� explode(Array<?> arg)

– Converts an array into multiple rows, with one

element per row

� Transform-like syntax� Transform-like syntax

– SELECT udtf(col0, col1, …) AS

colAlias FROM srcTable

Page 5: User-Defined Table Generating Functions

UDTF Example (Transform)

� SELECT explode(group_ids) AS group_id

FROM src

Table src Output

group_ids

[1]

[2, 3]

Table src

group_id

1

2

3

Output

Page 6: User-Defined Table Generating Functions

UDTF (Lateral View)

� Transform syntax limited to single

expression

– SELECT pageid, explode(adid_list)…

� Use lateral view� Use lateral view

– Creates a virtual table using UDTF

� Lateral view syntax

– …FROM baseTable

LATERAL VIEW udtf(col0, col1…)

tableAlias AS colAlias0, colAlias1…

Page 7: User-Defined Table Generating Functions

UDTF (Lateral View)

� Example Query

� SELECT src.*, myTable.*

FROM src LATERAL VIEW

explode(group_ids) myTable AS explode(group_ids) myTable AS

group_id

user_id group_ids

100 [1]

101 [2,3]

src

Page 8: User-Defined Table Generating Functions

UDTF (Lateral View)

user_id group_ids

100 [1]

101 [2,3]

src

group_id

1

2

3

explode(group_ids) myTable AS group_id

3

user_id group_ids group_id

100 [1] 1

101 [2,3] 2

101 [2,3] 3

Result

Join input rows to

output rows

Page 9: User-Defined Table Generating Functions

Outline

� UDTF Description and Usage

� Execution Phase

� Compile Phase

Page 10: User-Defined Table Generating Functions

Execution Phase

� 2 New Operators

– UDTFOperator

– LateralViewJoinOperator

� Different Operator DAG’s� Different Operator DAG’s

– For SELECT udtf(…)

– For … FROM udtf(…) … LATERAL VIEW

Page 11: User-Defined Table Generating Functions

Execution Phase - UDTFOperator

processOp()

UDTFOperator

GenericUDTF

process()

[1,2]

processOp()

forwardUDTFOutput()

process()

Collector

collect()

[1,2]

1

1

1

Page 12: User-Defined Table Generating Functions

GenericUDTF Interface

Page 13: User-Defined Table Generating Functions

TS

Execution Phase - TransformSELECT explode(group_ids) AS group_id

FROM src

Gets the rows of src

SELECT

UDTF

Selects group_ids

Calls explode

More Operators

Page 14: User-Defined Table Generating Functions

Execution Phase - LateralViewJoinOperator

foo barbaz

LVJ

foo, barfoo, baz

Page 15: User-Defined Table Generating Functions

Execution Phase (Lateral View)

TS

SELECT SELECT

Gets the rows of src

Selects other

needed

Selects arg.

cols for UDTF

FROM src LATERAL VIEW

explode(group_ids) myTable AS

group_id

LVJ

UDTF

needed

columns

(user_id,

group_ids)

cols for UDTF

(group_ids)

Calls explode()

Joins input and

ouput rows

More Operators

Page 16: User-Defined Table Generating Functions

Outline

� UDTF Description and Usage

� Execution Phase

� Compile Phase

Page 17: User-Defined Table Generating Functions

Compile Phase - Transform� SELECT explode(group_id_array) AS

group_id FROM src

Page 18: User-Defined Table Generating Functions

Compile Time - Transform

� UDTF detected in genSelectPlan()

...

SemanticAnalyzer::genSelectPlan()

if (udtfExpr.getType() == HiveParser.TOK_FUNCTION) {

String funcName =

TypeCheckProcFactory.DefaultExprProcessor.getFunctionText(

udtfExpr, true);

FunctionInfo fi = FunctionRegistry.getFunctionInfo(funcName);

if (fi != null) {

genericUDTF = fi.getGenericUDTF();

}

isUDTF = (genericUDTF != null);

}

...

Page 19: User-Defined Table Generating Functions

Compile Time - Transform

� If a UDTF is present

– Generate a select operator to get the columns

needed for UDTF

– Attach UDTFOperator to SelectOperator– Attach UDTFOperator to SelectOperator

...

if (isInTransform) {

output = genScriptPlan(trfm, qb, output);

}

if (isUDTF) {

output = genUDTFPlan(genericUDTF,

udtfTableAlias, udtfColAliases, qb,

output);

}

SemanticAnalyzer::genSelectPlan()

Page 20: User-Defined Table Generating Functions

� SELECT * FROM src LATERAL VIEW

explode(group_ids) myTable AS

group_id

Compile Phase - Lateral View

Partial AST for above query:

Page 21: User-Defined Table Generating Functions

Compile Phase – Lateral View

� In SemanticAnalyzer::doPhase1()

– make a mapping from the source table alias to

TOK_LATERAL_VIEW

“src”

Page 22: User-Defined Table Generating Functions

Compile Phase– Lateral View

� In SemanticAnalyzer::

genPlan()

– Iterate through

Map<String, Operator>

TS

Map<String, Operator>

aliasToOpInfo

– Attach SELECT/UDTF

Operator DAG with

genLateralViewPlans()Attached by

genLateralViewPlans()