fun with analytic functions - amazon s3€¦ · •pattern matching (find patterns, like v shaped...

Post on 05-Jun-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FUN WITH ANALYTIC FUNCTIONSUTOUG TRAINING DAYS 2017

ABOUT ME

• Born and raised here in UT

• In IT for 10 years, DBA for the last 6

• Databases and Data are my hobbies, I’m rather quite boring

• This isn’t why you’re here though

ANALYTIC FUNCTIONS… SAY WHAT?

• Analytic Functions compute a value based upon a subset of the rows in a query result

• The subset it referred to as “the partition” – Unrelated to table partitioning

• The best way to understand these functions is to compare them to standard Aggregate

functions (SUM, MIN, MAX, etc.)

AGGREGATE VS. ANALYTIC

The Data Aggregate AVG Analytic Function AVG

41 FLAVORS

• 41 different Analytic Functions

• Positional (FIRST, LAST, ROW_NUMBER, LEAD, LAG, RANK, etc.)

• Statistical (CORR, REG_R, N_TILE, STDDEV, etc.)

• Aggregate (SUM, AVG, MIN, MAX, etc.)

• Pattern Matching (Find patterns, like V shaped dips in stock ticker data)

• ListAgg

SAMPLES!

• Samples based

on SCOTT schema

• View -> Snippets

THE SYNTAX

It’s not as complicated as it looks

QUICK EXAMPLES

The Data Analytic Function AVG

select

ename,

job,

deptno,

avg(sal)over (partition by deptno)

avg_sal_by_deptno,

sal,

sal/(avg(sal) over (partition by deptno))

pct_of_average

from scott.emp

order by deptno desc;

FUNCTION(<field a>) OVER (PARTITION by <field b>)

MIX ‘N MATCH

select

ename,

job,

deptno,

avg(sal)over (partition by deptno)

avg_sal_by_deptno,

sal,

sal/(avg(sal) over (partition by deptno))

pct_of_average

from scott.emp

order by deptno desc;

select

ename,

job,

deptno,

min(sal) over (partition by deptno)

min_sal_by_deptno,

sal,

sal/(min(sal) over (partition by deptno))

pct_of_min

from scott.emp

order by deptno desc;

REAL LIFE

C-LEVEL ASKS EASY QUESTION

“Can you tell me the order that accounts were opened in?” “Can you give me an ordinal number (1st, 2nd, 3rd)?”

row_number() over (partition by acct order by acct_open_date)

WHAT ABOUT WHEN TWO SUB ACCOUNTS ARE OPENED ON THE SAME DAY, CAN YOU MAKE THOSE BE THE SAME?

dense_rank() over (partition by acct order by acct_open_date)

rank() over (partition by acct order by acct_open_date)

row_number() over (partition by acct order by acct_open_date)

Original Query

CAN YOU TELL ME HOW LONG IT TAKES BETWEEN ONE ACCOUNT AND ANOTHER?

lag(acct_open_date) over (partition by acct order by acct_open_date)

acct_open_date - lag(acct_open_date) over (partition by acct order by acct_open_date)

LAG

LEAD

WHAT SHE REALLY WANTED…

• I just need the sequence patterns, in general

This uses LISTAGG

LISTAGG

• LISTAGG(<string to concatenate>, ‘<concatenator>’ within group (order by <field>)

• LISTAGG(job, ' -> ') within group (order by hiredate)

NOT GOOD ENOUGH…

• “Can you order those by how common each pattern is?”

• Sure…?

SELECT

DISTINCT listagg(acct_description, ' -> ') WITHIN GROUP (order by ACCT_OPEN_DATE)

,

count(DISTINCT listagg(acct_description,' -> ') WITHIN GROUP (order by ACCT_OPEN_DATE))

pattern_observance_count

Analytic Functions can’t go in a GROUP BY Clause

DON’T PUT YOUR AF’S WHERE THEY DON’T BELONG

• Use a subquery to get around this

select

deptno,

avg(sal)over (partition by deptno)

avg_sal_by_deptno,

sal,

sal/(avg(sal) over (partition by deptno))

pct_of_average

from scott.emp

order by deptno desc;

select

deptno,

avg(sal)over (partition by deptno)

avg_sal_by_deptno,

sal,

sal/(avg(sal) over (partition by deptno))

pct_of_average

from scott.emp

where sal/(avg(sal) over (partition by deptno))

>1

order by deptno desc;

select

deptno,avg_sal_by_deptno,sal,pct_of_average

from (

select

deptno,

avg(sal)over (partition by deptno)

avg_sal_by_deptno,

sal,

sal/(avg(sal) over (partition by

deptno)) pct_of_average

from scott.emp

order by deptno desc

)

where pct_of_average >=1

GETTING ROLLED…

Can you tell me the transactions an account has done? Can you sum the Amounts?

NO, COULD YOU SUM UP THE AMOUNTS FOR EACH MONTH, BUT DON'T HIDE THE TRANSACTION DETAILS?

Original Data sum(amount)over

(partition by trunc(business_date,'MM'), acct_num)

monthly_total

sum(amount)

COULD YOU BREAK IT OUT BY THE TYPE OF TRANSACTION IT WAS? DEBIT VS. CREDIT?

sum(amount)over

(partition by trunc(business_date,'MM'),

acct_num,tran_type) monthly_total

sum(amount)over

(partition by trunc(business_date,'MM'),

acct_num) monthly_total

Nulls

treated

together

Same partition => same total

Different partition => different total

COULD YOU MAKE A ROLLING SUM TOO, BROKEN OUT THE SAME WAY?

sum(amount)over (partition by trunc(business_date,'MM'),acct_num,tran_type) monthly_total,

sum(amount) over ( partition by trunc(business_date,'MM'),acct,suffix,tran_type

order by acct_seq_num) rolling_monthly_total

PERFECT, BUT COULD YOU EXCLUDE THE CURRENT TRANSACTION FROM THE ROLLING MONTHLY TOTAL ?

sum(amount)over (partition by trunc(business_date,'MM'), acct_num,tran_type) monthly_total,

sum(amount) over ( partition by trunc(business_date,'MM'),acct,suffix,tran_type order by acct_seq_num)

rolling_monthly_total,

sum(amount) over ( partition by trunc(business_date,'MM'),acct,suffix,tran_type

ROWS BETWEEN UNBOUNDED PRECEDING and 1 PRECEDING ) roll_mnthly_tot_excl_cur_tran

ROWS AND RANGE – SUB PARTITIONS

• ROWS BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING

• ROWS BETWEEN UNBOUNDED PRECEDING and X PRECEDING

• ROWS is number of Rows

• RANGE is a numeric or date range

• PRECEEDING is before the current row

• FOLLOWING is after the current row

SIMPLE EXAMPLE

lead(row_number) over (partition by 'X' order by row_number) next_number,

first_value(row_number) over (partition by 'X' order by row_number rows between 2 FOLLOWING and 3 FOLLOWING)

number_after_the_next_number,

sum(row_number) over (partition by 'X' order by row_number rows between 1 FOLLOWING and 2 FOLLOWING)

sum_of_next_2_nums,

sum(row_number) over (partition by 'X' order by row_number rows between 1 FOLLOWING and UNBOUNDED FOLLOWING)

sum_nums_from_this_to_the_end,

sum(row_number) over (partition by 'X' order by row_number rows between 1 PRECEDING and 1 FOLLOWING)

sum_nums_1_before_to_1_after

FILLING HOLES

Can you tell me a drawer’s end of day totals are each day?

Lots of

missing days

How can we fill

in those gaps?

LET’S GET THE NEXT USED DATE ON EACH ROW

lead(branch_date) over (partition by branch_code,cashbox_id order by branch_date) next_used_date

Lets fix this null

AF’S CAN BE USED ALMOST ANYWHERE

case

when lead(branch_date) over (partition by branch_code,cashbox_id order by

branch_date)is null then

branch_date

else

lead(branch_date) over (partition by branch_code,cashbox_id order by branch_date)

end next_used_date,

NULLS FIXED!

Before After

But we still have gaps…

JOIN THIS TO A “CALENDAR”

Begin Date

Some big number larger than how far you want to go back.

This would calculate out the “End Date”

SELECT

to_date('20161101','YYYYMMDD')+ ROWNUM -1 calendar_date

FROM ( SELECT 1 just_a_column

FROM dual

CONNECT BY LEVEL <= (10000)

20161101* to_date('20161101','YYYYMMDD')

JOINING TO A CALENDAR

WHERE calendar_date BETWEEN branch_date and next_used_date-1

20161115 is between

20161115 and (20161116 -1)

20th is missing, but

20161120 is between

20161119 and (20161121– 1)

FILLED GAPS – THANKS TO AN AF

Before After

HOW BIG IS THAT CANYON?

• Department wanted to know details of accounts going negative

• They wanted to know how deep and how wide the “canyon” was when looking at a daily

history of account balances

-2000

-1500

-1000

-500

0

500

1000

1500

How deep?

How wide?

Start Time?End Time?

USE PATTERN MATCHING (12C)

The Data

-500

0

500

1000

1500

The Result

THINGS YOU CAN DO WITH IT:

• Find V, W and other patterns in Stock Prices

• Find timeframes of high database use

• Group clicks in web logs into sessions

• Detect traversal patterns of Finite State Machines

• We won’t go much deeper… but look into these, they’re neat!

NOT COMPLICATED, JUST INVOLVED

• Used wherever you can put data into a line graph, i.e. data is a log of events

• Lots of great resources:

• Ask Tom - http://www.oracle.com/technetwork/issue-archive/2013/13-nov/o63asktom-2034271.html

• GitHub - https://github.com/oracle/analytical-sql-examples/tree/master/pattern-matching

• Burleson - http://www.dba-oracle.com/t_sql_match_recognize.htm

• YouTube has some good demos too

AF PERFORMANCE?

• Keep an eye on performance – these do lots of sorts

• Try to use indexes, filter your data before applying analytic functions

• Sometimes AF’s can help improve performance, other times it can reduce it

• Tom Kyte says: In general, analytics are great for answering "really big" questions or

questions against "small sets" https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1137250200346660664

QUESTIONS?

top related