fun with analytic functions - amazon s3€¦ · •pattern matching (find patterns, like v shaped...
TRANSCRIPT
FUN WITH ANALYTIC FUNCTIONSUTOUG TRAINING DAYS 2017
ABOUT ME
• Born and raised here in UT
• In IT for 10 years, DBA for the last 6
• Databases and Data are my hobbies, I’m rather quite boring
• This isn’t why you’re here though
ANALYTIC FUNCTIONS… SAY WHAT?
• Analytic Functions compute a value based upon a subset of the rows in a query result
• The subset it referred to as “the partition” – Unrelated to table partitioning
• The best way to understand these functions is to compare them to standard Aggregate
functions (SUM, MIN, MAX, etc.)
AGGREGATE VS. ANALYTIC
The Data Aggregate AVG Analytic Function AVG
41 FLAVORS
• 41 different Analytic Functions
• Positional (FIRST, LAST, ROW_NUMBER, LEAD, LAG, RANK, etc.)
• Statistical (CORR, REG_R, N_TILE, STDDEV, etc.)
• Aggregate (SUM, AVG, MIN, MAX, etc.)
• Pattern Matching (Find patterns, like V shaped dips in stock ticker data)
• ListAgg
SAMPLES!
• Samples based
on SCOTT schema
• View -> Snippets
THE SYNTAX
It’s not as complicated as it looks
QUICK EXAMPLES
The Data Analytic Function AVG
select
ename,
job,
deptno,
avg(sal)over (partition by deptno)
avg_sal_by_deptno,
sal,
sal/(avg(sal) over (partition by deptno))
pct_of_average
from scott.emp
order by deptno desc;
FUNCTION(<field a>) OVER (PARTITION by <field b>)
MIX ‘N MATCH
select
ename,
job,
deptno,
avg(sal)over (partition by deptno)
avg_sal_by_deptno,
sal,
sal/(avg(sal) over (partition by deptno))
pct_of_average
from scott.emp
order by deptno desc;
select
ename,
job,
deptno,
min(sal) over (partition by deptno)
min_sal_by_deptno,
sal,
sal/(min(sal) over (partition by deptno))
pct_of_min
from scott.emp
order by deptno desc;
REAL LIFE
C-LEVEL ASKS EASY QUESTION
“Can you tell me the order that accounts were opened in?” “Can you give me an ordinal number (1st, 2nd, 3rd)?”
row_number() over (partition by acct order by acct_open_date)
WHAT ABOUT WHEN TWO SUB ACCOUNTS ARE OPENED ON THE SAME DAY, CAN YOU MAKE THOSE BE THE SAME?
dense_rank() over (partition by acct order by acct_open_date)
rank() over (partition by acct order by acct_open_date)
row_number() over (partition by acct order by acct_open_date)
Original Query
CAN YOU TELL ME HOW LONG IT TAKES BETWEEN ONE ACCOUNT AND ANOTHER?
lag(acct_open_date) over (partition by acct order by acct_open_date)
acct_open_date - lag(acct_open_date) over (partition by acct order by acct_open_date)
LAG
LEAD
WHAT SHE REALLY WANTED…
• I just need the sequence patterns, in general
This uses LISTAGG
LISTAGG
• LISTAGG(<string to concatenate>, ‘<concatenator>’ within group (order by <field>)
• LISTAGG(job, ' -> ') within group (order by hiredate)
NOT GOOD ENOUGH…
• “Can you order those by how common each pattern is?”
• Sure…?
SELECT
DISTINCT listagg(acct_description, ' -> ') WITHIN GROUP (order by ACCT_OPEN_DATE)
,
count(DISTINCT listagg(acct_description,' -> ') WITHIN GROUP (order by ACCT_OPEN_DATE))
pattern_observance_count
…
Analytic Functions can’t go in a GROUP BY Clause
DON’T PUT YOUR AF’S WHERE THEY DON’T BELONG
• Use a subquery to get around this
select
deptno,
avg(sal)over (partition by deptno)
avg_sal_by_deptno,
sal,
sal/(avg(sal) over (partition by deptno))
pct_of_average
from scott.emp
order by deptno desc;
select
deptno,
avg(sal)over (partition by deptno)
avg_sal_by_deptno,
sal,
sal/(avg(sal) over (partition by deptno))
pct_of_average
from scott.emp
where sal/(avg(sal) over (partition by deptno))
>1
order by deptno desc;
select
deptno,avg_sal_by_deptno,sal,pct_of_average
from (
select
deptno,
avg(sal)over (partition by deptno)
avg_sal_by_deptno,
sal,
sal/(avg(sal) over (partition by
deptno)) pct_of_average
from scott.emp
order by deptno desc
)
where pct_of_average >=1
GETTING ROLLED…
Can you tell me the transactions an account has done? Can you sum the Amounts?
NO, COULD YOU SUM UP THE AMOUNTS FOR EACH MONTH, BUT DON'T HIDE THE TRANSACTION DETAILS?
Original Data sum(amount)over
(partition by trunc(business_date,'MM'), acct_num)
monthly_total
sum(amount)
COULD YOU BREAK IT OUT BY THE TYPE OF TRANSACTION IT WAS? DEBIT VS. CREDIT?
sum(amount)over
(partition by trunc(business_date,'MM'),
acct_num,tran_type) monthly_total
sum(amount)over
(partition by trunc(business_date,'MM'),
acct_num) monthly_total
Nulls
treated
together
Same partition => same total
Different partition => different total
COULD YOU MAKE A ROLLING SUM TOO, BROKEN OUT THE SAME WAY?
sum(amount)over (partition by trunc(business_date,'MM'),acct_num,tran_type) monthly_total,
sum(amount) over ( partition by trunc(business_date,'MM'),acct,suffix,tran_type
order by acct_seq_num) rolling_monthly_total
PERFECT, BUT COULD YOU EXCLUDE THE CURRENT TRANSACTION FROM THE ROLLING MONTHLY TOTAL ?
sum(amount)over (partition by trunc(business_date,'MM'), acct_num,tran_type) monthly_total,
sum(amount) over ( partition by trunc(business_date,'MM'),acct,suffix,tran_type order by acct_seq_num)
rolling_monthly_total,
sum(amount) over ( partition by trunc(business_date,'MM'),acct,suffix,tran_type
ROWS BETWEEN UNBOUNDED PRECEDING and 1 PRECEDING ) roll_mnthly_tot_excl_cur_tran
ROWS AND RANGE – SUB PARTITIONS
• ROWS BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING
• ROWS BETWEEN UNBOUNDED PRECEDING and X PRECEDING
• ROWS is number of Rows
• RANGE is a numeric or date range
• PRECEEDING is before the current row
• FOLLOWING is after the current row
SIMPLE EXAMPLE
lead(row_number) over (partition by 'X' order by row_number) next_number,
first_value(row_number) over (partition by 'X' order by row_number rows between 2 FOLLOWING and 3 FOLLOWING)
number_after_the_next_number,
sum(row_number) over (partition by 'X' order by row_number rows between 1 FOLLOWING and 2 FOLLOWING)
sum_of_next_2_nums,
sum(row_number) over (partition by 'X' order by row_number rows between 1 FOLLOWING and UNBOUNDED FOLLOWING)
sum_nums_from_this_to_the_end,
sum(row_number) over (partition by 'X' order by row_number rows between 1 PRECEDING and 1 FOLLOWING)
sum_nums_1_before_to_1_after
FILLING HOLES
Can you tell me a drawer’s end of day totals are each day?
Lots of
missing days
How can we fill
in those gaps?
LET’S GET THE NEXT USED DATE ON EACH ROW
lead(branch_date) over (partition by branch_code,cashbox_id order by branch_date) next_used_date
Lets fix this null
AF’S CAN BE USED ALMOST ANYWHERE
case
when lead(branch_date) over (partition by branch_code,cashbox_id order by
branch_date)is null then
branch_date
else
lead(branch_date) over (partition by branch_code,cashbox_id order by branch_date)
end next_used_date,
NULLS FIXED!
Before After
But we still have gaps…
JOIN THIS TO A “CALENDAR”
Begin Date
Some big number larger than how far you want to go back.
This would calculate out the “End Date”
SELECT
to_date('20161101','YYYYMMDD')+ ROWNUM -1 calendar_date
FROM ( SELECT 1 just_a_column
FROM dual
CONNECT BY LEVEL <= (10000)
20161101* to_date('20161101','YYYYMMDD')
JOINING TO A CALENDAR
WHERE calendar_date BETWEEN branch_date and next_used_date-1
20161115 is between
20161115 and (20161116 -1)
20th is missing, but
20161120 is between
20161119 and (20161121– 1)
FILLED GAPS – THANKS TO AN AF
Before After
HOW BIG IS THAT CANYON?
• Department wanted to know details of accounts going negative
• They wanted to know how deep and how wide the “canyon” was when looking at a daily
history of account balances
-2000
-1500
-1000
-500
0
500
1000
1500
How deep?
How wide?
Start Time?End Time?
USE PATTERN MATCHING (12C)
The Data
-500
0
500
1000
1500
The Result
THINGS YOU CAN DO WITH IT:
• Find V, W and other patterns in Stock Prices
• Find timeframes of high database use
• Group clicks in web logs into sessions
• Detect traversal patterns of Finite State Machines
• We won’t go much deeper… but look into these, they’re neat!
NOT COMPLICATED, JUST INVOLVED
• Used wherever you can put data into a line graph, i.e. data is a log of events
• Lots of great resources:
• Ask Tom - http://www.oracle.com/technetwork/issue-archive/2013/13-nov/o63asktom-2034271.html
• GitHub - https://github.com/oracle/analytical-sql-examples/tree/master/pattern-matching
• Burleson - http://www.dba-oracle.com/t_sql_match_recognize.htm
• YouTube has some good demos too
AF PERFORMANCE?
• Keep an eye on performance – these do lots of sorts
• Try to use indexes, filter your data before applying analytic functions
• Sometimes AF’s can help improve performance, other times it can reduce it
• Tom Kyte says: In general, analytics are great for answering "really big" questions or
questions against "small sets" https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1137250200346660664
QUESTIONS?