- sheepsqueezers.comsheepsqueezers.com/media/documentation/oracle/ore-trng5...both obiee rpd and bi...
TRANSCRIPT
<Insert Picture Here>
©2011 Oracle – All Rights Reserved
Oracle R Enterprise – Training Sessions
Session 5: Operationalizing R Scripts
Mark Hornick, Senior Manager, Development
Oracle Advanced Analytics
2
The following is intended to outline our general product direction.
It is intended for information purposes only, and may not be
incorporated into any contract. It is not a commitment to deliver
any material, code, or functionality, and should not be relied upon
in making purchasing decisions.
The development, release, and timing of any features or
functionality described for Oracle’s products remain at the sole
discretion of Oracle.
3
Oracle R Enterprise Training Sessions
Date / Time Session Topics
Thursday, December 1, 2:00 PM ET Getting Started with Oracle R Enterprise
(ORE)
Oracle R Enterprise Overview
Installation of R
Installation of Oracle R Enterprise
Connecting to Exadata from R
Tuesday, December 6, 11:00 AM ET Introduction to the R Language and
Environment
R Language Basics
Producing Graphs in R
Thursday, December 8, 1:00 PM ET ORE Transparency Layer Interacting with Database Tables
Manipulating and transforming data through ORE
Tuesday, December 13, 11:00 AM ET ORE Embedded R Script Execution Execution through R interface
Execution through SQL interface
Thursday, December 15, 2:00 PM ET Operationalizing R Scripts From Analyst to Production
Integration with OBIEE
XML graph generation using SQL
Tuesday, December 20, 11:00 AM ET Advanced Topics Base SAS equivalent functionality
ORE support for Hadoop and Map-Reduce
Use of ORE in Exadata and BDA environments
©2011 Oracle – All Rights Reserved
4
Topics
• XML graph generation using SQL
• Integration with BI Publisher and OBIEE
• From analyst to production
©2011 Oracle – All Rights Reserved
5
XML Output Generation through SQL
©2011 Oracle – All Rights Reserved
6
rqEval – generate XML string for graphic output set long 20000
set pages 1000
begin
sys.rqScriptCreate('Example6',
'function(){
res <- 1:10
plot( 1:100, rnorm(100), pch = 21,
bg = "red", cex = 2 )
res
}');
end;
/
select value
from table(rqEval( NULL,'XML','Example6'));
• Execute the function that plots 100
random numbers
• Returns a vector with values 1 to 10
• No parameters are specified
• Return the results as XML
• View the XML VALUE returned, which
can be consumed by BI Publisher
©2011 Oracle – All Rights Reserved
VALUE
--------------------------------------------- <?xml version="1.0"?> <root> <RDataType type="Vector"><variable name="result" ty
pe="numeric"> <value>1</value> <value>2</value> <value>3</value> <value>4</v
alue> <value>5</value> <value>6</value> <value>7</value> <value>8</value> <
value>9</value> <value>10</value> </variable></RDataType> <img src="data:image/
png;base64,
iVBORw0KGgoAAAANSUhEUgAAAeAAAAHgCAIAAADytinCAAAgAElEQVR4nOzdeVxN+f8H8Ndt
URRiCClLdtnX9n3VJm0SQ8gwZJ/BEMLY9yXrMDO2ilBZE7JlzRqRES1CKaXSfu/vj778onuO
lrv3fj78MXM+p8953Zb3Pfecz/l8ODweD4QQQiSPnLgDEEII4Y8KNCGESCgq0IQQIqGoQBNC
iISiAk0IIRKKCjQhhEgoKtCEECKhqEATQoiEogJNCCESigo0IYRIKCrQhBAioahAE0KIhKIC
7
Of the 36 busiest airports,
which are the best/worst for Arrival Delay?
©2011 Oracle – All Rights Reserved
8
rqEval – 36 busiest airports graph begin
sys.rqScriptCreate('BusiestAirports',
' function(){
ontime <- ONTIME_S
n <- 36
aggdata <- aggregate(ontime$DEST, by = list(ontime$DEST), FUN = length)
minx <- min(head(sort(aggdata$x, decreasing = TRUE), n))
busiest_airports <- aggdata$Group.1[aggdata$x >= minx, drop = TRUE]
delay <- ontime$ARRDELAY[ontime$DEST %in% busiest_airports '||'&'||' ontime$YEAR == 2007]
dest <- ontime$DEST[ontime$DEST %in% busiest_airports '||'&'||' ontime$YEAR == 2007, drop = TRUE]
dest <- reorder(dest, delay, FUN = median, na.rm = TRUE)
bd <- split(delay, dest)
res <- boxplot(bd, notch = TRUE, col = "gold", cex = 0.5,
outline = FALSE, horizontal = TRUE, yaxt = "n",
main = paste("2007 Flight Delays by Airport, top ",n, " busiest",sep=""),
xlab = "Delay (minutes)", ylab = "Airport")
labels <- levels(dest)
text(par("usr")[1] - 3, 1:length(labels), srt = 0, adj = 1,
labels = labels, xpd = TRUE, cex = 0.75)
topbusiest <-
data.frame(UNIQUECARRIER=res$names,
MIN=res$stats[1,], Q1 =res$stats[2,], CONF_LOW=res$conf[1,], MEDIAN=res$stats[3,],
CONF_HIGH=res$conf[2,], Q3 =res$stats[4,], MAX=res$stats[5,])
topbusiest } ');
end;
/
©2011 Oracle – All Rights Reserved
9
rqEval – 36 busiest airports graph set long 20000
set pages 1000
select value
from table(rqEval( NULL,'XML','BusiestAirports'));
begin
sys.rqScriptDrop('BusiestAirports');
end;
/
• Execute the function that graphs
the top 36 busiest airports. No
parameters are specified.
• Return the results as XML
• View the XML VALUE returned,
which can be consumed by BI
Publisher
• View the table data associated
with the graph.
©2011 Oracle – All Rights Reserved
10
Integration with BI Publisher and OBIEE
©2011 Oracle – All Rights Reserved
11
Enabling ORE Results in OBIEE Approaches
OBIEE
Oracle Database
Dashboards
ORE
R Script
Table
OBIEE
Oracle Database
Dashboards
ORE
R Script
Graphics / Table Result Generation
BIP
MS Word BIP Template Builder
Independent R script execution R script execution from dashboard
©2011 Oracle – All Rights Reserved
12
Integration with OBIEE
• Invoke R calculations from BI Publisher and OBIEE
– Define ORE-based SQL query in Data Model
– Embedded R script execution
– Reference R scripts from database table repository
• Retrieve results from R
– Statistical output only XML
Both OBIEE RPD and BI Publisher online web template builder
– Structured tabular output data.frame
Both OBIEE and BIP data models
– Statistical output + graphics XML
Only BI Publisher via Word RTF template builder
– Graphic only XML
Only BI Publisher via Word RTF template builder
©2011 Oracle – All Rights Reserved
Oracle Database
ORE R Script
Repository
SQL XML &
data.frame
13
Getting started
• Install “Template Builder for Word”
©2011 Oracle – All Rights Reserved
14
Create a JDBC Connection
• Add a data source
©2011 Oracle – All Rights Reserved
15
Create a Data Model
• In BI Publisher, click New Data Model
• Specify your Data Source
• Save data model
• Click “Data Sets”
©2011 Oracle – All Rights Reserved
16
Setting up the Data Model
• Select SQL Query
• Specify Name, Data Source, SQL Query
• Click OK
• Click OK
• Change to XML
©2011 Oracle – All Rights Reserved
17
begin
sys.rqScriptCreate('BusiestAirports2',
'function(n){
ontime <- ONTIME_S
aggdata <- aggregate(ontime$DEST, by = list(ontime$DEST), FUN = length)
minx <- min(head(sort(aggdata$x, decreasing = TRUE), n))
busiest_airports <- aggdata$Group.1[aggdata$x >= minx, drop = TRUE]
delay <- ontime$ARRDELAY[ontime$DEST %in% busiest_airports]
dest <- ontime$DEST[ontime$DEST %in% busiest_airports, drop = TRUE]
dest <- reorder(dest, delay, FUN = median, na.rm = TRUE)
bd <- split(delay, dest)
res <- boxplot(bd, notch = TRUE, col = "gold", cex = 0.5,
outline = FALSE, horizontal = TRUE, yaxt = "n",
main = "Flight Delays by Airport for top 36 busiest",
xlab = "Delay (minutes)", ylab = "Airport")
labels <- levels(dest)
text(par("usr")[1] - 3, 1: length(labels), srt = 0, adj = 1, labels = labels, xpd = TRUE, cex = 0.75)
topbusiest <-data.frame(UNIQUECARRIER=res$names,
MIN=res$stats[1,], Q1 =res$stats[2,], CONF_LOW=res$conf[1,], MEDIAN=res$stats[3,],
CONF_HIGH=res$conf[2,], Q3 =res$stats[4,], MAX=res$stats[5,])
topbusiest } ');
end;
/
©2011 Oracle – All Rights Reserved
18
select value
from table(rqTableEval( cursor(select :top_busy VAL from dual),
NULL,'XML','BusiestAirports2'));
©2011 Oracle – All Rights Reserved
19
R script “gotchas” for SQL*Plus and BI Publisher / OBIEE
• Do not use “&” since it is interpreted as a prompted parameter
– Replace with concatenation in R script string: || ‘&’ ||
– Place body of R function in file and source contents dynamically
– Create function via sys.rqScriptCreate and reference in rq*Eval function
– SQLPlus allows the following cmd line specifications as well,
SET DEFINE ~
SET ESCAPE '\' e.g, 'text \& text'
SET SCAN OFF -- to not scan for substitution variables
• Follow colons “:” with a space otherwise it is interpreted as a bind variable
– E.g., R uses 1:10 for vector specification
• Dash dash “- -” is a comment in JDBC and SQL and incorrectly
terminates string
– Do not use in R strings, e.g., “Title for Graph -- follow on comment”
• R allows strings to span <CR>s
©2011 Oracle – All Rights Reserved
20
Set up parameter “TOP_BUSY”
• Click “TOP_BUSY”
• Set Data Type to Integer
• Set Default Value to 36
• Set Parameter Type to Menu
• Deselect all items below
©2011 Oracle – All Rights Reserved
21
Specify List of Values
• Click List of Values
• Set Type to Fixed Data
• Add values at bottom
• Save Data Model
• Click Get XML Output
©2011 Oracle – All Rights Reserved
22
Save XML Data
• Click Run
• View output
• Export XML and save to local file system
©2011 Oracle – All Rights Reserved
23
Create the .rtf file using MS Word
• Click Sample XML and load saved file
©2011 Oracle – All Rights Reserved
24
Insert the image
• Click ab | Field
• Select “img”
• Click Insert
• Click Close
©2011 Oracle – All Rights Reserved
25
Specify image properties
• Double click “img” in Word document
• Select Advanced tab
• Paste in the following
<fo:instream-foreign-object content-
type="image/png" width="600.0pt"
height="400pt" xdofo:alt="An
Image"><?img?></fo:instream-
foreign-object>
• Click OK
• Save the word file as RTF
• Click PDF to check result
©2011 Oracle – All Rights Reserved
26
View the PDF file
• Looks good!
©2011 Oracle – All Rights Reserved
27
• Switch to OBIEE
• Create a new report
• Upload rtf file
©2011 Oracle – All Rights Reserved
28
Upload template
• Specify Layout Name
• Specify Template rtf File
• Specify Type as RTF Template
• Specify Locale as English
• Click Upload
• Click Save As
• Click View Report
©2011 Oracle – All Rights Reserved
29
View the report, select different values and apply
©2011 Oracle – All Rights Reserved
30
Integration with an OBIEE RPD
©2011 Oracle – All Rights Reserved
31
Start with the RPD
• Install OBIEE client http://www.oracle.com/technetwork/middleware/bi-
enterprise-edition/downloads/bus-intelligence-11g-
165436.html
• Open RPD in offline edit mode
• Data Source Name
– tnsnames.ora entry,
or (no <CR>s) (DESCRIPTION=
(ADDRESS=(PROTOCOL=TCP)
(HOST=adc211-228.us.oracle.com)
(PORT=1521))
(CONNECT_DATA=(SID=ore)))
©2011 Oracle – All Rights Reserved
32
Create new physical table
• Example initialization string
select UNIQUECARRIER, MINI, Q1, CONF_LOW,
MEDIANI, CONF_HIGH, Q3, MAXI from
table(rqEval(NULL,'select cast(''a'' as varchar2(10))
UNIQUECARRIER, 1 MINI, 2 Q1, 3 CONF_LOW, 4
MEDIANI, 5 CONF_HIGH, 6 Q3, 7 MAXI from
dual','BusiestAirports'))
– No wrong type of <CR>s
– Specify columns, no “select *”
– No semicolon at end of query ‘;’
• Specify columns
©2011 Oracle – All Rights Reserved
33
Access the data
• Update Row Count
• View Data…
©2011 Oracle – All Rights Reserved
34
Set up business layer • Drag table to Business Model
and Mapping
• Drag again to create reference
for relationship
– Specify key columns
– Keep only key column(s) in ID
table
– Select both tables and right click
for Business Model Diagram
Selected tables only
©2011 Oracle – All Rights Reserved
35
Business Layer to Presentation Layer
• Click join tool and click/drag from key table
to fact table
• Click OK on Logical Join dialog
• Drag fact table to Presentation Layer
©2011 Oracle – All Rights Reserved
36
A more demanding example
©2011 Oracle – All Rights Reserved
37 ©2011 Oracle – All Rights Reserved
38
Target graph and statistics
©2011 Oracle – All Rights Reserved
39
Suggestions for getting started
• In R
– Define the function arguments to control filter and graph settings
– Implement the function for a single graph, then extend to two
– Define reusable sub-functions that factor out common functionality
• In SQL
– Convert the function to be invoked via rqTableEval
– Use the input data to specify the function parameters in a single row of
data
– Extract the values from the input data in the R script
©2011 Oracle – All Rights Reserved
40
Three sub-functions
• Prepare the data
– Filtering, split, provide data for boxplot function
• Extract boxplot statistics
– Create a data.frame from the statistics results of boxplot
• Create graph title
– Based on the orientation (h/v) and other inputs create a title string
©2011 Oracle – All Rights Reserved
41
prepData ( )
prepData <- function(numAirports,uniquecarriers,startYear,endYear,region,busiest) {
n <- numAirports
dat <- ONTIME_S
dat <- dat[dat$YEAR>=startYear & dat$YEAR<=endYear,]
if(uniquecarriers != "ALL") {
dat <- dat[dat$UNIQUECARRIER %in% uniquecarriers, ]
}
aggdata <- aggregate(dat$DEST, by = list(dat$DEST), FUN = length)
minx <- min(head(sort(aggdata$x, decreasing = busiest), n))
if(busiest){
busiest_airports <-aggdata$Group.1[aggdata$x >= minx, drop = TRUE]
} else {
busiest_airports <-aggdata$Group.1[aggdata$x <= minx, drop = TRUE]
}
delay <- dat$ARRDELAY[dat$DEST %in% busiest_airports]
dest <- dat$DEST[dat$DEST %in% busiest_airports, drop = TRUE]
dest <- reorder(dest, delay, FUN = median, na.rm = TRUE)
bd <- split(delay, dest)
list(bd=bd,dest=levels(dest))
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
©2011 Oracle – All Rights Reserved
42
extractBoxplotStats ( )
extractBoxplotStats <- function (res) {
stats <-data.frame(UNIQUECARRIER=res$names,
MIN=res$stats[1,], Q1 =res$stats[2,], CONF_LOW=res$conf[1,], MEDIAN=res$stats[3,],
CONF_HIGH=res$conf[2,], Q3 =res$stats[4,], MAX=res$stats[5,])
stats
}
©2011 Oracle – All Rights Reserved
43
createTitle ( )
createTitle <- function (uniquecarriers,busiest,numAirports,startYear,endYear,sideBySide){
if(sideBySide == 'h')
paste("Flight Delays by Airport \nfor",ifelse(uniquecarriers=="ALL","",uniquecarriers),
ifelse (busiest,"top","bottom"),
numAirports, "busiest \nfor years", startYear,"-",endYear)
else
paste("Flight Delays by Airport for",ifelse(uniquecarriers=="ALL","",uniquecarriers),
ifelse (busiest,"top","bottom"),
numAirports, "busiest for years", startYear,"-",endYear)
}
©2011 Oracle – All Rights Reserved
44
BusiestAirports ( )
BusiestAirports <- function(numAirports,uniquecarriers,startYearA,endYearA,regionA,busiestA,colA="gold",
startYearB=NULL,endYearB=NULL,regionB=NULL,busiestB=NULL,colB="red",
varWidth=FALSE,sideBySide='v'){
if(startYearB != -1) { # want two graphs, else ignore sideBySide
ifelse (sideBySide == 'v', par(mfrow=c(2,1)),
ifelse (sideBySide == 'h', par(mfrow=c(1,2)),NULL))
}
prepA <- prepData(numAirports,uniquecarriers,startYearA,endYearA,regionA,busiestA)
titleA <- createTitle(uniquecarriers,busiestA,numAirports,startYearA,endYearA,sideBySide)
resA <- boxplot(prepA[["bd"]], notch = TRUE, col = colA, cex = 0.5,varwidth=varWidth,
outline = FALSE, horizontal = TRUE, yaxt = "n", main = titleA,
xlab = "Delay (minutes)", ylab = "Airport")
labels <- prepA[["dest"]]
text(par("usr")[1] - 3, 1: length(labels), srt = 0, adj = 1, labels = labels, xpd = TRUE, cex = 0.75)
busiestStatsA <-extractBoxplotStats(resA)
busiestStats <- cbind(group=1,busiestStatsA)
busiestStats
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
©2011 Oracle – All Rights Reserved
45
BusiestAirports ( ) – with two graphs
BusiestAirports <- function(numAirports,uniquecarriers,startYearA,endYearA,regionA,busiestA,colA="gold",
startYearB=NULL,endYearB=NULL,regionB=NULL,busiestB=NULL,colB="red“,varWidth=FALSE,sideBySide='v'){ if(startYearB != -1) { # want two graphs, else ignore sideBySide
ifelse (sideBySide == 'v', par(mfrow=c(2,1)),
ifelse (sideBySide == 'h', par(mfrow=c(1,2)),NULL))
}
prepA <- prepData(numAirports,uniquecarriers,startYearA,endYearA,regionA,busiestA)
titleA <- createTitle(uniquecarriers,busiestA,numAirports,startYearA,endYearA,sideBySide)
resA <- boxplot(prepA[["bd"]], notch = TRUE, col = colA, cex = 0.5,varwidth=varWidth,
outline = FALSE, horizontal = TRUE, yaxt = "n", main = titleA,
xlab = "Delay (minutes)", ylab = "Airport")
labels <- prepA[["dest"]]
text(par("usr")[1] - 3, 1: length(labels), srt = 0, adj = 1, labels = labels, xpd = TRUE, cex = 0.75)
busiestStatsA <-extractBoxplotStats(resA)
busiestStats <- cbind(group=1,busiestStatsA)
if(startYearB != -1) { # want second graph
prepB <- prepData(numAirports,uniquecarriers,startYearB,endYearB,regionB,busiestB)
titleB <- createTitle(uniquecarriers,busiestB,numAirports,startYearB,endYearB,sideBySide)
resB <- boxplot(prepB[["bd"]], notch = TRUE, col = colB, cex = 0.5,varwidth=varWidth,
outline = FALSE, horizontal = TRUE, yaxt = "n", main=titleB,
xlab = "Delay (minutes)", ylab = "Airport")
labels <- prepB[["dest"]]
text(par("usr")[1] - 3, 1: length(labels), srt = 0, adj = 1, labels = labels, xpd = TRUE, cex = 0.75)
busiestStatsB <-extractBoxplotStats(resB)
busiestStats <- rbind(busiestStats,cbind(group=2,busiestStatsB)) }
busiestStats}
1
2
3
4
5
6
7
8
9
10
11
12
13
©2011 Oracle – All Rights Reserved
46
Almost there…
BusiestAirports(5,'AA',startYearA=2006,endYearA=2008,regionA='ALL',busiestA=TRUE,
startYearB=2002,endYearB=2005,regionB='ALL',
busiestB=TRUE,varWidth=TRUE,sideBySide='v')
©2011 Oracle – All Rights Reserved
47
Revise function to accept parameters from
rqTableEval
begin
sys.rqScriptCreate ('BusiestAirports-1',
'function(dat) {
numAirports<- dat$NUM_AIRPORTS
uniquecarriers <- dat$UNIQUE_CARRIERS
startYearA <- dat$START_YEAR_A
endYearA <- dat$END_YEAR_A
regionA <- dat$REGION_A
busiestA <- dat$BUSIEST_A
colA <-dat$COL_A
startYearB <- dat$START_YEAR_B
endYearB <- dat$END_YEAR_B
regionB <- dat$REGION_B
busiestB <- dat$USIEST_B
colB <- dat$COL_B
varWidth <- dat$VAR_WIDTH
sideBySide <- dat$SIDE_BY_SIDE
# include sub-functions created earlier
# along with main body of function
...}');
select value
from table(rqTableEval(
cursor(select 5 NUM_AIRPORTS,
'ALL' UNIQUE_CARRIERS,
2007 START_YEAR_A,
2007 END_YEAR_A,
'ALL' REGION_A,
1 BUSIEST_A,
'red' COL_A,
2008 START_YEAR_B,
2008 END_YEAR_B,
'ALL' REGION_B,
1 BUSIEST_B,
'gold' COL_B,
1 VAR_WIDTH,
'h' SIDE_BY_SIDE from dual),
NULL,'XML',‘BusiestAirports-1'));
©2011 Oracle – All Rights Reserved
48
Create rqTableEval for BI Publisher Data Source
select value
from table(rqTableEval( cursor(select :numAirports NUM_AIRPORTS,
:uniqueCarriers UNIQUE_CARRIERS,
:startYearA START_YEAR_A,
:endYearA END_YEAR_A,
:regionA REGION_A,
:busiestA BUSIEST_A,
:colA COL_A,
:startYearB START_YEAR_B,
:endYearB END_YEAR_B,
:regionB REGION_B,
:busiestB BUSIEST_B,
:colB COL_B,
:varWidth VAR_WIDTH,
:sideBySide SIDE_BY_SIDE from dual)
,NULL,'XML','BusiestAirports-1'))
©2011 Oracle – All Rights Reserved
49
Data Model – Data Set SQL Query
©2011 Oracle – All Rights Reserved
50
Data Model – Parameters Specification
©2011 Oracle – All Rights Reserved
51
Data Model – List of Values
©2011 Oracle – All Rights Reserved
52
XML Generation
©2011 Oracle – All Rights Reserved
53
Finished OBIEE Dashboard
54
From Analyst to Production
©2011 Oracle – All Rights Reserved
55
Components of Analytic Solution with ORE
Data Data Preparation R Script R Function rq*Eval Test Deployment
Filtering
Cleaning
Joining
Aggregating
Visualization
Database Load
Transformations
Analytical functions
User-defined functions
R package functions
Graph generation
Parameterize
Enable re-use
Package for db
Assess parallelism
BI Publisher
OBIEE
Populate db tables
Generate graphs
Select rq*Eval
Populate data/args
Exec through R
Exec via BIP/OBIEE
©2011 Oracle – All Rights Reserved
56
Components of Analytic Solution with ORE
Data Data Preparation R Script R Function rq*Eval Test Deployment
Filtering
Cleaning
Joining
Aggregating
Visualization
Database Load
Transformations
Analytical functions
User-defined functions
R package functions
Graph generation
Parameterize
Enable re-use
Package for db
Assess parallelism
BI Publisher
OBIEE
Populate db tables
Generate graphs
Select rq*Eval
Populate data/args
Exec through R
Exec via BIP/OBIEE
©2011 Oracle – All Rights Reserved
57 ©2011 Oracle – All Rights Reserved
58