tufco project brainstormdownload.101com.com › pub › tdwi › files › r_osbi_integration.pdf–...
TRANSCRIPT
© OpenBI, LLC 2009 1
OpenBI LLCThe Open Source Business Intelligence Experts
Extending Open Source BI platforms with R analytics.
© OpenBI, LLC 2009 2
Who we are
• Professional Services Firm– Specialized on BI and open source – Expert at bridging business and technology– Unwavering commitment to our customers
• Seasoned BI Professionals– Partners average 20+ years experience– Demonstrated BI thought leadership (e.g. DM Review, B-Eye Network)– Reputation for high-quality service, personal and professional integrity– Deep consulting/training services roots
• Extensive Expertise with BI technologies– Databases– ETL– Query/Reporting/OLAP– Dashboards & Scorecards– Statistical Modeling/Data Mining– Analytical CRM
© OpenBI, LLC 2009
Our Point in a Nutshell
“If the decision is going to be made by the facts, then everyone’s facts, as long as they are relevant, are equal. If the decision is going to be made on the basis of people’s opinions, then mine count for a lot more.”
Jack Barksdale, then CEO of Netscape
© OpenBI, LLC 2009 4
The Case for BI and Super-Crunching
Business Problem:Flawed Decision-Making
Technical Solution
• BI• Analytics• Experimentation
Business Solution
Performance Management Focus
Strategic Solution
Evidence orFact-based Decision-Making
© OpenBI, LLC 2009 5
Evidence Based ManagementA new philosophy for decision-making…
Management MindsetChange • Downgrade conventional wisdom and ego
• Upgrade test results and facts
Attitude of Wisdom • Humbly appreciate what you don’t know• Constantly question what you do know• Act on best knowledge available
Scientifically-Based• Generate Hypothesis ->• Conduct Research & Test ->• Assemble Evidence & Draw Conclusions ->• Act!
Strategy as Hypothesis “The organization is an unfinished prototype requiring trial programs, pilot studies, experimentation, etc.”
… requires a performance based approach.
© OpenBI, LLC 2009
Performance Management directs BI investments to business results
How am I doing? • Standard QRA• Executives as Consumers
What opportunities exist to improve performance in the future?• Data Mining & Operational BI• Line Staff as Consumers• Automated Decision-making
Why am I doing well or poorly?• Analytical Apps• Managers as Consumers
Enable Effective Business Decisions
Understand Trends &
Anomalies
Performance Relative to Objectives
Measure
Explore & ExecuteStrategize
Performance Management & BI
© OpenBI, LLC 2009 7
The R Project for Statistical Computing
• www.r-project.org • Derived from award-winning S
language developed at Bell Labs by John Chambers
• Object-based and readily extensible• Open source GPL• R provides:
– language, – storage, – data manipulation, – statistical/mathematical
procedures, – production-quality graphics
© OpenBI, LLC 2009 8
Adoption of R
• One of the most successful open source projects
• Lingua franca of academic statistical computing
– Over 1M users world-wide
• Unix, Linux, Windows, MacOS ports
• Enthusiastic world-wide support forums
• 1650+ contributed packages to extend the base platform
• Latest procedures submitted by originators well before they're
available commercially
• Cadre of R users/developers coming to the business world
© OpenBI, LLC 2009 9
Support for R
• Incredibly stable core product
• Excellent documentation/manuals/tutorials
• Large and growing # of R texts (42 in 2006-2009)
• Wiki, Newsletter
• 18 mailing/support lists
• International user conference, UseR! 2009, Agrocampus-Ouest,
Rennes, France
• Strong inroads into financial services and health sciences
© OpenBI, LLC 2009 10
• Extend OSBI’s core QRA business intelligence capabilities– Statistical modelling– Advanced data visualizations
• Bridge R and OSBI communities– Generate broader adoption– Advance mutual innovations– Engage statistics and BI communities together
The Case for OSBI/R Integration
© OpenBI, LLC 2009
11
Developers
Business Users
Business Users
Developers & DBAs
Interactive, Ad Hoc, and Managed Query and
Reporting Server
Interactive OLAP Data Analysis
High Performance Data Integration
Report Development
Library
Reporting, Analysis and Data Integration
Jaspersoft Business Intelligence Suite
© OpenBI, LLC 2009
12
Repository, Scheduling, Security, Integration
Production Reporting
End-User Ad Hoc Query &
Reporting Dashboards Data Analysis / Exploration
Data Mart / Warehouse / ODS
Operational RDBMSor
POJO, EJB, XML, Hibernate, MDX,
Custom
Advanced Reporting
Jaspersoft BI Suite Architecture
© OpenBI, LLC 2009
Demo
© OpenBI, LLC 2009 14
Pentaho/R Integration - Components
• Pentaho BI Suite 2.0
– Note that R graphs are generated as image files that are returned via norma Pentaho action sequence processing. No “container” report is necessary.
• R v2.8.1 (http://cran.r-project.org)
– The core R platform which must be installed on your server
• RServe v0.5-3 (http://www.rforge.net/Rserve/files/)
– TCP/IP Server which allows clients to use facilities of R
• REngine
– Java class library which enables java client to interact with RServe
In order to develop/deploy the following must be downloaded and installed
© OpenBI, LLC 2009
Pentaho Deployment Architecture
Pentaho BI Server
RConnection
R Function CodeRServe
R Datafile
Image File
Tomcat1. User requests execution of a Pentaho
component to create an R graph
2. Server initiates action sequence, passing session-maintained RConnection object and all user entered execution parameters.
3. Action Sequence JavaScript step performs the following by using the REngine API and supplied RConnection object:
a. Source the R Function Code file
b. Source the cached R Datafile
c. Convert Pentaho parameters into R function parameters
d. Invoke the R function to process and generate an Image File
4. An image file (jpg, png, etc) is created in the Pentaho data file repository cache
5. The image file is referenced in the html response and rendered to the user’s browser
1
2
3a
3b
5
3d
43c
ActionSequence
File Cache Approach
© OpenBI, LLC 2009
Pentaho Design Tips
• Place all R functionality within an r code file and provide a single function call for Java client
– Pentaho action sequence simply prepares parameters and calls an R function (e.g. create_chart) which generates a graph output file.
– R programmer works in “R”, Pentaho developer works in Java/Pentaho
• Utilize power of Pentaho session parameters to create and maintain RConnection object for a user session
– Could be created on session startup
• At present, have not created capability to pass Pentaho generated dataset to R as a data frame
– REngine API to create data frames will not work with Javascript – Requires development of a “Pentaho ResultSet -> R Data Frame” utility Java
class– Once created, Pentaho platform ability to create datasets from SQL, MDX,
XQuery, ETL Script, Javascript, etc would be enabled for data acquisition.
Enable modular development and better runtime performance
© OpenBI, LLC 2009 17
JasperSoft/R Integration - Components
• JasperServer v3
• iReport v3
– To enable scriptlet development copy tools.jar from your jdk/jre deployment directory to ~iReport-3.x.x\lib
– Note that R graphs are generated as an image field inside a JasperReport
• R v2.8.1 (http://cran.r-project.org)
– The core R platform which must be installed on your server
• RServe v0.5-3 (http://www.rforge.net/Rserve/files/)
– TCP/IP Server which allows clients to use facilities of R
• REngine
– Java class library which enables java client to interact with RServe
In order to develop/deploy the following must be downloaded and installed
© OpenBI, LLC 2009
JasperSoft Deployment Architecture
JasperServer and JasperReport
RConnection
R Function Code
RServe
R Datafile
Image File
Tomcat1. User requests report execution
2. Report execution causes afterReportInit() scriptlet method to fire
3. The code in the method uses the REngine API to:
a. Obtain an RServe connection
b. Source the R Function Code file
c. Source the cached R Datafile
d. Convert Jasper parameters into R function parameters
e. Invoke the R function to process and generate an Image File
f. Release the RServe connection
4. The report contains a single image field which is populated with the generated Image File
5. The report is rendered to the user
1
2
3a
3b
3c
5
3e
4
3d
3f
Report Scriptlet
File Cache Approach
© OpenBI, LLC 2009
JasperSoft Deployment Architecture
JasperServer and JasperReport
RConnection
R Function Code
RServe
Image File
Tomcat1. User requests report execution
2. Report execution causes afterReportInit() scriptlet method to fire after JasperReport executes its SQL
3. The code in the method uses the REngine API to:
a. Obtain an RServe connection
b. Source the R Function Code file
c. Convert the SQL result set into an R data frame
d. Convert Jasper parameters into R function parameters
e. Invoke the R function to process and generate an Image File
f. Release the RServe connection
4. The report contains a single image field which is populated with the generated Image File
5. The report is rendered to the user
1
2
3a
3b
5
3e
4
3c/d
3f
Report Scriptlet
Data Query Approach
© OpenBI, LLC 2009
JasperSoft Design Tips
• Place all R functionality within an R code file and provide a single function call for Java client– JasperReport Scriptlet simply prepares parameters from the Jasper
environment and calls an R function (e.g. create_chart) which generates a graph output file.
– R programmer works in “R”, Jasper developer works in Java/Jasper
• When to utilize a prepared R data file vs dynamically creating R Data Frame?– Still being studied…
• JasperReport is simply structured– Single Image Field that is populated with the generated image
file
Enable modular development and better runtime performance
© OpenBI, LLC 2009
JasperForge
• We plan to sponsor a project on JasperForge which will contain:– Examples– How-to Instructions– Forums, etc.
• Goal is to have first iteration available in Feb 09
© OpenBI, LLC 2009 22
Thank You!
• www.openbi.comWeb
Phone
• Office: 312.863.8660• Kevin Cell: 773-425-6010• Dave’s Cell: 630-405-8404• Steve’s Cell: 847-778-1145