tibco advanced analytics meetup (taam) - june 2015
TRANSCRIPT
!TIBCO Advanced Analytics Meetup !!Michael O’Connell!Chief Data [email protected]!@moc_tib !
!!!June 2015!
• TIBCO Analy,cs & Data Science (MOC: 30 min) • Data Analysis Pipeline
• Understand – An2cipate – Act
• Predic,ve Analy,cs (JM, IC: 25 + 20 min) • TERR Expressions and Data Func2ons
• GeoLoca2on Analy2cs
• Real-‐Time Analy,cs (UK: 15 min) • Customer Analy2cs with Event Processing
• APIs (AB: 15 min) • Iron Python for Data Write-‐Back
• Wrap-‐Up / Ques,ons (MOC: 10 min)
Increase Productivity
Grow Revenue
Value
Reduce Risk
ROI
TIBCO Analytics – Insight to Action!
© Copyright 2000-2015 TIBCO Software Inc.
Data Access & Prep
Exploratory Data Analysis Features Visual
Dashboard Model & Predict
Deploy Champion Model
Test & Learn
Channel
Social
Loyalty
Campaign
Filter
Map
Merge
Shape
Propensity
Affinity
Improve Guided -‐-‐-‐-‐-‐-‐-‐-‐ Deploy -‐-‐-‐-‐-‐-‐-‐-‐ In-‐Line Explore Data
Aggregate
Prepare Data Business Case
Increase Productivity
Grow Revenue
Ensemble Forest
Regression Additive Models Segment
Visualize
Pricing
Promotion
Challenger Models
At Rest
In Motion
Value Theses
Reduce Risk
ROI
Value
Dashboard Updates
Data a Insight a Action!
© Copyright 2000-2015 TIBCO Software Inc.
Spotfire Platform!
© Copyright 2000-2015 TIBCO Software Inc.
SpoTire Desktop
Spotfire Platform!
© Copyright 2000-2015 TIBCO Software Inc.
Spotfire Data Access!!
DA
TA
SO
UR
CE
S
XML RDBMS Flat Files
Cubes Spread- sheets
Hadoop & Big Data stores
Analytical DWs e.g. Exadata
Event Data Streams
Active Spaces
In-‐Memory Load data from source in to memory
In-‐Database Leave data in DB Dynamically load and discard data to visualize
On-‐Demand Dynamically swap data in and out of memory.
SQL MDX
1010 0110
Custom GUI-‐driven data access via SDK
Enterprise Data Access!
Siebel eBusiness
Local data sources
Access Excel STDF
Drag-‐and-‐drop
MySQL
SQL Server Oracle
Informa2on Services (join, transform, reusable,
parameterized, dynamic query for in-‐memory use)
Databases
JDBC/ODBC
Hadoop SFDC
PostgreSQL
Teradata Netezza
Etc. XML
RDBMS
Flat Files
Spread- sheets
Web Services
Oracle E-Business
RDBMS RDBMS
RDBMS
SAP BW SAP R/3 D A T A F A B R I C
Salesforce
ODBC OLE DB SqlClient
Direct connec2on
Oracle TeradataAster MS SSAS
Teradata
Direct Query (dynamically query and retrieve data
for visualiza2on and analysis) Databases
MySQL Etc.
OBIEE
Netezza Hadoop
© Copyright 2000-2015 TIBCO Software Inc.
Supported Data Sources!In-Memory, In-Database and Data-On-Demand!• Amazon Redshift!• Apache Hadoop/Hive!• Cloudera Hive CDH4.x, CDH5.x!• Cloudera Impala CDH4.x, CDH5.x, 0.6, 1.2.2, 1.2.3!• Composite Information Server 6.1.x, 6.2.x!• Hortonworks Data Platform 1.3, 2.0, 2.1.x, 2.2.x!• HP Vertica 5.0, 6.0, 6.1, 7!• IBM DB2 LUW 8, 9, 9.5, 10.x!• IBM Informix 9.4!• IBM Netezza 5, 6, 7!• JDBC!• Microsoft SQL Server 2000, 2005, 2008, 2012, 2014!• Oracle MySQL 4.1, 5.1, 5.5, 5.6!• Oracle and Oracle Exadata (Oracle 9i, 10g, 11gR1 and R2, RAC, 12c)!• Pivotal Greenplum 3.3, 4.1, 4.2, 4.3!• Pivotal HAWQ!• Pivotal HD 1.0.7!• PostgreSQL 8.4, 9.0, 9.1, 9.2!• SAP HANA SPS5, SPS6; AWS SAP HANA One!• SAP Sybase 12.5, 15, 15.5!• SAP Sybase IQ 15!• Teradata 12.00.12, 13.00, 13.10, 14.00, 14.10, 15.00!• Teradata Aster 5.0, 5.11, 6.0!
In-Memory and In-Database!• Microsoft SQL Server Analysis Services 2008, 2012, 2014!• Oracle Essbase 9.3, 11.1!• SAP NetWeaver Business Warehouse 7.0.1 SP10, 7.3!!
In-Memory and Data-On-Demand!• Aurea Sonic 7.5!• Oracle E-Business Suite 11.5.8, 11.5.10!• Oracle Siebel 7.7, 7.8, 8.0!• Salesforce.com!• SAP R/3 4.7, mySAP 5.0, 6.0!• TIBCO ActiveMatrix BusinessWorks™!• TIBCO ActiveSpaces!• TIBCO StreamBase LiveView!• Web Services!
In-Memory Only!• ADO.NET!• Comma-Separated Values (.csv)!• ESRI Shape Files (.shp)!• Microsoft Access Databases (.mdb, .mde)!• Microsoft Excel Workbooks (.xls, .xlsx, .xlsm)!• ODBC!• OData 1,2,3,4!• SAS Data Files (.sas7bdat, .sd2)!• Spotfire DecisionSite Files (.sfs)!• Spotfire Text Data Format (.stdf)!• Spotfire Binary Data Format (.sbdf)!• Text (.txt)!• TIBCO Formvine!• Universal Data Link (.udl)!
9!
Extended Data Source Access with TIBCO TERR!
Data – the Issues!
Organic Data Quality Ladder
• Machines • Sales • Logis2cs • Web • Scanners • Logs • Email, text • Social
Rigobono, 2015 © Copyright 2000-2015 TIBCO Software Inc.
Data and Features!April – 21 Customers • Representa,veness
• Inference from Sample to Popula2on
• Iden,fica,on and Features • Data relevant for the Process
• Q: Who most likely to drown while swimming in ocean?
• A: Great swimmers !
• Feature needed: Willingness to take risk beyond ability
• Telco Churn Example: who is more likely to leave plan?
• Answer: people who spend more 2me talking to people who have already leb the plan.
• Raw (Big) Data: zillions of calls
• Feature needed: 2me spent prior to leaving plan, speaking with other people who leb the same plan
• Feature not in any database !
© Copyright 2000-2015 TIBCO Software Inc.
June – 4 Deac,va,ons
Data and Features!
© Copyright 2000-2015 TIBCO Software Inc.
• Representa,veness • Inference from Sample to Popula2on
• Iden,fica,on and Features • Data relevant for the Process
• Telco Churn Example: who is more likely to leave plan?
• Answer: people who spend more 2me talking to people who have already leb the plan.
• Raw (Big) Data: zillions of calls
• Feature needed: 2me spent prior to leaving plan, speaking with other people who leb the same plan
• Feature not in any database !
July – 7 Deac,va,ons
Data and Features!
© Copyright 2000-2015 TIBCO Software Inc.
• Representa,veness • Inference from Sample to Popula2on
• Iden,fica,on and Features • Data relevant for the Process
• Telco Churn Example: who is more likely to leave plan?
• Answer: people who spend more 2me talking to people who have already leb the plan.
• Raw (Big) Data: zillions of calls
• Feature needed: 2me spent prior to leaving plan, speaking with other people who leb the same plan
• Feature not in any database !
Immediate Long-‐Term CompeDDve Advantage Value to the Organiza,on
TIBCO is the only analy,cs plaTorm that can provide value to the organiza,on across the full spectrum of use cases
Self-‐service Dashboards
Event Processing
Predic,ve and Prescrip,ve Analy,cs
Measure Diagnose Predict Op2mize Opera2onalize Automate
Analy2cs Maturity
Analy2cs Maturity Model
© Copyright 2000-2015 TIBCO Software Inc. 16!
Visual Analytics !
Visual Analytics !
© Copyright 2000-2015 TIBCO Software Inc.
Visual Analytics !
© Copyright 2000-2015 TIBCO Software Inc.
Visual Analytics – Dashboards !
Visual Analytics – Dashboards !
Visual Analytics – Dashboards !
Visual Analytics – Dashboards !
Visual Analytics – Dashboards !
Visual Analytics – Dashboards !
Visual Analytics – d3 Community !
© Copyright 2000-2015 TIBCO Software Inc.
Immediate Long-‐Term CompeDDve Advantage Value to the Organiza,on
TIBCO is the only analy,cs plaTorm that can provide value to the organiza,on across the full spectrum of use cases
Self-‐service Dashboards Event Analy,cs
Predic,ve and Prescrip,ve Analy,cs
Measure Diagnose Predict Op,mize Opera2onalize Automate
Analy2cs Maturity
Analy2cs Maturity Model
Advanced Analytics Ecosystem!
© Copyright 2000-2015 TIBCO Software Inc.
TIBCO Enterprise Runtime for R (TERR)!
• TIBCO Enterprise Run,me for R (TERR) • Latest sta2s2cs scrip2ng engine: S a S-‐PLUS® a R a TERR • Developer Edi2on: www.TIBCOmmunity.com
• Engine internals rebuilt from scratch at low-‐level • Redesigned data objects, memory management • Addresses long-‐standing issues with S (R) language
• TERR addresses deployment issues with R • Performance • Big data, fast data
• TERR is commercially licensed from TIBCO • TERR Installs (free) with Spodire Analyst / Desktop and other TIBCO products (CEP, Stats) • Spodire Server can manage all TERR / R scripts, ar2facts for reuse
© Copyright 2000-2015 TIBCO Software Inc.
Spotfire and TERR local TERR on server !
Spotfire-TERR Data Flows!
• Build models on data using local TERR engine embedded in Spodire
• Build models on big data directly in TERR on server and display results in Spodire
• Run TERR as parallel sessions on Hadoop cluster, controlled and visualized in Spodire
Data Source TERR TSSS
Spotfire
Results
ODBC JDBC SDC File
Data Function
Larger Data
Modeling
Spotfire
Local TERR
ODBC JDBC SDC File
Data
Data Source
Both Spotfire and TERR can load data from any ODBC or JDBC compliant source or from Spotfire Data Connections (SDC) or Spotfire Information Links stored in the Spotfire library.
© Copyright 2000-2015 TIBCO Software Inc.
Spotfire-TERR : Data Types, Analyses!
Spotfire data functions support any type of data as input and output parameters to and from TERR. TERR data functions used for data prep, integration, predictive & prescriptive analytics, … TERR data functions can output content metadata to Spotfire • formatting of fields • handling of binary data including
images and geospatial objects.
Rows Columns Values Tables
Metadata Blobs
Geometries Images
Spotfire TERR Data
Function
© Copyright 2000-2015 TIBCO Software Inc.
• Forecas,ng Y • Performance – sales, revenues, value/volume share
• Summary sta,s,cs • Correla2on, …
• Modeling Y = f (X, b) • Customer Analy2cs e.g. propensity analysis
• Segmenta,on, Clustering X • Customer segmenta2on
• Op,miza,on • Prescrip2ve analyses
• Simula,on • Prescrip2ve analyses
Predictive & Prescriptive Analytics!
© Copyright 2000-2015 TIBCO Software Inc.
Model Fitting: 5 Million Rows Model Scoring: 20 Million Rows
TERR 7X faster 84X
TERR Performance!
© Copyright 2000-2015 TIBCO Software Inc.
TERR in Spotfire !
What does TERR do in SpoTire? • Runs TERR Data Func2ons in Spodire analyses • Powers the Predic2ve Modeling Tools; the Forecast Tool; … • Can be used directly in Expressions • Runs on Hadoop nodes; called from Spodire; Runs in Streambase
TERR is embedded in SpoTire Analyst/Desktop and Streambase • No other sobware required, no connec2on to server required
© Copyright 2000-2015 TIBCO Software Inc.
1. In-‐line Expressions 2. Expression Func2ons
Spotfire-TERR Expression Functions!
Type R code in to expression field in Spo3ire e.g. -‐ Color graph by clusters -‐ Smooth points on graph Use TERR_* inbuilt expression funcAons Many entry points for adding expressions
Choose Expression FuncAon from menu -‐ Inbuilt -‐ Extension (you or someone else) via R code Use just like other expression funcAons in an expression Many entry points for adding expressions
1. Develop and test R code in R Studio / Spodire 2. Map inputs and outputs in Spodire
Spotfire-TERR Data Functions – 1, 2, 3!
R Programmer -‐ Set engine to TERR in opAons -‐ Graphs in Viewer Regular Spo3ire User
-‐ Spo3ire columns mapped to R inputs © Copyright 2000-2015 TIBCO Software Inc.
3. Point-‐click to analyze and visualize
Any business or tech user
Spotfire-TERR Data Functions – 1, 2, 3!
Spotfire Library !
Manage data func2ons, templates, informa2on links in Spodire library Manage permissions in library
Data func2ons import / export as .sfd files © Copyright 2000-2015 TIBCO Software Inc.
TERR and R Packages & Spotfire !Packages Shipped with TERR 3.2
© Copyright 2000-2015 TIBCO Software Inc.
R is the lingua franca of Statistical Computing
Date
R P a
c k a g e s
1/1/2002 1/1/2003 1/1/2004 1/1/2005 1/1/2006 1/1/2007 1/1/2008 1/1/2009 1/1/2010 1/1/2011 1/1/2012 1/1/2013
5000 4500 4000 3500 3000 2500 2000 1500 1000 500
0
Number of R-‐ or SAS-‐related posts to Stack Overflow by week. (copyright by r4stats.com)
Number of contributed packages on CRAN (hQp://cran.r-‐project.org/)
> 6,000 Packages !
R Community!
© Copyright 2000-2015 TIBCO Software Inc.
Big Data Community !
© Copyright 2000-2015 TIBCO Software Inc.
Winner of 2014 Strata Cloudera Award For Best Advanced Analytics Application
Big Data Analytics with Spotfire and TERR!
© Copyright 2000-2015 TIBCO Software Inc.
Big Data Analytics with TERR!
TERR on the nodes of Hadoop Cluster
TERR in AcDon
• Hadoop cluster compute • TIBCO Cloud Compute Grid • TIBCO Streambase • TIBCO Business Events • KNIME • Lavastorm • Rstudio • Teradata • TIBCO Sta2s2cs Services • TIBCO Spodire
© Copyright 2000-2015 TIBCO Software Inc.
• Cluster customers by geography
• Trade area analysis • Asset acquisi2on &
dives2ture • Overlay maps with
predic2ve metrics • Compute op2mal
paths • Library of geospa2al
func2ons
Advanced Geospatial Analytics!
© Copyright 2000-2015 TIBCO Software Inc.
Example: Trade Areas!
Immediate Long-‐Term CompeDDve Advantage Value to the Organiza,on
TIBCO is the only analy,cs plaTorm that can provide value to the organiza,on across the full spectrum of use cases
Self-‐service Dashboards Event Analy,cs
Predic,ve and Prescrip,ve Analy,cs
Measure Diagnose Predict Op,mize Opera2onalize Automate
Analy2cs Maturity
Analy2cs Maturity Model
BIG DATA AT REST
FAST DATA IN MOTION
Insight to Action
© Copyright 2000-2015 TIBCO Software Inc.
Analyze And Act On “Critical Business Moments”
Op2mize pricing Check for
fraud
Make offer to customer
Restock inventory
Reroute transport
Give customer service
Proac2vely maintain machines
© Copyright 2000-2015 TIBCO Software Inc.
Managing Industrial Equipment!
Big Data – Analysis of production
– Failure analytics
Fast Data
– Real-time sensor data
– Leading indicator for shutdowns
– Drilling: kick detection
– Flow monitoring
Benefits – Reduced NPT: Big $$s
– System reliability
– Efficient drilling
2. Find Leading Indicators
3. Backtest Rules / Models
4. Push Rules / Models to Event Server
1. Study Anomalies
Managing Industrial Equipment!
Alerting In The Field!
Industrial Equipment Management Improves Operations!
Optimizing Manufacturing Processes
Big Data – Analysis of product quality – Models for yield
– Models for defects
Fast Data
– In-line QA/QC!
Benefits
Maximize productivity Improve quality Optimize machine operations
Optimizing Manufacturing Processes
© Copyright 2000-2015 TIBCO Software Inc.
Customer Offers for Retailers
Big Data – Customer propensity to purchase
products
– Product affinity
– Customer segmentation
Fast Data
– In-line scoring on transactions!– Targeted offers to customers!
Benefits – Optimize inventory – Enhance customer experience
Customer Offers for Retailers
MonitorNotify!Act!
Analyze!
Store!
Analyze!
Data - Information - Knowledge
. . . Data Informa,on Knowledge
. . .
• IronPython controls behavior of Spodire
• We maintain library of IronPython func2ons
• …. toggling all zoom sliders
• Adding marker layers to a map
• … and many more
Spotfire API’s
Todays Presenters: Jagrata Minardi
Jagrata Minardi is a Staff Solu2ons Consultant with TIBCO Sobware, suppor2ng Financial Services and other industries. Previously, he worked for Insighdul Corpora2on, a provider of analy2c sobware and solu2ons. Since 1997, he has supported customers in the areas of pordolio construc2on, pordolio management, asset price forecas2ng, risk modeling, and risk aggrega2on.
Todays Presenters: Jagrata Minardi
Ian Cook is a Data Scien2st at TIBCO focused on applying the R sta2s2cal programming language to rapidly solve business problems across industry ver2cals. Ian founded and organizes the R users group in the Raleigh, North Carolina area. Prior to his role at TIBCO, Ian worked as a sta2s2cal sobware developer for the semiconductor company Advanced Micro Devices.
Todays Presenters: Ian Cook
Interpolation
© Copyright 2000-2013 TIBCO Software Inc.
Contour Lines
© Copyright 2000-2013 TIBCO Software Inc.
Transforming Coordinate Reference Systems
© Copyright 2000-2013 TIBCO Software Inc.
Performing Spatial Overlay
© Copyright 2000-2013 TIBCO Software Inc.
Todays Presenters: Ujval Kamath
Ujval Kamath is a Data Scien2st at TIBCO. He is focused on developing predic2ve models in R that are deployed in Spodire and StreamBase for data at mo2on and data at rest. He has experience in a range of industries, including Oil and Gas/Energy, Consumer Packaged Goods, Manufacturing, and Compu2ng
Spotfire and StreamBase!
Spodire is used to Create and Analyze Customer Segmenta2on and Propensity StreamBase is used to score new transac2ons in real 2me Spodire is used to understand the demographics of customers around stores
Todays Presenters: Andrew Berridge
Andrew Berridge is a Sr Solu2on Consultant at TIBCO. He joined the Spodire data science team in 2011 and has 15 years' experience working in pharmaceu2cals and other industries. Andrew specializes in developing tools, extensions and integra2ons with other technology pladorms for Spodire using IronPython, C#, Java and JavaScript.
Extending and Customizing Spotfire!
• Many ways of extending and customizing Spotfire platform • All APIs are publicly documented, eg
– Spotfire .NET API: https://docs.tibco.com/pub/doc_remote/spotfire/7.0.0/doc/api/Index.aspx
• Extend functionality of desktop and web clients: – TERR scripting – Data functions – IronPython scripting – JavaScript in text areas for UI elements – C# extensions (tools, transformations, calculations, etc.) – JavaScript mashup API for embedding in web applications
• JavaScript Visualizations – Use any JavaScript visualization framework – e.g. D3, HighCharts
• Extend Automation Services – Custom tasks
• Custom authentication/Single Sign-on (SSO)
Example: Write-back to Database from Spotfire!
• Why!– Take action from within your analysis!– Comment on data points!– Update external systems!
• How!– SQL within Spotfire Information Link with parameters!– Execute Information Link with IronPython, passing in marked data as parameters!– Can use other methods - this is simple !
SQL In Information Link!
• Must return data to Spotfire – we return the data table!• INSERT then SELECT!
INSERT INTO [SimpleDemo].[dbo].[UserActions]!
([State], [CoC], [Username], [Comment])!
VALUES!
(?State, ?CoC, %CURRENT_USER%, ?Comment);!
SELECT!
U1."id" AS "ID", U1."DateTime" AS "DATETIME", U1."State" AS "STATE",! U1."CoC" AS "COC", U1."Username" AS "USERNAME",!
U1."Comment" AS "COMMENT"!
FROM!
"SimpleDemo"."dbo"."UserActions" U1!
WHERE!
<conditions>!!
IronPython Code!
• Iterate over the marked rows in the data table:!– Set up the parameters for the Information Link!
• Name!• Value!
– Call the Information Link for each marked row!• Identified by its GUID in the Spotfire library!
!
Next Steps with Spotfire!!!
spodire.2bco.com/trial
spodire.2bco.com/learn/spodire-‐desktop-‐quickstart
spodire.2bco.com/learn/spodire-‐cloud-‐quickstart
Register for a live Spotfire demonstration spotfire.tibco.com/learn/live-demo
spotfire.tibco.com/demos!!
spotfire.tibco.com/tips/!!
tibco.com/blog/tag/trends-and-outliers/!!
www.tibcommunity.com!!
Resources spotfire.tibco.com!!!
learn.spotfire.tibco.com
Training learn.spotfire.tibco.com!!!
Monthly Knowledge Share Hosted by Quintus
LinkedIn!!!
Books!!!
Webcasts!!
Insight and Action - Analyzing Your OSIsoft PI System Data!
Tuesday, July 7, 2015 1 PM EST!
Presenter: Michael O'Connell & Dave Leigh!!
Predictive Analytics in the Energy Sector: Asset Valuation!
Tuesday, July 28, 2015 1PM EST!
Presenter: Michael O'Connell & Peter Shaw with Haas Engineering and R Lacy!!
Seeing Stars: the Gartner BI Bakeoff!
Recording, May 27, 2015!
Presenter: Anna Nowakowska & Michael O'Connell!
!
!
Events spotfire.tibco.com/about-us/events!!
© Copyright 2000-‐2015 TIBCO Sobware Inc.
78
Fast Data ! ! ! ! ! !www.tibco.com!
htp://d2.2bco.com/fast-‐data-‐webinars#event-‐processing-‐ROI
79
useR!!!! Lou Bajuk-‐Yorgan – Spodire Product Management
Ian Cook – Data Scien2st Difei Luo – Data Scien2st If you would like to set up a mee2ng please contact Lou Bajuk-‐Yorkan at lbajuk@,bco.com or Lars Sveding at lsveding@,bco.com
Thank you! Michael O’Connell, PhD Chief Data Scien2st TIBCO Fellow [email protected] @moc_2b htp://about.me/moconnell +1-‐919-‐7401560
First to Insight, First to Action
© Copyright 2000-2015 TIBCO Software Inc.