ibm insight 2014 - advanced warehouse analytics in the cloud
TRANSCRIPT
![Page 1: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/1.jpg)
dashDBAdvanced Warehouse Analytics in the Cloud
Torsten SteinbachArmin Stegerer
© 2014 IBM Corporation
![Page 2: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/2.jpg)
Please Note• IBM’s statements regarding its plans, directions, and intent are subject to change or
withdrawal without notice at IBM’s sole discretion.
• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
2
![Page 3: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/3.jpg)
Disclaimer
© Copyright IBM Corporation 2014. All rights reserved.U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM'S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.
IBM's statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM's sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
IBM, the IBM logo, ibm.com, Information Management, DB2, DB2 Connect, DB2 OLAP Server, pureScale, System Z, Cognos, solidDB, Informix, Optim, InfoSphere, and z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.
![Page 4: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/4.jpg)
The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework
![Page 5: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/5.jpg)
Data is the Basis of
1“2014 Analytics Market Survey,” research note, Nucleus Research, September 2014.2“Analytics Pays Back $13.01 for Every Dollar Spent,” research note, Nucleus Research, September 2014.3"Predicts 2014: Why You Should Modernize Your Information Infrastructure", November 28, 2013. Gartner.
Increasing investment
71%Faster ROI
13 to 1
Data warehouses will get you there
over 90%of analytics customers plan to increase their
analytics budgets within the next 2 years1
Analytics pays back US$13.01 for every
dollar spent – 1.2 times more than it
did 3 years ago2
of big data implementations will
augment, not replace, existing data warehouses3
Data is the Basis of New Competitive Advantage
NEW COMPETITIVE ADVANTAGE
![Page 6: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/6.jpg)
6
The Analytics Challenge
FRAUD DETECTIONBYING BEHAVIORCROSS-SELLING
HEALTH RISK ASSESSMENT
PORTFOLIO MANAGEMENT
DIGITAL MARKETINGSTORE PLACEMENT
ROUTE OPTIMIZATION PRODUCT PRICING
NEAREST SHOP
TelcoHealth Banking
Insurance
Retail
Transportation GovermentManufacturing
Big Data
Reading the data into the analytic tools
![Page 7: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/7.jpg)
Advanced Analytics Is Much More than OLAP or Calculating Statistics
Source: Wiki:: CRISP-DM Reference Model
![Page 8: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/8.jpg)
Most Time is Spent in Data Discovery and Preparation
Source: RexerAnalytics Data Miner Survey 2008
Some more recent sources claim this to be up to 60-70%
![Page 9: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/9.jpg)
9
Data + Data > 2 x Data
Public Data• Weather• News• Stocks• Social
Media• ...
Enterprise Data• Orders• CRM• Master Data• Operations• ...
Systems of Engagement• IoT• Mobile Apps• Cloud Apps
Correlation of Structured
Data
through overall reduction of systems, not data movements, improved utilization
and the power of mature structured data processing
Optimal ROI of in-db Analytics
Combining various data in a DW can be a
fusion reactor for analytics
• Speed to market• Improved accuracy• Lower cost
![Page 10: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/10.jpg)
Basic Math* Permutation and
Combination* Greatest Common
Divisor and Least Common Multiple*
Conversion of Values* Exponential and
Logarithm* Gamma and Beta
Functions Matrix Algebra+ Area Under Curve* Interpolation Methods*
Transformations MathematicalTime Series
Linear Regression+
Logistic Regression+
Classification
Bayesian
Sampling
Model Testing
Geospatial Data Type
Geometric Functions
Geometric Analysis
Predictive Geospatial* Fuzzy Logix
DB Lytix capabilities
+ Netezza Analytics and Fuzzy Logix DB Lytix capabilities
Data Profiling / Descriptive Statistics+
General Diagnostics
Statistics+
Sampling
Data prep
In-db Analytics provides support for all phases of the analytical process
Descriptive Statistics+
Distance Measures*
Hypothesis Testing*
Chi-Square & Contingency Tables*
Univariate & Multivariate Distributions+
Monte Carlo Simulation*
Autoregressive+
Forecasting*
Association Rules+
Clustering+
Feature Extraction+
Discriminant Analysis*
Data Mining
Statistics
![Page 11: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/11.jpg)
The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework
![Page 12: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/12.jpg)
The One Big Reason for In-Database Analytics:Bring Analytics to the Data
• Scalable and high-performance analytics -> Analytics Accelerator
Shorten response times Scale analyzed data volume (both by two to three digit
factors)• The secret sauce
Data Proximity: avoid to move data to analytic tools Scale-out: Run code on the MPP architecture of the WH
engine Talk the language of the user and the application developer
• R, SQL, Java, Python, C++, LUA, etc. Flexible runtime model: scalar, aggregate or table functions,
external executables Coverage: wide variety of algorithms and operators out-of-
the-box: • Predictive, Statistical and GeoSpatial Analytics
• Complements analytic tools … because it allows to accelerate and scale their analytics SPSS, R, SAS, ESRI, FuzzyLogix, Zementis, Aginity, …
![Page 13: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/13.jpg)
IBM Netezza Analytics Ecosystem
PureData for Analytics AMPP Platform
SoftwareDevelopment
Kit
3rd PartyIn-Database
Analytics
NetezzaIn-Database
Analytics
User-DefinedExtensions(UDF,UDA,
UDTF,UDAP)
Transformations
Mathematical
Geospatial
Predictive
Statistics
Time Series
Data Mining
Fuzzy Logix
SAS
Zementis
IBM SPSS
LanguageSupport
(Map/Reduce, Java, R, Python,
Lua, Perl,C, C++, Fortran) Mathworks
Open Source R
BI Tools
Visualization Tools
Eclipse
Open Source R
SAS
IBM SPSS
Apache Hadoop
Cloudera
IBMInfoSphereBigInsights
IBM InfoSphere
Streams
Esri
Netezza Analytics is one of the leaders for in-database analytics, making Netezza an attractive platform for users and third-party vendors in the
predictive analytics space
![Page 14: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/14.jpg)
Analytic Code & Algorithms:
Analytic Data:
Data pulled out and processed in analytic application
Analytic Applications
This is where we start from: All analytic processing done on application side
Analytics of Warehouse Data
![Page 15: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/15.jpg)
SQLs
Analytic Code & Algorithms:
Analytic Data:
Simple data lookup & massage operations pushed down as SQL operations
Analytic Applications
Benefit: Acceleration with no SQL skills required
SQLs
Push Down Step 1: BLU tables only logically represented in analytic application
Accelerate Analytics for Warehouse Data
![Page 16: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/16.jpg)
SQLs
Analytic Code & Algorithms:
Analytic Data:
Call built-in functions via SQL to execute typical algorithms inside db
Cloud Tooling
Analytic Applications
Benefit: Bring Standard Analytics to the Data
SQLsCanned
Algorithms
Push Down Step 2: Typical and popular algorithms pushed down to canned UDFs in the db
Accelerate Analytics for Warehouse Data
![Page 17: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/17.jpg)
Lang
uage
Fra
mew
ork
(UD
X &
AE
)
Analytic Code & Algorithms:
Analytic Data:
Deploy customer code and call via special SQL function interfaces
SQLsSQLs
Canned Algorithms
Analytic Applications
Benefit: Bring Custom Analytics to the Data
Push Down Step 3: Execute entire customer analytic programs inside the db
Accelerate Analytics for Warehouse Data
![Page 18: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/18.jpg)
The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework
![Page 19: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/19.jpg)
Modernize existing Data Warehousing with on-demand cloud agility
Embrace the concept of the logical data warehouse by combining cloud and on-premises deployments
Faster insight without the up front infrastructure investment
Full support for hybrid “ground to cloud” deployments
19
Organizations gaining competitive advantage through cloud adoption are reporting: 1
as compared to peer companies who are more cautious about cloud computing1
1http://www-03.ibm.com/press/us/en/pressrelease/42304.wss2http://www.huffingtonpost.com/vala-afshar/the-top-100-cloud-computi_b_3756172.html3http://www.businesswire.com/news/home/20100722005325/en/Cloud-Computing-Delivering-Promise-Doubts-Hold-Adoption#.UufrRKX0B8Y
77% of enterprises are in the initial stages of
cloud adoption2
84% of CIOs cut application costs by moving to the cloud2
58% of IT Decision Makers think cloud solutions give
them better control of their data3
2x revenue growth
2.5x higher gross profit
Cloud is Essential to the Modern Data Warehouse
![Page 20: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/20.jpg)
20
•Cloud-based predictive & cognitive analytics discovery platform
•Designed for business use
• Integrated social collaboration
•Freemium to enterprise versions
•Enable self service access & integration of multiple data sources
•Simplified tools to prepare, refine & secure data
•Open application programming interfaces for application development
•On-premise and cloud / internal & external data
• Rapid deployment of large scale data warehouses
• Enables scaling of both volume and processing speed
• Unified architecture that enables hybrid data processing, on-premise & in the cloud
• In-database analytic capabilities for the best analytic performance
DataWorks dashDB Watson Analytics
IBM’s Analytics Cloud Service Ecosystem
![Page 21: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/21.jpg)
21
dashDB
![Page 22: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/22.jpg)
• Enterprise Plan• Dedicated
infrastructure• Terabyte-scale
capacity• Closed Beta for
qualified accounts
• Deploy within Bluemix cloud-based environment for analytics and warehousing services
• Ingest data from a wide variety of sources
• In-database analytics included
• Pay as you go• Rapid Deployment
• Auto-provisioning from Cloudant management GUI
• Built-in automated synchronization from Cloudant JSON data stores
• Built-in analytics for Cloudant data
• Pay as you go• Rapid Deployment
1 2 3
dashDB – Available With Three Deployment Choices
![Page 23: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/23.jpg)
© 2014 IBM Corporation
dashDB Entry Plan
Bare Metal4x8 core
256GB RAM
2x 500GBHDD/root, /opt, /etc
12x 200GB SSD/mnt/bludata0
Swift Object StorageBackup and Metadata
Legacy iSCSI drives(detachable)/mnt/blumeta0
Mount
Data Center 1Gbps connection
1TB local HDD has the OS installed with necessary binaries and scripts stored.– /mnt/blutmp0 (16GB swap space)– /opt, /etc, /usr, /bin …
1.2TB local SSD– /mnt/bludata0 – used for database
Legacy iSCSI drive are used to store DB2 database and configuration.– /mnt/blumeta0 – used for configuration
Backups are stored in Swift Object Storage
Run commands to backup and restore Backs up from iSCSI LUNs to Swift
Restores from Swift to iSCSI LUNs
Data Center
Guardium (Shared)Public Shared 8 Core
16GB RAM100 GB San
DSM (Shared)Public Shared 16 Core
64GB RAM1000GB SAN
![Page 24: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/24.jpg)
© 2014 IBM Corporation
dashDB Enterprise 1TB Plan
VM #1Public Shared 16 core @ 2.0GHz
64GB RAM
100GB SAN1 (OS)/root/opt/etc
Swift Object StorageBackup and Metadata
1TB SAN2 (detachable)/mnt/bludata0
/mnt//bludata0/blumeta0
Mount
Data Center 1Gbps connection
Backs up from SAN2 to SwiftRestores from Swift to SAN2
100GB SAN1 has the OS installed with necessary binaries and scripts stored.– /mnt/blutmp0 (16GB swap space)– /opt, /etc, /usr, /bin …
1TB SAN2 holds the database and configuration for DB2.– /mnt/bludata0 – used for database– /mnt/blumeta0 -> /mnt/bludata0/blumeta0 – used for configuration
Backups are stored in Swift Object Storage
Swift Object StorageBackup and Metadata
Data Center
Guardium (Shared)Public Shared 8 Core
16GB RAM100 GB San
DSM (Shared)Public Shared 16 Core
64GB RAM1000GB SAN
![Page 25: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/25.jpg)
© 2014 IBM Corporation
dashDB Enterprise 4TB Plan – Compute Optimized
Bare Metal32 core
256GB RAM
2x 500GBHDD/root/opt/etc
Swift Object StorageBackup and Metadata
4TB Consistent Perf Storage /mnt/bludata0
/mnt/blumeta0
Mount
Data Center 10 Gbps connection
Backs up from Consistent Perf Storage to SwiftRestores from Swift to Consistent Perf Storage
1TB local HDD has the OS installed with necessary binaries and scripts stored.– /mnt/blutmp0 (16GB swap space)– /opt, /etc, /usr, /bin …
4TB Consistent Performance Storage 6K IOPs holds the database and configuration for DB2.– /mnt/bludata0 – used for database– /mnt/blumeta0 – used for configuration
Backups are stored in Swift Object Storage
Swift Object StorageBackup and Metadata
Data Center
Guardium (Shared)Public Shared 8 Core
16GB RAM100 GB San
DSM (Shared)Public Shared 16 Core
64GB RAM1000GB SAN
![Page 26: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/26.jpg)
© 2014 IBM Corporation
dashDB Enterprise 12TB Plan – Storage Optimized
Bare Metal32 core
256GB RAM
2x 500GBHDD/root/opt/etc
Swift Object StorageBackup and Metadata
12 TB Consistent Perf Storage /mnt/bludata0
/mnt/blumeta0
Mount
Data Center 10 Gbps connection
Backs up from Consistent Perf Storage to SwiftRestores from Swift to Consistent Perf Storage
1TB local HDD has the OS installed with necessary binaries and scripts stored.– /mnt/blutmp0 (16GB swap space)– /opt, /etc, /usr, /bin …
4TB Consistent Performance Storage 6K IOPs holds the database and configuration for DB2.– /mnt/bludata0 – used for database– /mnt/blumeta0 – used for configuration
Backups are stored in Swift Object Storage
Swift Object StorageBackup and Metadata
Data Center
Guardium (Shared)Public Shared 8 Core
16GB RAM100 GB San
DSM (Shared)Public Shared 16 Core
64GB RAM1000GB SAN
![Page 27: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/27.jpg)
© 2014 IBM Corporation
Server Outage Availability Scenario for dashDB Enterprise 4TB & 12 TB
Bare Metal #1 - Primary32 core
256GB RAM
2x 500GBHDD/root/opt/etc
Swift Object StorageBackup and Metadata
4/12TB Consistent Perf Storage /mnt/bludata0/mnt/blumeta0
Mount
Data Center 10 Gbps connection
Backs up from Consistent Perf Storage to SwiftRestores from Swift to Perf Consistent iSCSI
When primary server (BM #1) fails, its Perf Consistent iSCSI volume is re-mapped from primary server (BM #1) to standby server (BM #2)
Swift Object StorageBackup and Metadata
Data Center
Bare Metal #2 - Standby32 core
256GB RAM
2x 500GBHDD/root/opt/etc
Mount
Guardium (Shared)Public Shared 8 Core
16GB RAM100 GB San
DSM (Shared)Public Shared 16 Core
64GB RAM1000GB SAN
![Page 28: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/28.jpg)
© 2014 IBM Corporation
Coming Up Soon: Initial dashDB MPP Offering
Probably 8 partitions per node of initial cluster 2 TB storage per cluster node One node comparable to the 4TB SMP offering
– bare metal, 16 cores, 128 or 256 GB memory, local storage Smallest cluster offered: 3 nodes, i.e. 6 TB Grow in one node steps, up to 10 nodes (i.e. 20 TB)
–Distributing entire MLNs of initial cluster instead of redistribute data Larger MPP offerings are going to be rolled out in a second phase All this might still change until we release
![Page 29: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/29.jpg)
We Bring The Same Compatible Analytic Platform from Netezza to the Cloud
Analytic Extension FrameworkUDX C++ API
Canned Analytics
Application Integration
AE Framework In-DB R In-DB LUAIn-DB Python In-DB Perl
OLAP Functions
ROW_NUMBER
RANK
LAG LEAD
DENSE_RANKLinear
Regression
Kmeans Clustering Decision Tree
Association Rules
Association Rules
Naive Bayes
Spatial Operators
Contains
Touches
Within
Intersects
Crosses
Overlaps
R Wrapper Watson Analytics ESRI ArcGIS Connector …
Analytics Applications of ISVs and Customers
STDDEV
COVAR
……
![Page 30: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/30.jpg)
The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework
![Page 31: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/31.jpg)
Predictive Analytics With R• Very popular language for statisticians and data miners
> val1 <- c(23,54,100,134,200,252,311)> val2 <- sqrt(val1)> lm_vals <- lm(val1~val2)> summary(lm_vals)
Residuals: 1 2 3 4 5 6 7 23.480 -3.052 -16.814 -18.330 -10.170 2.785 22.102
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -108.570 20.645 -5.259 0.0033 ** val2 22.538 1.667 13.523 3.96e-05 ***
:> plot(lm_vals)
• Built-in support for graphs and charting; Large set of math. and statistic packages due to extensibility and very active community
• Data Frames: tables of data maintained in memory of R runtime> col1 <- c(23,54,100)> col2 <- c(”xyz”, ”abc”, ”123”)> col3 <- c(TRUE, FALSE, TRUE)> myDf <- data.frame(col1, col2, col3)
• Data frames can be populated from DB tables via RODBC package> library(RODBC)> myconn <-odbcConnect("mydsn", uid= "db2inst1", pwd= "secret")> myDf <- sqlQuery(myconn, "select * from employees")
![Page 32: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/32.jpg)
dashDB
Predictive Analytics With R In dashDB 1/3• Built-in R runtime & R Studio
• ibmdbR package Data frames logically representing data physically residing in Dynamite tables
> con <- idaConnect("BLUDB", "", "")> idaAnalyticsInit(con)> sysusage<-ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')> systems<-ida.data.frame('DB2INST1.SHOWCASE_SYSTEMS')> systypes<-ida.data.frame('DB2INST1.SHOWCASE_SYSTYPES’)
Push down of R data preparation to Dynamite> sysusage2 <- sysusage[sysusage$MEMUSED>50000,c("MEMUSED","USERS")]> mergedSys<-idaMerge(systems, systypes, by='TYPEID')> mergedUsage<-idaMerge(sysusage2, mergedSys, by='SID’)
Push down of analytic algorithms to in-db execution> lm1 <- idaLm(MEMUSED~USERS, mergedUsage)
R StudioBrowser
Any R Runtime
ibmdbR
ibmdbR
![Page 33: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/33.jpg)
Predictive Analytics With R In dashDB 2/3 Dynamite-native implementation of statistical functions
• colnames, cor, cov, dim, head, length, max, mean, min, names, print, sd, summary, var
Logically derived columns pushed down to Dynamite> myDF <- ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')> myDF$MemPerUser <- myDF$MEMUSED / myDF$USERS
Sampling of tables in Dynamite> idaSample(myDF, 3)
SID DATE USERS MEMUSED ALERT MemPerUser1 8 2014-02-14 23:39:00.000000 34 5015 f 1472 5 2014-01-22 07:52:00.000000 96 11512 f 1193 7 2013-09-12 05:17:00.000000 39 5592 t 143
Statistics about tables in Dynamite> summary(myDF)
SID USERS MEMUSED ALERT MemPerUser
Min. :0.000 Min. : 3.000 Min. : 350.000 f :3655563 Min. :105.000
1st Qu.:2.000 1st Qu.: 35.000 1st Qu.: 5113.000 t :1344437 1st Qu.:135.000
Median :4.500 Median : 64.000 Median : 9455.000 NA's: NA Median :150.000
Mean : NA Mean : NA Mean : NA Mean : NA
3rd Qu.:7.000 3rd Qu.:111.000 3rd Qu.:16517.000 3rd Qu.:165.000
Max. :9.000 Max. :347.000 Max. :62379.000 Max. :209.000
Statistics about categorical values> idaTable(myDF)
ALERT f t 3655563 1344437
![Page 34: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/34.jpg)
Predictive Analytics With R In dashDB 3/3 Store R objects in Dynamite database
> myPrivateObjects <- ida.list(type='private’)> myPrivateObjects['series100'] <- 1:100> x <- myPrivateObjects['series100’]> X [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 [45] 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 [67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 [89] 89 90 91 92 93 94 95 96 97 98 99 100> names(myPrivateObjects) [1] "series100”> myPrivateObjects['series100'] <- NULL
Manage Dynamite tables> idaExistTable('DB2INST1.SHOWCASE_SYSUSAGE') [1] TRUE> idaShowTables()
Schema Name Owner Type 1 BLUADMIN R_OBJECTS_PRIVATE BLUADMIN T 2 BLUADMIN R_OBJECTS_PRIVATE_META BLUADMIN T 3 BLUADMIN R_OBJECTS_PUBLIC BLUADMIN T 4 BLUADMIN R_OBJECTS_PUBLIC_META BLUADMIN T> myView <- idaCreateView(myDF)> idaIsView(myView) [1] TRUE> idaDropView(myView)> idaIsView(myView) [1] FALSE
![Page 35: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/35.jpg)
The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework
![Page 36: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/36.jpg)
The Power of Place
• Spatial Awareness is a dramatically increasing property of big data due to mobile computing and Internet of Things
• Spatial Insight is directly available in dashDB through built-in spatial data type and operators, like for instance: WITHIN – E.g.: Show me the clients that are affected by a
power outage! OVERLAPS – E.g.: What are my cell phone customers who
are at risk of cell tower service outage due to upcoming tornados?
TOUCHES – E.g.: Give me the neighboring ZIP areas per customer for customized marketing campaigns!
DISTANCE – E.g.: List me the top 5 closest stores! DISJOINT – E.g.: What are candidates of insurance fraud
because a client submitted a claim from a different place than the case is for?
… and ~100 further operators• Supported and leveraged by ESRI – major spatial tooling
vendor
![Page 37: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/37.jpg)
GeoSpatial Analytics In dashDB
• Implements ISO SQL/MM standard for spatial See
http://www.iso.org/iso/catalogue_detail.htm?csnumber=38651
• Spatial data type ST_GEOMETRY (hierarchy)• Enables spatial joins in database through spatial
operators available as user defined functions• Dedicated support in ESRI tools starting V 10.3 http://www.esri.com/software/arcgis/arcgis-for-desktop/free-trial
• GeoSpatial Applications Examples Telco Location Data Utilities Smart Grid GPS Tracking in Transportation Insurance Demographics Cable Marketing Campaigns Retail Store Placement
![Page 38: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/38.jpg)
Examples of using ESRI ArcGIS with dashDB 1/3Load spatial data into dashDBDiscover & browse spatial data with ArcCatalog
Counties
Tornado paths over recent 50 years
![Page 39: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/39.jpg)
Examples of using ESRI ArcGIS with dashDB 2/3Combine spatial data from dashDB into interactive maps with ArcMap
![Page 40: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/40.jpg)
Examples of using ESRI ArcGIS with dashDB 3/3Perform spatial joins in dashDB using query layers and visualize results ArcMap
Tornado risk per county
![Page 41: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/41.jpg)
41
Insurance Risk Analysis – Show case overview
Public spatial data sets available online- Historical tornados from 1950s to today
http://www.spc.noaa.gov/gis/svrgis/- Current tornado weather warnings
http://www.nws.noaa.gov/regsci/gis/shapefiles/- US counties
https://www.census.gov/geo/maps-data/data/tiger-line.html
Mobile application generating
spatial data for insurance claims for tornado damage
Cloud warehouse service for analytics and correlation
between customer data and public or third party data
Visualization and spatial analysis capabilities by
Esri ArcGIS
www.bluemix.net
www.cloudant.comdashDB
Cloud service for persistency of
system of engagementInsurance Master Data (customers)
![Page 42: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/42.jpg)
© 2010 IBM Corporation
Information Management
Twitter-dashDB Show Case (www.youtube.com/watch?v=9yVNwOs9L4c)
http://american-sniper-analysis.mybluemix.net
![Page 43: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/43.jpg)
The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework
![Page 44: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/44.jpg)
The Two Elements of Analytic Extension Framework1. User Defined Extension – UDX – C++ API
Three types of UDXs:• Scalar Functions
SELECT MyXForm(Col1, Col2) FROM MyTab
• Aggregate FunctionsSELECT Col1, MyAgg(Col2) FROM MyTab GROUP BY Col1
• Table FunctionsSELECT b.MyCol1 FROM MyTab a, TABLE(MyTableFunc(a.Col1, a.Col2)) AS b
C++ code compiled and linked within dashDB service Registered via DDL, e.g.
CREATE FUNCTION MyXForm(VARCHAR(ANY), INTEGER) RETURNS VARCHAR(ANY) LANGUAGE CPP PARAMETER STYLE NPSGENERIC EXTERNAL NAME ’mylib.so!cMyFunc’CREATE FUNCTION MyAgg(INTEGER) LANGUAGE CPP RETURNS DOUBLE AGGREGATE WITH (SUM INTEGER) PARAMETER STYLE NPSGENERIC External Name 'mylib.so!cMyAgg'CREATE FUNCTION MyTableFunc(VARARGS) RETURNS TABLE (Col1 INTEGER) LANGUAGE CPP PARAMETER STYLE NPSGENERIC External Name 'mylib.so!cMyUDTF’
2. REST API & tooling for development & deployment:pushFile, pullFile, executeCC, compile, link, promote, createPackage, deployPackage, getProjList, getFileList, executeDDL, executeSQL, dropUDX, ...
![Page 45: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/45.jpg)
class cMyFunc: public nz::udx_ver2::Udf{public: cMyFunc(UdxInit *pInit) : Udf(pInit) { } static nz::udx_ver2::Udf* instantiate(UdxInit *pInit);
virtual nz::udx_ver2::ReturnValue evaluate() { int int1= int32Arg(0); int int2= int32Arg(1); int retVal = int1 * int2;
NZ_UDX_RETURN_INT32(retVal); }};nz::udx_ver2::Udf* cMyFunc::instantiate(UdxInit *pInit){ return new cMyFunc(pInit);}
User Defined Scalar Function API Example
![Page 46: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/46.jpg)
class cMyAgg: public nz::udx_ver2::Uda{
public: GenericSum(UdxInit *pInit) : Uda(pInit) { } static nz::udx_ver2::Uda* instantiate(UdxInit *pInit);
void initializeState() { int64 *s = int64State(0); *s = 0; setStateNull(0, false); } //Accumulate data in states. virtual void accumulate() { if (isArgNull(0)) return; int64 *s = int64State(0); *s += int16Arg(0); } //States flowed in as input; Merge back in state virtual void merge() { accumulate();
} //Merged data copied to input virtual ReturnValue finalResult() { if (isArgNull(0)) NZ_UDX_RETURN_NULL(); setReturnNull(false); NZ_UDX_RETURN_INT64(int64Arg(0)); }};
User Defined Aggregate Function API Example
![Page 47: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/47.jpg)
class OneUdtf : public nz::udx_ver2::Udtf{private: int32 argInt, xcount;public:
static nz::udx_ver2::Udtf* instantiate(UdxInit *pInit);
OneUdtf(UdxInit *pInit) : Udtf(pInit) { }
static nz::udx_ver2::Uda* instantiate(UdxInit *pInit);
virtual void newInputRow(){ argInt=0;for (int i = 0; i < numArgs(); i++){
if(argType(i) == UDX_INT32){argInt = int32Arg(i);
}else{
throwUdxException( "Unknown type");}
}xcount = 1;
} virtual DataAvailable nextOutputRow(){
if (xcount > 5)return Done;
for (int i=0; i < numReturnColumns(); i++) {setReturnColumnNull(i, false);if (returnTypeColumn(i) == UDX_INT32){
*int32ReturnColumn(i) = argInt + xcount;}else{
throwUdxException( "Unknown type");}
}xcount++;return MoreData;
}};
User Defined Table Function API Example
![Page 48: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/48.jpg)
dashDB
push File
REST API
.cpp
compile
.o
pro mote
exeuteDDL
Command Line 3rd Party IDEs
Cloud Web IDE
release
create Package
LogsLogs
pull File
.o .o
Run SQL
.cpp.cpp BLUDB
Catalog
dashDB Developer
Setup
Analytic Extension Development Process
unde
r con
sider
ation
unde
r con
struc
tion
DRDA
link
.so
.zip
deploy Package
![Page 49: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/49.jpg)
Some Examples Highlighting the REST APILogin and keep a cookie for the sessioncurl -d j_username=<User> -d j_password=<PW> https://<IP>:8443/services/loginService -c ck.dat
Upload source filescurl –F cmd=pushFile –F proj=udsf1 –F subDir=src --form "file[0]=@./udsf1.cpp" --form "file[1]=@./opr.cpp" --form "file[2]=@./opr.h" https://<IP>:8443/ida -b ck.dat
Compile source filescurl –d cmd=compile –d proj=udsf1 –d targetDir=bin -d "files={\"files\":[\"src/udsf1.cpp\”]}” https://<IP>:8443/ida -b ck.dat
Link object filescurl –d cmd=link–d proj=udsf1 –d targetDir=bin -d "files={\"files\":[\"bin/udsf1.o\“]}“ https://<IP>:8443/ida -b ck.dat
Alternatively: low-level cc invocationcurl –d cmd=executeCC –d proj=udsf1 -d "args=-m64 -Wall -fPIC -c -D_CPLUSPLUS src/udsf1.cpp -I/mnt/blumeta0/home/db2inst1/sqllib/include -o udsf1.o” https://<IP>:8443/ida -b ck.dat
Promote linked binaries to release directorycurl –d cmd=promote –d proj=udsf1 -d "files=lib*.so“ https://<IP>:8443/ida -b ck.dat
Register UDX with DDLcurl –d cmd=executeDDL –d profileName=BLUDB -d "ddl=CREATE FUNCTION udf1(INT) RETURNS INT LANGUAGE CPP PARAMETER STYLE NPSGENERIC FENCED EXTERNAL NAME '/mnt/blumeta0/home/bluadmin/projects/udsf1/release/libudsf1.so!CUdf';" https://<IP>:8443/blushiftservices/BluShiftHttp.do -b ck.dat
![Page 50: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud](https://reader036.vdocuments.site/reader036/viewer/2022062822/5880dba21a28ab9c3a8b714f/html5/thumbnails/50.jpg)
A Proof Point of UDX Support in dashDB
We have working prototype of the entire the Netezza SQL Extension Toolkit for dashDB !!