introducing microsoft sql server 2016 r services · pdf fileintroducing microsoft sql server...
TRANSCRIPT
Introducing Microsoft
SQL Server 2016 R Services
Julian Lee
Advanced Analytics Lead
Global Black Belt – Asia Timezone
Consistent experience from on-premises to cloud
Microsoft Tableau Oracle
$120
$480
$2,230
Self-service BI per user
In-memory across all workloads
built-inbuilt-in built-in built-in built-in
at massive scale
0 14
0 03
34
29
22
15
5
22
6
43
20
69
18
49
3
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6
SQL Server Oracle MySQL SAP HANA TPC-H
Oracle is #5#2
SQL Server
#1
SQL Server
#3
SQL Server
SQL Server 2016: Everything built-in
2
From data to decisions and actions
Value
Data
$1.6trillion
ActionDecisions
Microsoft Advanced Analytics Offerings
Cortana
Analytics Suite
SQL Server 2016
Typical advanced analytics lifecycle
Ingest Transform Explore Model Deploy
Score Visualize Measure
Model
Score
ƒ(x)
Preparation Modeling
Operationalization
Data Scientist should be creating / testing models
Ingest Transform Explore Model Deploy
Score Visualize Measure
Model
Score
ƒ(x)
Preparation Modeling
Operationalization
But the reality is different …
15%
Data scientist focus time
Ingest Transform Explore Model Deploy
Score Visualize Measure
Model
Score
ƒ(x)
Preparation Modeling
Operationalization
80%
5%
15%
R – What is it?
Open source “lingua franca”
Analytics, computing, modeling
Global community
Millions of users 7,000+Packages
Big dataEcosystem
Scalability
CRAN: The Comprehensive R Archive Network
Open Source “lingua franca”
Analytics, Computing, Modeling
In addition to CRAN, Bioconductor, GitHub, and others distribute R packages
Large talent base knows how to use it already
R – Why use it?
Scalability for ongoing computation of data
Protecting important data is much easier
Dividing use between roles creates efficiency
$?
Challenges posed by open source R
Uncertain total cost of ownership and return on investment
Integrating R with existing and ever changing data infrastructures
Scale and Performance
Data movement restricts access for efficient data modeling
24/7 support,
enterprise-grade platform
Included in SQL 2016
Single Analytics platform on multiple data infrastructures
Smarter decisions,
faster analysis and results
Hybrid cloud,
reusable code,
fewer limits
SQL Server 2016 R Services offers
In-database advanced analytics
Data Scientist
Interacts directly with data
SQL Developer/DBAManage data and
analytics together
ExtensibilityExample solutions
Sales forecasting
Warehouse efficiency
Predictive
maintenance
Credit risk protection
010010
100100
010101
Relational data
Analytics library
T-SQL interface
?R
integration
Built into
SQL Server 2016
010010
100100
010101
Real-time operational analyticswithout moving data
R with in-memory scalability
SQL Server R Services:
Model , Deploy & Operationize
In SQL16:Support Entire Analytics Lifecycle
Enable R Users to Run R Inside SQL 2016
Enable SQL Users to Extend BI Applications Using
R Analytics
Advantages:Scale By Eliminating Movement
Scale Using Parallelized Analytics
Reduced Security Exposure
SQL Skill Reuse for Data Engineering
SQL Skill Reuse for App development
Improved Operational Stability for Applications
Enterprise R Analytics
Typical Advanced Analytics Process
OperationalizeModelPrepare
SQL 2016
Improving advanced analytics with R and SQL
Server
Data Scientist
Publish algorithms and interact directly with data
Run R across entire data set
Execute and test in-database
Deploy to local database
SQL Developer
Better manage data and analytics together
Operationalize R script/model
Use T-SQL constructs
DBA
Better manage storage and analytics together
More easily maintain performance and stability
Secure and govern R runtime execution
Custom parallelization
PEMA-R API
rxDataStep
rxExec
Data step
Data import – Delimited, fixed, SAS, SPSS, OBDC
Variable creation & transformation
Recode variables
Factor variables
Missing value handling
Sort, merge, split
Aggregate by category (means, sums)
Descriptive statistics
Min/max, mean, median (approx.)
Quantiles (approx.)
Standard deviation
Variance
Correlation
Covariance
Sum of squares (cross-product matrix for set variables)
Pairwise cross tabs
Risk ratio & odds ratio
Cross-tabulation of data (standard tables & long form)
Marginal summaries of cross tabulations
Statistical tests
Chi Square Test
Kendall Rank Correlation
Fisher’s Exact Test
Student’s t-Test
Sampling
Subsample (observations & variables)
Random sampling
Predictive models
Sum of squares (cross-product matrix for set variables)
Multiple linear regression
Generalized linear models (GLM) exponential family distributions: binomial,
Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit,
identity, log, logit, probit. User defined distributions & link functions.
Covariance & correlation matrices
Logistic regression
Classification & regression trees
Predictions/scoring for models
Residuals for all models
Simulation
Simulation (e.g., Monte Carlo)
Parallel random number generation
Cluster analysis
K-Means
Classification
Decision trees
Decision forests
Gradient-boosted decision trees
Naïve Bayes
Parallelized, remote executing algorithms
Faster and more scalable
Appendix
Want to know more?
Find out how much R knowledge is already in-house
Contact your Microsoft Data Platform Specialist or local Black Belt
Learn more about the relationship between Microsoft and Revolution Analytics
http://microsoft.com/SQL
http://blogs.technet.com/b/machinelearning/archive/2016/01/12/making-r-the-
enterprise-standard-for-cross-platform-analytics-both-on-premises-and-in-the-cloud.aspx
5 Key Hurdles for Advanced AnalyticsLoad Data Re-CodeBuild Insight
Innovation Rate
?
MapReduce YARN Spark ?
< 3 years
Rapid Big Data Evolution
Large Unknown Current Spend
? = ∑( )€
£
$
€$
Rp¥$
Complex Infrastructure Decisions
Cloud
Security
Experience
Transition Cost
Risk
Agility
Elasticity
TCO
Fit
On-Prem
?
DataScientist
Analyst
Business Executive
Developer
Data Steward
Data Engineer
Data Lake
Decisions
Applications
Expanding Communities
Demo Link
http://13.76.242.39/