introducing microsoft sql server 2016 r services · pdf fileintroducing microsoft sql server...

21
Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt – Asia Timezone

Upload: vuongcong

Post on 05-Feb-2018

262 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Introducing Microsoft

SQL Server 2016 R Services

Julian Lee

Advanced Analytics Lead

Global Black Belt – Asia Timezone

Page 2: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Consistent experience from on-premises to cloud

Microsoft Tableau Oracle

$120

$480

$2,230

Self-service BI per user

In-memory across all workloads

built-inbuilt-in built-in built-in built-in

at massive scale

0 14

0 03

34

29

22

15

5

22

6

43

20

69

18

49

3

0

10

20

30

40

50

60

70

80

1 2 3 4 5 6

SQL Server Oracle MySQL SAP HANA TPC-H

Oracle is #5#2

SQL Server

#1

SQL Server

#3

SQL Server

SQL Server 2016: Everything built-in

2

Page 3: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

From data to decisions and actions

Value

Data

$1.6trillion

ActionDecisions

Page 4: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Microsoft Advanced Analytics Offerings

Cortana

Analytics Suite

SQL Server 2016

Page 5: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Typical advanced analytics lifecycle

Ingest Transform Explore Model Deploy

Score Visualize Measure

Model

Score

ƒ(x)

Preparation Modeling

Operationalization

Page 6: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Data Scientist should be creating / testing models

Ingest Transform Explore Model Deploy

Score Visualize Measure

Model

Score

ƒ(x)

Preparation Modeling

Operationalization

Page 7: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

But the reality is different …

15%

Data scientist focus time

Ingest Transform Explore Model Deploy

Score Visualize Measure

Model

Score

ƒ(x)

Preparation Modeling

Operationalization

80%

5%

15%

Page 8: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

R – What is it?

Open source “lingua franca”

Analytics, computing, modeling

Global community

Millions of users 7,000+Packages

Big dataEcosystem

Scalability

Page 9: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

CRAN: The Comprehensive R Archive Network

Open Source “lingua franca”

Analytics, Computing, Modeling

In addition to CRAN, Bioconductor, GitHub, and others distribute R packages

Page 10: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Large talent base knows how to use it already

R – Why use it?

Scalability for ongoing computation of data

Protecting important data is much easier

Dividing use between roles creates efficiency

Page 11: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

$?

Challenges posed by open source R

Uncertain total cost of ownership and return on investment

Integrating R with existing and ever changing data infrastructures

Scale and Performance

Data movement restricts access for efficient data modeling

Page 12: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

24/7 support,

enterprise-grade platform

Included in SQL 2016

Single Analytics platform on multiple data infrastructures

Smarter decisions,

faster analysis and results

Hybrid cloud,

reusable code,

fewer limits

SQL Server 2016 R Services offers

Page 13: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

In-database advanced analytics

Data Scientist

Interacts directly with data

SQL Developer/DBAManage data and

analytics together

ExtensibilityExample solutions

Sales forecasting

Warehouse efficiency

Predictive

maintenance

Credit risk protection

010010

100100

010101

Relational data

Analytics library

T-SQL interface

?R

integration

Built into

SQL Server 2016

010010

100100

010101

Real-time operational analyticswithout moving data

R with in-memory scalability

Page 14: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

SQL Server R Services:

Model , Deploy & Operationize

In SQL16:Support Entire Analytics Lifecycle

Enable R Users to Run R Inside SQL 2016

Enable SQL Users to Extend BI Applications Using

R Analytics

Advantages:Scale By Eliminating Movement

Scale Using Parallelized Analytics

Reduced Security Exposure

SQL Skill Reuse for Data Engineering

SQL Skill Reuse for App development

Improved Operational Stability for Applications

Enterprise R Analytics

Typical Advanced Analytics Process

OperationalizeModelPrepare

SQL 2016

Page 15: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Improving advanced analytics with R and SQL

Server

Data Scientist

Publish algorithms and interact directly with data

Run R across entire data set

Execute and test in-database

Deploy to local database

SQL Developer

Better manage data and analytics together

Operationalize R script/model

Use T-SQL constructs

DBA

Better manage storage and analytics together

More easily maintain performance and stability

Secure and govern R runtime execution

Page 16: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Custom parallelization

PEMA-R API

rxDataStep

rxExec

Data step

Data import – Delimited, fixed, SAS, SPSS, OBDC

Variable creation & transformation

Recode variables

Factor variables

Missing value handling

Sort, merge, split

Aggregate by category (means, sums)

Descriptive statistics

Min/max, mean, median (approx.)

Quantiles (approx.)

Standard deviation

Variance

Correlation

Covariance

Sum of squares (cross-product matrix for set variables)

Pairwise cross tabs

Risk ratio & odds ratio

Cross-tabulation of data (standard tables & long form)

Marginal summaries of cross tabulations

Statistical tests

Chi Square Test

Kendall Rank Correlation

Fisher’s Exact Test

Student’s t-Test

Sampling

Subsample (observations & variables)

Random sampling

Predictive models

Sum of squares (cross-product matrix for set variables)

Multiple linear regression

Generalized linear models (GLM) exponential family distributions: binomial,

Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit,

identity, log, logit, probit. User defined distributions & link functions.

Covariance & correlation matrices

Logistic regression

Classification & regression trees

Predictions/scoring for models

Residuals for all models

Simulation

Simulation (e.g., Monte Carlo)

Parallel random number generation

Cluster analysis

K-Means

Classification

Decision trees

Decision forests

Gradient-boosted decision trees

Naïve Bayes

Parallelized, remote executing algorithms

Page 17: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Faster and more scalable

Page 18: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Appendix

Page 19: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Want to know more?

Find out how much R knowledge is already in-house

Contact your Microsoft Data Platform Specialist or local Black Belt

Learn more about the relationship between Microsoft and Revolution Analytics

http://microsoft.com/SQL

http://blogs.technet.com/b/machinelearning/archive/2016/01/12/making-r-the-

enterprise-standard-for-cross-platform-analytics-both-on-premises-and-in-the-cloud.aspx

Page 20: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

5 Key Hurdles for Advanced AnalyticsLoad Data Re-CodeBuild Insight

Innovation Rate

?

MapReduce YARN Spark ?

< 3 years

Rapid Big Data Evolution

Large Unknown Current Spend

? = ∑( )€

£

$

€$

Rp¥$

Complex Infrastructure Decisions

Cloud

Security

Experience

Transition Cost

Risk

Agility

Elasticity

TCO

Fit

On-Prem

?

DataScientist

Analyst

Business Executive

Developer

Data Steward

Data Engineer

Data Lake

Decisions

Applications

Expanding Communities

Page 21: Introducing Microsoft SQL Server 2016 R Services · PDF fileIntroducing Microsoft SQL Server 2016 R Services ... reusable code, ... Gaussian, inverse Gaussian, Poisson, Tweedie

Demo Link

http://13.76.242.39/