website – facebook – ... · data governance is a methodology that represents quality, policies,...

26
1 Website – http://greenvillebig.org Facebook – www.facebook.com/greenvillebig Twitter - @GreenvilleBIG Linkedin – Greenville BIG User Group R with SQL Server Bill Fellows- Thursday, August 14, 2014 (6:00 PM - 8:30 PM) ITT Technical Institute - 6 Independence Pointe, Greenville, SC, 29615

Upload: others

Post on 12-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

1

!Website – http://greenvillebig.org!

Facebook – www.facebook.com/greenvillebig!Twitter - @GreenvilleBIG!

Linkedin – Greenville BIG User Group!!

R with SQL Server Bill Fellows- Thursday, August 14, 2014 (6:00 PM - 8:30 PM)

ITT Technical Institute - 6 Independence Pointe, Greenville, SC, 29615!!

Page 2: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

2

• August 19 - GSP Developers Guild !

• August 20 - GSATC Technology Council !

• August 20 - Tech After Five !

• September 2 - SSIG Meeting !

• September 6 - SQL Saturday Raleigh !

• September 14 - SQL Saturday Kansas City !

• October 4 - SQL Saturday Charlotte !

• November 4-7 - PASS Summit - Seattle, WA $150 Discount Code: USSES33

!

Upcoming Events!

Page 3: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

3

Reeves Smith - Thursday, September 11, 2014 (6:00 PM - 8:30 PM)ITT Technical Institute - 6 Independence Pointe, Greenville, SC, 29615!!Understand how a Master Data Management (MDM) and Data Governance methodology can enable business clarity across the enterprise. Get introduced to Master Data Services (MDS) which is a Master Data Management solution on the Microsoft Platform. This solution enables the management of non-transactional data that defines a business entity within the enterprise. Get a good business and technical understanding of how MDS can help obtain better business clarity across the organization through a data governance strategy. Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo will walk through the basics of getting started with Master Data Services 2012 including the Excel Add-in for Master Data Services.`!

Master Data Services, How Does It Apply to My Enterprise!

Page 4: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Getting started with R and SQL Server

An introduction to the R language

4

Bill Fellows [email protected]

http://blog.billfellows.net/ @billinkc #New2R

Page 5: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Who are you?

• Developer

• Business analyst

• Manager

• Lost

5"BouncingTownshend" by Jean-Luc - originally posted to Flickr as The WHO. Licensed under Creative Commons Attribution-Share Alike 2.0 via Wikimedia Commons - http://

commons.wikimedia.org/wiki/File:BouncingTownshend.jpg#mediaviewer/File:BouncingTownshend.jpg

Page 6: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Why should you stay?

• Money

• Able to install R/R Studio

• Set up ODBC connection

• Pull data from SQL Server

• Perform analysis

6

"Stay Thirsty - 2011 Travers Stakes" by Mike L Photo's - Flickr: Stay Thirsty - 2011 Travers Stakes. Licensed under Creative Commons Attribution 2.0 via Wikimedia Commons - http://

commons.wikimedia.org/wiki/File:Stay_Thirsty_-_2011_Travers_Stakes.jpg#mediaviewer/File:Stay_Thirsty_-_2011_Travers_Stakes.jpg

Money: either spend less of the company’s by using open source tools or empower you to be more competitive

Page 7: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Agenda

• What is R

• How do I get it

• Show me the basics

• Work with my database

• Profit

7

Page 8: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

What is R8

http://upload.wikimedia.org/wikipedia/commons/f/fe/R_cursiva.gif

It’s a letter, Wikipedia says so

Page 9: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

R Language

• Statistical programming Language

• Implementation of S

• Free

9

http://upload.wikimedia.org/wikipedia/commons/c/c1/Rlogo.png

http://en.wikipedia.org/wiki/R_(programming_language) http://www.r-project.org/

Page 10: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Get it

• http://cran.rstudio.com/

• http://www.rstudio.com/products/rstudio/download/

10

Page 11: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Help!

• ?

• help(topic)

• ?? or help.search(topic)

• example

11

"Help" by The cover art of the Beatles' album Help!, taken from the Apple iTunes Store.. Licensed under Fair use of copyrighted material in the context of Help! (album) via

Wikipedia - http://en.wikipedia.org/wiki/File:Help.jpg#mediaviewer/File:Help.jpg

Launch R-Studio and start getting help vignette

Page 12: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Data Types

• numeric

• integer

• complex

• logical

• character

12http://www.trekmate.org.uk/wp-content/uploads/2014/05/Data_and_Lore_2364.jpg

http://www.r-tutor.com/r-introduction/basic-data-types numeric is default character can use “ or ‘ (underneath it’s all “) factor vs text also referred to as mode

Page 13: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Operations

• + - * / ^

• <- =

• == | > <

• c

• scan

13http://www.flickr.com/photos/pernell/3296502

addition, subtraction, multiplication, division, exponent assignment (old vs new style) equality, or, greater than, less than c - combine

Page 14: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

what is is

• is.integer

• as.integer

14

is tests whether something is derived from another as is an explicit cast

Page 15: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Free data

• data()

• demo()

15http://www.duncanchannon.com/wp-content/uploads/2009/02/free-candy-

van.jpg

Page 16: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Analyze this

• mean

• sd

• cor

• summary

16

Page 17: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Show me the data

• plot

• qplot - quick plots

• ggplot - kitchen sink

17

http://www.r-bloggers.com/basic-introduction-to-ggplot2/

Page 18: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Time to make the data

• c

• scan

• scan(file=‘’)

• read.csv

• NA

18

"Glazed-Donut" by Evan-Amos - Own work. Licensed under Public domain via Wikimedia Commons - http://

commons.wikimedia.org/wiki/File:Glazed-Donut.jpg#mediaviewer/File:Glazed-Donut.jpg

scan(file.choose()) Demo scan pulling internet data read.csv(file, sep = ',', header = TRUE, row.names) read.delim read.table all purpose row = observation, rownames; columns = variables, factors

Page 19: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Data Types too

• vectors (1D)

• matrices (2D)

• arrays (nD)

• data frames

19

Everything’s a vector Matrix - innards are all same type str() reveals all, class() more succinctly variables: nominal (no order {coffee, tea}), ordinal (order without magnitude {venti, tall, grande}), continuous (order & magnitude {20, ?, ?}) factors: categorical & ordinal;

Page 20: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Vector

• (ordinal1, ordinal2)

• [ordinal1:ordinal3]

• Negative slice removes element - contrary to python

20

One based, ish scalar values are really just one element vectors

Page 21: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Connect to SQL Server

• FreeTDS (for mac weenies)

• install.packages(“RODBC”) or install.packages(“rsqlserver”)

21

http://www.freetds.org/ http://blog.benjaminwalters.net/?p=10 http://www.r-bloggers.com/guide-to-accessing-ms-sql-server-and-mysql-server-on-mac-os-x/ http://stackoverflow.com/questions/24875169/trouble-connecting-tsql-to-sql-server https://github.com/agstudy/rsqlserver/wiki/benchmarking !

Page 22: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

ODBC Interface

22

Function Description

odbcConnect(dsn, uid="", pwd="") Open a connection to an ODBC database

sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame

sqlQuery(channel, query) Submit a query to an ODBC database and return the results

sqlSave(channel, mydf, tablename = sqtable, append = FALSE)

Write or update (append=True) a data frame to a table in the ODBC database

sqlDrop(channel, sqtable) Remove a table from the ODBC database

close(channel) Close the connection

http://www.statmethods.net/input/dbinterface.html

Page 23: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Resources• Beginning R: The Statistical Programming

Language

• R in Action: Data analysis and graphics with R

• http://swirlstats.com/students.html

• http://datascience101.wordpress.com/2014/07/07/statistical-programming-languages-infographic/

23

Beginning R the statistical programming language by Gardener, Mark. ISBN 111816430X R in Action: Data analysis and graphics with R http://www.manning.com/kabacoff2/ ISBN 9781617291388

Page 24: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Resources• http://gettinggeneticsdone.blogspot.ca/2014/07/

introduction-to-r-for-life-scientists.html (cheat sheet)

• http://r-tutor.com/r-introduction/

• http://datacamp.com/

• http://www.r-bloggers.com/guide-to-accessing-ms-sql-server-and-mysql-server-on-mac-os-x/

24

Page 25: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

Resources• http://blog.sqltrainer.com/2011/12/statistical-

analysis-with-r-and.html

• http://http://www.johndcook.com/R_language_for_programmers.html

• http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html

• http://cran.r-project.org/doc/manuals/R-data.pdf

25

Page 26: Website – Facebook – ... · Data governance is a methodology that represents quality, policies, and process management, in relation to handling your enterprise data. The demo

26

DOOR PRIZES