sas sql sas seminar series shamika ketkar july 14 th, 2008

24
SAS SQL SAS Seminar Series Shamika Ketkar July 14 th , 2008

Upload: barnaby-lester

Post on 26-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

SAS SQL SAS Seminar Series

Shamika KetkarJuly 14th, 2008

Page 2: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

SQL

♦ Structured Query Language♦ Developed by IBM in the early 1970’s♦ From the 70’s to the late 80’s there were

different types of SQL, based on different databases.

♦ In 1986 the first unified SQL standard (SQL-86) was created.

♦ In 1987 database interface for SQL was added to the Version 6 Base SAS package

♦ A “language within a language”

Page 3: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

SQL Nomenclature♦ Tables (datasets)♦ Rows (observations)♦ Columns (variables)

Anatomy of A PROC SQL StatementPROC SQL; SELECT column list FROM table list WHERE condition list GROUP BY column list ORDER BY column list ;quit;

Page 4: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Features♦ SQL looks at datasets differently from SAS

◘ SAS looks at a dataset one record at a time, using an implied loop that moves from the first record to the last

◘ SQL looks at all the records, as a single object◘ Because of this difference SQL can easily do a

few things that are more difficult to do in SAS♦ SQL commands are available for creating tables,

changing table structures, changing values in tables, functions and more…

Page 5: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Processing Large Datasets: Create View

♦ When a table is created, the query is executed and the resulting data is stored in a file.

♦ When a view is created, the query itself is stored in the file. The data is not accessed at all in the process of creating a view.

♦ By default, PROC SQL will print the resultant query (use NOPRINT option to suppress this feature). But NO output is produced when a view is created.

Page 6: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Create ViewPROC SQL; CREATE VIEW out.c1data AS SELECT * FROM data.allgenostarc1 AS a, pheno.new_gtriplet AS b WHERE a.subject=b.subject; ORDER BY a.subject;QUIT;

Log Snippet

NOTE: SQL view ME.C1DATA has been defined.NOTE: PROCEDURE SQL used (Total process time): real time 0.86 seconds cpu time 0.01 seconds

Page 7: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Log Snippet

The CONTENTS ProcedureData Set Name out.c1data Observations .Member Type VIEW Variables 4Engine SQLVIEW Indexes 0Protection Compressed NOData Set Type Sorted YES

# Variable Type Len Format Informat 3 age Num 8 5 pedid Num 8 BEST12. F12. 4 sex Num 8 BEST12. F12. 1 subject Num 8 11. F11.

♦ SAS stores it with an extension ‘sas7bvew’

Page 8: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

View from ViewPROC SQL; CREATE VIEW out.agecat as SELECT *, CASE WHEN . lt age le 18 THEN 1 WHEN 18 lt age le 25 THEN 2 WHEN 25 lt age le 40 THEN 3 WHEN 40 lt age le 55 THEN 4 WHEN 55 lt age le 70 THEN 5 WHEN age gt 70 THEN 6 ELSE . END AS agecat format=1. FROM out.c1data; QUIT;

Page 9: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

SQL FunctionsPROC SQL;SELECT COUNT(DISTINCT subject), agecat, sexFROM out.agecat GROUP BY agecat, sex;QUIT;

$ agecat sex --------------------- 1 1 0 79 2 0 118 2 1 322 3 0380 3 1608 4 0 741 4 1461 5 0 452 5 1 42 6 0 32 6 1

PROC SQL noprint;SELECT COUNT(DISTINCT subject)INTO :subj1-:subj2FROM out.agecatGROUP BY sex;QUIT;%PUT "Males=" &subj1 “Female =“ &subj2;

Macro Variable

Page 10: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

SQL Functions♦ PROC SQL supports all the functions available to the SAS

DATA step that can be used in a proc sql select statement ♦ Because of how SQL handles a dataset, these functions

work over the entire dataset♦ Common Functions:

◘ COUNT◘ DISTINCT◘ MAX◘ MIN◘ SUM◘ AVG◘ VAR◘ STD◘ STDERR

♦ PROC SQL does not support LAG, DIF, and SOUND functions.

◘ NMISS◘ RANGE◘ SUBSTR◘ LENGTH◘ UPPER◘ LOWER◘ CONCAT◘ ROUND◘ MOD

Page 11: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Creating Index

PROC SQL; CREATE UNIQUE INDEX id ON data.goldn(id);

♦ Indexes are auxiliary data structures that can be used to improve performance of large data sources

♦ Stored in the same directory as the indexed table in a different file, same name, different extension

Page 12: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Why use Indexes?♦ NO Index?

◘ Lookups must read the entire data portion of the table from start to finish to be certain of finding all matches

◘ This means a lot of CPU and I/O time used to read records that are never needed

♦ Index?◘ SAS will automatically detect and exploit the index if it

can improve performance◘ The index file contains a list of key variable values and

their location within the data table◘ The index supplies a list of matching records positions

which is then used to interrogate the table itself◘ Only the parts of the table that are needed are read

which means less CPU and I/O time

Page 13: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Merge without SortPROC SQL; CREATE TABLE goldndata AS SELECT * FROM goldn.gtriplet AS a, goldn.blood AS b WHERE a.id=b.id;QUIT;

♦ No presorting required ♦ No requirement for common variable names to join on (should be same type, length)

PROC SQL; CREATE TABLE goldndata AS SELECT * FROM goldn.gtriplet AS a, goldn.blood AS b WHERE a.myid=b.id;QUIT;

Page 14: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Full Join InnerJoin

Left Join Right Join

Combining Datasets: Joins

If a or b; If a and b;

If a; If b;

Page 15: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Changing the Order of Variables♦ Changing the Order of Variables in Your Data Set –

some genetics software require id as first column…

Table 1. Order of variables before changing (oldfile)

Age

Sex

Subject

Table 2. Order of variables after changing (newfile)

Subject

Sex

Age

Page 16: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Changing the order…PROC SQL;CREATE TABLE newfile ( subject num,

sex num, age num

);INSERT INTO newfile SELECT subject, sex, age FROM me.c1data;QUIT;proc contents data=newfile; run;

Alphabetic List of Variables and Attributes # Variable Type Len 3 age Num 8 2 sex Num 8 1 subject Num 8

Log Snippet…

Page 17: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Matching, Sounds-Like…♦ Phonetic Matching: Sounds-Like Operator =*

◘ A technique for finding names that sound alike or have variations in spelling. The sounds-like operator "=*" searches and selects character data based on two expressions: the search value and the matched value.

♦ Pattern Matching: % Wildcard character ◘ The % acts as a wildcard character representing any

number of characters, including any combination of uppercase or lowercase characters. Combining the LIKE predicate with the % (percent sign) permits case-sensitive searches.

Page 18: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

PROC SQL; CREATE VIEW map AS SELECT * FROM map.map;QUIT;PROC SQL; SELECT * FROM map WHERE GeneSymbol LIKE 'CYP%'; * WHERE GeneSymbol =* "CYP19";QUIT;

Matching, Sounds-Like…

Page 19: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Creating Macro Variables with Proc SQL♦ Select ALL Unique Values Into a Macro Variable: Keyword

DISTINCT eliminates duplicates.

PROC SQL NOPRINT; SELECT DISTINCT genesymbol INTO :gene SEPARATED BY ', ' FROM map.map; QUIT; %put &gene; List file Snippet GIMAP4,GIMAP5,GIMAP6,GIMAP7,GIMAP8,GIOT-

1,GIP,GIPC1,GIPC2,…

♦ Without the SEPARATED BY clause each value put into the macro variable would overwrite the previous value and we would end up with an array with the single value which would be the last value of the variable.

Page 20: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

♦ Select ALL Unique Values Into a Macro Variable but this time add double quotes using Quote function and delete consecutive blanks using compbl function.

PROC SQL NOPRINT;

SELECT DISTINCT quote(compbl(genesymbol))

INTO :gene SEPARATED BY ', '

FROM map.map;

QUIT;

%put &gene;

List file Snippet…

"GIMAP4 ","GIMAP5 ","GIMAP6 ","GIMAP7 ","GIMAP8 ","GIOT-1 ","GIP ","GIPC1 ","GIPC2

Macro Variables with Proc SQL contd…

Page 21: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

♦ Select all variable names and create a macro array: the simplest way would include the output from proc contents:

PROC CONTENTS DATA=mydata(KEEP = diabetes -- asthma )

OUT=vars(KEEP = name varNum ) NOPRINT;RUN ;PROC SQL NOPRINT ;SELECT name INTO :row_1 - :row_&SysMaxLongFROM varsORDER BY varnum ;QUIT ;

CREATING MACRO ARRAYS USING PROC SQL

Page 22: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Finale

♦ PROC SQL is an additional tool with its own strengths and challenges

♦ Many times it is just another way to do the same thing

♦ BUT other times it might be much more efficient and may cut down the number of sorts, data steps & procedures or lines of code required.

Page 23: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Suggested Readings

♦ Papers◘ SQL for People Who Don’t Think They Need

SQL: Erin M. Christen (PharmaSUG 2003) ◘ Ten Great Reasons to Learn SAS® Software's

SQL Procedure: Kirk Paul Lafler (SUGI23)

♦ Books◘ Proc SQL Beyond the Basics: Kirk Paul Lafler◘ SAS Guide to the SQL Procedure

Page 24: SAS SQL SAS Seminar Series Shamika Ketkar July 14 th, 2008

Thank you!