basics of sas
TRANSCRIPT
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Basics of SAS
Taddesse Kassahun
Email: [email protected] of StatisticsAddis Ababa University
December 16, 2015
Taddesse Kassahun Basics of SAS 1 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Contents1 Introduction to the SAS Language
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
2 Data Management using SASVariable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
3 Data AnalysisThe Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Taddesse Kassahun Basics of SAS 2 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
General overview of SAS
The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.
Different releases of SAS software might be available fordifferent operating systems.
Select Help ⇒ bout SAS to display a window thatcontains release details.
SAS organizes data into a rectangular form or table calleda SAS dataset.
The SAS system achieves its versatility by providing userswith the ability to write their own program statements.
A comment in SAS *comment statement; or /* commentstatement */
Taddesse Kassahun Basics of SAS 3 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
General overview of SAS
The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.
Different releases of SAS software might be available fordifferent operating systems.
Select Help ⇒ bout SAS to display a window thatcontains release details.
SAS organizes data into a rectangular form or table calleda SAS dataset.
The SAS system achieves its versatility by providing userswith the ability to write their own program statements.
A comment in SAS *comment statement; or /* commentstatement */
Taddesse Kassahun Basics of SAS 3 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
General overview of SAS
The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.
Different releases of SAS software might be available fordifferent operating systems.
Select Help ⇒ bout SAS to display a window thatcontains release details.
SAS organizes data into a rectangular form or table calleda SAS dataset.
The SAS system achieves its versatility by providing userswith the ability to write their own program statements.
A comment in SAS *comment statement; or /* commentstatement */
Taddesse Kassahun Basics of SAS 3 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
General overview of SAS
The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.
Different releases of SAS software might be available fordifferent operating systems.
Select Help ⇒ bout SAS to display a window thatcontains release details.
SAS organizes data into a rectangular form or table calleda SAS dataset.
The SAS system achieves its versatility by providing userswith the ability to write their own program statements.
A comment in SAS *comment statement; or /* commentstatement */
Taddesse Kassahun Basics of SAS 3 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
General overview of SAS
The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.
Different releases of SAS software might be available fordifferent operating systems.
Select Help ⇒ bout SAS to display a window thatcontains release details.
SAS organizes data into a rectangular form or table calleda SAS dataset.
The SAS system achieves its versatility by providing userswith the ability to write their own program statements.
A comment in SAS *comment statement; or /* commentstatement */
Taddesse Kassahun Basics of SAS 3 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
General overview of SAS
The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.
Different releases of SAS software might be available fordifferent operating systems.
Select Help ⇒ bout SAS to display a window thatcontains release details.
SAS organizes data into a rectangular form or table calleda SAS dataset.
The SAS system achieves its versatility by providing userswith the ability to write their own program statements.
A comment in SAS *comment statement; or /* commentstatement */
Taddesse Kassahun Basics of SAS 3 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
General Overview . . .
It allows users call up SAS routines called procedures forperforming various statistical analysis on specifieddatasets.Syntax - There is just one basic syntax rule you mustalways follow: each SAS statement ends with asemicolon, “;”.
Taddesse Kassahun Basics of SAS 4 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
SAS Window Environment
Taddesse Kassahun Basics of SAS 5 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
SAS Window Environment . . .
Taddesse Kassahun Basics of SAS 6 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
SAS Window Environment . . .
Taddesse Kassahun Basics of SAS 7 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
SAS Window Environment . . .
Taddesse Kassahun Basics of SAS 8 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
SAS Window Environment . . .
Taddesse Kassahun Basics of SAS 9 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
Getting Help
The easiest way to get help is by clicking Help in themenu bar.
Writing HELP in the command bar and hitting ENTER.
You can find help about SAS at support.sas.com.
Taddesse Kassahun Basics of SAS 10 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
A SAS program is a sequence of steps that the usersubmits for execution.Data steps are typically used to create SAS data sets.PROC steps are typically used to process SAS data sets.
Taddesse Kassahun Basics of SAS 11 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
Data Libraries
A SAS data library is a collection of SAS files that arerecognized as a unit by SAS.
A SAS dataset is one type of SAS file stored in a datalibrary.
Create a permanent SAS dataset via your own library.
Identify SAS data libraries by assigning each a libraryreference name (libref) with LIBNAME statement.
LIBNAME libref ”file-folder-location”;
Eg: LIBNAME readData’C:/Users/User/Desktop/SAST’;
Taddesse Kassahun Basics of SAS 12 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
Rules for Naming Libref
The name must be 8 characters or less.
The name must begin with a letter or underscore.
The remaining characters can be letters, numbers orunderscores.
Taddesse Kassahun Basics of SAS 13 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries
SAS work library
Work is a temporary library.
SAS datasets created in Work only exist during SASsession.
Once SAS session ends, datasets are erased.
Not necessary to assign a libref for Work.
Taddesse Kassahun Basics of SAS 14 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Data Management
Data management refers to creating, formatting andretrieving datasets.
SAS can subset, split, merge, concatenate, transpose, andaggregate data sets into formats appropriate for furtheranalyses.
SAS has also the capability to import data sets fromother software packages, and export datasets to othersoftware packages.
Datasets consist of observations and variables.
Observations = records or experimental units.
Variables are characteristics assuming different values.
SAS variables can be of two types: numeric or character.
Taddesse Kassahun Basics of SAS 15 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Variable names
SAS names can contain uppercase letters, lowercaseletters, or a mix of the two.SAS names must start with a letter or an underscore( )The remaining characters in a SAS name can be letters,numbers, or underscores.SAS names can be up to 32 characters long.SAS names cannot contain embedded blanks, e.g., DOITis a valid name, but DO IT is not.SAS reserves some names for internal use.
For datasets, don’t use the name NULL , DATA ,LASTFor variables, do not use the name N , ERROR ,NUMERIC , CHARACTER , or ALL
Taddesse Kassahun Basics of SAS 16 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Words in SAS
Words in SAS may be written in uppercase, lowercase, ora mix of the two.
Letters of words in the raw data should exactly matchwith letters of words in SAS (case sensitive).
Missing Values
For numeric variables, enter a period (.) for missingvalues.
For character variables, continue to leave the data blank(” ”). When we place several observations on a singleline, enter a period for a missing character value.
Taddesse Kassahun Basics of SAS 17 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
SAS Statements
SAS programs are made up of two kinds of statements:
Statements that lead to the creation of SAS datasets(data step).Statements that lead to the analysis of SAS datasets(proc step).
The output from SAS program consists of 2 parts:
The SAS Log for running commentary on the results ofexecuting each step of the entire program.The SAS Output: output produced from statisticalanalysis.
Taddesse Kassahun Basics of SAS 18 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
DATA Step
To create one or more SAS data sets.
The data step contains statements which read in rawdata files or existing ASCII data files.
It also helps perform the following tasks.
Transforming, creating, and selecting variables.Labeling variables, and so on.
The data step begins with the word DATA followed bythe name of a dataset, i.e., DATA data-set-name;
If a data-set-name is not provided, SAS chooses theautomatic names DATA1, DATA2, and so on.
Taddesse Kassahun Basics of SAS 19 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Codes in Data Step
Data data−set−name;
Input X1 X2 Y1 Y5 /* If the variables are numeric*/Input C1$ C2$ Z$ /* If variables are characters*/Input X1 – X4 /* Enters four numeric variables */Input X1@@ /* Enters several values of X1 on a singleline*/Input X d• /* Reads d digits of X before decimal */
Cards or datalines : These statements tell SAS softwarethat the data lines follow.
Infile: to read data already existing in ASCII file.
Taddesse Kassahun Basics of SAS 20 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Proc Step
The PROC step, a short form for procedure, is used toperform different procedures such as:
Proc Contents, Proc Print, Proc Reg, Proc Means, ProcPlot, Proc Anova, etc.
The Proc step must come after the Data step.Example: To display the contents of a dataset ”tad”:
Taddesse Kassahun Basics of SAS 21 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Example
Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M
X2: 19 17 25 Age: 21 20 26
Q1 Q2
Taddesse Kassahun Basics of SAS 22 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Example
Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M
X2: 19 17 25 Age: 21 20 26
Q1 Q2
Taddesse Kassahun Basics of SAS 22 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Example
Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M
X2: 19 17 25 Age: 21 20 26
Q1 Q2
Taddesse Kassahun Basics of SAS 22 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Example
Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M
X2: 19 17 25 Age: 21 20 26
Q1 Q2
Taddesse Kassahun Basics of SAS 22 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
SAS datasets
In order to create a SAS dataset from a raw data file, youmust
Start a DATA step and name the SAS dataset beingcreated (DATA statement)Identify the location of the raw data file to read (INFILEstatement)Describe how to read the data fields from the raw datafile (INPUT statement)
Taddesse Kassahun Basics of SAS 23 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Data Libraries Example
Create a permanent SAS dataset where the name of thelibrary is T on your desktop with folder name SAST.
City HighT LowTAA 26 14BD 31 17
Mekelle 27 15Jimma 28 16
Taddesse Kassahun Basics of SAS 24 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
SAS Code
libname T ’C:\Users\User\Desktop\SAST’;
data T.Temp; input city $ H L ; cards;AA 26 14BD 31 17Mekelle 27 15Jimma 28 16;run;
Taddesse Kassahun Basics of SAS 25 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Reading data from a text file
Example: Suppose that a dataset named PG whichcomprises results from an experiment on plant growth isavailable at the folder SAST. Then read the dataset intoSAS.data PlantGrowth;infile’C:\Users\User\Desktop\SAST\PG.txt’ firstobs=2;
input weight group $ ;run;INFILE – where to find the data.FIRSTOBS – start reading data from the 2nd line if thedata has header.INPUT – variable names to associate each data value.
Taddesse Kassahun Basics of SAS 26 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Reading data from a csv file
dsd option
(delimiter-sensitive data):
Changes default delimiter from blank to commaIf two delimiters in a row, assumes missing valuebetween them.
Taddesse Kassahun Basics of SAS 27 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Reading data from a csv file
dsd option
(delimiter-sensitive data):
Changes default delimiter from blank to commaIf two delimiters in a row, assumes missing valuebetween them.
Taddesse Kassahun Basics of SAS 27 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Reading PC Database Files with IMPORT
Code
PROC IMPORT DATAFILE = ’filename’OUT = datasetREPLACEDBMS=csv;GETNAMES=yes; RUN;
OUT= name of output SAS dataset.
DATAFILE= where to find the data (same as INFILE)
DBMS= type of incoming raw data (can be CSV, TAB,EXCEL).
REPLACE= to overwrite on the existing dataset.
GETNAMES=yes uses the first record to generate names.
Taddesse Kassahun Basics of SAS 28 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Example
Code
PROC IMPORTDATAFILE=’C:\Users\User\Desktop\SAST\PGEx.xls’
OUT=Plant;GETNAMES=yes;RUN;
Taddesse Kassahun Basics of SAS 29 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
IMPORT Procedure using Menu
Step 1
Taddesse Kassahun Basics of SAS 30 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
IMPORT Procedure using Menu
Step 2
Taddesse Kassahun Basics of SAS 31 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
The Output Delivery System (ODS)
ODS allows output to be presented in multiple formatsPDF (ods pdf)Excel (ods html)HTML (ods html)Word (ods rtf)
Output can be opened in non−SAS applicationsMore control over the appearance of the output.
Procedure
ODS destination FILE=’file−pathname.ext’;< . . . procedures. . . > ODS destination CLOSE;
destination : Desired destination (PDF, HTML, etc.)File will not be created until ODS close statement
Taddesse Kassahun Basics of SAS 32 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
ODS Example
Output to a pdf file
ods pdf file= ’C:\Users\User\Desktop\SAST\PG1.pdf’
style=education;proc print data=pg;run;ods pdf close;
To see the list of available style templates
proc template;list styles;run;
Taddesse Kassahun Basics of SAS 33 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
ODS Example
Output to an excel file
ods html file= ’C:\Users\User\Desktop\SAST\PG1.xls’
style=Sasweb;proc print data=pg;run;ods html close;
Output to a word file
ods rtf file= ’C:\Users\User\Desktop\SAST\PG1.doc’
style=Science;proc print data=pg;run;ods rtf close;
Taddesse Kassahun Basics of SAS 34 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Labeling
Labels can be applied to variables using the LABELstatement.
Example
data PG;infile’C:\Users\User\Desktop\SAST\PG.txt’ firstobs=2;
input weight group $;label weight=”Dried Weight of Trees”group=”A Control and Treatments”;run;proc print data=PG label;run;
Taddesse Kassahun Basics of SAS 35 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Conditional Processing
This involves IF, THEN, ELSE.
IF
IF < condition > THEN < X >; ELSE < Y >;
If Score ≥ 50 Then Grade = ’Pass’;
Else Grade = ’Fail’;
Student Score GradeHenok 75 PssGoitom 58 Pass
Leah 40 FailZekiya 70 Pass
Taddesse Kassahun Basics of SAS 36 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Example
Input
Data score;input Stud $ Sc;if Sc >= 50 then Grade = ”Pass” ;else Grade = ”Fail”;cards;Henok 75Goitom 58Leah 40Zekiya 68;run;proc print data=score; run;
Taddesse Kassahun Basics of SAS 37 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
If..Else If
IF < condition > THEN < X >;ELSE IF < condition2 > THEN < Y >;
ELSE < Z >;
Example
Data score1; Input Stud $ Sc;if Sc >=80 then Grade = ”Very Good” ;Else if 60 <= Sc <80 then Grade=”Good”else Grade = ”Not Bad”; cards;Henok 75Goitom 58Leah 40Zekiya 68
; run;Taddesse Kassahun Basics of SAS 38 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Arithmetic operators
Example
Data Op; input x @@; y1 = x + 2; y2= x/2;y3=x**2; cards;2 3 4 5 6 7 8 9 10; run;Proc print data=Op; run;
Taddesse Kassahun Basics of SAS 39 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Comparison operators
Taddesse Kassahun Basics of SAS 40 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Subsetting datasets
We can use IF or WHERE statements to subset data.Both IF and WHERE statements can be used within DATAstep if SET statement is used to read in SAS data.IF statement must be used within DATA step if INPUTstatement is used to read raw data.
WHERE statement must be used within PROC step.
Example
Using the dataset AD available at the folder SASTdata New; set SP;if AGe lt 70;run; * ORdata New1; set SP;where AGe lt 70; run;
Taddesse Kassahun Basics of SAS 41 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
WHERE
Use a WHERE statement in a PROC step to includeselected observations only.
Example
Example Data Sub1; input Gender $ Age @@; cards;M 22 M 33 F 26 F 29 F 35 M 40 F 28 M 32;Run;Proc print data=Sub1;where Age ge 30;run;
Taddesse Kassahun Basics of SAS 42 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Selecting Variables
By default, SAS will keep all variables of the inputdataset.Use DROP to exclude certain variables from the outputdataset.Use KEEP to include only certain variables from theoutput dataset.
Taddesse Kassahun Basics of SAS 43 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Merging
It refers to combining data from two or more datasets.
We can merge entire datasets or subsets of datasets.
We can produce one−to − one or one − to −many , butnot many − to −many joins.
Input datasets must have a common identifying variable(primary key).
Input datasets must be first sorted by this key variable.
Key variable must have same name and attributes.
All other variables must have a unique name or they willbe overwritten by last merged dataset.
Taddesse Kassahun Basics of SAS 44 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Taddesse Kassahun Basics of SAS 45 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Example
Merge the datasets in Test 1 and Test 2
Test 1
Data Test1; input Stud_Id $T1; cards;001 15003 17004 15002 16;Run;
Test 2
Data Test2; input Stud_Id $T2; cards;001 16003 15002 18004 14;Run;
Taddesse Kassahun Basics of SAS 46 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
First sort the datasets using the PROC Sort.
Sorting in Ascending
Proc sort data=Test1; by Stud_Id; run;Proc sort data=Test2; by Stud_Id; run;
To sort the datasets in a descending order:
Sorting in Descending
Proc sort data=Test1 out=Test1S; by DESCENDINGStud_Id; run; Proc sort data=Test2 out=Test2S; byDESCENDING Stud_Id; run;
Taddesse Kassahun Basics of SAS 47 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
First sort the datasets using the PROC Sort.
Sorting in Ascending
Proc sort data=Test1; by Stud_Id; run;Proc sort data=Test2; by Stud_Id; run;
To sort the datasets in a descending order:
Sorting in Descending
Proc sort data=Test1 out=Test1S; by DESCENDINGStud_Id; run; Proc sort data=Test2 out=Test2S; byDESCENDING Stud_Id; run;
Taddesse Kassahun Basics of SAS 47 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Merged dataset
To merge the two datasets:
Data Final;merge Test1 Test2;by Stud_Id; run;proc print data=Final;run;
Taddesse Kassahun Basics of SAS 48 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Merged dataset
To merge the two datasets:
Data Final;merge Test1 Test2;by Stud_Id; run;proc print data=Final;run;
Taddesse Kassahun Basics of SAS 48 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
One − to −many
Taddesse Kassahun Basics of SAS 49 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Example
Demo
Data Demo; input Stud_Id $ Gen $ ; cards;001 F002 M003 M004 F;Run;Proc sort data=demo;by Stud_Id;run;
Taddesse Kassahun Basics of SAS 50 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Course
Data Course; input Stud_Id $ CCode $; cards;001 Psy101001 Phil105001 Math212002 EnLa222002 Psy101002 Stat173;Run;Proc sort data=Course;by Stud_Id; Run;
Taddesse Kassahun Basics of SAS 51 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
To merge the two
Data DemCo;merge Demo Course;by StudId;Run;Proc print data=DemCo;Run;
Taddesse Kassahun Basics of SAS 52 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Some functions
ABS(variable) takes the absolute value of a numericvariable.
LOG(variable) takes the natural logarithm of a numericvariable.
ROUND(variable, unit) rounds the numeric variableaccording to the unit.
LOWCASE(variable) converts mixed case text of thecharacter variable to all lowercase.
UPCASE(variable) converts mixed case text of thecharacter variable to all uppercase.
Taddesse Kassahun Basics of SAS 53 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Summarizing Data
PROC UNIVARIATE gives an extensive summary
PROC MEANS gives a brief summary
UNIVARIATE
PROC UNIVARIATE DATA=data-set-name;VAR variables;ID variable;RUN;
MEANS
PROC MEANS DATA=data-set-name;VAR variables; RUN;
Taddesse Kassahun Basics of SAS 54 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Summarizing Data
PROC UNIVARIATE gives an extensive summary
PROC MEANS gives a brief summary
UNIVARIATE
PROC UNIVARIATE DATA=data-set-name;VAR variables;ID variable;RUN;
MEANS
PROC MEANS DATA=data-set-name;VAR variables; RUN;
Taddesse Kassahun Basics of SAS 54 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Line Printer Plots for Continuous Variables
PLOTS
PROC UNIVARIATE DATA=data-set-name PLOT;VAR variables;
Example
Proc univariate data=orange plot;Var age circumference; run;
Histograms
PROC UNIVARIATE DATA=data-set-name noprint;VAR variables;HISTOGRAM variables;RUN;
Taddesse Kassahun Basics of SAS 55 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Line Printer Plots for Continuous Variables
PLOTS
PROC UNIVARIATE DATA=data-set-name PLOT;VAR variables;
Example
Proc univariate data=orange plot;Var age circumference; run;
Histograms
PROC UNIVARIATE DATA=data-set-name noprint;VAR variables;HISTOGRAM variables;RUN;
Taddesse Kassahun Basics of SAS 55 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Line Printer Plots for Continuous Variables
PLOTS
PROC UNIVARIATE DATA=data-set-name PLOT;VAR variables;
Example
Proc univariate data=orange plot;Var age circumference; run;
Histograms
PROC UNIVARIATE DATA=data-set-name noprint;VAR variables;HISTOGRAM variables;RUN;
Taddesse Kassahun Basics of SAS 55 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Frequency table and Bar chart
PROC UNIVARIATE
PROC UNIVARIATE DATA=data-set-name FREQ;VAR variables; Run;
PROC FREQ
PROC FREQ DATA=data-set-name;TABLES variables; Run;
Bar Charts
Proc Gchart Data=data-set;Vbar variables; Hbar variables;RUN;
Taddesse Kassahun Basics of SAS 56 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Frequency table and Bar chart
PROC UNIVARIATE
PROC UNIVARIATE DATA=data-set-name FREQ;VAR variables; Run;
PROC FREQ
PROC FREQ DATA=data-set-name;TABLES variables; Run;
Bar Charts
Proc Gchart Data=data-set;Vbar variables; Hbar variables;RUN;
Taddesse Kassahun Basics of SAS 56 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions
Frequency table and Bar chart
PROC UNIVARIATE
PROC UNIVARIATE DATA=data-set-name FREQ;VAR variables; Run;
PROC FREQ
PROC FREQ DATA=data-set-name;TABLES variables; Run;
Bar Charts
Proc Gchart Data=data-set;Vbar variables; Hbar variables;RUN;
Taddesse Kassahun Basics of SAS 56 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
The Chi-squared test of association
The null hypothesis : two categorical variables are notassociated.
The alternative hypothesis: two categorical variables areassociated.
Sporting FacilityA B C
SatisfiedYes 17 14 13No 3 6 7
Is there evidence of different satisfaction levels in the threefacilities?
Taddesse Kassahun Basics of SAS 57 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Data
Data client;input Sat $ Facility $ count @@;datalines;Yes A 17 Yes B 14 Yes C 13 No A 3 No B 6 No C 7;Run;
Chisq
Proc freq data=client;tables sat*facility / expected chisq norow nocol nopercent;weight count;Run;
Taddesse Kassahun Basics of SAS 58 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Data
Data client;input Sat $ Facility $ count @@;datalines;Yes A 17 Yes B 14 Yes C 13 No A 3 No B 6 No C 7;Run;
Chisq
Proc freq data=client;tables sat*facility / expected chisq norow nocol nopercent;weight count;Run;
Taddesse Kassahun Basics of SAS 58 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
(PROC Tabulate)
TABULATE
Proc tabulate;class list all class variables;table Row-variable, Column-variable;Run;
Adding total rows and columns
proc tabulate data=haireye;class Hair Eye;table Hair ALL , Eye ALL ;run;
Taddesse Kassahun Basics of SAS 59 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Testing the Mean
We can use proc ttest to perform a t-test to determinewhether the
mean of a group has some specified value,mean of one group differs from the other.
One Sample T-Test
PROC TTEST DATA=data-set-name h0=mean;Var measurement-variable; run;
Two Sample T-Test
Paired SamplePROC TTEST DATA=data-set-name;PAIRED first-variable * second-variable; run;
Taddesse Kassahun Basics of SAS 60 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Independent Sample T Test
PROC TTEST DATA=data-set-name;CLASS classification-variable;VAR measurement-variables; run;
Comparing More than Two Group Means
PROC ANOVA DATA=data-set-name;CLASS class-variable;MODEL Response-variable=class-variable;MEANS class-variable / HOVTEST WELCH;MEANS class-var/ BON TUKEY SCHEFFE LSD; RUN;
Taddesse Kassahun Basics of SAS 61 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Residual Analysis to check assumptions in ANOVA
Residual Analysis
Proc ANOVA Data=datasetName;Class CategoricalVar;Model Response = Factor;LSMeans Factor;Means Factor/ hovtest;OUTPUT OUT=diagnost p=yhat r=resid;Run;
GPLOT
PROC GPLOT data=diagnost;PLOT resid*yhat/vref=0;Run;
Taddesse Kassahun Basics of SAS 62 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Residual Analysis ...
UNIVARIATE
PROC UNIVARIATE noprint ;QQPLOT resid / normal;Run;
Shapiro-Wilk’s Test
Proc UNIVARIATE DATA=diagnost normal;Var resid;Run;
Taddesse Kassahun Basics of SAS 63 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
ANOVA
RCBD
Proc glm data=DatasetName;class trt rep;model response = trt rep;means trt/lsd cldiff alpha=.05;contrast ’Control vs Others’ trt 4 -1 -1 -1 -1;Run;
Latin Square
PROC GLM data=latin;CLASS COLVar RowVar TRT;MODEL MILK = TRT COLVar RowVar;MEANS TRT/ TUKEY; RUN;
Taddesse Kassahun Basics of SAS 64 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
ANOVA
Factorial
PROC GLM; CLASS Factor1 Factor2;MODEL Response= Factor1|Factor2;RUN;
Split Plot
Proc Glm;Class Block WPlot SPlot;Model Response = Block|WPlot|SPlot/ss3;TEST H = BLOCK WPlot E = BLOCK*WPlot;TEST H = SPlot E = BLOCK*SPlot;TEST H=WPlot*SPlot E = BLOCK*WPlot*SPlot;Run;
Taddesse Kassahun Basics of SAS 65 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Linear Regression
Simple Linear
Proc reg data=DatasetName;Model Response=Factor/p clb;Plot Response*Factor/ nomodel nostat;plot r.*p. student.*nqq./ nomodel nostat;Run;
Multiple Linear
Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/vif ;Run;
Taddesse Kassahun Basics of SAS 66 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Model Diagnostics
Test for Normality of Residuals
Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork;output out=diag (keep= r pr) residual=r predicted=pr;Run;Proc univariate data=diag normal;var r;qqplot r / normal(mu=est sigma=est);Run;
Taddesse Kassahun Basics of SAS 67 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Tests on Nonconstant Error Variance
Graphical Method
Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork;plot r.*p.;Run;
The White’s Test
Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/spec;Run;
Taddesse Kassahun Basics of SAS 68 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Independence of Errors
Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/dw;plot r.*p.;Run;
Tests for Collinearity
Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/vif;Run;
Taddesse Kassahun Basics of SAS 69 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Simple Logistic
SAS CODE
PROC LOGISTIC DATA=datasetName descending;CLASS variables ;MODEL response=predictors/lackfit;OUTPUT OUT=SAS-data-set p = probability;RUN;
Model Selection
Proc logistic data= DatasetName;Model Response=predictors /selection=stepwise orforward or backward;Run;
Taddesse Kassahun Basics of SAS 70 / 71
Introduction to the SAS LanguageData Management using SAS
Data Analysis
The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression
Taddesse Kassahun Basics of SAS 71 / 71