basics of sas

88
Introduction to the SAS Language Data Management using SAS Data Analysis Basics of SAS Taddesse Kassahun Email: [email protected] Department of Statistics Addis Ababa University December 16, 2015 Taddesse Kassahun Basics of SAS 1 / 71

Upload: taddesse-kassahun

Post on 16-Apr-2017

198 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Basics of SAS

Taddesse Kassahun

Email: [email protected] of StatisticsAddis Ababa University

December 16, 2015

Taddesse Kassahun Basics of SAS 1 / 71

Page 2: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Contents1 Introduction to the SAS Language

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

2 Data Management using SASVariable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

3 Data AnalysisThe Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Taddesse Kassahun Basics of SAS 2 / 71

Page 3: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Page 4: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Page 5: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Page 6: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Page 7: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Page 8: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Page 9: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General Overview . . .

It allows users call up SAS routines called procedures forperforming various statistical analysis on specifieddatasets.Syntax - There is just one basic syntax rule you mustalways follow: each SAS statement ends with asemicolon, “;”.

Taddesse Kassahun Basics of SAS 4 / 71

Page 10: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS Window Environment

Taddesse Kassahun Basics of SAS 5 / 71

Page 11: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS Window Environment . . .

Taddesse Kassahun Basics of SAS 6 / 71

Page 12: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS Window Environment . . .

Taddesse Kassahun Basics of SAS 7 / 71

Page 13: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS Window Environment . . .

Taddesse Kassahun Basics of SAS 8 / 71

Page 14: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS Window Environment . . .

Taddesse Kassahun Basics of SAS 9 / 71

Page 15: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

Getting Help

The easiest way to get help is by clicking Help in themenu bar.

Writing HELP in the command bar and hitting ENTER.

You can find help about SAS at support.sas.com.

Taddesse Kassahun Basics of SAS 10 / 71

Page 16: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

A SAS program is a sequence of steps that the usersubmits for execution.Data steps are typically used to create SAS data sets.PROC steps are typically used to process SAS data sets.

Taddesse Kassahun Basics of SAS 11 / 71

Page 17: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

Data Libraries

A SAS data library is a collection of SAS files that arerecognized as a unit by SAS.

A SAS dataset is one type of SAS file stored in a datalibrary.

Create a permanent SAS dataset via your own library.

Identify SAS data libraries by assigning each a libraryreference name (libref) with LIBNAME statement.

LIBNAME libref ”file-folder-location”;

Eg: LIBNAME readData’C:/Users/User/Desktop/SAST’;

Taddesse Kassahun Basics of SAS 12 / 71

Page 18: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

Rules for Naming Libref

The name must be 8 characters or less.

The name must begin with a letter or underscore.

The remaining characters can be letters, numbers orunderscores.

Taddesse Kassahun Basics of SAS 13 / 71

Page 19: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS work library

Work is a temporary library.

SAS datasets created in Work only exist during SASsession.

Once SAS session ends, datasets are erased.

Not necessary to assign a libref for Work.

Taddesse Kassahun Basics of SAS 14 / 71

Page 20: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Data Management

Data management refers to creating, formatting andretrieving datasets.

SAS can subset, split, merge, concatenate, transpose, andaggregate data sets into formats appropriate for furtheranalyses.

SAS has also the capability to import data sets fromother software packages, and export datasets to othersoftware packages.

Datasets consist of observations and variables.

Observations = records or experimental units.

Variables are characteristics assuming different values.

SAS variables can be of two types: numeric or character.

Taddesse Kassahun Basics of SAS 15 / 71

Page 21: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Variable names

SAS names can contain uppercase letters, lowercaseletters, or a mix of the two.SAS names must start with a letter or an underscore( )The remaining characters in a SAS name can be letters,numbers, or underscores.SAS names can be up to 32 characters long.SAS names cannot contain embedded blanks, e.g., DOITis a valid name, but DO IT is not.SAS reserves some names for internal use.

For datasets, don’t use the name NULL , DATA ,LASTFor variables, do not use the name N , ERROR ,NUMERIC , CHARACTER , or ALL

Taddesse Kassahun Basics of SAS 16 / 71

Page 22: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Words in SAS

Words in SAS may be written in uppercase, lowercase, ora mix of the two.

Letters of words in the raw data should exactly matchwith letters of words in SAS (case sensitive).

Missing Values

For numeric variables, enter a period (.) for missingvalues.

For character variables, continue to leave the data blank(” ”). When we place several observations on a singleline, enter a period for a missing character value.

Taddesse Kassahun Basics of SAS 17 / 71

Page 23: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

SAS Statements

SAS programs are made up of two kinds of statements:

Statements that lead to the creation of SAS datasets(data step).Statements that lead to the analysis of SAS datasets(proc step).

The output from SAS program consists of 2 parts:

The SAS Log for running commentary on the results ofexecuting each step of the entire program.The SAS Output: output produced from statisticalanalysis.

Taddesse Kassahun Basics of SAS 18 / 71

Page 24: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

DATA Step

To create one or more SAS data sets.

The data step contains statements which read in rawdata files or existing ASCII data files.

It also helps perform the following tasks.

Transforming, creating, and selecting variables.Labeling variables, and so on.

The data step begins with the word DATA followed bythe name of a dataset, i.e., DATA data-set-name;

If a data-set-name is not provided, SAS chooses theautomatic names DATA1, DATA2, and so on.

Taddesse Kassahun Basics of SAS 19 / 71

Page 25: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Codes in Data Step

Data data−set−name;

Input X1 X2 Y1 Y5 /* If the variables are numeric*/Input C1$ C2$ Z$ /* If variables are characters*/Input X1 – X4 /* Enters four numeric variables */Input X1@@ /* Enters several values of X1 on a singleline*/Input X d• /* Reads d digits of X before decimal */

Cards or datalines : These statements tell SAS softwarethat the data lines follow.

Infile: to read data already existing in ASCII file.

Taddesse Kassahun Basics of SAS 20 / 71

Page 26: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Proc Step

The PROC step, a short form for procedure, is used toperform different procedures such as:

Proc Contents, Proc Print, Proc Reg, Proc Means, ProcPlot, Proc Anova, etc.

The Proc step must come after the Data step.Example: To display the contents of a dataset ”tad”:

Taddesse Kassahun Basics of SAS 21 / 71

Page 27: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M

X2: 19 17 25 Age: 21 20 26

Q1 Q2

Taddesse Kassahun Basics of SAS 22 / 71

Page 28: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M

X2: 19 17 25 Age: 21 20 26

Q1 Q2

Taddesse Kassahun Basics of SAS 22 / 71

Page 29: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M

X2: 19 17 25 Age: 21 20 26

Q1 Q2

Taddesse Kassahun Basics of SAS 22 / 71

Page 30: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M

X2: 19 17 25 Age: 21 20 26

Q1 Q2

Taddesse Kassahun Basics of SAS 22 / 71

Page 31: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

SAS datasets

In order to create a SAS dataset from a raw data file, youmust

Start a DATA step and name the SAS dataset beingcreated (DATA statement)Identify the location of the raw data file to read (INFILEstatement)Describe how to read the data fields from the raw datafile (INPUT statement)

Taddesse Kassahun Basics of SAS 23 / 71

Page 32: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Data Libraries Example

Create a permanent SAS dataset where the name of thelibrary is T on your desktop with folder name SAST.

City HighT LowTAA 26 14BD 31 17

Mekelle 27 15Jimma 28 16

Taddesse Kassahun Basics of SAS 24 / 71

Page 33: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

SAS Code

libname T ’C:\Users\User\Desktop\SAST’;

data T.Temp; input city $ H L ; cards;AA 26 14BD 31 17Mekelle 27 15Jimma 28 16;run;

Taddesse Kassahun Basics of SAS 25 / 71

Page 34: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Reading data from a text file

Example: Suppose that a dataset named PG whichcomprises results from an experiment on plant growth isavailable at the folder SAST. Then read the dataset intoSAS.data PlantGrowth;infile’C:\Users\User\Desktop\SAST\PG.txt’ firstobs=2;

input weight group $ ;run;INFILE – where to find the data.FIRSTOBS – start reading data from the 2nd line if thedata has header.INPUT – variable names to associate each data value.

Taddesse Kassahun Basics of SAS 26 / 71

Page 35: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Reading data from a csv file

dsd option

(delimiter-sensitive data):

Changes default delimiter from blank to commaIf two delimiters in a row, assumes missing valuebetween them.

Taddesse Kassahun Basics of SAS 27 / 71

Page 36: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Reading data from a csv file

dsd option

(delimiter-sensitive data):

Changes default delimiter from blank to commaIf two delimiters in a row, assumes missing valuebetween them.

Taddesse Kassahun Basics of SAS 27 / 71

Page 37: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Reading PC Database Files with IMPORT

Code

PROC IMPORT DATAFILE = ’filename’OUT = datasetREPLACEDBMS=csv;GETNAMES=yes; RUN;

OUT= name of output SAS dataset.

DATAFILE= where to find the data (same as INFILE)

DBMS= type of incoming raw data (can be CSV, TAB,EXCEL).

REPLACE= to overwrite on the existing dataset.

GETNAMES=yes uses the first record to generate names.

Taddesse Kassahun Basics of SAS 28 / 71

Page 38: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Code

PROC IMPORTDATAFILE=’C:\Users\User\Desktop\SAST\PGEx.xls’

OUT=Plant;GETNAMES=yes;RUN;

Taddesse Kassahun Basics of SAS 29 / 71

Page 39: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

IMPORT Procedure using Menu

Step 1

Taddesse Kassahun Basics of SAS 30 / 71

Page 40: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

IMPORT Procedure using Menu

Step 2

Taddesse Kassahun Basics of SAS 31 / 71

Page 41: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

The Output Delivery System (ODS)

ODS allows output to be presented in multiple formatsPDF (ods pdf)Excel (ods html)HTML (ods html)Word (ods rtf)

Output can be opened in non−SAS applicationsMore control over the appearance of the output.

Procedure

ODS destination FILE=’file−pathname.ext’;< . . . procedures. . . > ODS destination CLOSE;

destination : Desired destination (PDF, HTML, etc.)File will not be created until ODS close statement

Taddesse Kassahun Basics of SAS 32 / 71

Page 42: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

ODS Example

Output to a pdf file

ods pdf file= ’C:\Users\User\Desktop\SAST\PG1.pdf’

style=education;proc print data=pg;run;ods pdf close;

To see the list of available style templates

proc template;list styles;run;

Taddesse Kassahun Basics of SAS 33 / 71

Page 43: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

ODS Example

Output to an excel file

ods html file= ’C:\Users\User\Desktop\SAST\PG1.xls’

style=Sasweb;proc print data=pg;run;ods html close;

Output to a word file

ods rtf file= ’C:\Users\User\Desktop\SAST\PG1.doc’

style=Science;proc print data=pg;run;ods rtf close;

Taddesse Kassahun Basics of SAS 34 / 71

Page 44: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Labeling

Labels can be applied to variables using the LABELstatement.

Example

data PG;infile’C:\Users\User\Desktop\SAST\PG.txt’ firstobs=2;

input weight group $;label weight=”Dried Weight of Trees”group=”A Control and Treatments”;run;proc print data=PG label;run;

Taddesse Kassahun Basics of SAS 35 / 71

Page 45: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Conditional Processing

This involves IF, THEN, ELSE.

IF

IF < condition > THEN < X >; ELSE < Y >;

If Score ≥ 50 Then Grade = ’Pass’;

Else Grade = ’Fail’;

Student Score GradeHenok 75 PssGoitom 58 Pass

Leah 40 FailZekiya 70 Pass

Taddesse Kassahun Basics of SAS 36 / 71

Page 46: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Input

Data score;input Stud $ Sc;if Sc >= 50 then Grade = ”Pass” ;else Grade = ”Fail”;cards;Henok 75Goitom 58Leah 40Zekiya 68;run;proc print data=score; run;

Taddesse Kassahun Basics of SAS 37 / 71

Page 47: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

If..Else If

IF < condition > THEN < X >;ELSE IF < condition2 > THEN < Y >;

ELSE < Z >;

Example

Data score1; Input Stud $ Sc;if Sc >=80 then Grade = ”Very Good” ;Else if 60 <= Sc <80 then Grade=”Good”else Grade = ”Not Bad”; cards;Henok 75Goitom 58Leah 40Zekiya 68

; run;Taddesse Kassahun Basics of SAS 38 / 71

Page 48: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Arithmetic operators

Example

Data Op; input x @@; y1 = x + 2; y2= x/2;y3=x**2; cards;2 3 4 5 6 7 8 9 10; run;Proc print data=Op; run;

Taddesse Kassahun Basics of SAS 39 / 71

Page 49: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Comparison operators

Taddesse Kassahun Basics of SAS 40 / 71

Page 50: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Subsetting datasets

We can use IF or WHERE statements to subset data.Both IF and WHERE statements can be used within DATAstep if SET statement is used to read in SAS data.IF statement must be used within DATA step if INPUTstatement is used to read raw data.

WHERE statement must be used within PROC step.

Example

Using the dataset AD available at the folder SASTdata New; set SP;if AGe lt 70;run; * ORdata New1; set SP;where AGe lt 70; run;

Taddesse Kassahun Basics of SAS 41 / 71

Page 51: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

WHERE

Use a WHERE statement in a PROC step to includeselected observations only.

Example

Example Data Sub1; input Gender $ Age @@; cards;M 22 M 33 F 26 F 29 F 35 M 40 F 28 M 32;Run;Proc print data=Sub1;where Age ge 30;run;

Taddesse Kassahun Basics of SAS 42 / 71

Page 52: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Selecting Variables

By default, SAS will keep all variables of the inputdataset.Use DROP to exclude certain variables from the outputdataset.Use KEEP to include only certain variables from theoutput dataset.

Taddesse Kassahun Basics of SAS 43 / 71

Page 53: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Merging

It refers to combining data from two or more datasets.

We can merge entire datasets or subsets of datasets.

We can produce one−to − one or one − to −many , butnot many − to −many joins.

Input datasets must have a common identifying variable(primary key).

Input datasets must be first sorted by this key variable.

Key variable must have same name and attributes.

All other variables must have a unique name or they willbe overwritten by last merged dataset.

Taddesse Kassahun Basics of SAS 44 / 71

Page 54: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Taddesse Kassahun Basics of SAS 45 / 71

Page 55: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Merge the datasets in Test 1 and Test 2

Test 1

Data Test1; input Stud_Id $T1; cards;001 15003 17004 15002 16;Run;

Test 2

Data Test2; input Stud_Id $T2; cards;001 16003 15002 18004 14;Run;

Taddesse Kassahun Basics of SAS 46 / 71

Page 56: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

First sort the datasets using the PROC Sort.

Sorting in Ascending

Proc sort data=Test1; by Stud_Id; run;Proc sort data=Test2; by Stud_Id; run;

To sort the datasets in a descending order:

Sorting in Descending

Proc sort data=Test1 out=Test1S; by DESCENDINGStud_Id; run; Proc sort data=Test2 out=Test2S; byDESCENDING Stud_Id; run;

Taddesse Kassahun Basics of SAS 47 / 71

Page 57: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

First sort the datasets using the PROC Sort.

Sorting in Ascending

Proc sort data=Test1; by Stud_Id; run;Proc sort data=Test2; by Stud_Id; run;

To sort the datasets in a descending order:

Sorting in Descending

Proc sort data=Test1 out=Test1S; by DESCENDINGStud_Id; run; Proc sort data=Test2 out=Test2S; byDESCENDING Stud_Id; run;

Taddesse Kassahun Basics of SAS 47 / 71

Page 58: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Merged dataset

To merge the two datasets:

Data Final;merge Test1 Test2;by Stud_Id; run;proc print data=Final;run;

Taddesse Kassahun Basics of SAS 48 / 71

Page 59: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Merged dataset

To merge the two datasets:

Data Final;merge Test1 Test2;by Stud_Id; run;proc print data=Final;run;

Taddesse Kassahun Basics of SAS 48 / 71

Page 60: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

One − to −many

Taddesse Kassahun Basics of SAS 49 / 71

Page 61: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Demo

Data Demo; input Stud_Id $ Gen $ ; cards;001 F002 M003 M004 F;Run;Proc sort data=demo;by Stud_Id;run;

Taddesse Kassahun Basics of SAS 50 / 71

Page 62: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Course

Data Course; input Stud_Id $ CCode $; cards;001 Psy101001 Phil105001 Math212002 EnLa222002 Psy101002 Stat173;Run;Proc sort data=Course;by Stud_Id; Run;

Taddesse Kassahun Basics of SAS 51 / 71

Page 63: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

To merge the two

Data DemCo;merge Demo Course;by StudId;Run;Proc print data=DemCo;Run;

Taddesse Kassahun Basics of SAS 52 / 71

Page 64: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Some functions

ABS(variable) takes the absolute value of a numericvariable.

LOG(variable) takes the natural logarithm of a numericvariable.

ROUND(variable, unit) rounds the numeric variableaccording to the unit.

LOWCASE(variable) converts mixed case text of thecharacter variable to all lowercase.

UPCASE(variable) converts mixed case text of thecharacter variable to all uppercase.

Taddesse Kassahun Basics of SAS 53 / 71

Page 65: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Summarizing Data

PROC UNIVARIATE gives an extensive summary

PROC MEANS gives a brief summary

UNIVARIATE

PROC UNIVARIATE DATA=data-set-name;VAR variables;ID variable;RUN;

MEANS

PROC MEANS DATA=data-set-name;VAR variables; RUN;

Taddesse Kassahun Basics of SAS 54 / 71

Page 66: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Summarizing Data

PROC UNIVARIATE gives an extensive summary

PROC MEANS gives a brief summary

UNIVARIATE

PROC UNIVARIATE DATA=data-set-name;VAR variables;ID variable;RUN;

MEANS

PROC MEANS DATA=data-set-name;VAR variables; RUN;

Taddesse Kassahun Basics of SAS 54 / 71

Page 67: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Line Printer Plots for Continuous Variables

PLOTS

PROC UNIVARIATE DATA=data-set-name PLOT;VAR variables;

Example

Proc univariate data=orange plot;Var age circumference; run;

Histograms

PROC UNIVARIATE DATA=data-set-name noprint;VAR variables;HISTOGRAM variables;RUN;

Taddesse Kassahun Basics of SAS 55 / 71

Page 68: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Line Printer Plots for Continuous Variables

PLOTS

PROC UNIVARIATE DATA=data-set-name PLOT;VAR variables;

Example

Proc univariate data=orange plot;Var age circumference; run;

Histograms

PROC UNIVARIATE DATA=data-set-name noprint;VAR variables;HISTOGRAM variables;RUN;

Taddesse Kassahun Basics of SAS 55 / 71

Page 69: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Line Printer Plots for Continuous Variables

PLOTS

PROC UNIVARIATE DATA=data-set-name PLOT;VAR variables;

Example

Proc univariate data=orange plot;Var age circumference; run;

Histograms

PROC UNIVARIATE DATA=data-set-name noprint;VAR variables;HISTOGRAM variables;RUN;

Taddesse Kassahun Basics of SAS 55 / 71

Page 70: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Frequency table and Bar chart

PROC UNIVARIATE

PROC UNIVARIATE DATA=data-set-name FREQ;VAR variables; Run;

PROC FREQ

PROC FREQ DATA=data-set-name;TABLES variables; Run;

Bar Charts

Proc Gchart Data=data-set;Vbar variables; Hbar variables;RUN;

Taddesse Kassahun Basics of SAS 56 / 71

Page 71: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Frequency table and Bar chart

PROC UNIVARIATE

PROC UNIVARIATE DATA=data-set-name FREQ;VAR variables; Run;

PROC FREQ

PROC FREQ DATA=data-set-name;TABLES variables; Run;

Bar Charts

Proc Gchart Data=data-set;Vbar variables; Hbar variables;RUN;

Taddesse Kassahun Basics of SAS 56 / 71

Page 72: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Frequency table and Bar chart

PROC UNIVARIATE

PROC UNIVARIATE DATA=data-set-name FREQ;VAR variables; Run;

PROC FREQ

PROC FREQ DATA=data-set-name;TABLES variables; Run;

Bar Charts

Proc Gchart Data=data-set;Vbar variables; Hbar variables;RUN;

Taddesse Kassahun Basics of SAS 56 / 71

Page 73: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

The Chi-squared test of association

The null hypothesis : two categorical variables are notassociated.

The alternative hypothesis: two categorical variables areassociated.

Sporting FacilityA B C

SatisfiedYes 17 14 13No 3 6 7

Is there evidence of different satisfaction levels in the threefacilities?

Taddesse Kassahun Basics of SAS 57 / 71

Page 74: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Data

Data client;input Sat $ Facility $ count @@;datalines;Yes A 17 Yes B 14 Yes C 13 No A 3 No B 6 No C 7;Run;

Chisq

Proc freq data=client;tables sat*facility / expected chisq norow nocol nopercent;weight count;Run;

Taddesse Kassahun Basics of SAS 58 / 71

Page 75: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Data

Data client;input Sat $ Facility $ count @@;datalines;Yes A 17 Yes B 14 Yes C 13 No A 3 No B 6 No C 7;Run;

Chisq

Proc freq data=client;tables sat*facility / expected chisq norow nocol nopercent;weight count;Run;

Taddesse Kassahun Basics of SAS 58 / 71

Page 76: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

(PROC Tabulate)

TABULATE

Proc tabulate;class list all class variables;table Row-variable, Column-variable;Run;

Adding total rows and columns

proc tabulate data=haireye;class Hair Eye;table Hair ALL , Eye ALL ;run;

Taddesse Kassahun Basics of SAS 59 / 71

Page 77: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Testing the Mean

We can use proc ttest to perform a t-test to determinewhether the

mean of a group has some specified value,mean of one group differs from the other.

One Sample T-Test

PROC TTEST DATA=data-set-name h0=mean;Var measurement-variable; run;

Two Sample T-Test

Paired SamplePROC TTEST DATA=data-set-name;PAIRED first-variable * second-variable; run;

Taddesse Kassahun Basics of SAS 60 / 71

Page 78: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Independent Sample T Test

PROC TTEST DATA=data-set-name;CLASS classification-variable;VAR measurement-variables; run;

Comparing More than Two Group Means

PROC ANOVA DATA=data-set-name;CLASS class-variable;MODEL Response-variable=class-variable;MEANS class-variable / HOVTEST WELCH;MEANS class-var/ BON TUKEY SCHEFFE LSD; RUN;

Taddesse Kassahun Basics of SAS 61 / 71

Page 79: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Residual Analysis to check assumptions in ANOVA

Residual Analysis

Proc ANOVA Data=datasetName;Class CategoricalVar;Model Response = Factor;LSMeans Factor;Means Factor/ hovtest;OUTPUT OUT=diagnost p=yhat r=resid;Run;

GPLOT

PROC GPLOT data=diagnost;PLOT resid*yhat/vref=0;Run;

Taddesse Kassahun Basics of SAS 62 / 71

Page 80: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Residual Analysis ...

UNIVARIATE

PROC UNIVARIATE noprint ;QQPLOT resid / normal;Run;

Shapiro-Wilk’s Test

Proc UNIVARIATE DATA=diagnost normal;Var resid;Run;

Taddesse Kassahun Basics of SAS 63 / 71

Page 81: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

ANOVA

RCBD

Proc glm data=DatasetName;class trt rep;model response = trt rep;means trt/lsd cldiff alpha=.05;contrast ’Control vs Others’ trt 4 -1 -1 -1 -1;Run;

Latin Square

PROC GLM data=latin;CLASS COLVar RowVar TRT;MODEL MILK = TRT COLVar RowVar;MEANS TRT/ TUKEY; RUN;

Taddesse Kassahun Basics of SAS 64 / 71

Page 82: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

ANOVA

Factorial

PROC GLM; CLASS Factor1 Factor2;MODEL Response= Factor1|Factor2;RUN;

Split Plot

Proc Glm;Class Block WPlot SPlot;Model Response = Block|WPlot|SPlot/ss3;TEST H = BLOCK WPlot E = BLOCK*WPlot;TEST H = SPlot E = BLOCK*SPlot;TEST H=WPlot*SPlot E = BLOCK*WPlot*SPlot;Run;

Taddesse Kassahun Basics of SAS 65 / 71

Page 83: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Linear Regression

Simple Linear

Proc reg data=DatasetName;Model Response=Factor/p clb;Plot Response*Factor/ nomodel nostat;plot r.*p. student.*nqq./ nomodel nostat;Run;

Multiple Linear

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/vif ;Run;

Taddesse Kassahun Basics of SAS 66 / 71

Page 84: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Model Diagnostics

Test for Normality of Residuals

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork;output out=diag (keep= r pr) residual=r predicted=pr;Run;Proc univariate data=diag normal;var r;qqplot r / normal(mu=est sigma=est);Run;

Taddesse Kassahun Basics of SAS 67 / 71

Page 85: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Tests on Nonconstant Error Variance

Graphical Method

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork;plot r.*p.;Run;

The White’s Test

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/spec;Run;

Taddesse Kassahun Basics of SAS 68 / 71

Page 86: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Independence of Errors

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/dw;plot r.*p.;Run;

Tests for Collinearity

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/vif;Run;

Taddesse Kassahun Basics of SAS 69 / 71

Page 87: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Simple Logistic

SAS CODE

PROC LOGISTIC DATA=datasetName descending;CLASS variables ;MODEL response=predictors/lackfit;OUTPUT OUT=SAS-data-set p = probability;RUN;

Model Selection

Proc logistic data= DatasetName;Model Response=predictors /selection=stepwise orforward or backward;Run;

Taddesse Kassahun Basics of SAS 70 / 71

Page 88: Basics of SAS

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Taddesse Kassahun Basics of SAS 71 / 71