basics of sas

Post on 16-Apr-2017

198 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Basics of SAS

Taddesse Kassahun

Email: tadestat@gmail.comDepartment of StatisticsAddis Ababa University

December 16, 2015

Taddesse Kassahun Basics of SAS 1 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Contents1 Introduction to the SAS Language

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

2 Data Management using SASVariable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

3 Data AnalysisThe Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Taddesse Kassahun Basics of SAS 2 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General overview of SAS

The Statistical Analysis System (SAS) is a computerprogram for performing statistical analysis of data.

Different releases of SAS software might be available fordifferent operating systems.

Select Help ⇒ bout SAS to display a window thatcontains release details.

SAS organizes data into a rectangular form or table calleda SAS dataset.

The SAS system achieves its versatility by providing userswith the ability to write their own program statements.

A comment in SAS *comment statement; or /* commentstatement */

Taddesse Kassahun Basics of SAS 3 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

General Overview . . .

It allows users call up SAS routines called procedures forperforming various statistical analysis on specifieddatasets.Syntax - There is just one basic syntax rule you mustalways follow: each SAS statement ends with asemicolon, “;”.

Taddesse Kassahun Basics of SAS 4 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS Window Environment

Taddesse Kassahun Basics of SAS 5 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS Window Environment . . .

Taddesse Kassahun Basics of SAS 6 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS Window Environment . . .

Taddesse Kassahun Basics of SAS 7 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS Window Environment . . .

Taddesse Kassahun Basics of SAS 8 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS Window Environment . . .

Taddesse Kassahun Basics of SAS 9 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

Getting Help

The easiest way to get help is by clicking Help in themenu bar.

Writing HELP in the command bar and hitting ENTER.

You can find help about SAS at support.sas.com.

Taddesse Kassahun Basics of SAS 10 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

A SAS program is a sequence of steps that the usersubmits for execution.Data steps are typically used to create SAS data sets.PROC steps are typically used to process SAS data sets.

Taddesse Kassahun Basics of SAS 11 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

Data Libraries

A SAS data library is a collection of SAS files that arerecognized as a unit by SAS.

A SAS dataset is one type of SAS file stored in a datalibrary.

Create a permanent SAS dataset via your own library.

Identify SAS data libraries by assigning each a libraryreference name (libref) with LIBNAME statement.

LIBNAME libref ”file-folder-location”;

Eg: LIBNAME readData’C:/Users/User/Desktop/SAST’;

Taddesse Kassahun Basics of SAS 12 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

Rules for Naming Libref

The name must be 8 characters or less.

The name must begin with a letter or underscore.

The remaining characters can be letters, numbers orunderscores.

Taddesse Kassahun Basics of SAS 13 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

General overview of SASSAS Help and DocumentationSAS ProgramsSAS Data Libraries

SAS work library

Work is a temporary library.

SAS datasets created in Work only exist during SASsession.

Once SAS session ends, datasets are erased.

Not necessary to assign a libref for Work.

Taddesse Kassahun Basics of SAS 14 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Data Management

Data management refers to creating, formatting andretrieving datasets.

SAS can subset, split, merge, concatenate, transpose, andaggregate data sets into formats appropriate for furtheranalyses.

SAS has also the capability to import data sets fromother software packages, and export datasets to othersoftware packages.

Datasets consist of observations and variables.

Observations = records or experimental units.

Variables are characteristics assuming different values.

SAS variables can be of two types: numeric or character.

Taddesse Kassahun Basics of SAS 15 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Variable names

SAS names can contain uppercase letters, lowercaseletters, or a mix of the two.SAS names must start with a letter or an underscore( )The remaining characters in a SAS name can be letters,numbers, or underscores.SAS names can be up to 32 characters long.SAS names cannot contain embedded blanks, e.g., DOITis a valid name, but DO IT is not.SAS reserves some names for internal use.

For datasets, don’t use the name NULL , DATA ,LASTFor variables, do not use the name N , ERROR ,NUMERIC , CHARACTER , or ALL

Taddesse Kassahun Basics of SAS 16 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Words in SAS

Words in SAS may be written in uppercase, lowercase, ora mix of the two.

Letters of words in the raw data should exactly matchwith letters of words in SAS (case sensitive).

Missing Values

For numeric variables, enter a period (.) for missingvalues.

For character variables, continue to leave the data blank(” ”). When we place several observations on a singleline, enter a period for a missing character value.

Taddesse Kassahun Basics of SAS 17 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

SAS Statements

SAS programs are made up of two kinds of statements:

Statements that lead to the creation of SAS datasets(data step).Statements that lead to the analysis of SAS datasets(proc step).

The output from SAS program consists of 2 parts:

The SAS Log for running commentary on the results ofexecuting each step of the entire program.The SAS Output: output produced from statisticalanalysis.

Taddesse Kassahun Basics of SAS 18 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

DATA Step

To create one or more SAS data sets.

The data step contains statements which read in rawdata files or existing ASCII data files.

It also helps perform the following tasks.

Transforming, creating, and selecting variables.Labeling variables, and so on.

The data step begins with the word DATA followed bythe name of a dataset, i.e., DATA data-set-name;

If a data-set-name is not provided, SAS chooses theautomatic names DATA1, DATA2, and so on.

Taddesse Kassahun Basics of SAS 19 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Codes in Data Step

Data data−set−name;

Input X1 X2 Y1 Y5 /* If the variables are numeric*/Input C1$ C2$ Z$ /* If variables are characters*/Input X1 – X4 /* Enters four numeric variables */Input X1@@ /* Enters several values of X1 on a singleline*/Input X d• /* Reads d digits of X before decimal */

Cards or datalines : These statements tell SAS softwarethat the data lines follow.

Infile: to read data already existing in ASCII file.

Taddesse Kassahun Basics of SAS 20 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Proc Step

The PROC step, a short form for procedure, is used toperform different procedures such as:

Proc Contents, Proc Print, Proc Reg, Proc Means, ProcPlot, Proc Anova, etc.

The Proc step must come after the Data step.Example: To display the contents of a dataset ”tad”:

Taddesse Kassahun Basics of SAS 21 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M

X2: 19 17 25 Age: 21 20 26

Q1 Q2

Taddesse Kassahun Basics of SAS 22 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M

X2: 19 17 25 Age: 21 20 26

Q1 Q2

Taddesse Kassahun Basics of SAS 22 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M

X2: 19 17 25 Age: 21 20 26

Q1 Q2

Taddesse Kassahun Basics of SAS 22 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Enter the following datasets into SAS1. X1: 12 10 23 2. Sex: M F M

X2: 19 17 25 Age: 21 20 26

Q1 Q2

Taddesse Kassahun Basics of SAS 22 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

SAS datasets

In order to create a SAS dataset from a raw data file, youmust

Start a DATA step and name the SAS dataset beingcreated (DATA statement)Identify the location of the raw data file to read (INFILEstatement)Describe how to read the data fields from the raw datafile (INPUT statement)

Taddesse Kassahun Basics of SAS 23 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Data Libraries Example

Create a permanent SAS dataset where the name of thelibrary is T on your desktop with folder name SAST.

City HighT LowTAA 26 14BD 31 17

Mekelle 27 15Jimma 28 16

Taddesse Kassahun Basics of SAS 24 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

SAS Code

libname T ’C:\Users\User\Desktop\SAST’;

data T.Temp; input city $ H L ; cards;AA 26 14BD 31 17Mekelle 27 15Jimma 28 16;run;

Taddesse Kassahun Basics of SAS 25 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Reading data from a text file

Example: Suppose that a dataset named PG whichcomprises results from an experiment on plant growth isavailable at the folder SAST. Then read the dataset intoSAS.data PlantGrowth;infile’C:\Users\User\Desktop\SAST\PG.txt’ firstobs=2;

input weight group $ ;run;INFILE – where to find the data.FIRSTOBS – start reading data from the 2nd line if thedata has header.INPUT – variable names to associate each data value.

Taddesse Kassahun Basics of SAS 26 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Reading data from a csv file

dsd option

(delimiter-sensitive data):

Changes default delimiter from blank to commaIf two delimiters in a row, assumes missing valuebetween them.

Taddesse Kassahun Basics of SAS 27 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Reading data from a csv file

dsd option

(delimiter-sensitive data):

Changes default delimiter from blank to commaIf two delimiters in a row, assumes missing valuebetween them.

Taddesse Kassahun Basics of SAS 27 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Reading PC Database Files with IMPORT

Code

PROC IMPORT DATAFILE = ’filename’OUT = datasetREPLACEDBMS=csv;GETNAMES=yes; RUN;

OUT= name of output SAS dataset.

DATAFILE= where to find the data (same as INFILE)

DBMS= type of incoming raw data (can be CSV, TAB,EXCEL).

REPLACE= to overwrite on the existing dataset.

GETNAMES=yes uses the first record to generate names.

Taddesse Kassahun Basics of SAS 28 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Code

PROC IMPORTDATAFILE=’C:\Users\User\Desktop\SAST\PGEx.xls’

OUT=Plant;GETNAMES=yes;RUN;

Taddesse Kassahun Basics of SAS 29 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

IMPORT Procedure using Menu

Step 1

Taddesse Kassahun Basics of SAS 30 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

IMPORT Procedure using Menu

Step 2

Taddesse Kassahun Basics of SAS 31 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

The Output Delivery System (ODS)

ODS allows output to be presented in multiple formatsPDF (ods pdf)Excel (ods html)HTML (ods html)Word (ods rtf)

Output can be opened in non−SAS applicationsMore control over the appearance of the output.

Procedure

ODS destination FILE=’file−pathname.ext’;< . . . procedures. . . > ODS destination CLOSE;

destination : Desired destination (PDF, HTML, etc.)File will not be created until ODS close statement

Taddesse Kassahun Basics of SAS 32 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

ODS Example

Output to a pdf file

ods pdf file= ’C:\Users\User\Desktop\SAST\PG1.pdf’

style=education;proc print data=pg;run;ods pdf close;

To see the list of available style templates

proc template;list styles;run;

Taddesse Kassahun Basics of SAS 33 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

ODS Example

Output to an excel file

ods html file= ’C:\Users\User\Desktop\SAST\PG1.xls’

style=Sasweb;proc print data=pg;run;ods html close;

Output to a word file

ods rtf file= ’C:\Users\User\Desktop\SAST\PG1.doc’

style=Science;proc print data=pg;run;ods rtf close;

Taddesse Kassahun Basics of SAS 34 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Labeling

Labels can be applied to variables using the LABELstatement.

Example

data PG;infile’C:\Users\User\Desktop\SAST\PG.txt’ firstobs=2;

input weight group $;label weight=”Dried Weight of Trees”group=”A Control and Treatments”;run;proc print data=PG label;run;

Taddesse Kassahun Basics of SAS 35 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Conditional Processing

This involves IF, THEN, ELSE.

IF

IF < condition > THEN < X >; ELSE < Y >;

If Score ≥ 50 Then Grade = ’Pass’;

Else Grade = ’Fail’;

Student Score GradeHenok 75 PssGoitom 58 Pass

Leah 40 FailZekiya 70 Pass

Taddesse Kassahun Basics of SAS 36 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Input

Data score;input Stud $ Sc;if Sc >= 50 then Grade = ”Pass” ;else Grade = ”Fail”;cards;Henok 75Goitom 58Leah 40Zekiya 68;run;proc print data=score; run;

Taddesse Kassahun Basics of SAS 37 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

If..Else If

IF < condition > THEN < X >;ELSE IF < condition2 > THEN < Y >;

ELSE < Z >;

Example

Data score1; Input Stud $ Sc;if Sc >=80 then Grade = ”Very Good” ;Else if 60 <= Sc <80 then Grade=”Good”else Grade = ”Not Bad”; cards;Henok 75Goitom 58Leah 40Zekiya 68

; run;Taddesse Kassahun Basics of SAS 38 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Arithmetic operators

Example

Data Op; input x @@; y1 = x + 2; y2= x/2;y3=x**2; cards;2 3 4 5 6 7 8 9 10; run;Proc print data=Op; run;

Taddesse Kassahun Basics of SAS 39 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Comparison operators

Taddesse Kassahun Basics of SAS 40 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Subsetting datasets

We can use IF or WHERE statements to subset data.Both IF and WHERE statements can be used within DATAstep if SET statement is used to read in SAS data.IF statement must be used within DATA step if INPUTstatement is used to read raw data.

WHERE statement must be used within PROC step.

Example

Using the dataset AD available at the folder SASTdata New; set SP;if AGe lt 70;run; * ORdata New1; set SP;where AGe lt 70; run;

Taddesse Kassahun Basics of SAS 41 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

WHERE

Use a WHERE statement in a PROC step to includeselected observations only.

Example

Example Data Sub1; input Gender $ Age @@; cards;M 22 M 33 F 26 F 29 F 35 M 40 F 28 M 32;Run;Proc print data=Sub1;where Age ge 30;run;

Taddesse Kassahun Basics of SAS 42 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Selecting Variables

By default, SAS will keep all variables of the inputdataset.Use DROP to exclude certain variables from the outputdataset.Use KEEP to include only certain variables from theoutput dataset.

Taddesse Kassahun Basics of SAS 43 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Merging

It refers to combining data from two or more datasets.

We can merge entire datasets or subsets of datasets.

We can produce one−to − one or one − to −many , butnot many − to −many joins.

Input datasets must have a common identifying variable(primary key).

Input datasets must be first sorted by this key variable.

Key variable must have same name and attributes.

All other variables must have a unique name or they willbe overwritten by last merged dataset.

Taddesse Kassahun Basics of SAS 44 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Taddesse Kassahun Basics of SAS 45 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Merge the datasets in Test 1 and Test 2

Test 1

Data Test1; input Stud_Id $T1; cards;001 15003 17004 15002 16;Run;

Test 2

Data Test2; input Stud_Id $T2; cards;001 16003 15002 18004 14;Run;

Taddesse Kassahun Basics of SAS 46 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

First sort the datasets using the PROC Sort.

Sorting in Ascending

Proc sort data=Test1; by Stud_Id; run;Proc sort data=Test2; by Stud_Id; run;

To sort the datasets in a descending order:

Sorting in Descending

Proc sort data=Test1 out=Test1S; by DESCENDINGStud_Id; run; Proc sort data=Test2 out=Test2S; byDESCENDING Stud_Id; run;

Taddesse Kassahun Basics of SAS 47 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

First sort the datasets using the PROC Sort.

Sorting in Ascending

Proc sort data=Test1; by Stud_Id; run;Proc sort data=Test2; by Stud_Id; run;

To sort the datasets in a descending order:

Sorting in Descending

Proc sort data=Test1 out=Test1S; by DESCENDINGStud_Id; run; Proc sort data=Test2 out=Test2S; byDESCENDING Stud_Id; run;

Taddesse Kassahun Basics of SAS 47 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Merged dataset

To merge the two datasets:

Data Final;merge Test1 Test2;by Stud_Id; run;proc print data=Final;run;

Taddesse Kassahun Basics of SAS 48 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Merged dataset

To merge the two datasets:

Data Final;merge Test1 Test2;by Stud_Id; run;proc print data=Final;run;

Taddesse Kassahun Basics of SAS 48 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

One − to −many

Taddesse Kassahun Basics of SAS 49 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Example

Demo

Data Demo; input Stud_Id $ Gen $ ; cards;001 F002 M003 M004 F;Run;Proc sort data=demo;by Stud_Id;run;

Taddesse Kassahun Basics of SAS 50 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Course

Data Course; input Stud_Id $ CCode $; cards;001 Psy101001 Phil105001 Math212002 EnLa222002 Psy101002 Stat173;Run;Proc sort data=Course;by Stud_Id; Run;

Taddesse Kassahun Basics of SAS 51 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

To merge the two

Data DemCo;merge Demo Course;by StudId;Run;Proc print data=DemCo;Run;

Taddesse Kassahun Basics of SAS 52 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Some functions

ABS(variable) takes the absolute value of a numericvariable.

LOG(variable) takes the natural logarithm of a numericvariable.

ROUND(variable, unit) rounds the numeric variableaccording to the unit.

LOWCASE(variable) converts mixed case text of thecharacter variable to all lowercase.

UPCASE(variable) converts mixed case text of thecharacter variable to all uppercase.

Taddesse Kassahun Basics of SAS 53 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Summarizing Data

PROC UNIVARIATE gives an extensive summary

PROC MEANS gives a brief summary

UNIVARIATE

PROC UNIVARIATE DATA=data-set-name;VAR variables;ID variable;RUN;

MEANS

PROC MEANS DATA=data-set-name;VAR variables; RUN;

Taddesse Kassahun Basics of SAS 54 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Summarizing Data

PROC UNIVARIATE gives an extensive summary

PROC MEANS gives a brief summary

UNIVARIATE

PROC UNIVARIATE DATA=data-set-name;VAR variables;ID variable;RUN;

MEANS

PROC MEANS DATA=data-set-name;VAR variables; RUN;

Taddesse Kassahun Basics of SAS 54 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Line Printer Plots for Continuous Variables

PLOTS

PROC UNIVARIATE DATA=data-set-name PLOT;VAR variables;

Example

Proc univariate data=orange plot;Var age circumference; run;

Histograms

PROC UNIVARIATE DATA=data-set-name noprint;VAR variables;HISTOGRAM variables;RUN;

Taddesse Kassahun Basics of SAS 55 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Line Printer Plots for Continuous Variables

PLOTS

PROC UNIVARIATE DATA=data-set-name PLOT;VAR variables;

Example

Proc univariate data=orange plot;Var age circumference; run;

Histograms

PROC UNIVARIATE DATA=data-set-name noprint;VAR variables;HISTOGRAM variables;RUN;

Taddesse Kassahun Basics of SAS 55 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Line Printer Plots for Continuous Variables

PLOTS

PROC UNIVARIATE DATA=data-set-name PLOT;VAR variables;

Example

Proc univariate data=orange plot;Var age circumference; run;

Histograms

PROC UNIVARIATE DATA=data-set-name noprint;VAR variables;HISTOGRAM variables;RUN;

Taddesse Kassahun Basics of SAS 55 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Frequency table and Bar chart

PROC UNIVARIATE

PROC UNIVARIATE DATA=data-set-name FREQ;VAR variables; Run;

PROC FREQ

PROC FREQ DATA=data-set-name;TABLES variables; Run;

Bar Charts

Proc Gchart Data=data-set;Vbar variables; Hbar variables;RUN;

Taddesse Kassahun Basics of SAS 56 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Frequency table and Bar chart

PROC UNIVARIATE

PROC UNIVARIATE DATA=data-set-name FREQ;VAR variables; Run;

PROC FREQ

PROC FREQ DATA=data-set-name;TABLES variables; Run;

Bar Charts

Proc Gchart Data=data-set;Vbar variables; Hbar variables;RUN;

Taddesse Kassahun Basics of SAS 56 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

Variable namesData and PROCReading External DataSubsetting and Combining SAS data setsCommonly Used SAS Functions

Frequency table and Bar chart

PROC UNIVARIATE

PROC UNIVARIATE DATA=data-set-name FREQ;VAR variables; Run;

PROC FREQ

PROC FREQ DATA=data-set-name;TABLES variables; Run;

Bar Charts

Proc Gchart Data=data-set;Vbar variables; Hbar variables;RUN;

Taddesse Kassahun Basics of SAS 56 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

The Chi-squared test of association

The null hypothesis : two categorical variables are notassociated.

The alternative hypothesis: two categorical variables areassociated.

Sporting FacilityA B C

SatisfiedYes 17 14 13No 3 6 7

Is there evidence of different satisfaction levels in the threefacilities?

Taddesse Kassahun Basics of SAS 57 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Data

Data client;input Sat $ Facility $ count @@;datalines;Yes A 17 Yes B 14 Yes C 13 No A 3 No B 6 No C 7;Run;

Chisq

Proc freq data=client;tables sat*facility / expected chisq norow nocol nopercent;weight count;Run;

Taddesse Kassahun Basics of SAS 58 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Data

Data client;input Sat $ Facility $ count @@;datalines;Yes A 17 Yes B 14 Yes C 13 No A 3 No B 6 No C 7;Run;

Chisq

Proc freq data=client;tables sat*facility / expected chisq norow nocol nopercent;weight count;Run;

Taddesse Kassahun Basics of SAS 58 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

(PROC Tabulate)

TABULATE

Proc tabulate;class list all class variables;table Row-variable, Column-variable;Run;

Adding total rows and columns

proc tabulate data=haireye;class Hair Eye;table Hair ALL , Eye ALL ;run;

Taddesse Kassahun Basics of SAS 59 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Testing the Mean

We can use proc ttest to perform a t-test to determinewhether the

mean of a group has some specified value,mean of one group differs from the other.

One Sample T-Test

PROC TTEST DATA=data-set-name h0=mean;Var measurement-variable; run;

Two Sample T-Test

Paired SamplePROC TTEST DATA=data-set-name;PAIRED first-variable * second-variable; run;

Taddesse Kassahun Basics of SAS 60 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Independent Sample T Test

PROC TTEST DATA=data-set-name;CLASS classification-variable;VAR measurement-variables; run;

Comparing More than Two Group Means

PROC ANOVA DATA=data-set-name;CLASS class-variable;MODEL Response-variable=class-variable;MEANS class-variable / HOVTEST WELCH;MEANS class-var/ BON TUKEY SCHEFFE LSD; RUN;

Taddesse Kassahun Basics of SAS 61 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Residual Analysis to check assumptions in ANOVA

Residual Analysis

Proc ANOVA Data=datasetName;Class CategoricalVar;Model Response = Factor;LSMeans Factor;Means Factor/ hovtest;OUTPUT OUT=diagnost p=yhat r=resid;Run;

GPLOT

PROC GPLOT data=diagnost;PLOT resid*yhat/vref=0;Run;

Taddesse Kassahun Basics of SAS 62 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Residual Analysis ...

UNIVARIATE

PROC UNIVARIATE noprint ;QQPLOT resid / normal;Run;

Shapiro-Wilk’s Test

Proc UNIVARIATE DATA=diagnost normal;Var resid;Run;

Taddesse Kassahun Basics of SAS 63 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

ANOVA

RCBD

Proc glm data=DatasetName;class trt rep;model response = trt rep;means trt/lsd cldiff alpha=.05;contrast ’Control vs Others’ trt 4 -1 -1 -1 -1;Run;

Latin Square

PROC GLM data=latin;CLASS COLVar RowVar TRT;MODEL MILK = TRT COLVar RowVar;MEANS TRT/ TUKEY; RUN;

Taddesse Kassahun Basics of SAS 64 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

ANOVA

Factorial

PROC GLM; CLASS Factor1 Factor2;MODEL Response= Factor1|Factor2;RUN;

Split Plot

Proc Glm;Class Block WPlot SPlot;Model Response = Block|WPlot|SPlot/ss3;TEST H = BLOCK WPlot E = BLOCK*WPlot;TEST H = SPlot E = BLOCK*SPlot;TEST H=WPlot*SPlot E = BLOCK*WPlot*SPlot;Run;

Taddesse Kassahun Basics of SAS 65 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Linear Regression

Simple Linear

Proc reg data=DatasetName;Model Response=Factor/p clb;Plot Response*Factor/ nomodel nostat;plot r.*p. student.*nqq./ nomodel nostat;Run;

Multiple Linear

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/vif ;Run;

Taddesse Kassahun Basics of SAS 66 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Model Diagnostics

Test for Normality of Residuals

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork;output out=diag (keep= r pr) residual=r predicted=pr;Run;Proc univariate data=diag normal;var r;qqplot r / normal(mu=est sigma=est);Run;

Taddesse Kassahun Basics of SAS 67 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Tests on Nonconstant Error Variance

Graphical Method

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork;plot r.*p.;Run;

The White’s Test

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/spec;Run;

Taddesse Kassahun Basics of SAS 68 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Independence of Errors

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/dw;plot r.*p.;Run;

Tests for Collinearity

Proc reg data=DatasetName;model Response=Factor1 Factor2 Factork/vif;Run;

Taddesse Kassahun Basics of SAS 69 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Simple Logistic

SAS CODE

PROC LOGISTIC DATA=datasetName descending;CLASS variables ;MODEL response=predictors/lackfit;OUTPUT OUT=SAS-data-set p = probability;RUN;

Model Selection

Proc logistic data= DatasetName;Model Response=predictors /selection=stepwise orforward or backward;Run;

Taddesse Kassahun Basics of SAS 70 / 71

Introduction to the SAS LanguageData Management using SAS

Data Analysis

The Chi-squared test of associationCreating Tabular ReportsTesting the MeanRegression

Taddesse Kassahun Basics of SAS 71 / 71

top related