how to navigate the guide

250
1 How to Navigate the Guide To navigate this SAS Guide, use the PageDown and PageUp buttons on the keyboard. A copy of this PowerPoint document can be downloaded from http://www.biostat.ku.dk/~lts/varians_reg ression/sasguide.ppt

Upload: joelle-miller

Post on 02-Jan-2016

31 views

Category:

Documents


4 download

DESCRIPTION

How to Navigate the Guide. To navigate this SAS Guide, use the PageDown and PageUp buttons on the keyboard. A copy of this PowerPoint document can be downloaded from http://www.biostat.ku.dk/~lts/varians_regression/sasguide.ppt. Preface. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: How to Navigate the Guide

1

How to Navigate the Guide

To navigate this SAS Guide, use the PageDown and PageUp buttons on the keyboard.

A copy of this PowerPoint document can be downloaded from

http://www.biostat.ku.dk/~lts/varians_regression/sasguide.ppt

Page 2: How to Navigate the Guide

2

Preface

This is The Beginners’ Guide To SAS. The document was originally written by Anna Johansson, MEP, Stockholm.

It has been lightly edited by Peter Dalgaard and Lene Theil Skovgaard for the Ph.D. course on SAS at the Faculty of Health Sciences, University of Copenhagen, May 2002, and later by LTS for the Ph.D. Course in Analysis of Variance and Regression.

Page 3: How to Navigate the Guide

3

Introduction

What is SAS?

SAS is a software package for managing large amounts of data and performing statistical analyses.

It was created in the early 1960s by the Statistical Department at North Carolina State University. Today SAS is developed and marketed by SAS Institute Inc. with head office in Cary, North Carolina, U.S.A.

Page 4: How to Navigate the Guide

4

Introduction (cont.)

SAS in Denmark

The Danish subdivision of SAS Institute provides consulting and a wide range of courses. It is located in Copenhagen.

SAS Institute A/S

Købmagergade 7-9

1150 Kbh. K

Tel: 70 28 28 70

Fax: 70 28 29 91

Email: [email protected]

Page 5: How to Navigate the Guide

5

Introduction (cont.) The SAS SystemThe SAS System is mainly used for

- Data Management (about 80% of all users)

- Statistical Analysis (about 20% of all users)

The power of SAS lies in its ability to manage large data sets. It is fast and has many 5statistical and non-statistical features.

The disadvantage of SAS is its steep learning curve. It takes quite a bit of an effort to get started. User-friendly interfaces do exist, though.

Page 6: How to Navigate the Guide

6

Introduction (cont.)

Start af SAS på kursussalen:

- Flyt på musen (eller tænd maskinen)

- Login er kursusxx

- Password skifter

- Vælg START, efterfulgt af STATISTIK og SAS 8.2

Page 7: How to Navigate the Guide

7

Introduction (cont.) Getting StartedA very good start is to enter the SAS Online Training.

Choose in the menu Help + Getting Started with the SAS Software, then click on the “book”.

Page 8: How to Navigate the Guide

8

Introduction (cont.) SAS FilesIf your data is not yet in a SAS data set, you access the raw

data by creating a SAS data set from it.

Once you have made the SAS data set, you use SAS programs to analyse, manage and/or present the data.

SAS data sets can be permanent or temporary. A special library called WORK is created on start-up and deleted on exit.

Page 9: How to Navigate the Guide

9

Introduction (cont.) SAS ProgrammingSAS programming works in two steps:

Data Step

1. reads data from file

2. makes transformations and adds new variables

3. creates SAS Data Set

Proc Step

4. uses the SAS Data Set

5. produces the information we want, such as tables, statistics, graphs, web pages

Page 10: How to Navigate the Guide

10

Introduction (cont.) Data and Proc StepsExample of a SAS program:

data work.main;

set work.original;

age=1997-birthyr; Data Step bmi=weight/(height*height);

run;

proc print data=work.main;

var id age bmi;

run;

Proc Stepsproc means data=work.main;

var age bmi;

run;

Page 11: How to Navigate the Guide

11

Introduction (cont.) SAS ModulesThe SAS system is made up of several modules, each used

for different purposes.

This Guide deals only with the SAS BASE

and the GRAPH modules, giving knowledge on basic data management and simple statistical analyses.

Other modules are SAS/Stat (statistical analyses), SAS/Access (data base applications), SAS/Graph, SAS/Assist (menu-driven info system), SAS/FSP (data entry and retrieval), SAS/Connect (remote submit), etc.

Page 12: How to Navigate the Guide

12

Introduction (cont.) SAS at Biostat Dept.We primarily use SAS on a Unix server

whereas these notes assume that the programs are run locally on a PC

The basic programming is the same regardless of what platform you use. This is one of the big advantages of SAS.

We do tend to prefer running SAS non-interactively though.

Page 13: How to Navigate the Guide

13

The SAS Environment

WindowsThe main feature of SAS is its division of the main window

into two halves. The left part is a navigator of SAS libraries and Results (from the Output window).

The right part is divided into three separate windows:

- Program window or Enhanced Editor

- Log window

- Output window

Page 14: How to Navigate the Guide

14

The SAS Environment (cont.)

WindowsThe log and output windows are always opened by default

when you start SAS (although they may be hidden behind each other).

The program window and the Enhanced Editor are two different windows but they are used for the same purpose, i.e. writing code and executing it. One of them will open by default.

Other windows are also available and are opened on request

(use View), for instance the Graphics window.

Page 15: How to Navigate the Guide

15

The SAS Environment (cont.)

Windows(The program window is a reminiscent of the older SAS

version 6. The Enhanced Editor is a new feature of version 8, and is more user-friendly, since it colours the code and works more like an ordinary text editor.)

Page 16: How to Navigate the Guide

16

The SAS Environment (cont.)

WindowsTo check which windows are opened, choose Window in the

menu. At the bottom there is a list of opened windows.

The active window is indicated by a . A star * after the window name indicates that the file has not been saved since its latest alteration.

If you are missing any of the windows (Enhanced Editor, Log, Output), you can open it by choosing in the menu

View + window-name

Page 17: How to Navigate the Guide

17

The SAS Environment (cont.)

WindowsYou switch between the windows by choosing

Window + ENHANCED EDITOR

Window + OUTPUT

Window + LOG

in the menu.

Page 18: How to Navigate the Guide

18

The SAS Environment (cont.)

WindowsThe window location on the screen can be changed by

choosing

Window + TileWindow + Cascade

or by pulling the lower right corner of the window with the mouse.

When you exit SAS, the window setting will be kept for the next session (unless someone else ...).

Page 19: How to Navigate the Guide

19

The SAS Environment (cont.)

Enhanced Editor / Program WindowIn the Enhanced Editor you write the SAS programs.

The programs tell SAS to produce the data sets, tables, statistics, etc.

A program consists of data steps and proc steps.

A SAS program is executed (submitted) by choosing Run + Submit in the menu (or by clicking on the “Running Man” icon, fourth from the right in the menu).

Page 20: How to Navigate the Guide

20

The SAS Environment (cont.)

Output and Log WindowsThe result of a program execution is printed to the Output

window. There you will find the prints, tables and reports, etc.

A log file is printed to the Log window.

The log file contains information about the execution, whether it was successful or not. It usually points out your mistakes with warning and error messages so that you can correct them.

Page 21: How to Navigate the Guide

21

The SAS Environment (cont.) Example: SAS Log65 proc gplot data=work.influnce;66 plot di*pred / vaxis=axis1 haxis=axis1;ERROR: Variable DI not found.NOTE: The previous statement has been deleted.67 run;

Make a habit of checking the Log window after every execution.

Even if SAS has accepted and executed the program, you may have made a methodological error. Check the note on how many observations were read, and if there were any missing values.

Page 22: How to Navigate the Guide

22

The SAS Environment (cont.) Example: SAS Output

patientens alder

Cumulative ALDER Frequency Frequency __________________________________

0 - 24 41 41 25 - 44 176 217 45 - 64 77 294 65- 25 319

Page 23: How to Navigate the Guide

23

The SAS Environment (cont.)

File TypesThese files are created by SAS:

- .sas file (SAS program)

- .log file (Log)

- .lst file (Output)

The SAS data sets are saved as .sd7 or .sas7bdat files.

(Other file types, e.g. catalogs, are also used and created by SAS, but we will not pursue this any further.)

Page 24: How to Navigate the Guide

24

The SAS Environment (cont.)

Using the SAS SystemYou work with SAS using

- Menus and Toolbar

- Command Line

- Key Functions F1-F12

Page 25: How to Navigate the Guide

25

The SAS Environment (cont.)

ExampleThree different ways to Open a File in the Enhanced Editor:

1. Menus: choose File + Open

2. Toolbar: press the icon for “Open”

3. Command line: write include ‘N:\temp\bp.sas’

and press Enter.

Page 26: How to Navigate the Guide

26

The SAS Environment (cont.) Commands and Keys Command Key

Open File include ’P:\catalogue\filename.sas’ Ctrl-O

Save File file ’P:\catalogue\filename.sas’ Ctrl-S

Clear Window clear F12

Zoom On/Off Window zoom F11

Recall Submitted Programs recall F4

Go To Enhanced Editor wpgm F5

Go To Log Window log F6

Go To Output Window output F7

Submit a Program submit F8

Change Key Definitions keys F9

Page 27: How to Navigate the Guide

27

The SAS Environment (cont.)

Write and ReadIn the Enhanced Editor you can

- create new, or edit existing, programs

- submit programs

- save programs (an unsaved file is marked with * after the file name)

You can NOT edit the log file or the output file in their windows. They are only readable. If you wish to edit these files, save them and use the Enhanced Editor or Word.

Page 28: How to Navigate the Guide

28

SAS syntax

StatementsThe SAS code (syntax) consists of statements (sætninger).

Statements mostly begin with a keyword (nøgleord), and they ALWAYS end with a SEMICOLON.

data work.cohort;

set course.males98;

run;

proc print data=work.cohort;

run;

Examples of keywords: data, set, run, proc.

Page 29: How to Navigate the Guide

29

SAS syntax (cont.) StatementsSAS statements can begin and end anywhere on a line.

data work.cohort;

One or several blanks can be used between words.

data work.cohort;

One or several semicolons can be used between statements.

data work.cohort;;;

;

Page 30: How to Navigate the Guide

30

SAS syntax (cont.) StatementsThe statement can begin and end on different lines.

data

work.cohort;

SAS will not object to several statements on the same line. However, it is not considered good programming to have more than one statement per line. It makes the code difficult to read. Avoid this!

data work.cohort; set course.males98; run;

Page 31: How to Navigate the Guide

31

SAS syntax (cont.)

Indenting to improve readabilityImprove the readability of your program by adding more

space to the code (= indenting).

Begin data steps and proc steps in the first position, as far left as possible. The ending run statement should also be in the first position.

All statements in between should start a few blanks in from the left margin.

This creates blocks of data steps and proc steps, and you can easily see where one ends and another begins.

Page 32: How to Navigate the Guide

32

SAS syntax (cont.)

Example of Indenting

data work.height;

infile 'h:\mep\rawdata_height.txt';

input name $ 1-20

kon 21

alder 22-23

height 24-30;

if kon=0 and (height ne .) then

do;

if 0<height<81.75 then lnapprx=50;

else

if 81.75<=height then lnapprx=100;

end;

else lnapprx=.;

run;

Page 33: How to Navigate the Guide

33

SAS syntax (cont.)

IndentingWithin statements it is also VERY useful to use indenting.

Put similar syntactic words in the same position below each other.

Use blank lines a lot!

Markers of blocks should be placed in the same position below one another (e.g. data-run, proc-run, if-else, do-end).

Page 34: How to Navigate the Guide

34

SAS Data Sets

What is a SAS Data Set?A SAS data set is a special file type (.sas7bdat) which

consists of a descriptive part and a data part.

The DESCRIPTIVE part includes

- general information, such as data set name, date of creation, number of observations and variables etc.

- variable information, such as variable name, type (character or numeric), format, length, label etc.

Page 35: How to Navigate the Guide

35

SAS Data Sets (cont.)

The DataThe DATA part is the data values.

Data is organised with observations in the rows and variables in the columns.

Page 36: How to Navigate the Guide

36

SAS Data Sets (cont.)

Descriptive PartProc CONTENTS prints the descriptive part of a data set. The CONTENTS Procedure

Data Set Name: PPT_EX8.MAIN Observations: 64 Member Type: DATA Variables: 9 Engine: V8 Indexes: 0 Created: 17:17 Tuesday, August 7, 2001 Observation Length: 72 Last Modified: 17:17 Tuesday, August 7, 2001 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos Format Informat ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 2 BIRTHYR Num 8 0 BEST8. F8. 5 CASE_1 Num 8 24 1 ID Char 8 56 $8. $8. 4 LENGTH Num 8 16 BEST8. F8. 3 WEIGHT Num 8 8 BEST8. F8. 6 age Num 8 32 4. 8 bmi Num 8 48 9 generatn Char 5 64 7 height Num 8 40

Page 37: How to Navigate the Guide

37

SAS Data Sets (cont.)

Data PartProc PRINT prints the data part of a data set.

OBS ID BIRTHYR WEIGHT HEIGHT AGE BMI

1 001 1954 62 1.65 43 22.7732 2 002 1956 68 1.67 41 24.3824 3 003 1956 65 1.72 41 21.9713 4 004 1962 56 1.68 35 19.8413 5 005 1954 58 1.59 43 22.9421 6 006 1953 52 1.62 44 19.8141 7 007 1955 69 1.75 42 22.5306 8 008 1955 75 1.73 42 25.0593 9 009 1960 82 1.7 37 28.3737 10 010 1962 68 1.72 35 22.9854 11 011 1961 65 1.68 36 23.0300 12 012 1954 62 1.69 43 21.7079 13 013 1956 58 1.68 41 20.5499 14 014 1962 61 1.64 35 22.6800 15 015 1958 58 1.63 39 21.8300 16 016 1959 62 1.65 38 22.7732 17 017 1962 59 1.64 35 21.9363 18 018 1957 73 1.8 40 22.5309

Page 38: How to Navigate the Guide

38

SAS Data Sets (cont.)

Create a Data SetA SAS data set is created from

- SAS data set (.sas7bdat file)

- raw data file (.txt file)

- another external file through importing (EXCEL file, etc.)

or by

- manually entering the data

Page 39: How to Navigate the Guide

39

SAS Data Sets (cont.)

Create a Data SetTo use an existing data set, a .sas7bdat file, is the most

common way to create a SAS data set.

How to create a SAS data set from a raw data file is described in chapter Read Raw Data Into SAS.

Importing non-SAS data is not trivial. Use File + Import Data. Ask for help if you run into trouble.

- or use the program STAT-Transfer

Page 40: How to Navigate the Guide

40

SAS Data Sets (cont.)

Create a Data SetThe easiest way to manually enter data into SAS is via the

Viewtable facility (see later on in this chapter).

You can also use the CARDS or DATALINES statement (chapter Read Raw Data Into SAS).

Page 41: How to Navigate the Guide

41

SAS Data Sets (cont.)

Existing SAS Data Set (.sas7bdat)Create a SAS data set from an existing SAS data set:

data work.main;

set work.original;

statements;

run;

This will yield an exact copy of the old data set “original”. The name of the copy is “main”.

Usually we wish to change the new data set, by adding programming statements after the SET statement.

Page 42: How to Navigate the Guide

42

SAS Data Sets (cont.) Naming Data SetsPLEASE, use descriptive names for your data sets.

It is not considered clever to name your data sets: final1, final2, final3, etc.

Other names to avoid are: new, old, mydata, analys, your-name, etc.

More on this topic in the chapter Naming Data Sets and Variables.

Page 43: How to Navigate the Guide

43

SAS Data Sets (cont.) ViewtableThe Viewtable facility is a user-friendly tool to look at your

data set without using data steps or proc steps.

You enter the Viewtable window by issuing the “viewtable” command in the Command line.

This will yield a window very similar to EXCEL, with cells, rows and columns.

Page 44: How to Navigate the Guide

44

SAS Data Sets (cont.)

ViewtableIt is very easy to create a data set in the Viewtable window.

Just enter the data manually into the cells. The variable names are created by clicking on the column header and following the instructions.

When you click to save the data set, it is saved into a .sas7bdat file, which may then be used in any data step or proc steps in the Enhanced Editor.

Page 45: How to Navigate the Guide

45

SAS Data Sets (cont.)

ViewtableIf you wish to open an existing data set into the Viewtable

window, just issue the command

“viewtable name-of-data- set”

and it will open.

You can also open a data set from the Explorer window in the window area to the left. Just navigate to the right library and double click on the data set icon.

Page 46: How to Navigate the Guide

46

SAS Data Sets (cont.)

VariablesThere are two types of variables in SAS: character (char) and

numerical (num).

The type refers to the values the variable have.

Examples of a variable called MONTH:

A character variable: MONTH with values ‘Jan’, ‘Feb’, …, ‘Dec’.

A numerical variable: MONTH with values 1, 2, … , 12.

Page 47: How to Navigate the Guide

47

SAS Data Sets (cont.)

VariablesThe values of a character variable are between quotes ‘’.

When the value is printed, all characters within the quotes are printed.

Typical character values are letters, while numerical values always are digits.

Character variables may include digits as well.

Page 48: How to Navigate the Guide

48

SAS Data Sets (cont.)

VariablesMissing values of a character variable are represented by a

blank, while a period “.” (punktum) denotes missing values of a numeric variable.

Character values can be 32767 characters long at most.

(200 characters in version 6)

Good rule: Never use char variables to store numeric values. For example, always store Patient Number as a numeric.

Page 49: How to Navigate the Guide

49

SAS Data Sets (cont.) Naming VariablesVariable names (e.g. age, bmi) and data set names (e.g. main,

original)

- can be 32 characters (letters, underscores and digits) long at most

- can be uppercase or lowercase or mixed (mAiN = MaIn)

- must start with a letter (A-Z) or an underscore (_), not a digit

Page 50: How to Navigate the Guide

50

SAS Data Libraries

What is a SAS Data Library?A SAS data library is the catalogue where your data sets are

stored.

A data library is like a drawer in a filing cabinet. The cabinet may have several drawers representing several different libraries.

A data set is a file within a drawer. A drawer may contain several files.

Page 51: How to Navigate the Guide

51

SAS Data Libraries (cont.) Figure: SAS Data Libraries

Page 52: How to Navigate the Guide

52

SAS Data Libraries (cont.) WORK and SASUSERTwo data libraries are created automatically by SAS:

WORK and SASUSER.

The WORK library is a temporary library. All its contents are deleted when you exit SAS. If you wish to keep your data sets, do not put them in the WORK library.

The SASUSER library is a permanent library. All its contents are kept when you exit SAS.

Page 53: How to Navigate the Guide

53

SAS Data Libraries (cont.)

WORK and SASUSERThe physical location of the permanent SASUSER library is

under ‘C:\’.

It is not especially clever to save all the SAS programs and data sets in the same folder.

Generally we wish to store them in separate folders for separate projects or papers.

Therefore, it is possible to create your own permanent libraries.

Page 54: How to Navigate the Guide

54

SAS Data Libraries (cont.)

Libraries and FoldersA library is a physical folder anywhere on your hard disk or

server disk, or even floppy disk.

The physical folder for the SASUSER library on Anna’s computer is ‘C:\Winnt\Profiles\annaj\…\V8’.

Similarly, if you wish to store data sets in a particular folder, you can create a library by using a libname statement.

Page 55: How to Navigate the Guide

55

SAS Data Libraries (cont.)

LibnamesSAS data sets have names of the form

work.original

WORK is the library where the data set is stored.

ORIGINAL is the data file (.sas7bdat) in that library.

The two components of the name are separated by a period (punktum).

Page 56: How to Navigate the Guide

56

SAS Data Libraries (cont.)

LibnamesMore formalised, SAS data sets have names of the form

libref.filename

The libref (library reference) is the name of the library.

The filename is the name of data file (.sas7bdat).

Page 57: How to Navigate the Guide

57

SAS Data Libraries (cont.)

The WORK LibraryFor the WORK library you can omit the libref WORK, and

simply write “original” as the data set name.

If no other libname is used, all data sets and formats (see chapter Formats) created during a SAS session are saved to the WORK library.

Be aware that since WORK is a temporary library, all its contents are deleted when you exit SAS.

Page 58: How to Navigate the Guide

58

SAS Data Libraries (cont.)

Libname StatementA libref is defined by a libname statement, which links the

libref to the physical location of a folder.

Libname statements are of the form

libname libref ‘location-of-folder ’;

Libname statements are written outside data steps and proc steps, generally at the top of a program. Once it has been submitted the libref will remain defined until you exit SAS.

Page 59: How to Navigate the Guide

59

SAS Data Libraries (cont.)

Libname Statement

Example of SAS program

libname sas engine ‘.../phd/artikel1/sasdata’;

data sammenlign;

set sas.glostrup;

proc ...

Page 60: How to Navigate the Guide

60

SAS Data Libraries (cont.)

LibrefsThe LIBREF is just a link.

It makes the data set easier to access since you do not need to specify the complete location (P:\catalogue\sub-catalogue(s)...\filename.sas) in the data and proc steps.

When a LIBREF is deleted, i.e. the SAS session ends, the folder it refers to still exists.

Page 61: How to Navigate the Guide

61

Submitting a SAS Program

Submit a ProgramTo submit (= execute) a SAS program:

- Menus: choose Run + Submit, or

- Toolbar: press the icon with “the running man”, or

- Command line: issue “submit” command.

To halt a submitted program press CTRL + BREAK. You may have to press it a few times.

Page 62: How to Navigate the Guide

62

Submitting a SAS Program (cont.)

Submit a ProgramWhen a program is submitted each step (data or proc) is

executed one at a time.

The contents of a data step is performed on each observation one at a time (i.e. creation of new variables etc.).

Each step generates log to the Log window. Output (if any)

is generated to the Output window.

Page 63: How to Navigate the Guide

63

Submitting a SAS Program (cont.)

Check the Log

ALWAYS browse the Log window after a submission!

If the submission is stopped by errors the data set is unchanged and you might do incorrect analyses on an old data set. You will only see it has been stopped by looking at the log.

(We emphasise this, since we are too well experienced with the consequences of the opposite.)

Page 64: How to Navigate the Guide

64

Submitting a SAS Program (cont.)

Error Messages in the Log

Note (blue): information which do not indicate errors. They are usually informative to rule out any methodological errors in you programming.

Warning (green): points out errors which SAS could correct itself. The execution was performed with these changes. Still you should check whether it was done properly. Example: misspelled keywords.

Page 65: How to Navigate the Guide

65

Submitting a SAS Program (cont.)

Error Messages in the Log

Error (red): serious errors which SAS could not handle. The execution was stopped. These errors must be corrected by you. Example: forgotten semicolons, invalid options, misspelled variable names.

Especially if you are updating data sets, be aware that red errors mean NO updating!

Page 66: How to Navigate the Guide

66

Submitting a SAS Program (cont.)

Unbalanced QuotesA special type of syntactic error is unbalanced quotes (‘’).

Quotes must come in pairs. If they do not, the execution will keep on running forever. You halt it by submitting

’;

Page 67: How to Navigate the Guide

67

Submitting a SAS Program (cont.)

Enhanced EditorWhen you press submit, all the code in the Enhanced Editor

is executed.

If you only wish to submit a limited number of rows of the program code, mark it and press submit.

Page 68: How to Navigate the Guide

68

The Data Step

Data StatementData sets are created through a data step. The data step begins

with a DATA statement.

General form of the DATA statement:

data SAS-data-set;

The SAS-data-set is a name of the form

libref.filename

Page 69: How to Navigate the Guide

69

The Data Step (cont.)

Data StatementTo create a data set ORIGINAL in the temporary library

WORK:

data work.original;

When using the temporary WORK library it is possible to skip the work prefix and just write:

data original;

Page 70: How to Navigate the Guide

70

The Data Step (cont.)

Create VariablesA new variable is created by:

variable=expression ;

The expression may consist of numbers, other variables and operators such as

+ addition

- subtraction

* multiplication

/ division

** exponentiation (potensopløftning)

Page 71: How to Navigate the Guide

71

The Data Step (cont.)

Examples

data work.main;

set work.original;

age=1997-birthyr;

height=height/100;

bmi=weight/(height*height);

run;

Page 72: How to Navigate the Guide

72

The Data Step (cont.)

FunctionsUseful are the predefined functions in SAS, such as

exp(argument) exponential function

log(argument) natural logarithm

int(argument) the integer part of a numeric argument

There are also non-mathematical SAS specific functions. A list of useful functions may be found in the SAS manual SAS Language (pp 521-616).

Page 73: How to Navigate the Guide

73

The Data Step (cont.)

Examples

data work.main;

set work.original;

highest=max(height1, height2, height3);

birthyr=year(brthdate);

total=sum(x1,x2,x3,x4,x5);

run;

Page 74: How to Navigate the Guide

74

The Data Step (cont.)

Variable NamesIf you are creating a series of variables, such as repeated

measurements, put the order number at the end of the name, e.g. x1, x2, x3, since

total=sum(x1,x2,x3,x4,x5) total=sum(of x1-x5)

The use of the notation x1-x5 is widely accepted in many expressions and procedures. It will not work if the order number is in the middle of the name.

Also see the chapter Naming Data Sets and Variables.

Page 75: How to Navigate the Guide

75

The Proc Steps

ProceduresA procedure (proc) is a predefined function that operate on

data sets. By specifying the predefined statements in the procedure you can adapt it to your needs and wishes.

Examples of procedures:

proc contents prints the descriptive part of a data set

proc print prints the data part of a data set

proc freq creates frequency tables, etc.

proc means calculates means and other statistics

Page 76: How to Navigate the Guide

76

The Proc Steps (cont.)

Data Step Vs. Proc StepUsually, for beginners as well as among advanced users, the

data step is more comprehensible as a concept. Not seldom is extensive programming done in the data step, when the same result easily could have been obtained through a simple option in a procedure.

As a rule, operations on observations (within rows) are done in the data step, e.g. adding two variables together to make a third.

Operations on variables (within columns) are done in a proc step, e.g. taking the mean of a variable.

Page 77: How to Navigate the Guide

77

The Proc Steps (cont.) Proc CONTENTSThe CONTENTS procedure prints out the descriptive part of

a data set.

The descriptive part includes:

- General information: data set name, number of observations, number of variables, etc.

- Variable information: variable name, type, length, position, format, label, etc.

Page 78: How to Navigate the Guide

78

The Proc Steps (cont.) Proc CONTENTSThe general form of the CONTENTS procedure:

proc contents data=SAS-data-set;

run;

Example:

libname course ‘h:\SasAtMEP\Course’;

proc contents data=course.main;

run;

Page 79: How to Navigate the Guide

79

The Proc Steps (cont.) Proc PRINTThe PRINT procedure prints out the data part of a data set.

It is possible to choose which variables to print. If none are chosen, all variables will be printed.

The first column is the OBS column, which indicates observation. If there are more variables than will fit into the output window, the output is split and the exceeding variables printed on the following page. The OBS column is reprinted to indicate observation.

Page 80: How to Navigate the Guide

80

The Proc Steps (cont.) Proc PRINTThe general form of the PRINT procedure:

proc print data=SAS-data-set;

run;

OR to print a specified list of variables from the data set

proc print data=SAS-data-set;

var variable1 variable2 variable3 ... ;

run;

Page 81: How to Navigate the Guide

81

The Proc Steps (cont.) Proc PRINTExamples:

proc print data=course.main;

run;

proc print data=course.main;

var age bmi;

run;

Page 82: How to Navigate the Guide

82

The Proc Steps (cont.) Proc FREQThe FREQ procedure is mainly used to create frequency

tables, although it has a wide range of statistical features as well.

It creates both one-way and multiple-way tables.

(FREQ is pronounced “frek” in Danish and “freek” in English.)

Page 83: How to Navigate the Guide

83

The Proc Steps (cont.) Proc FREQThe general form of FREQ procedure:

proc freq data=SAS-data-set;

tables var1;

run;

OR for a two-way table

proc freq data=SAS-data-set;

tables var1 * var2 / nopercent norow nocol;

run;

Page 84: How to Navigate the Guide

84

The Proc Steps (cont.) Proc FREQExample one-way table:

proc freq data=course.main;

tables age;

run; The SAS System

Cumulative Cumulative

AGE Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

35 4 22.2 4 22.2

36 1 5.6 5 27.8

37 1 5.6 6 33.3

… 43 3 16.7 17 94.4

44 1 5.6 18 100.0

Page 85: How to Navigate the Guide

85

The Proc Steps (cont.) Proc FREQExample two-way table:

proc freq data=course.main;

tables case*age/nopercent nocol norow;

run;

The nopercent option suppresses the printing of cell percentages.

Nocol and norow suppress column and row cell percentages respectively.

Page 86: How to Navigate the Guide

86

The Proc Steps (cont.) Proc FREQ

The SAS System

TABLE OF AGE BY CASE_1

AGE CASE_1

Frequency‚ 0‚ 1‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 35 ‚ 3 ‚ 1 ‚ 4 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 36 ‚ 1 ‚ 0 ‚ 1 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 37 ‚ 0 ‚ 1 ‚ 1 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ ... ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 44 ‚ 0 ‚ 1 ‚ 1 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 9 9 18

Page 87: How to Navigate the Guide

87

The Proc Steps (cont.) Proc SORTThe SORT procedure sorts the data set according to a chosen

variable. The sorted data set replaces the unsorted data set, unless you define an OUT data set.

The SORT procedure can sort by several variables, in ascending (default) or descending (option) order.

Missing values are defined as minus infinity, i.e. less than all other numeric values.

Page 88: How to Navigate the Guide

88

The Proc Steps (cont.) Proc SORTThe general form of the SORT procedure:

proc sort data=SAS-data-set out=SAS-data-set;

by variables;

run;

OR if you want descending order

proc sort data=SAS-data-set out=SAS-data-set;

by descending variables;

run;

Page 89: How to Navigate the Guide

89

The Proc Steps (cont.) Proc SORTExample:

proc sort data=course.main out=course.sortage;

by case_1 age;

run;

This will yield a data set called course.sortage, where the observations are sorted by case_1 and within each category of case_1 by age.

Page 90: How to Navigate the Guide

90

The Proc Steps (cont.) Proc SORTThere is no result in the Output window from proc SORT, but a

proc PRINT of data set course.sortage gives:

OBS ID BIRTHYR WEIGHT HEIGHT AGE BMI CASE_1

1 010 1962 68 1.72 35 22.9854 0 2 014 1962 61 1.64 35 22.6800 0 3 017 1962 59 1.64 35 21.9363 0 4 011 1961 65 1.68 36 23.0300 0... 9 012 1954 62 1.69 43 21.7079 0 10 004 1962 56 1.68 35 19.8413 1 11 009 1960 82 1.7 37 28.3737 1 12 002 1956 68 1.67 41 24.3824 1 13 003 1956 65 1.72 41 21.9713 1 14 007 1955 69 1.75 42 22.5306 1... 17 005 1954 58 1.59 43 22.9421 1 18 006 1953 52 1.62 44 19.8141 1

Page 91: How to Navigate the Guide

91

The Proc Steps (cont.) Proc MEANSThe MEANS procedure calculates basic statistics. By default

the statistics are

N = number of non-missing observations

Mean = mean value, average

Std Dev = standard deviation

Minimum = minimum value

Maximum = maximum value

Optional statistics include Nmiss (number of missing observations), range (maximum-minimum), etc.

Page 92: How to Navigate the Guide

92

The Proc Steps (cont.) Proc MEANSThe general form of MEANS procedure:

proc means data=SAS-data-set;

run;

This will yield summary statistics on all variables in the data set.

Missing values are excluded from the analysis.

Page 93: How to Navigate the Guide

93

The Proc Steps (cont.) Proc MEANSExample:

proc means data=course.main;

run;

The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ BIRTHYR 62 1959.35 3.3294354 1953.00 1967.00 WEIGHT 63 61.7460317 6.5129356 47.0000000 80.0000000 LENGTH 64 1.6675000 0.0617213 1.4800000 1.8000000 CASE_1 64 0.4687500 0.5029674 0 1.0000000 age 62 37.6451613 3.3294354 30.0000000 44.0000000 height 64 1.6675000 0.0617213 1.4800000 1.8000000 bmi 63 22.1982718 1.9262282 17.9591837 29.3847567 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƑƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Naturally, the character variable ID is not displayed.

Page 94: How to Navigate the Guide

94

The Proc Steps (cont.) Proc MEANSYou can modify the proc MEANS code to suit your wishes:

- To specify the number of decimals used in the printout add the option MAXDEC.

- If you are only interested in a selection of variables, use a VAR statement.

Page 95: How to Navigate the Guide

95

The Proc Steps (cont.) Proc MEANSExample:

proc means data=course.main maxdec=2;

var age bmi;

run;

Variable N Mean Std Dev Minimum Maximumƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

AGE 18 39.44 3.24 35.00 44.00BMI 18 22.65 1.95 19.81 28.37ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Page 96: How to Navigate the Guide

96

The Proc Steps (cont.) Proc MEANSIn proc MEANS it is possible to calculate statistics on

subgroups of the data set, e.g. the mean bmi and age for cases and controls separately.

There are two different ways to deal with subgroup statistics, depending on what output you are interested in:

- BY statement

- CLASS statement

Page 97: How to Navigate the Guide

97

The Proc Steps (cont.) Proc MEANSThe BY statement:

proc means data=course.main maxdec=2;

var age bmi;

by case_1;

run;

The BY statement requires that the data set has previously been sorted according to the BY variable.

Page 98: How to Navigate the Guide

98

The Proc Steps (cont.) Proc MEANSResult from BY statement:

CASE_1=0 The MEANS Procedure

Variable N Mean Std Dev Minimum Maximumƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒage 32 37.50 3.08 33.00 44.00bmi 34 22.09 1.95 17.96 29.38ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

CASE_1=1

Variable N Mean Std Dev Minimum Maximumƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒage 30 37.80 3.62 30.00 44.00bmi 29 22.33 1.92 19.61 27.34ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Page 99: How to Navigate the Guide

99

The Proc Steps (cont.) Proc MEANSThe CLASS statement:

proc means data=course.main maxdec=2;

var age bmi;

class case_1;

run;

The CLASS statement does NOT require any sorting.

Page 100: How to Navigate the Guide

100

The Proc Steps (cont.) Proc MEANSResult from CLASS statement:

The MEANS Procedure

N CASE_1 Obs Variable N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 34 age 32 37.50 3.08 33.00 44.00 bmi 34 22.09 1.95 17.96 29.38 1 30 age 30 37.80 3.62 30.00 44.00 bmi 29 22.33 1.92 19.61 27.34 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Page 101: How to Navigate the Guide

101

The Online HELP

Quick Help on SyntaxSAS has a very good ONLINE HELP. In this help you can get

full information on syntax. For more theoretical issues you should use the paperback manuals or the Online Documentation (see later on how to use them).

The online help is accessed through the Command line and command “help”.

help print

help means

(Do not write “proc” before the process name.)

Page 102: How to Navigate the Guide

102

The Online HELP (cont.)

Using the Online HelpWhen a help command is issued the HELP Window will open

with topics:

Introduction

Syntax

Additional Topics (occasionally)

To get full information on how to write the code, choose SYNTAX.

Page 103: How to Navigate the Guide

103

The Online HELP (cont.)

ExampleAs an example, access online help for proc MEANS.

“help means” + choose SYNTAX

These are all the possible statements that proc MEANS accept. If you want to know more about a specific statement just click on it and read:

Page 104: How to Navigate the Guide

104

The Online HELP (cont.)

ExamplePROC MEANS: Syntax

PROC MEANS <option(s)> <statistic-keyword(s)>; BY <DESCENDING> variable-1 <... <DESCENDING> variable-n><NOTSORTED>;

CLASS variable(s) </ option(s)>; FREQ variable; ID variable(s); OUTPUT <OUT=SAS-data-set> <output-statistic-specification(s)> <id-group-specification(s)> <maximum-id-specification(s)> <minimum-id-specification(s)> </ option(s)> ; TYPES request(s); VAR variable(s) < / WEIGHT=weight-variable>; WAYS list; WEIGHT variable;

Page 105: How to Navigate the Guide

105

The Online HELP (cont.)

Explanation to the Online Help Text- underlined word = keyword referring to a statement

(statements within a procedure are optional, the PROC and the RUN statements are required)

- black word = required if the corresponding keyword is used

- words within “< >” = optional, not required

- words separated by “|” = possible choices of values for a specific option

Page 106: How to Navigate the Guide

106

The Online HELP (cont.)

ExampleIf you click on the “PROC MEANS”, a list of possible

options will be displayed.

Among them is the MAXDEC= option which we have already used. The equal sign is required. Next to MAXDEC= is the black word “number”. If you use the MAXDEC option you are required to fill in a number corresponding to the maximum number of decimals to be displayed.

(The exact conventions depend on which version of the help you use…. –pd)

Page 107: How to Navigate the Guide

107

Labels

What are Labels?Each variable has a variable name (e.g. birthyr) and a LABEL

(e.g. Year of Birth). The label is how the variable is written on the output. By default the label = variable name unless you specify it.

To define and assign a label, use the LABEL statement.

label variable1 = ‘label-name1’

variable2 = ‘label-name2’

...

;

Page 108: How to Navigate the Guide

108

Labels (cont.)

What are Labels?Labels can be 256 characters long at most.

The output from proc CONTENTS include a column with labels for all the variables in the data set.

To delete a label simply define the label equal to space:

label variable1 = ‘ ‘;

Page 109: How to Navigate the Guide

109

Labels (cont.)

Permanent or Temporary LabelsLabels can be assigned inside a data step or a proc step.

Labels assigned in a data step are permanent. They are also transferred to new data sets.

Labels assigned in a proc step are temporary. A temporary label replaces a permanent label throughout the execution of the procedure step.

Most common are permanent labels defined in the data step.

Page 110: How to Navigate the Guide

110

Labels (cont.)

ExampleAssigning permanent labels in a data step:

data course.main; set course.original;

age=1997-birthyr; height=height/100; bmi=weight/(height*height); label birthyr=‘Year of Birth’ age=‘Alder’ height=‘Højde’ bmi=‘BMI’;run;

Page 111: How to Navigate the Guide

111

Labels (cont.)

ExampleWith label:

Year ofOBS Birth

1 1954 2 1956 3 1956 4 1962 5 1954 6 1953 7 1955 ... 18 1957

Without label:

OBS BIRTHYR

1 1954 2 1956 3 1956 4 1962 5 1954 6 1953 7 1955 ... 18 1957

Page 112: How to Navigate the Guide

112

Formats

What are Formats?Formats are used on variable values to

- display the values differently from the raw values (e.g. with fewer decimals, or as dates)

- group the values (values 0-25=low, values 26-100=high)

There are predefined formats in SAS which you may use, but you can also create your own formats. The procedures are designed to handle formats and use them accordingly.

Page 113: How to Navigate the Guide

113

Formats (cont.)

Assign FormatsTo assign formats you use the FORMAT statement inside a

data step (permanently) or a proc step (temporarily).

The general form of the FORMAT statement is

format variable1 format1.;

The following yields a value with two digits, a decimal point and two decimals (5 positions, of which two are decimals):

format bmi 5.2;

Page 114: How to Navigate the Guide

114

Formats (cont.)

Example Permanent Assignment

data course.main;

set course.original;

age=1997-birthyr;

height=height/100;

bmi=weight/(height*height);

format age 4.0

bmi 4.2

birthyr best4.;

run;

Page 115: How to Navigate the Guide

115

Formats (cont.)

Example Temporary Assignment

proc print data=course.main;

var birthyr age bmi;

format age 4.0

bmi 4.2

birthyr best5.;

run;

Usually, the format statement is at the end of the data or proc step together with the label statement.

Page 116: How to Navigate the Guide

116

Formats (cont.)

Predefined SAS FormatsFormats are all of the form (where < > indicates optional and

is not to be typed in):

format-name<w>.<d>

w indicates maximum number of positions used to display the value

d indicates optional number of decimals in a numeric format

Page 117: How to Navigate the Guide

117

Formats (cont.)

Predefined SAS Formats

Formats for character variables need a $ sign in the first position:

$format-name<w>.<d>

All formats, numeric or character, MUST contain a period (. punktum), either at the end or before the d value.

See examples.

Page 118: How to Navigate the Guide

118

Formats (cont.)

Predefined SAS Formatsw.d = numeric values at most w positions long, and d of

these positions are decimals

$w. = character values w positions long

COMMAw.d = numeric values with commas and decimal points: 12,345.67

BESTw. = chooses the best notation with w positions for numeric values

The period (.) occupies one position in all of these formats.

Page 119: How to Navigate the Guide

119

Formats (cont.)

ExampleStored Value Format Displayed Value

12345.9876 10.4 12345.9876

12345.9876 10.2 12345.99

12345.9876 8.2 12345.99

12345.9876 COMMA11.4 12,345.9876

12345.9876 BEST8. 12345.99

12345.9876 BEST4. 12E3 (E3=103)

Page 120: How to Navigate the Guide

120

Formats (cont.)

User-defined Formats

There are situations when the predefined formats do not suffice.

An example, you wish to group the BMI values into three categories; underweight, normal weight, overweight.

There is no predefined format to meet your demands in this situation. The solution is to create your own format.

Page 121: How to Navigate the Guide

121

Formats (cont.)

User-defined Formats

To use your own formats you must

- define the format

- assign the format

Several variables may be assigned to the same format

and

A variable may be assigned to different formats in different procedures

Page 122: How to Navigate the Guide

122

Formats (cont.)

Proc FORMAT defines formatsFormats are defined through the FORMAT procedure.

proc format;

value format-name range1 = ‘label’

range2 = ‘label’

... ;

run;

The labels must be inside quotes (‘’).

Page 123: How to Navigate the Guide

123

Formats (cont.)

User-defined Formats

Format names are like any other SAS names, however they must not end in a number.

A format for a character variable must have a dollar sign $ as its first character.

Format names do NOT end with a period (.) in proc FORMAT. The period is only used when assigning the format in a data or proc step.

Page 124: How to Navigate the Guide

124

Formats (cont.)

Example: User-defined Formats

A case/control format (case_1f) and a BMI format (bmif).

proc format;

value case_1f 0=‘Case’

1=‘Control’

other=‘Other’;

value bmif low-20.0=‘Underweight’

20.0-25.0=‘Normal weight’

25.0-high=‘Overweight’

other=‘Other’;

run;

Page 125: How to Navigate the Guide

125

Formats (cont.)

Example: User-defined Formats

Above, a value of 20.0000 would fall into Underweight, but 20.0001 would fall into Normal weight.

The first true range alternative is used for a value of a variable assigned by the format.

Page 126: How to Navigate the Guide

126

Formats (cont.)

Special Format Valuesother = all other values, including missing values

low = the lowest value (minimum) of the variable assigned to the format, including missing values. (For character formats low does not include missing values.)

high = the highest value (maximum) of the variable assigned to the format

Page 127: How to Navigate the Guide

127

Formats (cont.)

Assigning User-defined Formats

User-defined formats are assigned by a FORMAT statement, exactly as with the predefined formats.

proc freq data=course.main;

tables bmi;

format bmi bmif.;

run; Cumulative Cumulative BMI Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Underweight 2 11.1 2 11.1 Normal weight 14 77.8 16 88.9 Overweight 2 11.1 18 100.0

Page 128: How to Navigate the Guide

128

Formats (cont.)

Assigning User-defined Formatsproc means data=course.main maxdec=1;

class bmi;

var age;

format bmi bmif.;

run; The MEANS Procedure

Analysis Variable : age

N bmi Obs N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Underweight 7 7 37.9 3.3 35.0 44.0

Normal weight 51 49 37.6 3.4 30.0 44.0

Overweight 5 5 38.0 3.3 35.0 42.0 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Page 129: How to Navigate the Guide

129

Formats (cont.)

Assigning User-defined Formats

As shown above, user-defined formats are assigned in the exact same way as the SAS formats:

format variable1 format1.;

Page 130: How to Navigate the Guide

130

Titles and Footnotes

TitlesYou can add titles to the output with a TITLE statement. A

TITLE statement is one of the global statements which do not have to be included in a data step or a proc step (other global statements are the LIBNAME and OPTIONS statements) .

The form of the TITLE statement is

title ‘here-you-write-the-title’;

The title must be surrounded by quotes (’).

Page 131: How to Navigate the Guide

131

Titles and Footnotes (cont.)

Exampletitle ‘BMI Body Mass Index’;

proc freq data=course.main;

tables bmi;

format bmi bmif.;

run; BMI Body Mass Index

Cumulative Cumulative BMI Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Underweight 2 11.1 2 11.1 Normal weight 14 77.8 16 88.9 Overweight 2 11.1 18 100.0

Page 132: How to Navigate the Guide

132

Titles and Footnotes (cont.)

Delete TitlesA title will stay defined, and be printed to all output, until it is

changed, or deleted.

To delete a title simply write

title;

Page 133: How to Navigate the Guide

133

Titles and Footnotes (cont.)

Several TitlesIt is also possible to have second titles below the main title.

A maximum of 10 titles can be used simultaneously.

title1 ‘here-you-write-the-first-title’;

title2 ‘here-you-write-the-second-title’;

...

title10 ‘here-you-write-the-tenth-title’;

Page 134: How to Navigate the Guide

134

Titles and Footnotes (cont.)

Several TitlesThe unnumbered title statement, is equal to the title1

statement.

It is possible to have, for example, title2 undefined or deleted while title3 is defined. It will result in a gap between title1 and title3 on the printout representing title2.

However, when you delete say title3, all titles beneath it (title4-title10) will also be deleted.

title3 ;

Page 135: How to Navigate the Guide

135

Titles and Footnotes (cont.)

Example

title ‘BMI Body Mass Index’;

title2 ‘Women 35-45 yrs’;

proc freq data=course.main;

tables bmi;

format bmi bmif.;

run;

Page 136: How to Navigate the Guide

136

Titles and Footnotes (cont.)

Example

BMI Body Mass Index Women 35-45 yrs

Cumulative Cumulative BMI Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Underweight 2 11.1 2 11.1 Normal weight 14 77.8 16 88.9 Overweight 2 11.1 18 100.0

Page 137: How to Navigate the Guide

137

Titles and Footnotes (cont.)

Titles WindowA shortcut to defining titles is the Titles window.

Issue the “title” command in the Command line.

The Titles window will open, with all your current title definitions. From here the titles can be changed directly by editing.

The disadvantage of this shortcut is that you can NOT save the title definitions, as you could have if you had written them in code. When a program is rerun later after many title changes, the titles will not be as originally.

Page 138: How to Navigate the Guide

138

Titles and Footnotes (cont.)

FootnotesFootnotes work in the exact same way as titles. The only

difference is that footnotes are written at the bottom of the printout.

footnote ‘here-you-write-the-footnote’;

To delete a footnote write

footnote;

Page 139: How to Navigate the Guide

139

Titles and Footnotes (cont.)

Example

footnote ‘BMI Body Mass Index’;

footnote2 ‘Women 35-45 yrs’;

proc freq data=course.main;

tables bmi;

format bmi bmif.;

run;

Page 140: How to Navigate the Guide

140

Titles and Footnotes (cont.)

Example

Cumulative Cumulative BMI Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Underweight 2 11.1 2 11.1 Normal weight 14 77.8 16 88.9 Overweight 2 11.1 18 100.0

BMI Body Mass Index Women 35-45 yrs

Page 141: How to Navigate the Guide

141

Titles and Footnotes (cont.)

Footnotes WindowTo open the Footnotes window and edit footnotes directly,

issue the command “footnote” in the Command line.

________________________________________________

There are lots of additional features to titles and footnotes available, such as fonts, sizes and orientation, etc. See the manual SAS/GRAPH Software Vol. I.

Page 142: How to Navigate the Guide

142

Subsetting a Data Set

Subsets of DataOften one wants to use only a subset of a data set, e.g.

persons older than 60 years, women, cases etc.

This is particularly useful when performing data cleaning, and you only want to print the observations with extreme values of a variable, say blood pressure > 200.

Page 143: How to Navigate the Guide

143

Subsetting a Data Set (cont.)

WHERE optionIn procedures you use the WHERE data set option to subset

the data set.

proc print data=SAS-data-set(where=(expression));

run;

The WHERE data set option may be used in any procedure. It can also be used in data steps, although it is less usual.

data course.cases;

set course.main(where=(case_1=1));

run;

Page 144: How to Navigate the Guide

144

Subsetting a Data Set (cont.)

WHERE optionThe expression must be a logical one, resulting in true or

false.

Only observations for which the expression is true will be used in the proc step.

Examples of expressions are:

where=(birthyr gt 1950)

where=(1947<birthyr<1950)

where=(birthyr=1947)

Page 145: How to Navigate the Guide

145

Subsetting a Data Set (cont.)

Conditional OperatorsPossible conditional operators (use sign or abbreviation):

= eq equal to

^= ne not equal to

> gt greater than

< lt less than

>= ge greater than or equal to

<= le less than or equal to

Page 146: How to Navigate the Guide

146

Subsetting a Data Set (cont.)

Logical OperatorsYou can also combine several expressions by using logical

operators:

AND if all expressions are true then the

compound expression will be true,

else it is false

OR if any of the expressions are true, then the

compound expression will be true,

else it is false

Page 147: How to Navigate the Guide

147

Subsetting a Data Set (cont.)

Examplesproc freq data=course.main(where=(birthyr<1950 and case_1=1));

... ;

run;

proc means data=course.main(where=((birthyr<1950 or

birthyr>1953)

and

(case_1 ne .)));

... ;

run;

We recommend that you use parentheses as much as you can. It will enhance the reading as well as reduce the errors. There is no limit to how many pairs you are allowed to use.

Page 148: How to Navigate the Guide

148

Subsetting a Data Set (cont.)

Special OperatorsSpecial operators to use with the WHERE option:

IS MISSING true if value is missing

BETWEEN - AND true if value falls within the range

Examples:

where=(birthyr is missing) where=(birthyr=.)where=(birthyr between 1950 and 1952) where=(1950<=birthyr<=1952)

Page 149: How to Navigate the Guide

149

Subsetting a Data Set (cont.)

Special OperatorsSpecial character operators to use with the WHERE option:

LIKE true if value is identical to the argument

where=(lastname like ‘Johnsson’);

Page 150: How to Navigate the Guide

150

Subsetting a Data Set (cont.)

Special OperatorsThe power of the LIKE operator is that you can use different

types of matching arguments:

‘Smi%’ matches all character values that begin

with “Smi”

‘%th’ matches all character values that end

with “th”

Page 151: How to Navigate the Guide

151

Subsetting a Data Set (cont.)

Special Operators‘Smi_’ matches all character values that are

four characters long and begin with

“Smi”

‘Smi_ _’ matches all character values that are

five characters long and begin with

“Smi”

Page 152: How to Navigate the Guide

152

Subsetting a Data Set (cont.)

Special OperatorsAnother special operator is CONTAIN.

CONTAIN yields true if any string of the value matches the argument.

where=(lastname contains ‘nss’);

All last names with the letters “nss” in them will be selected.

Page 153: How to Navigate the Guide

153

Subsetting a Data Set (cont.)

Create a SubsetSometimes you want to make a new data set from a subset.

These are two ways to accomplish this:

data course.cases(where=(case_1=1));

set course.main;

run;

which will delete the redundant observations at the end of the data step.

Page 154: How to Navigate the Guide

154

Subsetting a Data Set (cont.)

Create a Subset

data course.cases;

set course.main(where=(case_1=1));

run;

which will delete the redundant observations at the beginning of the data set.

However, a good advice is to minimise the number of data sets as much as possible. The fewer data sets the better! There is nearly always a procedure to use on the main data set instead of creating a new data set.

Page 155: How to Navigate the Guide

155

Subsetting a Data Set (cont.)

WHERE statementThere is a third way too, the WHERE statement:

data course.cases; set course.main; where (case_1=1);run;

Note that no equal sign is used in the WHERE statement!Apart from the equal sign, the WHERE statement works as

the WHERE option.

May also be written if case_1=1;

Page 156: How to Navigate the Guide

156

Subsetting a Data Set (cont.)

OBS and FIRSTOBSIn both data and proc steps you can use the OBS data option to

subset a specific range of observations.

To subset the ten first observations:

proc print data=course.main(obs=10);

To subset observations 5 to 10:

proc print data=course.main(firstobs=5 obs=10);

Note that no WHERE is used!

Page 157: How to Navigate the Guide

157

Subsetting a Data Set (cont.)

Subsetting IFThere is one occasion when a WHERE statement will not

work, namely if you wish to condition on a variable which you create in that same data step.

Instead you can use a subsetting IF statement:

if (your-expression);

Page 158: How to Navigate the Guide

158

Subsetting a Data Set (cont.)

Example

data course.less20yr;

set course.main;

age75=1975-birthyr;

if (age75 < 20);

run;

This will yield a data set where all the observations were younger than 20 yrs in 1975.

Page 159: How to Navigate the Guide

159

Subsetting a Data Set (cont.)

Delete ObservationsIt is possible to delete both observations and variables from a

data set.

Observations are deleted through the DELETE statement:

if expression then delete;

(The IF-THEN statement is explained in detail later on.)

Page 160: How to Navigate the Guide

160

Subsetting a Data Set (cont.)

Example

data course.after65;

set course.main;

if birthyr<1965 then delete;

run;

This will yield a data set where all observations with birthyr<1965 have been deleted.

Page 161: How to Navigate the Guide

161

Subsetting a Data Set (cont.)

DELETE vs. WHERE vs. IFThe same result could have been obtained through a WHERE

statement, or by a subsetting IF

if birthyr>=1965;

However, if variable BIRTHYR is created in the same data step, then you need to use IF or a WHERE data set option on the target data set:

data course.after65(where=(birthyr>65)); ...;run;

Page 162: How to Navigate the Guide

162

Subsetting a Data Set (cont.)

Delete VariablesTo delete variables use either of the KEEP and DROP

statements.

keep variable1 variable2 ... ;

drop variable1 variable2 ... ;

The KEEP statement will yield a data set where all variables NOT listed in the KEEP statement list are deleted.

The DROP statement will yield a data set where all variables listed in it are dropped.

Page 163: How to Navigate the Guide

163

Subsetting a Data Set (cont.)

Delete VariablesYou use KEEP and DROP in the same way as WHERE,

either as a data set option, or as a statement in a data step.

Thus KEEP and DROP data set options can be used in procedures as well.

Be aware that you can NOT use both KEEP and DROP in the same data step.

Page 164: How to Navigate the Guide

164

Subsetting a Data Set (cont.)

ExamplesKEEP statement:

data course.half;

set course.main;

keep birthyr age case_1;

run;

Page 165: How to Navigate the Guide

165

Subsetting a Data Set (cont.)

ExamplesKEEP data set option:

data course.half(keep=birthyr age case_1);

set course.main;

run;

or

data course.half;

set course.main(keep=birthyr age case_1);

run;

Page 166: How to Navigate the Guide

166

IF-THEN-ELSE

IF-THEN-ELSEQuite often one wishes to create a variable of the form

1 if age<30X = 0 else

The value of X depends on the value of variable age.

Page 167: How to Navigate the Guide

167

IF-THEN-ELSE (cont.)

IF-THEN-ELSE Statement

To define the X variable in SAS is simple. The code is as follows:

data course.groups;

set course.main;

if (age<30) then X=1;

else X=0;

run;

Note, that since a missing value is interpreted as negative infinity, it will fall under X=1. More about this problem shortly.

Page 168: How to Navigate the Guide

168

IF-THEN-ELSE (cont.)

IF-THEN-ELSE Statement

The general form of the IF-THEN-ELSE statement is

if expression1 then expression2 ;

else expression3 ;

Expression1 must be a logical expression yielding either true or false.

If expression1 is true than expression2 is calculated.

If expression1 is false then expression3 is calculated.

Page 169: How to Navigate the Guide

169

IF-THEN-ELSE (cont.)

ExampleIt is very useful to use several IF-THEN-ELSE statements in

a series.

if (age<30) then generatn=1;

else

if (30<=age<60) then generatn=2;

else

if (60<=age) then generatn=3;

else generatn=.;

Page 170: How to Navigate the Guide

170

IF-THEN-ELSE (cont.)

ExampleSay that the age=67, then (age<30) will yield false, and we

will go right to the ELSE line.

There the expression is a new IF-THEN-ELSE statement where we continue our execution.

(30<=age<60) is false too, and so we continue at the next ELSE line.

There the expression (60<=age) is true, and consequently we calculate the THEN-expression generatn=3.

Page 171: How to Navigate the Guide

171

IF-THEN-ELSE (cont.)

Good ProgrammingThis definition of the variable generatn was perfect in the

sense that it included all possible values of age in the IF-expressions.

A good IF-THEN-ELSE statement should!

If all the IF-expressions yield FALSE (that is, if you have not included all possible values of the conditioning variable), then the new variable will automatically be set to missing.

Page 172: How to Navigate the Guide

172

IF-THEN-ELSE (cont.)

Good ProgrammingThere should always be an IF-expression taking care of

missing values of the conditioning variable (age in our example).

Usually the missing value option is put at the end of a long series of IF-THEN-ELSE statements, as the final ELSE-statement.

However, this is not always appropriate. In our example age<30 will be true for a missing value of age (missing = negative infinity).

Page 173: How to Navigate the Guide

173

IF-THEN-ELSE (cont.)

Good ProgrammingThis problem is avoided if an extra IF-THEN-ELSE

statement is added at the beginning:

if (age eq .) then generatn=.;

else

if (age<30) then generatn=1;

else

if (30<=age<60) then generatn=2;

else generatn=3;

This will prevent any missing values from falling under the expression (age<30).

Page 174: How to Navigate the Guide

174

IF-THEN-ELSE (cont.)

ELSE StatementQuestion:

What happens if I don’t use the ELSE statement in the IF-THEN statement?

I.e.

if (age eq .) then generatn=.; if (0<age<30) then generatn=1;

if (30<=age<60) then generatn=2;

if (60<=age) then generatn=3;

Page 175: How to Navigate the Guide

175

IF-THEN-ELSE (cont.)

ELSE StatementAnswer:

Nothing. The result is the same as before.

However, the ELSE statement is a far better alternative when it comes to complicated IF-THEN statements (see the next picture).

Make a habit of using the ELSE statement.

Page 176: How to Navigate the Guide

176

IF-THEN-ELSE (cont.)

ELSE StatementThe difference between IF-THEN and IF-THEN-ELSE:

As soon as an IF statement is found to be true, the following IF statements in the series of IF-THEN-ELSE statements are not calculated.

If you only use IF-THEN statements, all IF statements will be checked, even after you have found the true one.

Have you by any chance not made them exclusive of each other, the latest assignment will be in effect.

Page 177: How to Navigate the Guide

177

IF-THEN-ELSE (cont.)

ELSE StatementExample of a faulty definition:

if (age eq .) then generatn=.;

if (age<30) then generatn=1;

if (age<60) then generatn=2;

if (60<=age) then generatn=3;

All persons younger than 60 years will be have generatn=2!

Page 178: How to Navigate the Guide

178

IF-THEN-ELSE (cont.)

DO-ENDIf you wish to assign several variables in a IF-THEN-ELSE

statement you can use a DO-END statement.

data course.main;

set course.original;

if (age eq .) then

do;

generatn=.;

credit=.;

end;

else

if (age<30) then

do;

generatn=1;

credit=1;

end;

...;

run;

Page 179: How to Navigate the Guide

179

IF-THEN-ELSE (cont.)

DO-ENDThe general form of the DO-END statement is

IF expression THEN

DO;

statements

END;

ELSE

DO;

statements

END;

Page 180: How to Navigate the Guide

180

Set and Merge

Joining Data SetsPreviously we have taken a subset of a data set. It is also

possible to join (concatenate and merge) data sets together.

Concatenate= adding observations (“lægge dataset efter hinanden”)

Merge = adding variables, match data sets (“lægge dataset ved siden af hinanden”)

Page 181: How to Navigate the Guide

181

Set and Merge (cont.)

SETTo concatenate two or several data sets, use the SET

statement.

data SAS-data-set;

set data-set1 data-set2 data-set3 ... ;

run;

The top observations in the SAS-data-set will be from data-set1, the next observations from data-set2 and so on.

Page 182: How to Navigate the Guide

182

Set and Merge (cont.)

SETIf there are different variables in the data sets, missing values

will be generated for the observations which did not have the variables before the concatenation..

Variables not present in all the data sets will be found to the far right of the concatenated data set.

Page 183: How to Navigate the Guide

183

Set and Merge (cont.)

ExampleWe wish to add three observations to our data set

course.main. The three observations are stored in data set course.extraobs.

data course.concatenate;

set course.main course.extraobs;

run;

Page 184: How to Navigate the Guide

184

Set and Merge (cont.)

ExampleConcatenate.sas7bdat:

OBS ID BIRTHYR WEIGHT AGE BMI ALDER

1 001 1954 62 43 22.77319 . 2 002 1956 68 41 24.38237 . 3 003 1956 65 41 21.97134 . 4 004 1962 56 35 19.84127 . 5 005 1954 58 43 22.94213 . 6 006 1953 52 44 19.81405 . 7 007 1955 69 42 22.53061 . 8 008 1955 75 42 25.05931 . 9 009 1960 82 37 28.3737 . course.main 10 010 1962 68 35 22.9854 . 11 011 1961 65 36 23.03005 . 12 012 1954 62 43 21.70792 . 13 013 1956 58 41 20.54989 . 14 014 1962 61 35 22.67995 . 15 015 1958 58 39 21.82995 . 16 016 1959 62 38 22.77319 . 17 017 1962 59 35 21.93635 . 18 018 1957 73 40 22.53086 . 19 029 . . . 28.59 48 20 030 . . . 22.14 51 course.extraobs 21 031 . . . 23.85 42

Page 185: How to Navigate the Guide

185

Set and Merge (cont.)

ExampleSince alder = age it would be nice to transform the three alder

values to age values. Use a RENAME data set option.

rename=(old-var-name=new-var-name);

In the example:

data course.concatenate;

set course.main

course.extraobs(rename=(alder=age));

run;

Page 186: How to Navigate the Guide

186

Set and Merge (cont.)

ExampleOBS ID BIRTHYR WEIGHT AGE BMI

1 001 1954 62 43 22.77319 2 002 1956 68 41 24.38237 3 003 1956 65 41 21.97134 4 004 1962 56 35 19.84127 5 005 1954 58 43 22.94213 6 006 1953 52 44 19.81405 7 007 1955 69 42 22.53061 8 008 1955 75 42 25.05931 9 009 1960 82 37 28.3737 course.main 10 010 1962 68 35 22.9854 11 011 1961 65 36 23.03005 12 012 1954 62 43 21.70792 13 013 1956 58 41 20.54989 14 014 1962 61 35 22.67995 15 015 1958 58 39 21.82995 16 016 1959 62 38 22.77319 17 017 1962 59 35 21.93635 18 018 1957 73 40 22.53086 19 029 . . 48 28.59 20 030 . . 51 22.14 course.extraobs 21 031 . . 42 23.85

Page 187: How to Navigate the Guide

187

Set and Merge (cont.)

MERGETo merge or match two or several data sets, use the MERGE

statement:

data SAS-data-set;

merge data-set1 data-set2 data-set3 ... ;

by variable;

run;

The first variables in the SAS-data-set will be from data-set1, the next variables from data-set2 etc.

Page 188: How to Navigate the Guide

188

Set and Merge (cont.)

MERGEThe BY variable (matching variable) is usually an

identification variable unique to all observations and present in all data sets to be merged.

The data sets must be sorted according to the matching variable before MERGE is used.

If there are different observations in the data sets, missing values will be generated for observations not present in other data sets. These observations will be found at the end of the merged data set.

Page 189: How to Navigate the Guide

189

Set and Merge (cont.)

ExampleWe wish to add a variable to our data set course.main.

The new variable is the number of children each woman has, variable CHILD.

This variable is stored together with the variable ID in a data set called course.extravar. ID is our matching variable, unique for all observations.

Page 190: How to Navigate the Guide

190

Set and Merge (cont.)

Example

proc sort data=course.main course.extravar;

by id;

run;

proc sort data=course.extravar;

by id;

run;

data course.merged;

merge course.main course.extravar;

by id;

run;

Page 191: How to Navigate the Guide

191

Set and Merge (cont.)

Examplemerged.sas7bdat:

OBS ID BIRTHYR WEIGHT AGE BMI CHILD

1 001 1954 62 43 22.7732 1 2 002 1956 68 41 24.3824 0 3 003 1956 65 41 21.9713 2 4 004 1962 56 35 19.8413 2 5 005 1954 58 43 22.9421 0 6 006 1953 52 44 19.8141 3 7 007 1955 69 42 22.5306 1 8 008 1955 75 42 25.0593 2 9 009 1960 82 37 28.3737 4 10 010 1962 68 35 22.9854 . 11 011 1961 65 36 23.0300 2... 17 017 1962 59 35 21.9363 2 18 018 1957 73 40 22.5309 0 19 029 . . 48 28.59 0 20 030 . . 51 22.14 3 21 031 . . 42 23.85 1

course.main course.extravar

Page 192: How to Navigate the Guide

192

Read Raw Data Into SAS

Raw DataSometimes you have a raw data file you want to read into

SAS. The data are not stored in a SAS data set, or .sas7bdat file.

Perhaps data are of the form:

001 Anderson 1954 62 43002 Askelund 1956 68 41003 Bengtson 1956 65 41004 Carner 1962 56 35005 Holmfors 1954 58 43006 Johnsson 1953 52 44...016 Torberg 1959 62 38017 Turner 1962 59 35018 Öhlund 1957 73 40

Page 193: How to Navigate the Guide

193

Read Raw Data Into SAS (cont.)

List Input FormatThis form of data is called list input format. It means that

each data value is separated by a blank.

List input formatted data have the following characteristics:

- all values contain eight or fewer characters

- values are separated by one or several blanks

- missing values (both numeric and character) must be written as a single period (.)

- numerical values must include all necessary decimal points

Page 194: How to Navigate the Guide

194

Read Raw Data Into SAS (cont.)

INPUT and CARDSTo create a data set using list input formatted data, write

data course.nameincl;

input id $ name $ birthyr weight age;

cards;

001 Anderson 1954 62 43

002 Askelund 1956 68 41

003 Bengtson 1956 65 41

...

016 Torberg 1959 62 38

017 Turner 1962 59 35

018 Öhlund 1957 73 40

;

run;

Page 195: How to Navigate the Guide

195

Read Raw Data Into SAS (cont.)

INPUT and CARDSIn the INPUT statement you list the variables in the same

order as they are presented.

If a variable is a character variable, put a dollar sign ($) after the name.

When there is no dollar sign, then SAS interprets the variable as a numeric.

Page 196: How to Navigate the Guide

196

Read Raw Data Into SAS (cont.)

INPUT and CARDSThe CARDS statement indicates that data lines are to follow.

It is written below the INPUT statement.

Note that no semicolons are present at the end of each data line, ONLY at the end of all the data. It marks the end of data entry.

Equivalent to CARDS is the keyword DATALINES.

Page 197: How to Navigate the Guide

197

Read Raw Data Into SAS (cont.)

Column Input FormatIf the data you wish to read into a data set is specified in

columns (= positions), you may use the column input format.

001 Andersson 1954 62 43

002 Askelund 1956 68 41

003 Bengtsson 1956 65 41

004 Carner 1962 56 35

005 Holmfors 1954 58 43

006 Johansson 1953 52 44

...

016 Torberg 1959 62 38

017 Wernersson 1962 59 35

018 Öhlund 1957 73 40

----+----1----+----2----+----3

Page 198: How to Navigate the Guide

198

Read Raw Data Into SAS (cont.)

Column Input FormatEach variable occupies a specified number of characters or

positions (1 position = 1 column).

The first variable ID is in columns 1-3, NAME in 5-12 (including blanks at the end), BIRTHYR in 13-16 etc.

The ruler below the data is a SAS ruler. It is often used in the Log window.

Each column is indicated by “-”. The digit 1 indicates column 10, 2 column 20 etc. The + indicates columns 5, 15, 25 etc.

Page 199: How to Navigate the Guide

199

Read Raw Data Into SAS (cont.)

Column Input FormatTo create a data set using column input formatted data, write

data course.nameincl;

input id $ 1-3 name $ 5-12

birthyr 13-16 weight 18-19 age 21-22;

cards;001 Andersson 1954 62 43

002 Askelund 1956 68 41

003 Bengtsson 1956 65 41

...

016 Torberg 1959 62 38

017 Wernersson 1962 59 35

018 Öhlund 1957 73 40

;

run;

Page 200: How to Navigate the Guide

200

Read Raw Data Into SAS (cont.)

Column vs. List Input FormatThe only difference from the code for list input formatted

data is that you must specify the columns in the INPUT statement. (ID $ 1-3, name $ 5-12, etc.)

Differences in data characteristics:

- values must not contain more characters than are specified in the input statement

- missing values are written as blank for character variable, period (.) for numerical

Page 201: How to Navigate the Guide

201

Read Raw Data Into SAS (cont.)

Data from .TXT FileTo read data (list or column formatted) from external text

files (.txt) use an INFILE statement.

data course.nameincl;

infile ‘h:\SAS-folder\data-file.txt’;

input id $ 1-3

name $ 5-12

birthyr 13-16

weight 18-19

age 21-22

;

run;

Page 202: How to Navigate the Guide

202

Read Raw Data Into SAS (cont.)

Data from .TXT FileThe INPUT statement is written as before, depending on

whether you have column or list formatted data in the text file.

No CARDS (or DATALINES) statement is used.

The INFILE statement must be written above the INPUT statement.

Page 203: How to Navigate the Guide

203

The SAS Manuals

The SAS ManualsThere is a heap of heavy manuals for SAS, often not easy to read

for a beginner but indispensable when you get more experienced.

These are the most useful of them:

SAS Language, Reference

SAS Procedures Guide

To lower the threshold of SAS manuals, you can begin by reading the thin (!) SAS Language and Procedures, Introduction. Most of its contents are presented in this guide, so it is easy to compare.

Page 204: How to Navigate the Guide

204

Importing and Exporting Files

It is also possible to import data from other systems file formats, including Excel and Access, as well as delimited file types.

Use the entries on the Files menu for this.

Or use the program STAT/transfer.

Page 205: How to Navigate the Guide

205

The SAS Manuals (cont.)

Manuals vs. Online HelpThe SAS manuals/Online Documentation work similarly to

the Online Help (see chapter The Online HELP).

The Online Help is easy to use when you know what you are looking for, a specific syntactic problem, options or keywords.

The manuals are best used when you are looking for a solution to a bigger SAS problem, such as which procedure to use, why does it not work as I thought it would, or what is SAS actually doing.

Page 206: How to Navigate the Guide

206

Comments

Commenting ProgramsComments in SAS programs are code that will not be

executed when the program is submitted.

It is often useful to comment out parts of a program.

Also it is good to add comments which explain the code, such as new variables, dates at changes, general purpose of the program, etc.

Page 207: How to Navigate the Guide

207

Comments (cont.)

General CommentComments in programs should be inside /* and */ .

/* comment */

It is NOT possible to have comments within comments

/* comment /* comment */ comment */

Page 208: How to Navigate the Guide

208

Comments (cont.)

Statement CommentAnother way to comment out text is to use a *.

*age=1996-birthyr;

All text after the * and until a semicolon is reached will be commented.

This will avoid the comment-inside-comment problem, but it is more cumbersome, since you must comment every statement.

Page 209: How to Navigate the Guide

209

Comments (cont.)

RUN CANCELIf you wish to prevent a data step or procedure from being

executed you can change the RUN statement to

RUN CANCEL;

Page 210: How to Navigate the Guide

210

Naming Data Sets and Variables

Data Set and Variable NamesThe name of a data set or a variable should reflect its

contents.

It is not obvious to an outsider that the variable “rel51yrd” refers to “relapse 5 years after first diagnosis”. A better name would perhaps be “relapse5”.

Add comments in the data step to clarify what the variables are!

Page 211: How to Navigate the Guide

211

Naming Data Sets and Variables (cont.) Data Set and Variable Names

- can be 32 characters (letters, underscores and digits) long at most

- can be uppercase (versaler) or lowercase (gemena) or mixed (OrIgInAl = oRiGiNaL)

- must start with a letter (A-Z) or an underscore (_), not a digit

Page 212: How to Navigate the Guide

212

Naming Data Sets and Variables (cont.)

Data Set and Variable NamesThe increased character limit to 32 characters in version 8.0 is

welcomed by many SAS users (formerly it was only 8 characters in version 6.12).

But a long name is not always to prefer either. It makes the code tiresome to read, not to mention write.

A good idea is therefore to try to keep to a maximum of 8 to 12 characters also in future. Use labels.

Page 213: How to Navigate the Guide

213

Naming Programs

Program File NamesContrary to the SAS data set and variable names, there is no

limit to 32 characters when you name the program files.

Do not hesitate to use long names, although the same rules as for the variables names apply.

Be consistent, no cryptic abbreviations.

Page 214: How to Navigate the Guide

214

Categorising Variables Dichotomous VariablesThe most common categorising is dichotomization, a variable

that only takes two values, i.e. woman/man, case/control, etc.

Dichotomised variables should take values 1 and 0

(not 1 and 2).

Example: 1 if overweight

0 if not

Page 215: How to Navigate the Guide

215

Categorising Variables (cont.) Dichotomous VariablesDichotomous variables are often called indicator variables,

since the 1 indicate a characteristic and the 0 does not.

It is therefore clever to name dichotomous variables after the “1” characteristic.

Example:

1 if BMI>25

fat =

0 otherwise

Page 216: How to Navigate the Guide

216

Categorising Variables (cont.) Defining Dichotomous VariablesThis is a how you may define the 1/0 variable in a data step:

data course.main;

set course.original;

if (bmi ne .) then

do;

if bmi>25 then fat=1;

else fat=0;

end;

else fat=.;

run;

Or simply fat=(bmi>25); ....but

Page 217: How to Navigate the Guide

217

Categorising Variables (cont.) Dummy VariablesSome procedures can not use variables with several

categories, only dichotomised, so you need to create 1/0 variables (dummy variables) for each category.

Example:

A BMI variable is categorised into three categories, using a format:

< 20

bmi = 20-25

> 25

Page 218: How to Navigate the Guide

218

Categorising Variables (cont.) Dummy VariablesThe three new dummy variables should have bmi as a prefix:

1 if bmi<20 1 if 20<bmi<25 1 if bmi>25

bmi1= bmi2= bmi3=

0 else 0 else 0 else

With the same prefix as the mother variable it will be very easy to understand their origin.

The suffices 1, 2 and 3 are much easier to comprehend than say bmilow, bmimiddl and bmihigh.

Page 219: How to Navigate the Guide

219

Categorising Variables (cont.) ExampleThis is one way to define dummy variables in a data step:

data course.main;

set course.original;

if (bmi ne .) then /* Checks for non-missing values */

do;

if bmi<20 then bmi1=1;

else bmi1=0;

if (20<=bmi<=25) then bmi2=1;

else bmi2=0;

if bmi>25 then bmi3=1;

else bmi3=0;

end;

else /* If missing value */

do;

bmi1=.;

bmi2=.;

bmi3=,;

end;

run;

Page 220: How to Navigate the Guide

220

Categorising Variables (cont.) Example to AvoidThis is BAD way to define dummy variables (well, Anna

doesn’t like it….):

data course.main;

set course.original;

if (bmi ne .) then

do;

bmi1=(bmi<20);

bmi2=(20<=bmi<=25);

bmi3=(bmi>25);

end;

run;

Page 221: How to Navigate the Guide

221

Categorising Variables (cont.) Example to AvoidFirst of all, what does it mean?

Expressions (bmi<20), (20<=bmi<=25) and (bmi>25) are logical expressions. They are either TRUE or FALSE.

SAS interprets the logical value TRUE as the numeric value 1, and FALSE as 0.

So, if bmi<20 say, then bmi1=1, bmi2=0 and bmi3=0.

Page 222: How to Navigate the Guide

222

Organise Your Programming

Structuring SAS Files

Here is a suggestion of how to organise your SAS programs.

It is not presumed to be The Ultimate Structure, but it is a good start to get organised.

All ideas to improve this structure are most welcome.

Page 223: How to Navigate the Guide

223

Organise Your Programming (cont.)

We Want To Avoid ThisThe most common SAS programming is:

You make a program file which begins with a data step reading original.sas7bdat. In the data step you do the changes and adds, create formats, and then the procs follow using this data set.

The reason why people write programs like this is probably because it is the natural way to do it. First data, then analysis.

Page 224: How to Navigate the Guide

224

Organise Your Programming (cont.)

We Want To Avoid ThisThe problem with this programming is:

All programs create new data sets.

So whenever you want to update the data you must update ALL the program files.

Eventually you will not know which ones you have updated, and which you have not.

Page 225: How to Navigate the Guide

225

Organise Your Programming (cont.)

Our IdeaTo avoid this time consuming dilemma, I suggest the

following file structure:

Separate the data steps from the procs by saving them to different files. Data files and analysis files.

Preferably, create only ONE main data set where you make ALL your changes, and then use it in every proc file.

Whenever you make a change to the data, all the analyses are changed accordingly (when rerun).

Page 226: How to Navigate the Guide

226

Organise Your Programming (cont.)

Data FilesFour files handles the data:

the original data file saved as a

1. .sas7bdat file

the data set programming is in

2. main.sas

which creates

3. main.sas7bdat

formats are created by code in the file

4. formats.sas

Page 227: How to Navigate the Guide

227

Organise Your Programming (cont.)

Analysis FilesOther files can be divided into three groups:

- data cleaning files

- analysis files

- published analysis files

However, these groups are not clearly defined. Some files could fall into several groups.

We have yet to come up with a clever structure of these files. Suggestions are kindly accepted.

Page 228: How to Navigate the Guide

228

Organise Your Programming (cont.)

Organising the Data FilesThe original data file (a SAS data set or an external data base

file), should NOT be used when you run SAS.

SAS executions may go wrong and create permanent changes to the data.

Start by making a copy of the original data file.

Page 229: How to Navigate the Guide

229

Organise Your Programming (cont.)

.sas7bdat FileIf the original data file is a .sas7bdat file, save the copy as

copy-of-name.sas7bdat (or a similar name).

Page 230: How to Navigate the Guide

230

Organise Your Programming (cont.)

Non-SAS FileIf the original data file is not a SAS data file (.sas7bdat) then

you should create a SAS data file (name.sas7bdat) from it.

This is not always trivial and you may need help from the Analysis group.

Golden Rule: NEVER try to import THE original file. Always make another COPY of it before you import it to SAS. Imports can crash.

Page 231: How to Navigate the Guide

231

Organise Your Programming (cont.)

Do Not Alter Original DataYou now have the original data stored as the SAS data file

name.sas7bdat.

Golden Rule No 2: The file name.sas7bdat should ONLY be altered whenever the original data file is altered!

It is an exact copy of the original data and should remain so.

Page 232: How to Navigate the Guide

232

Organise Your Programming (cont.)

Main.sasFrom name.sas7bdat we create the data set main.sas7bdat in the

program called main.sas. This is our version of main.sas:

/***

your-libname.main creates the main.sas7bdat data set for project ‘project-name’.

Created: 2001-08-07 by Anna Johansson

Used by: all analysis files, otherwise mentioned

***/

data your-libname.main;

set your-libname.name;

/* Changes and adds (new variables) */

continues ...

Page 233: How to Navigate the Guide

233

Organise Your Programming (cont.)

Main.sas /* Assign variable labels */

label var1=‘label1’

var2=‘label2’

...;

/* Assign formats */

format var1 format1.

Var2 format2.

...;

run;

proc contents data=your-libname.main;

run;

proc print data=your-libname.main;

run;

/* END of main.sas */

Page 234: How to Navigate the Guide

234

Organise Your Programming (cont.)

Main.sasThe comment header of main.sas includes some general

information. It is optional, but very useful.

The data set your-libname.main is created from your-libname.name in the data step that follows.

Changes and adds to the original data are made.

At the end, labels and formats are assigned to the variables. These lists become longer as the work progress.

Page 235: How to Navigate the Guide

235

Organise Your Programming (cont.)

Main.sasAfter the data step, two procedures, CONTENTS and PRINT,

are found.

The logs and outputs they generate are used to check whether the main.sas7bdat data set was created satisfactorily.

If you have a large data set it might be a good idea to put a restriction to the PRINT procedure, so that only a fraction of the data is printed.

proc print data=course.main(obs=20);

Page 236: How to Navigate the Guide

236

Organise Your Programming (cont.)

Formats.sasThe formats assigned at the end of the main data step are

created in a file we call formats.sas:

proc datasets library=your-libname memtype=(catalog);

delete formats / memtype=catalog;

run;

proc format library=your-libname fmtlib;

value bmif low-20.0=‘Underweight’

20.0-25.0=‘Normal weight’

25.0-high=‘Overweight’

other=‘Other’;

continues ...

Page 237: How to Navigate the Guide

237

Organise Your Programming (cont.)

Formats.sas

value case_1f 0=‘Case’ 1=‘Control’

other=‘Other’;

value yes10no 0=‘No’

1=‘Yes’

.=‘Missing’;

run;

/* END of formats.sas */

Page 238: How to Navigate the Guide

238

Organise Your Programming (cont.)

Formats.sasAll formats are saved permanently to a format catalogue (see

section on Formats).

The first part (proc datasets) will delete the existing SAS format catalogue.

Make sure the occurrences of your-libname are correct.

Page 239: How to Navigate the Guide

239

Organise Your Programming (cont.)

Formats.sasThe formats are created by proc FORMAT. Here the format

names are bmif, case_1f and yes10no.

The name should reflect the variable it refers to.

Occasionally there are more general formats like “yes10no” (0=‘No’, 1=‘Yes’). They may have a general name and be assigned to several variables.

A good idea is to place the format definitions in alphabetical order in formats.sas.

Page 240: How to Navigate the Guide

240

Organise Your Programming (cont.)

Working With Several FormatsOften you switch between several formats for the same

variable, for instance when you are checking different ways to divide categories.

Do not use one format which you change and redefine, but several formats and change the assignment of them (either in main.sas or in the proc you are using, see the next picture).

These formats should have similar names, like bmi3grp, bmi5grp, bmi7grp. (Note that a format name can not end with a digit.)

Page 241: How to Navigate the Guide

241

Organise Your Programming (cont.)

Working With Several FormatsThe permanent assignment in main.sas should reflect the

format you wish to keep as the permanent. It is possible to temporarily change a format in ANY procedure by adding a FORMAT statement.

Example:

proc freq data=course.main;

tables bmi*case_1;

format bmi bmi5grp.;

run;

Page 242: How to Navigate the Guide

242

Organise Your Programming (cont.)

Working With Several FormatsTo delete all format assignments simply assign no format:

proc freq data=course.main;

tables bmi*case_1;

format bmi ;

run;

Page 243: How to Navigate the Guide

243

Organise Your Programming (cont.)

When Must I Rerun Main.sas?Whenever you have

- made a change to the data step, or

- changed the permanent format assignment in main.sas

you must rerun main.sas.

You do not need to rerun main.sas when you have changed the formats.sas, UNLESS you have also changed the assignments in main.sas.

Page 244: How to Navigate the Guide

244

Organise Your Programming (cont.)

Organising the Analysis FilesOrganising the proc files (analysis files) is more complicated.

How are they naturally grouped?

By the tables presented in the article?

By the different procs (MEANS in one file, FREQ in another)?

The only advise I have is to give the files good names, and perhaps add dates in comments. And use comments, since the proc files are the documentation of your analysis work.

Page 245: How to Navigate the Guide

245

Organise Your Programming (cont.)

DocumentationRemember:

The SAS files are the documentation of your data analysis.

Documentation should be easy to access and understand.

You should be able to reproduce all published results from the original data file.

Page 246: How to Navigate the Guide

246

Appendix

List of Useful ProceduresPROC APPEND = concatenates data sets (compare SET

statement)

PROC CATALOG = administrates SAS Catalogues

PROC CHART = creates simple charts, bar charts, block charts, pie charts, etc.

PROC COMPARE = compares the contents of two data sets, or variables within a data set

PROC CONTENTS = prints the descriptive part of a data set

PROC COPY = copies a SAS Data Library or parts of it

PROC CORR = calculates correlation coefficients

PROC DATASETS = lists, renames, deletes data sets etc.

Page 247: How to Navigate the Guide

247

Appendix (cont.)

List of Useful Procedures (cont.)PROC FORMAT = defines formats and informats permanently

or temporarily

PROC FREQ = creates frequency tables and basic statistical tests

PROC MEANS = creates descriptive statistics

PROC OPTIONS = lists all current SAS System options values

PROC PLOT = creates simple Y-X plots

PROC PRINT = prints the data part of a data set

PROC PRINTTO = defines destination of output and log, e.g. when you wish to use an output as input for another procedure

Page 248: How to Navigate the Guide

248

Appendix (cont.)

List of Useful Procedures (cont.)PROC RANK = calculates ranking of different sorts

PROC REPORT = creates better layout for output

PROC SORT = sorts data sets by one or several variables

PROC SQL = implementing query language SQL, used to retrieve data sets

PROC STANDARD = standardises variables to a given mean and variance

PROC SUMMARY = descriptive statistics in form of summations

PROC TABULATE = descriptive statistics in tables

PROC TRANSPOSE = transposes data sets, i.e. transforms rows to columns, and vice versa

Page 249: How to Navigate the Guide

249

Appendix (cont.)

List of Useful Procedures (cont.)PROC UNIVARIATE = creates descriptive distribution

statistics and plots, histograms etc.

Page 250: How to Navigate the Guide

250

Acknowledgements

Anna wishes to thank the following MEP colleagues for their support.

Hanna Svensson, Statistician, Doctoral Student

Lena Brandt, Statistician

Paul Dickman, Senior Statistician

Gunnar Petersson, Programmer

-- and Peter and Lene wish to thank Anna!!