sas programming basics - university of north carolina at ...people.uncw.edu/blumj/stt305/ppt/sas...
TRANSCRIPT
SAS Programs
SAS Programs consist of three major components:
Global statements
Procedures
Data steps
Notes
Data steps and procedures are made up of one or more statements.
All statements end with a semicolon. This is the only restriction, SAS programs are otherwise free-
form—e.g. the following are equivalent:
proc sort data=mysas.fish out=work.fish2;
by lt hg;
run;
proc sort data=mysas.fish out=work.fish2;by lt hg;run;
Notes
The run statement is optional (only the final one is required). The run statement indicates to SAS that
the procedure or data step is complete.
Starting a new data step or procedure is also taken by SAS as an indication that the previous one is complete (nesting of data steps and/or procedures is not allowed).
Using the run statement is generally good practice…
SAS Data Set Basics
SAS data set reference Two levels: libref.data-set
libref is the SAS library (work library is default if libref is omitted)
data-set is the name of the SAS data set
SAS naming conventions Names can be up to 32 characters in length Not case sensitive Must begin with a letter or underscore Contain only letters, numbers or underscores Applies to data sets, variables and libraries, but
libraries are limited to 8 characters.
SAS Data Libraries
A SAS data library is a directory on your computer where SAS data sets are stored or will be stored.
A library reference name (libref) can be assigned using the libname statement:libname libref ‘path-specification’;
The libref must follow naming conventions and the path specified must exist.
SAS Data Set Basics
SAS data sets consist of two portions
Descriptor portion
Data portion
Descriptor portion
Contains various status information
Can be viewed with PROC CONTENTS
The Contents Procedure
To view the descriptor portion of a data set:
proc contents data=libref.filename;
run;
To view data set listing for a particular library:
proc contents data=libref._all_ nods;
run; Keyword todisplay all datasets in library
Suppressdescriptor
portion
Quick Exercise
Assign a library reference to the Orion sub-folder of the SAS Programming Data folder; then run PROC CONTENTS on this library.
SAS Data Set Basics
Data portion
Contains variable names and values
Can be viewed with various procedures or via the explorer.
SAS variables and values are of two types
Numeric
Character
The Print Procedure
The print procedure is primarily used to display raw data.
In its most basic form, it simply sends the data to the output window.
Print does allow for some levels of customization and summarization.
General Syntax
proc print data=SAS-data-set options ;
var variable-list ;
by <descending> variable-list ;
pageby variable-list ;
id variable-list ;
sum variable-list ;
sumby variable-list ;
run;
Invoke the procedurewith options (including
data set)
General Syntax
proc print data=SAS-data-set options ;
var variable-list ;
by <descending> variable-list ;
pageby variable-list ;
id variable-list ;
sum variable-list ;
sumby variable-list ;
run;
Select the set ofvariables (columns) todisplay—default is all
General Syntax
proc print data=SAS-data-set options ;
var variable-list ;
by <descending> variable-list ;
pageby variable-list ;
id variable-list ;
sum variable-list ;
sumby variable-list ;
run;
Group by a particularvariable(s), data set
must be sorted.
descending is optionalhere. In SAS help
optional keywords areenclosed in < >; however,they are not actually typed
when the option is used
General Syntax
proc print data=SAS-data-set options ;
var variable-list ;
by <descending> variable-list ;
pageby variable-list ;
id variable-list ;
sum variable-list ;
sumby variable-list ;
run;
If by variables areused, this statementcan be included toget page breaks.
General Syntax
proc print data=SAS-data-set options ;
var variable-list ;
by <descending> variable-list ;
pageby variable-list ;
id variable-list ;
sum variable-list ;
sumby variable-list ;
run;
Used in conjunctionwith by to alter thedisplay of groups.
General Syntax
proc print data=SAS-data-set options ;
var variable-list ;
by <descending> variable-list ;
pageby variable-list ;
id variable-list ;
sum variable-list ;
sumby variable-list ;
run;
Self explanatory—only applicable tonumeric variables.
General Syntax
proc print data=SAS-data-set options ;
var variable-list ;
by <descending> variable-list ;
pageby variable-list ;
id variable-list ;
sum variable-list ;
sumby variable-list ;
run;
If by variables areused, this will permitaltering sub-totals.
Simple Examples
Try this:proc print data=mysas.fish;
var name elv sa z hg;
run;
Vs. this:proc print data=mysas.fish label;
var name elv sa z hg;
run;
Vs. this:proc print data=mysas.fish label noobs;
var name elv sa z hg;
run;
Most SAS procedureswill use labels
whenever present, PRINT does not.
The observationcolumn can be
removed.
Using BY Processing (and SORT)
Any time a BY statement is used in a procedure, the data set must be sorted on the listed variables. The sort procedure can be used to sort a data set.
Syntax:proc sort data=SAS-data-set <out=SAS-data-set > ;
by <descending > variable1 … <descending > variablek;run;
Sorts by first variable listed, then by the second variable within each group of the first, and so on.
An Example with BY Groups
proc sort data=mysas.fish out=work.fish_sort;
by lt;
run;
title 'Grouped by Lake Type';
proc print data=fish_sort label noobs;
by lt;
var name elv sa z hg;
run;
Normally the sorted data would replace the previous data set,here an output data set must
be specified, why?
Remember, as far as SAS isconcerned, work.fish_sortand fish_sort are the same
data set.
Modification
proc print data=fish_sort label noobs;
by lt;
id lt;
var name elv sa z hg;
sum hg;
run;
ID alters the display ofby groups.
Get totals and sub-totalsfor hg variable.
Other Statements of Use
PRINT is not a very sophisticated procedure, and we will not use it much. But it is good for illustrating some general concepts.
Titles and Footnotes
General form
titlen ‘title text’;
footnoten ‘footnote text’;
Titles appear at the top of the page, notes at the bottom (default title is The SAS System)
n can be any whole number between 1 and 10 (title is equivalent to title1).
More on Titles and Footnotes
Titles remain in effect for the SAS session unless they are cancelled or changed.
Submission of a titlen statement:
Replaces previous title with same number
Removes all titles with higher number
Submission of a null title statement (title;) cancels all titles.
Above works same for footnotes.
The Label Statement
The label statement applies a label to be written in place of the variable name for display
label variable1=‘label1’ variable2=‘label2’ …;
Labels assigned using the label statement in a procedure are temporary—only in effect for that procedure.
Where Processing
One can also subset output using a where statement in any procedure (or data step)
where expression;
Procedure only processes records for which the expression is true
Example: Add the following to your last PRINT.
where hg ge 0.5;
Relational Operators
Relation Symbol Mnemonic
Equal = eq
Not Equal ^= ne
Greater Than > gt
Less Than < lt
Greater or Equal To >= ge
Less or Equal To <= le
Compound Conditions
Ex.proc print data=fish_sort label noobs;
where lt ne . and hg ge 0.5;
by lt;
id lt;
var name elv sa z hg;
run;
Compound Conditions
Ex.proc print data=fish_sort label noobs;
where lt ne . and hg ge 0.5;
by lt;
id lt;
var name elv sa z hg;
run;
The keywords and, or are available along with parentheses to help you set compound conditions.
Other Operators
Between; example:
proc print data=fish_sort label noobs;
where hg between 0.3 and 0.5;
by lt;
id lt;
var name elv sa z hg;
run;
In; example:
proc print data=fish_sort label noobs;
where lt in (1,3);
by lt;
id lt;
var name elv sa z hg;
run;
General System Options
Options statement
options SAS-system-options;
Option linesize=n
ls=n
pagesize=n
ps=n
date
nodate
number
nonumber
pageno=n
Function Sets the number of characters per line for output.
(and log)
Sets the number of lines per page for output.
(and log)
Turns printing of date and time on/off.
Turns printing of page numbers on/off.
Specifies beginning page number for output.
Exercises
Create the following output from the “fish” data set, grouped on both the dam and lt variables (first page shown):