4 4
SAS FunctionsSAS provides a large library of functions for manipulating data during DATA step execution.
A SAS function is often categorized by the type of data manipulation performed: truncation character date and time mathematical trigonometric
special sample statistics financial random number state and ZIP code
5 5
Syntax for SAS FunctionsA SAS function is a routine that performs a computation or system manipulation and returns a value. Functions use arguments supplied by the user or by the operating environment.
General form of a SAS function:
function-name(argument-1,argument-2,…,argument-n)function-name(argument-1,argument-2,…,argument-n)
6 6
Using SAS FunctionsYou can use functions in executable DATA step statements anywhere that an expression can appear.
data contrib; set prog2.donate; Total=sum(Qtr1,Qtr2,Qtr3,Qtr4); if Total ge 50;run;
proc print data=contrib noobs;run;
7 7
Using SAS FunctionsPartial PROC PRINT Output
What if you want to sum Qtr1 through Qtr400, instead
of Qtr1 through Qtr4?
ID Qtr1 Qtr2 Qtr3 Qtr4 Total
E00224 12 33 22 . 67E00367 35 48 40 30 153E00441 . 63 89 90 242E00587 16 19 30 29 94E00621 10 12 15 25 62
8 8
SAS Variable ListsA SAS variable list is a shortcut method of referring to a list of variable names. SAS enables you to use the following variable lists: numbered range lists name range lists name prefix lists special SAS name lists
These methods can be used in many places where variable names are expected.
9 9
SAS Variable Lists: Numbered Range ListSyntax: x1-xn
A numbered range list specifies all variables from x1 to xn inclusive (including the variables named).
You can begin with any number and end with any number. You must follow the rules for user-supplied variable names and the numbers must be consecutive.
proc print data=prog2.donate; var id Qtr2-Qtr4;run;
1010
SAS Variable Lists- Numbered Range List
What would be the result of this program if Qtr3 were not in the data set?
proc print data=prog2.donate; var id Qtr2-Qtr4;run;
1111
SAS Variable Lists- Numbered Range List
What would be the result of this program if Qtr3 were not in the data set?
proc print data=prog2.donate; var id Qtr2-Qtr4;run;
Because the variable Qtr3 is not in the data set, you get an error in the log. The error message indicates that the variable does not exist.
1212
SAS Variable Lists- Name Range ListSyntax: StartVarName--StopVarName
A name range list specifies all variables ordered as they are in the program data vector, from StartVarName to StopVarName, inclusive.
There are two hyphens.
proc print data=fakedata; var id Name--Salary;run;
ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3
1313
What is the result of the following program?
The output contains ID followed by Name, State, and
Salary.
SAS Variable Lists- Name Range List
proc print data=fakedata; var id Name--Salary;run;
ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3
1414
SAS Variable Lists- Name Range ListSyntax: StartVarName-NUMERIC-StopVarName
StartVarName-CHARACTER-StopVarName
You can also use the keyword NUMERIC or CHARACTER inside the hyphens to select all the variables of that data type, inclusively.
proc print data=fakedata; var id Name-character-JobCode;run;
ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3
1515
SAS Variable Lists- Name Range ListWhat is the result of the following program?
The output contains ID followed by Name, State, and
Jobcode. Salary is not displayed because it is a numeric variable.
proc print data=fakedata; var id Name-character-Jobcode;run;
ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3
1616
SAS Variable Lists- Name Prefix ListSyntax: PartVarName:
Providing part of the variable name followed by a semicolon tells SAS that you want all the variables that start with that string.
The case of the string does not matter.
data fakedata2; set fakedata; keep id S:;run;
ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3
1717
What is the result of the following program?
The new data set contains ID followed by State and
Salary.
SAS Variable Lists- Name Prefix List
data fakedata2; set fakedata; keep id S:;run;
ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3
1818
SAS Variable Lists- Special Prefix ListSyntax: _ALL_
_NUMERIC_
_CHARACTER_
specifies either all variables, all numeric variables, or all character variables that are defined in the current DATA step.
The case of the keyword does not matter.
ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3
1919
SAS Variable ListsWhen you use a SAS variable list in a SAS function, use the keyword OF in front of the first variable name in the list.
If you omit the OF keyword, the function subtracts Qtr4
from Qtr1.
data contrib; set prog2.donate; Total=sum(of Qtr1-Qtr4); if Total ge 50;run;
2222
A Mailing Label ApplicationThe freqflyers data set contains information about frequent flyers.
Use this data set to create another data set suitable for mailing labels.
2323
A Mailing Label ApplicationID is a character variable. Its last digit represents the gender (1 denotes female, 2 denotes male) of the frequent flyer.
prog2.freqflyers
ID Name Address1 Address2
F31351 Farr,Sue 15 Harvey Rd. Macon,Bibb,GA,31298F161 Cox,Kay B. 163 McNeil Pl. Kern,Pond,CA,93280F212 Mason,Ron 442 Glen Ave. Miami,Dade,FL,33054F25122 Ruth,G. H. 2491 Brady St. Munger,Bay,MI,48747
2424
A Mailing Label Applicationlabels
The first task is to create a title of Mr. or Ms. based
on the last digit of ID.
FullName Address1 Address2
Ms. Sue Farr 15 Harvey Rd. Macon, GA 31298Ms. Kay B. Cox 163 McNeil Pl. Kern, CA 93280Mr. Ron Mason 442 Glen Ave. Miami, FL 33054Mr. G. H. Ruth 2491 Brady St. Munger, MI 48747
2525
The SUBSTR Function (Right Side)The SUBSTR function is used to extract or replace characters.
This form of the SUBSTR function (right side of the assignment statement) extracts characters.
NewVar=SUBSTR(string,start<,length>);NewVar=SUBSTR(string,start<,length>);
2626
The SUBSTR Function - Examples
If the length of the created variable is not previously defined with a LENGTH statement, it is the same as the length of the first argument to SUBSTR.
String can be a character constant, variable or expression.
Start specifies the starting position.
Length specifies the number of characters to extract. If omitted, the substring consists of the remainder of the expression.
2727
The SUBSTR Function (Right Side)Extract two characters from Location and start at position 11.
State=substr(Location,11,2);
State$ 18
OH
Location $ 18
Columbus, OH 43227
2828
proc print data=prog2.freqflyers noobs; var ID;run;
A Mailing Label Application
In what position does the last digit of ID occur?
In some values, the last digit is in column 6 and in others it is in column 4.
ID
F31351 F161 F212 F25122
PROC PRINT Output
2929
The RIGHT FunctionThe RIGHT function returns its argument right-aligned.
Trailing blanks are moved to the start of the value.
NewID=right(ID);
NewVar=RIGHT(argument);NewVar=RIGHT(argument);
NewID$ 6
F161
ID$ 6
F161
continued...
3030
The RIGHT FunctionAn argument can be a character constant, variable, or expression
If the length of the created variable is not previously defined with a LENGTH statement, it is the same as the length of the argument.
3131
The LEFT FunctionThe LEFT function returns its argument left-aligned.
Trailing blanks are moved to the end of the value.
NewID=left(ID);
NewVar=LEFT(argument);NewVar=LEFT(argument);
ID$ 6
F161
NewID$ 6
F161
3232
The LEFT FunctionAn argument can be a character constant, variable or expression
If the length of the created variable is not previously defined with a LENGTH statement, it is the same as the length of the argument.
3333
data labels; set prog2.freqflyers; if substr(right(ID),6)='1' then Title='Ms.'; else if substr(right(ID),6)='2' then Title='Mr.';run;
proc print data=labels noobs; var ID Title;run;
A Mailing Label Application
The result of the RIGHT function acts as the first argument to the SUBSTR function.
3535
A Mailing Label ApplicationThe next task is to separate the names of the frequent flyers into two parts.
Name
Farr,Sue
Cox,Kay B.
FMName
Sue
Kay B.
LName
Farr
Cox
3636
The SCAN FunctionThe SCAN function returns the nth word of a character value.
It is used to extract words from a character value when the relative order of words is known, but their starting positions are not.
NewVar=SCAN(string,n<,delimiters>);NewVar=SCAN(string,n<,delimiters>);
3737
The SCAN FunctionWhen the SCAN function is used, the length of the created variable is 200 bytes if it is
not previously defined with a LENGTH statement delimiters before the first word have no effect any character or set of characters can serve as
delimiters two or more contiguous delimiters are treated as a
single delimiter a missing value is returned if there are fewer than n
words in string if n is negative, the SCAN function selects the word in
the character string starting from the end of string.
3838
Second=scan(Phrase,2,' ');
The SCAN FunctionExtract the second word of Phrase.
andsoftware and services
1 2 3
Second$ 200
Phrase$ 21
...
3939
Second=scan(Phrase,2,':');
The SCAN FunctionExtract the second word of Phrase.
2
servicessoftware and:services services
Second$ 200
Phrase$ 21
software and:services
1
4040
The SCAN Functiondata scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;
...
4141
The SCAN Functiondata scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;
Thursday
1 2 3 4
Var1 $ 200
...
4242
The SCAN Function
Thursday 1776
Var1 $ 200
Var2$ 200
data scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;
1 2 3 4
...
4343
The SCAN Functiondata scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;
Thursday 1776
1 2 3 4
Var1 $ 200
Var2$ 200
Var3 $ 200
missing...
4444
The SCAN Functiondata scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;
Thursday 1776 1776)
1 2
space
Var1 $ 200
Var2$ 200
Var3 $ 200
Var4$ 200
...
4545
The SCAN Functiondata scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;
Thursday 1776 1776) 1776
space
1 2
Var1 $ 200
Var2$ 200
Var3 $ 200
Var4$ 200
Var5$ 200
4646
A Mailing Label Applicationdata labels; length FMName LName $ 10; set prog2.freqflyers; if substr(right(ID),6)='1' then Title='Ms.'; else if substr(right(ID),6)='2' then Title='Mr.'; FMName=scan(Name,2,','); LName=scan(Name,1,',');run;
4747
proc print data=labels noobs; var ID Name Title FMName LName;run;
A Mailing Label Application
ID Name Title FMName LName
F31351 Farr,Sue Ms. Sue FarrF161 Cox,Kay B. Ms. Kay B. CoxF212 Mason,Ron Mr. Ron MasonF25122 Ruth,G. H. Mr. G. H. Ruth
The next task is to join the values of Title, FMName,
and LName into another variable.
PROC PRINT Output
4848
Concatenation OperatorThe concatenation operator joins character strings.
Depending on the characters available on your keyboard, the symbol to concatenate character values can be two exclamation points (!!), two vertical bars (||), or two broken vertical bars (¦¦).
NewVar=string1 !! string2;NewVar=string1 !! string2;
4949
Sue Farr
FULLName$ 20
Concatenation OperatorCombine FMName and LName to create FullName.
FullName=FMName !! LName;
Sue
FMName$ 10
Farr
LName $ 10
5050
The TRIM FunctionThe TRIM function removes trailing blanks from its argument.
If the argument is blank, the TRIM function returns one blank.
The TRIMN function is similar but returns a null string (zero blanks) if the argument is blank.
NewVar=TRIM(argument1) !! argument2;NewVar=TRIM(argument1) !! argument2;
5151
The TRIM Functiondata trim; length FMName LName $ 10; FMName='Sue'; LName='Farr'; FullName1=trim(FMName); FullName2=trim(FMName) !! LName; FullName3=trim(FMName) !! ' ' !! LName;run;
FullName1$ 10
Sue
...
5252
The TRIM Functiondata trim; length FMName LName $ 10; FMName='Sue'; LName='Farr'; FullName1=trim(FMName); FullName2=trim(FMName) !! LName; FullName3=trim(FMName) !! ' ' !! LName;run;
FullName1$ 10
FullName2$ 20
Sue SueFarr
...
5353
The TRIM Functiondata trim; length FMName LName $ 10; FMName='Sue'; LName='Farr'; FullName1=trim(FMName); FullName2=trim(FMName) !! LName; FullName3=trim(FMName) !! ' ' !! LName;run;
FullName1$ 10
FullName2$ 20
Sue SueFarr Sue Farr
FullName3$ 21
5454
The TRIM FunctionThe TRIM function does not remove leading blanks from a character argument. Use a combination of the TRIM and LEFT functions to remove leading and trailing blanks from a character argument.
If FMName contained leading blanks, the following assignment statement would correctly concatenate FMName and LName into FullName.
FullName=trim(left(FMName)) !! ' ' !! LName;
5555
A Mailing Label Applicationdata labels(keep=FullName Address1 Address2); length FMName LName $ 10; set prog2.freqflyers; if substr(right(ID),6)='1' then Title='Ms.'; else if substr(right(ID),6)='2' then Title='Mr.'; FMName=scan(Name,2,','); LName=scan(Name,1,','); FullName=Title !! ' ' !! trim(FMName) !! ' ' !! LName; Address2=scan(Address2,1,',') !! ', ' !! scan(Address2,3,',') !! ' ' !! scan(Address2,4,',');run;
5656
A Mailing Label Application
FullName Address1 Address2
Ms. Sue Farr 15 Harvey Rd. Macon, GA 31298Ms. Kay B. Cox 163 McNeil Pl. Kern, CA 93280Mr. Ron Mason 442 Glen Ave. Miami, FL 33054Mr. G. H. Ruth 2491 Brady St. Munger, MI 48747
proc print data=labels noobs; var FullName Address1 Address2;run;
PROC PRINT Output
5757
The CATX FunctionThe CATX function concatenates character strings, removes leading and trailing blanks, and inserts separators.
CATX(separator, string-1, … string-n)CATX(separator, string-1, … string-n)
5858
A Mailing Label Applicationdata labels(keep=FullName Address1 Address2); length FMName LName $ 10;
set prog2.freqflyers;if substr(right(ID),6)='1' then Title = 'Ms.';else if substr(right(ID),6)='2' then Title = 'Mr.';FMName = scan(Name,2,',');Lname = scan(Name,1,',');FullName = catx(' ',Title,FMName,LName);Address2 = catx(' ',
scan(Address2,1,',') || ',', scan(Address2,3,','),
scan(Address2,4,','));run;
5959
A Mailing Label Application
FullName Address1 Address2
Ms. Sue Farr 15 Harvey Rd. Macon, GA 31298Ms. Kay B. Cox 163 McNeil Pl. Kern, CA 93280Mr. Ron Mason 442 Glen Ave. Miami, FL 33054Mr. G. H. Ruth 2491 Brady St. Munger, MI 48747
proc print data=labels noobs; var FullName Address1 Address2;run;
PROC PRINT Output
6161
Exercises
The MIT Admissions Office received a list of students with perfect SAT scores. The file must be in a format that Admissions can use. Use the People data set to create a temporary SAS data set named Separate that contains the variables First, MI, and Last to perform the following tasks:
1. Create a First and MI variable that contains each person's first name and middle initial. Do not include in the Separate data set.
2. Use the Separate data set to create a temporary data set called flname that contains the variables NewName and CityState. NewName should be a concatenation of each person's first name and last name with one space between them.
3. Create a list report to view the results.
6262
Exercises – Solution 1
libname prog2 'your-directory';data separate(drop=FMnames); length FMnames First $ 30
MI $ 2Last $ 30;
set prog2.people; FMnames = left(scan(Name,2,',')); First = scan(FMnames,1,' '); MI = left(scan(FMnames,2,' ')); Last = scan(Name,1,',');run;
proc print data=separate; var Name CityState First MI Last;run;
6363
Exercises – Solution 2
libname prog2 'your-directory';data flname(keep=NewName CityState); length FMname First MI Last $ 30; set prog2.people; Last = scan(Name,1,','); FMname = left(scan(Name,2,',')); First = scan(FMname,1,' '); MI = scan(FMname,2,' '); NewName = trim(First) !! ' ' !! Last;run;
proc print data=flname; var NewName CityState;run;
6464
Exercises - Output
The SAS System
Obs NewName CityState
1 LINDSAY DEAN WILMINGTON, NC 2 HELEN-ASHE FLORENTINO WASHINGTON, DC 3 JAN VAN ALLSBURG SHORT HILLS, NJ 4 STANLEY LAFF SPRINGFIELD, IL 5 GEORGE RIZEN CHICAGO, IL 6 MARC MITCHELL CHICAGO, IL 7 DOROTHY MILLS JOE, MT 8 JONATHAN WEBB MORRISVILLE, NC 9 MAYNARD KEENAN SEDONA, AZ 10 PHYLLIS LACK WALTHAM, MA 11 KERRY THOMPSON WINTER PARK, FL 12 DOROTHY COX TIMONIUM, MD 13 DONALD SEPTOFF BOSTON, MA 14 JANICE PHOENIX SOMERVILLE, NJ 15 MURRAY HUNEYCUTT DIME BOX, TX 16 SHERRY ERICKSON EL PASO, TX 17 CLIVE SCHNEIDER CAPE MAY, NJ 18 KIMBERLY PUTNAM DUNWOODY, GA 19 JENNIFER PITTMAN BENNINGTON, VT 20 STACY ROLEN CODY, WY
6565
Exercises
The MIT Admissions Office likes to review applications based on merit and remove as much identifiable material as possible. They choose to label each application folder with an applicants' initials only.
Using the Separate data set that you recently
created, create a temporary data set called init that
contains only the variables Name and Initials.
The value of Initials should be a concatenation of the first characters from each person's first name, middle initial, and last name with no delimiters separating the characters.
6666
Exercises
data init(drop=FName MName LName FMNames); length Initials $ 3 LName FMNames FName MName $ 30; set prog2.People; LName=scan(Name,1,','); FMNames=scan(Name,2,','); FName=scan(FMNames,1,' '); MName=scan(FMNames,2,' '); /* Put together just the first letters. */ Initials=substr(FName,1,1) !! substr(MName,1,1) !! substr(LName,1,1);run;proc print data=init; var Name CityState Initials;run;
6767
Exercises
The SAS System
Obs Name CityState Initials
1 DEAN, LINDSAY A. WILMINGTON, NC LAD 2 FLORENTINO, HELEN-ASHE H. WASHINGTON, DC HHF 3 VAN ALLSBURG, JAN F. SHORT HILLS, NJ JFV 4 LAFF, STANLEY X. SPRINGFIELD, IL SXL 5 RIZEN, GEORGE Q. CHICAGO, IL GQR 6 MITCHELL, MARC J. CHICAGO, IL MJM 7 MILLS, DOROTHY E. JOE, MT DEM 8 WEBB, JONATHAN W. MORRISVILLE, NC JWW 9 KEENAN, MAYNARD J. SEDONA, AZ MJK 10 LACK, PHYLLIS M. WALTHAM, MA PML 11 THOMPSON, KERRY L. WINTER PARK, FL KLT 12 COX, DOROTHY E. TIMONIUM, MD DEC 13 SEPTOFF, DONALD E. BOSTON, MA DES 14 PHOENIX, JANICE A. SOMERVILLE, NJ JAP 15 HUNEYCUTT, MURRAY Y. DIME BOX, TX MYH 16 ERICKSON, SHERRY A. EL PASO, TX SAE 17 SCHNEIDER, CLIVE J. CAPE MAY, NJ CJS 18 PUTNAM, KIMBERLY M. DUNWOODY, GA KMP 19 PITTMAN, JENNIFER R. BENNINGTON, VT JRP 20 ROLEN, STACY D. CODY, WY SDR
6868
A Search ApplicationThe ffhistory data set contains information about the history of each frequent flyer.
This history information consists of each membership level that the flyer attained (Bronze,
Silver, or Gold) the year that the flier attained each level.
Create a report that shows all frequent flyers who attained Silver membership status and the year each of them became Silver members.
6969
A Search Applicationffhistory
To determine who attained Silver membership status, search the Status variable for the value Silver.
Seat ID Status Pref
F31351 Silver 1998,Gold 2000 AISLEF161 Bronze 1999 WINDOWF212 Bronze 1992,silver 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE
7070
The FIND function searches for a specific substring of characters within a character string that you specify, and returns its location.
The FIND function returns the starting position of the first occurrence of value
within target, if value is found 0, if value is not found.
The FIND Function
Position = FIND(target,value<,modifiers,startpos>);Position = FIND(target,value<,modifiers,startpos>);
7171
The FIND FunctionA modifier can be the value of I or T. I indicates that the search is case insensitive. T indicates that the search ignores trailing blanks. They can be combined. If they are omitted, the search is case sensitive and trailing blanks are taken into consideration.
The startpos is an integer that specifies the position at which the search should start and the direction of the search.
Positive values = forward (right)
Negative values= backward (left)
If omitted, the search starts at position 1 and moves right.
7272
PosN 8
24
Text="This target contains a BULL'S-EYE.";Pos=find(Text,"BULL'S-EYE");
The FIND FunctionDetermine whether Text contains the string
BULL'S-EYE.
Text$ 34
This target contains a BULL'S-EYE.
24
7373
The FIND Function
data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');run;
6
6
Pos1 N 8
...
7474
The FIND Function
data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');run;
86
Pos2N 8
Pos1 N 8
8
...
7575
The FIND Function
data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');run;
086
Pos3N 8
Pos2N 8
Pos1 N 8
...
7676
The FIND Function
data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');run;
086
Pos3N 8
Pos2N 8
Pos1 N 8
Pos4N 8
6
7777
A Search Application
Seat ID Status Pref
F31351 Silver 1998,Gold 2000 AISLEF161 Bronze 1999 WINDOWF212 Bronze 1992,silver 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE
prog2.ffhistory
data silver; set prog2.ffhistory; if find(Status,'silver','I') > 0;run;
7878
Seat ID Status Pref
F31351 Silver 1998,Gold 2000 AISLEF212 Bronze 1992,silver 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE
proc print data=silver noobs;run;
A Search Application
PROC PRINT Output
7979
The INDEX FunctionThe INDEX function searches a character argument for the location of a specified character value and returns its location.
The INDEX function returns the starting position of the first occurrence of value
within target, if value is found 0, if value is not found.
Position=INDEX(target,value);Position=INDEX(target,value);
8080
The INDEX FunctionTarget = specifies the character expression to search.
Value= specifies the string of characters to search for in the character expression.
The search for value is literal. Capitalization and blanks are considered.
The INDEX function differs from the FIND function : does not have a modifier does not have startpos functionality
8181
PosN 8
24
Text="This target contains a BULL'S-EYE.";Pos=index(Text,"BULL'S-EYE");
The INDEX FunctionDetermine whether Text contains the string
BULL'S-EYE.
TEXT$ 34
This target contains a BULL'S-EYE.
24
8282
The INDEX Function
Pos1 N 8
6
data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=index(Text,'IT'); Pos2=index(Text,' IT '); Pos3=index(Text,'it');run;
6
...
8383
The INDEX Function
Pos2N 8
8
Pos1N 8
6
data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=index(Text,'IT'); Pos2=index(Text,' IT '); Pos3=index(Text,'it');run;
8
...
8484
The INDEX Function
Pos3N 8
0
Pos2N 8
8
Pos1N 8
6
data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=index(Text,'IT'); Pos2=index(Text,' IT '); Pos3=index(Text,'it');run;
8585
The INDEX Functiondata index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;
IT
String$ 5
...
8686
The INDEX Functiondata index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;
IT
String$ 5
Pos4N 8
0
...
8787
The INDEX Functiondata index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;
IT
String$ 5
6
Pos5N 8
Pos4N 8
0
6
...
8888
The INDEX Functiondata index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;
8
IT
String$ 5
Pos4N 8
0 6
Pos5N 8
8
Pos6N 8
8989
A Search Application
Seat ID Status Pref
F31351 Silver 1998,Gold 2000 AISLEF161 Bronze 1999 WINDOWF212 Bronze 1992,silver 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE
prog2.ffhistory
data silver; set prog2.ffhistory; if index(Status,'Silver') > 0;run;
9090
Why was F212 not selected?
Seat ID Status Pref
F31351 Silver 1998,Gold 2000 AISLEF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE
proc print data=silver noobs;run;
A Search Application
PROC PRINT Output
9191
Why was F212 not selected?
For F212, silver was stored in lowercase. You searched for Silver.
Seat ID Status Pref
F31351 Silver 1998,Gold 2000 AISLEF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE
proc print data=silver noobs;run;
A Search Application
PROC PRINT Output
9292
The UPCASE FunctionThe UPCASE function converts all letters in its argument to uppercase has no effect on digits and special characters.
NewVal=UPCASE(argument);NewVal=UPCASE(argument);
9393
A Search Applicationdata silver(drop=Location); length Year $ 4; set prog2.ffhistory; Location=index(upcase(Status),'SILVER'); if Location > 0; Year=substr(Status,Location+7,4);run;
proc print data=silver noobs; var ID Status Year SeatPref;run;
9494
A Search Application
Did the values of Status permanently change?
Seat ID Status Year Pref
F31351 Silver 1998,Gold 2000 1998 AISLEF212 Bronze 1992,silver 1995 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 1998 AISLE
PROC PRINT Output
Year=substr(Status,Location+7,4);
Location Location+7
9595
The PROPCASE FunctionThe PROPCASE function converts all words in an argument to proper case, in which the first letter is uppercase and the remaining letters are lowercase.
Default delimiters for the PROPCASE function are the blank, forward slash, hyphen, open parenthesis, period, and tab characters.
NewVal=PROPCASE(argument <,delimiter(s)>);NewVal=PROPCASE(argument <,delimiter(s)>);
9696
A Search Applicationdata silver(drop=Location); length Year $ 4; set prog2.ffhistory; Status=propcase(Status,' ,'); Location=find(Status,'Silver'); if Location > 0;
SeatPref=propcase(SeatPref);
Year=substr(Status,Location+7,4);run;
proc print data=silver noobs; var ID Status Year SeatPref;run;
9797
A Search Application
Seat ID Status Year Pref
F31351 Silver 1998,Gold 2000 1998 AisleF212 Bronze 1992,Silver 1995 1995 WindowF25122 Bronze 1994,Gold 1996,Silver 1998 1998 Aisle
PROC PRINT Output
9898
The TRANWRD FunctionThe TRANWRD function replaces or removes all occurrences of a given word (or a pattern of characters) within a character string.
The TRANWRD function does not remove trailing blanks from target or replacement.
source source string that you want translated
target string searched for in source
replacement string that replaces the target
NewVal=TRANWRD(source,target,replacement);NewVal=TRANWRD(source,target,replacement);
9999
The TRANWRD FunctionIf the length of the new variable is not previously defined with a LENGTH statement, the default length is 200 bytes.
Using the TRANWRD function to replace an existing string with a longer string might cause truncation of the resulting value if a LENGTH statement is not used.
100100
Dessert$ 20
Apple pie
Dessert$ 20
Pumpkin pie
Dessert=tranwrd(Dessert,'Pumpkin','Apple');
The TRANWRD FunctionReplace the first word of Dessert.
101101
A Search Applicationdata silver(drop=Location); length Year $ 4; set prog2.ffhistory; Status=tranwrd(Status,'silver','Silver'); Location=index(Status,'Silver'); if Location > 0; Year=substr(Status,Location+7,4);run;
proc print data=silver noobs; var ID Status Year SeatPref;run;
102102
A Search Application
Seat ID Status Year Pref
F31351 Silver 1998,Gold 2000 1998 AISLEF212 Bronze 1992,Silver 1995 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 1998 AISLE
PROC PRINT Output
103103
The LOWCASE FunctionThe LOWCASE function converts all letters in its argument to lowercase has no effect on digits and special characters.
NewVal=LOWCASE(argument);NewVal=LOWCASE(argument);
104104
The SUBSTR Function (Left Side)The SUBSTR function is used to extract or replace characters.
This form of the SUBSTR function (left side of the assignment statement) replaces characters in a character variable.
SUBSTR(string,start<,length>)=value;SUBSTR(string,start<,length>)=value;
105105
The SUBSTR Function (Left Side)string specifies a character variable
start specifies a numeric expression that is the beginning character position
length specifies a numeric expression that is the length of the substring to be replaced
The length value cannot be larger than the remaining length of string after start (including trailing blanks).
If you omit length, SAS uses all the characters on the right side of the assignment statement to replace the values of string up to the limit.
106106
Replace two characters from Location starting at position 11.
The SUBSTR Function (Left Side)
substr(Location,11,2)='OH';
Location$ 18
Columbus, OH 43227
Location$ 18
Columbus, GA 43227
107107
data silver; set silver; substr(SeatPref,2)= lowcase(substr(SeatPref,2));run;
The LOWCASE Function
SeatPref$ 6
AISLE
...
108108
data silver; set silver; substr(SeatPref,2)= lowcase(substr(SeatPref,2));run;
The LOWCASE Function
SeatPref$6
AISLE
isle
...
109109
data silver; set silver; substr(SeatPref,2)= lowcase(substr(SeatPref,2));run;
SeatPref$6
A
The LOWCASE Function
SeatPref$6
AISLE
isle
isle
110110
proc print data=silver noobs; var ID Year SeatPref;run;
A Search Application
Seat ID Year Pref
F31351 1998 Aisle F212 1995 Window
F25122 1998 Aisle
PROC PRINT Output
112112
Exercises
The Pizza Company tracks feedback from customers. The person
who recorded the data misspelled the word received in the data
set Complaint.
Write a program to correct this data mistake. Create a new data set
called Complaints2 that contains the corrected data. Create a list report to view the results.
113113
Exercises – Solution
libname apex 'SAS-directory';data complaints2; set apex.complaint; Complaint = tranwrd(Complaint, 'recieve', 'receive');run;proc print data=complaints2;run;
115115
Exercises
The SAS Snack Company wants to create categories for snacks.
Using the Snacks data set, create a temporary SAS data set
named Snack_new.
1. Create a variable called Category. The value will depend on the product and should be one of the following: Chips, Pretzels, Pretzel Sticks, Popcorn, Pork Rinds, Crackers. or Puffs. (Note: Pretzel Sticks should go to the Pretzel Category, as well as other snacks with Pretzel in the name.)
2. Create a variable called TotalSales that is the product of the quantity sold and the price. Format the value with dollar signs and two decimal places.
3. Create a frequency report of Category to make sure that all the rows are mapped correctly.
4. Create a list report to view the results.
116116
Exercises - Solution
data snack_new;
set snacks;
length Category $ 12;
if index(lowcase(Product), 'chip') >0 then Category = "Chips";
else if index(lowcase(Product), 'pretzel') >0 then Category = "Pretzels";
else if index(lowcase(Product), 'stick') >0 then Category = "Sticks";
else if index(lowcase(Product), 'popcorn') >0 then Category = "Popcorn";
else if index(lowcase(Product), 'pork rinds') >0 then
Category = "Pork Rinds";
else if index(lowcase(Product), 'cracker') >0 or
index(lowcase(Product), 'saltine') >0 then Category = "Crackers";
else if index(lowcase(Product), 'puffs') >0 then Category = "Puffs";
TotalSales = QtySold*Price;
format TotalSales dollar20.2;
run;
proc freq data=snack_new;
tables category;
run;
117117
Exercises - Output
The FREQ Procedure
Cumulative Cumulative Category Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chips 11242 31.43 11242 31.43 Crackers 5110 14.29 16352 45.71 Popcorn 4088 11.43 20440 57.14 Pork Rinds 3066 8.57 23506 65.71 Pretzels 3066 8.57 26572 74.29 Puffs 3066 8.57 29638 82.86 Sticks 6132 17.14 35770 100.00
118118
Exercises
As part of the financial aid process, MIT helps students identify scholarship opportunities from external sources.
Use the People data set to create the temporary
Prairie data set.
Use the appropriate function to search through values of CityState to identify only those applicants from Illinois.
119119
Exercises
data prairie; set prog2.people; if index(CityState,' IL') > 0;run;proc print data=Prairie;run;
The SAS System
Obs Name CityState
1 LAFF, STANLEY X. SPRINGFIELD, IL 2 RIZEN, GEORGE Q. CHICAGO, IL 3 MITCHELL, MARC J. CHICAGO, IL
120120
Exercises
In order to mail the scholarship information to the proper applicants, MIT need the address information in the proper format for a letter.
1. Use the variable Name from Prairie to create a
data set called mixedprairie that contains the
values of Name.
2. Convert Name from uppercase to mixed case.
121121
Exercises
data mixedprairie; set prairie; Name = propcase(Name);run;
proc print data=mixedprairie;run;
123123
Objectives Use SAS functions to truncate numeric values. Use SAS functions to compute sample statistics of
numeric values.
124124
Truncation FunctionsSelected functions that truncate numeric values include ROUND function CEIL function FLOOR function INT function.
125125
The ROUND FunctionThe ROUND function returns a value rounded to the nearest round-off unit.
If round-off-unit is not provided, argument is rounded to the nearest integer.
NewVar=ROUND(argument<,round-off-unit>);NewVar=ROUND(argument<,round-off-unit>);
126126
The ROUND Function
NewVar1
12
data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;
...
127127
The ROUND Function
NewVar2
42.7
NewVar1
12
data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;
...
128128
The ROUND Function
NewVar3
6.48
NewVar2
42.7
NewVar1
12
data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;
...
129129
The ROUND Function
NewVar4
100
NewVar3
6.48
NewVar2
42.7
NewVar1
12
data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;
130130
The CEIL FunctionThe CEIL function returns the smallest integer greater than or equal to the argument.
3 4 5X
4.4
x=5;
NewVar=CEIL(argument);NewVar=CEIL(argument);
x=ceil(4.4);
131131
The FLOOR FunctionThe FLOOR function returns the greatest integer less than or equal to the argument.
X3.6
NewVar=FLOOR(argument);NewVar=FLOOR(argument);
3 4 5
y=3;y=floor(3.6);
132132
The INT FunctionThe INT function returns the integer portion of the argument.
NewVar=INT(argument);NewVar=INT(argument);
X3.9
3 4 5
z=3;z=int(3.9);
133133
Truncation Functions
NewVar1
7
Var1
6.478
data truncate; Var1=6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;
...
134134
Truncation Functions
NewVar2
6
NewVar1
7
Var1
6.478
data truncate; Var1=6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;
...
135135
Truncation Functions
NewVar3
6
NewVar2
6
NewVar1
7
Var1
6.478
data truncate; Var1=6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;
136136
Truncation Functions
NewVar3
-6
NewVar2
-7
NewVar1
-6
Var1
-6.478
data truncate; Var1=-6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;
Use the same functions with a negative value for the variable Var1.
For values greater than 0, the FLOOR and INT functions return the same value. For values less than 0, the CEIL and INT functions return the same value.
137137
Functions That Compute StatisticsSelected functions that compute sample statistics based on a group of values include the following: SUM function (total of values) MEAN function (average of values) MIN function (lowest value) MAX function (highest value)
These functions accept multiple arguments in any order use the same algorithm as SAS statistical procedures ignore missing values.
138138
Functions That Compute StatisticsThe SUM function adds values together and ignores missing values.
The MIN function returns the smallest non-missing value.
The MAX function returns the largest value.
NewVar=SUM(argument-1,argument-2,…,argument-n);NewVar=SUM(argument-1,argument-2,…,argument-n);
MIN(argument-1, argument-2,…, argument-n);MIN(argument-1, argument-2,…, argument-n);
MAX(argument-1, argument-2,…, argument-n);MAX(argument-1, argument-2,…, argument-n);
139139
The SUM Function
NewVar
18
Var3
6
Var2
.
Var1
12
data summary; Var1=12; Var2=.; Var3=6; NewVar=sum(Var1,Var2,Var3);run;
...
140140
The SUM Function
What would be the value of NewVar if an arithmetic operator were used instead of the SUM function?
NewVar
18
Var3
6
Var2
.
Var1
12
data summary; Var1=12; Var2=.; Var3=6; NewVar=sum(Var1,Var2,Var3);run;
...
141141
The SUM Function
What would be the value of NewVar if an arithmetic operator were used instead of the SUM function?
Missing
NewVar
18
Var3
6
Var2
.
Var1
12
data summary; Var1=12; Var2=.; Var3=6; NewVar=sum(Var1,Var2,Var3);run;
142142
The MEAN FunctionThe MEAN function calculates the arithmetic mean (average) of values and ignores missing values.
NewVar=MEAN(argument-1,argument-2,…,argument-n);NewVar=MEAN(argument-1,argument-2,…,argument-n);
143143
The MEAN Function
NewVar
9
Var3
6
Var2
.
Var1
12
data summary; Var1=12; Var2=.; Var3=6; NewVar=mean(Var1,Var2,Var3);run;
145145
Exercises
Final grades are coming.
Use the data set Grade to create a data set named
Final. The Final data set should contain a variable
named Overall that is the semester average grade.
Calculate Overall by averaging all the tests plus the final. The final is weighted twice as much as any of the other tests. (Count the final twice when calculating Overall.)
Round Overall to the nearest integer.
146146
Exercises
data final; set prog.grade; Overall=round(mean(Test1,Test2,Test3,Final,Final));run;proc print data=final;run;
data final; set prog.grade; Overall=round(mean(of Test1-Test3,Final,Final));proc print data=final;run;
Alternate Solution
147147
Exercises
The SAS System
SSN Course Test1 Test2 Test3 Final Overall
012-40-4928 BUS450 80 70 80 80 78012-83-3816 BUS450 90 90 60 80 80341-44-0781 MATH400 78 87 90 91 87423-01-7721 BUS450 80 70 75 95 83448-23-8111 MATH400 88 91 100 95 94723-14-8422 HIST100 88 90 91 95 92819-32-1294 HIST100 67 80 60 70 69831-34-2411 MATH400 72 76 82 79 78837-33-8374 HIST100 90 99 87 96 94877-22-7731 MATH400 87 85 80 78 82880-90-0783 HIST400 50 70 78 80 72920-22-0209 MATH400 79 87 81 82 82973-34-2119 BUS450 80 75 88 90 85877-22-7731 SCI400 80 70 80 80 78012-40-4928 FRENCH100 80 70 80 80 78819-32-1294 FRENCH100 67 80 60 70 69819-32-1294 BUS450 67 80 60 70 69723-14-8422 SCI400 79 87 81 82 82837-33-8374 SCI400 79 87 81 82 82
148148
Exercises
Final grades are coming.
Modify the DATA step in the previous exercise so that the value of Overall is the average of the two highest test scores and the final. (The lowest test should not be used to calculate Overall.)
As before, the final exam should be counted twice.
Round Overall to the nearest integer.
149149
Exercises
data final(drop=OverallTotal); set prog2.grade; OverallTotal=sum(Test1,Test2,Test3,Final,Final)- min(Test1,Test2,Test3); Overall=round(OverallTotal/4);run;proc print data=final;run;
data final(drop=OverallTotal); set prog2.grade; OverallTotal=sum(of Test1-Test3,Final,Final)- min(of Test1-Test3); Overall=round(OverallTotal/4);proc print data=final;run;
Alternate Solution
150150
Exercises
The SAS System
Obs SSN Course Test1 Test2 Test3 Final Overall
1 012-40-4928 BUS450 80 70 80 80 80 2 012-83-3816 BUS450 90 90 60 80 85 3 341-44-0781 MATH400 78 87 90 91 90 4 423-01-7721 BUS450 80 70 75 95 86 5 448-23-8111 MATH400 88 91 100 95 95 6 723-14-8422 HIST100 88 90 91 95 93 7 819-32-1294 HIST100 67 80 60 70 72 8 831-34-2411 MATH400 72 76 82 79 79 9 837-33-8374 HIST100 90 99 87 96 95 10 877-22-7731 MATH400 87 85 80 78 82 11 880-90-0783 HIST400 50 70 78 80 77 12 920-22-0209 MATH400 79 87 81 82 83 13 973-34-2119 BUS450 80 75 88 90 87 14 877-22-7731 SCI400 80 70 80 80 80 15 012-40-4928 FRENCH100 80 70 80 80 80 16 819-32-1294 FRENCH100 67 80 60 70 72 17 819-32-1294 BUS450 67 80 60 70 72 18 723-14-8422 SCI400 79 87 81 82 83 19 837-33-8374 SCI400 79 87 81 82 83
152152
Objectives Review SAS functions used to create SAS date
values. Review SAS functions used to extract information
from SAS date values. Use SAS functions to determine the intervals between
two SAS date values.
153153
Creating SAS Date ValuesYou can use the MDY or TODAY functions to create SAS date values.
The MDY function creates a SAS date value from month, day, and year values.
The TODAY function returns the current date as a SAS date value.
NewDate=MDY(month,day,year);NewDate=MDY(month,day,year);
NewDate=TODAY();NewDate=TODAY();
154154
Extracting InformationYou can use the MONTH, DAY, and YEAR functions to extract information from SAS date values.
The MONTH function creates a numeric value (1-12) that represents the month of a SAS date value.
NewMonth=MONTH(SAS-date-value);NewMonth=MONTH(SAS-date-value);
continued...
155155
Extracting InformationThe DAY function creates a numeric value (1-31) that represents the day of a SAS date value.
The YEAR function creates a four-digit numeric value that represents the year.
NewDay=DAY(SAS-date-value);NewDay=DAY(SAS-date-value);
NewYear=YEAR(SAS-date-value);NewYear=YEAR(SAS-date-value);
continued...
156156
Calculating an Interval of YearsThe YRDIF function returns the number of years between two SAS date values.
NewVal=YRDIF(sdate,edate,basis);NewVal=YRDIF(sdate,edate,basis);
continued...
157157
Calculating an Interval of Years'ACT/ACT' Actual number of days between dates in
calculating the number of years. # of days that fall in a 365 year/365 plus # of days in a 366 year/366. ACTUAL is the alias.
'30/360' Each month is considered to have 30 days and each year 360 days regardless of the actual number. '360' is the alias.
'ACT/360' Actual number of days between dates in calculating the number of years regardless of the actual number of days in a year.
'ACT/365' Actual number of days between dates to calculate years. Number of days divided by 365 regardless of the actual number in a year.
158158
The YRDIF FunctionThe variable DOB represents a person's date of birth.
Assume today's date is May 3, 2008, and DOB is November 8, 1972. What is this person's age?
The DATDIF function can be used to return the number of days between two SAS date values.
Only two basis values are valid for the DATDIF function:
'ACT/ACT' and '30/360')
MyVal=yrdif(DOB,'3may2008'd,'act/act');
159159
The YRDIF FunctionThe variable DOB represents a person's date of birth.
Assume today's date is May 3, 2008, and DOB is November 8, 1972. What is this person's age?
MyVal=yrdif(DOB,'3may2008'd,'act/act');
MyVal
35.4836
...
160160
The YRDIF FunctionThe variable DOB represents a person's date of birth.
Assume today's date is May 3, 2008, and DOB is November 8, 1972. What is this person's age?
How can you alter this program to compute each employee's age based on today's date truncate all of the decimal places without rounding?
MyVal=yrdif(DOB,'3may2008'd,'act/act');
MyVal
35.4836
...
161161
The YRDIF FunctionHow can you alter this program to compute each employee's age based on today's date truncate all of the decimal places without rounding?
This code was run on July 13, 2008. Your values will differ.
int(yrdif(DOB, today(),'act/act'));
MyVal
35
163163
Exercises
In order to vote in most states, voters must be 18 years of age by the date of the election. Use the Register data set to create two new data sets
called Voters and NonVoters. Use the existing BirthMonth, Day, and BirthYear to
create a new variable called Birthday that stores the SAS date value for each voter's birth date.
Create a second new variable called Age that stores the number of years between each voter's birthday and today.
The values of Birthday should be displayed with the DATE9. format. The value of Age should be truncated to remove all decimals without rounding.
Produce two listing reports with appropriate titles.
164164
Exercises
data voters nonvoters; keep StudentId Name Birthday Age; set prog.register; Birthday=mdy(BirthMonth,Day,BirthYear);
/* The FLOOR function could be used in the following assignment statement. */ Age=int(yrdif(Birthday,today(),'act/act')); format Birthday date9.; if Age >= 18 then output voters; else output nonvoters;run;title 'Students who are over 18 and can vote';proc print data=voters;run;title 'Students who are not over 18 and cannot vote';proc print data=nonvoters;run;
165165
Exercises
Partial Listing of Voters
Your output will contain a different number of rows.
Students who are over 18 and can vote
StudentObs ID Name Birthday Age
1 1155 Angel Reisman 23JUN1987 19 2 1266 Melanie Michaels 17MAR1988 18 3 2055 Faith Sadowski 06FEB1988 18 4 2561 Dorothy Gilbert 16APR1988 18 5 2584 Patrice Ray 18AUG1988 18 6 2587 Jeremiah Ashford 26SEP1987 19 7 2600 Alisha Gurman 21DEC1987 18 8 2606 Gustavo Spencer 16SEP1988 18 9 2681 Ryan Lin 03MAR1988 18 10 3213 Thomas Gladwell 09JUN1988 18 11 3250 Misty Orlowski 22OCT1987 19 12 3456 Ruby Abdul 09JUN1988 18
166166
Exercises
Partial Listing of NonVoters Students who are not over 18 and cannot vote
StudentObs ID Name Birthday Age
1 1005 Chaz Richardson 21JUN1989 17 2 1154 Barbara Muir 04APR1990 16 3 1245 Leticia Ritter 27MAR1990 16 4 1257 Richard Calle 01SEP1989 17 5 1258 Ronnie Trimpin 03OCT1989 17 6 2001 Troy Pruska 05AUG1989 17 7 2006 Annie Ritter 25FEB1990 16 8 2046 Derrick Ikiz 05MAR1989 17 9 2334 Jesse Liu 03AUG1991 15 10 2335 Taylor Lowet 07AUG1989 17 . . .
Your output will contain a different number of rows.
168168
Objectives Describe the automatic conversion of character data
into numeric data. Explicitly convert character data into numeric data. Describe the automatic conversion of numeric data
into character data. Explicitly convert numeric data into character data.
169169
Data ConversionIn many applications, you might need to convert one data type to another. You might need to read digits in character form
into a numeric value. You might need to write a numeric value to a
character string.
170170
Data ConversionYou can convert data types by using one of the following methods: implicitly by enabling SAS to do it for you explicitly with these functions:
– INPUT character-to-numeric conversion– PUT numeric-to-character conversion
171171
Automatic Character-to-Numeric ConversionThe prog2.salary1 data set contains a character
variable GrossPay. Compute a 10-percent bonus for each employee.
What will happen when the character values of GrossPay are used in an arithmetic expression?
172172
Automatic Character-to-Numeric Conversionprog2.salary1
data bonuses; set prog2.salary1; Bonus=.10*GrossPay;run;
ID GrossPay $11 $5
201-92-2498 52000 482-87-7945 32000
330-40-7172 49000
173173
Partial Log
2 data bonuses;3 set prog2.salary1;4 Bonus=.10*GrossPay;5 run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 4:14NOTE: The data set WORK.BONUSES has 3 observations and 3 variables.
Automatic Character-to-Numeric Conversion
174174
ID GrossPay Bonus
201-92-2498 52000 5200482-87-7945 32000 3200330-40-7172 49000 4900
proc print data=bonuses noobs; run;
Automatic Character-to-Numeric Conversion
PROC PRINT Output
175175
SAS automatically converts a character value to a numeric value when the character value is used in a numeric context, such as the following: assignment to a numeric variable an arithmetic operation logical comparison with a numeric value a function that takes numeric arguments
Automatic Character-to-Numeric Conversion
176176
The automatic conversion uses the w. informat produces a numeric missing value from a character
value that does not conform to standard numeric notation (digits with optional decimal point and/or leading sign and/or E-notation).
Automatic Character-to-Numeric Conversion
177177
Automatic conversion
Charactervalue
52000
1.243E1
-8.96
1,742.64
Numericvalue
52000
12.43
-8.96
.
Automatic Character-to-Numeric Conversion
178178
The INPUT FunctionThe INPUT function is used primarily for converting character values to numeric values.
The INPUT function returns the value produced when source is read with informat.
NumVar=INPUT(source,informat);NumVar=INPUT(source,informat);
179179
The INPUT Functiondata conversion; CVar1='32000'; CVar2='32,000'; CVar3='03may2008'; CVar4='050308'; NVar1=input(CVar1,5.); NVar2=input(CVar2,comma6.); NVar3=input(CVar3,date9.); NVar4=input(CVar4,mmddyy6.);run;
180180
----Alphabetic List of Variables and Attributes----
# Variable Type Len ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 CVar1 Char 5 2 CVar2 Char 6 3 CVar3 Char 9 4 CVar4 Char 6 5 NVar1 Num 8 6 NVar2 Num 8 7 NVar3 Num 8 8 NVar4 Num 8
The INPUT Function
Partial PROC CONTENTS Output
proc contents data=conversion; run;
181181
CVar1 CVar2 CVar3 CVar4 NVar1
32000 32,000 03may2008 050308 32000
NVar2 NVar3 NVar4
32000 17655 17655
The INPUT Functionproc print data=conversion noobs; run;
PROC PRINT Output
182182
Explicit Character-to-Numeric ConversionThe values of the variable GrossPay in the SAS data
set prog2.salary2 contain commas. Attempt to use automatic conversion to compute a 10-percent bonus.
prog2.salary2
ID GrossPay $11 $6
201-92-2498 52,000 482-87-7945 32,000
330-40-7172 49,000
183183
Explicit Character-to-Numeric Conversiondata bonuses; set prog2.salary2; Bonus=.10*GrossPay;run;
proc print data=bonuses;run;
PROC PRINT Output
ID GrossPay Bonus
201-92-2498 52,000 . 482-87-7945 32,000 . 330-40-7172 49,000 .
184184
Explicit Character-to-Numeric Conversiondata bonuses; set prog2.salary2; Bonus=.10*input(GrossPay,comma6.);run;
proc print data=bonuses;run;
ID GrossPay Bonus
201-92-2498 52,000 5200 482-87-7945 32,000 3200 330-40-7172 49,000 4900
PROC PRINT Output
c05s5d2.sas
185185
Data Conversionproc contents data=bonuses; run;
Partial PROC CONTENTS Output
----Alphabetic List of Variables and Attributes----
# Variable Type Len ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 Bonus Num 8 2 GrossPay Char 6 1 ID Char 11
How can you convert GrossPay to a numeric variable with the same name?
186186
GrossPay=input(GrossPay,comma6.);
Data ConversionYou cannot convert data by assigning the converted variable value to a variable with the same name.
This assignment statement does not change GrossPay
from a character variableto a numeric variable.
This assignment statement does not change GrossPay
from a character variableto a numeric variable.
187187
Data ConversionOn the left side of the assignment statement, you want GrossPay to be numeric. However, on the right side
of the assignment statement, GrossPay is character.
GrossPay=input(GrossPay,comma6.);
A variable is character or numeric. After the
variable type is established, it cannot be changed.
A variable is character or numeric. After the
variable type is established, it cannot be changed.
188188
Data ConversionFirst, use the RENAME= data set option to rename the variable that you want to convert.
To rename more than one variable from the same data set, separate with a space the variables to rename.
data bonuses; set prog2.salary2(rename=(GrossPay= CharGross)); additional SAS statementsrun;
SAS-data-set(RENAME=(old-name=new-name))SAS-data-set(RENAME=(old-name=new-name))
189189
Data ConversionSecond, use the INPUT function in an assignment statement to create a new variable whose name is the original name of the variable you renamed previously.
data bonuses; set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); additional SAS statementsrun;
190190
Data ConversionThird, use a DROP= data set option in the DATA statement to exclude the original variable from the output SAS data set.
data bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;
191191
data bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;
Data Conversion: Compilation
ID $ 4
CharGross $ 6
PDV
...
192192
Data Conversion: Compilationdata bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;
GrossPay N 8
ID $ 4
CharGross $ 6
PDV
...
193193
Data Conversion: Compilationdata bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;
ID $ 4
CharGross $ 6
Bonus N 8
GrossPay N 8
PDV
...
194194
Data Conversion: Compilationdata bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;
ID $ 4
CharGross $ 6
Bonus N 8
GrossPay N 8
PDV
D
195195
Converting Character Dates to SAS Datesprog2.born
How can you alter this program to compute each person's age based on today's date?
Name Date $12 $7
Ruth, G. H. 13apr72 Delgado, Ed 25aug68 Overby, Phil 08jun71
data birth(drop=Date); set prog2.born; Birthday=input(Date,date7.); Age=int(yrdif(Birthday,'3may2008'd, 'ACT/ACT')); run;
196196
Converting Character Dates to SAS Datesprog2.born
Use the TODAY() function.
Name Date $12 $7
Ruth, G. H. 13apr72 Delgado, Ed 25aug68 Overby, Phil 08jun71
data birth(drop=Date); set prog2.born; Birthday=input(Date,date7.); Age=int(yrdif(Birthday, today(), 'ACT/ACT')); run;
197197
Name Birthday Age
Ruth, G. H. 4486 36Delgado, Ed 3159 39Overby, Phil 4176 36
Converting Character Dates to SAS Datesproc print data=birth noobs;run;
PROC PRINT Output
198198
Automatic Numeric-to-Character ConversionThe prog2.phones data set contains a numeric
variable Code (area code) and a character variable
Telephone (telephone number). Create a character variable that contains the area code in parentheses followed by the telephone number.
199199
prog2.phones
data phonenumbers; set prog2.phones; Phone='(' !! Code !! ') ' !! Telephone;run;
Code Telephone N8 $8
303 393-0956 919 770-8292 301 449-5239
Automatic Numeric-to-Character Conversion
200200
Partial Log
13 data phonenumbers;14 set prog2.phones;15 Phone='(' !! Code !! ') ' !! Telephone;16 run;
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
15:17NOTE: The data set WORK.PHONENUMBERS has 3
observations and 3 variables.
Automatic Numeric-to-Character Conversion
201201
Code Telephone Phone
303 393-0956 ( 303) 393-0956919 770-8292 ( 919) 770-8292301 449-5239 ( 301) 449-5239
proc print data=phonenumbers noobs; run;
Automatic Numeric-to-Character Conversion
PROC PRINT Output
202202
Automatic Numeric-to-Character ConversionSAS automatically converts a numeric value to a character value when the numeric value is used in a character context, such as assignment to a character variable a concatenation operation a function that accepts character arguments.
203203
The automatic conversion uses the BEST12. format right-aligns the resulting character value.
Automatic conversion
Numericvalue:
8 bytes
303
Charactervalue:
12 bytes
303
9 leadingblanks
Automatic Numeric-to-Character Conversion
204204
data phonenumbers; set prog2.phones; Phone='(' !! Code !! ') ' !! Telephone;run;
Phone$23
( 303) 393-0956
9 leadingblanks
Automatic Numeric-to-Character Conversion
205205
The PUT function writes values with a specific format.
The PUT function returns the value produced when source is written with format.
The PUT Function
CharVar=PUT(source,format);CharVar=PUT(source,format);
206206
The PUT Functiondata conversion; NVar1=614; NVar2=55000; NVar3=366; CVar1=put(NVar1,3.); CVar2=put(NVar2,dollar7.); CVar3=put(NVar3,date9.); run;
207207
-----Variables Ordered by Position-----
# Variable Type Len ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 NVar1 Num 8 2 NVar2 Num 8 3 NVar3 Num 8 4 CVar1 Char 3 5 CVar2 Char 7 6 CVar3 Char 9
The PUT Function
The VARNUM option in the PROC CONTENTS statement prints a list of the variables by their logical position in the data set.
Partial PROC CONTENTS Output
proc contents data=conversion varnum;run;
208208
NVar1 NVar2 NVar3 CVar1 CVar2 CVar3 614 55000 366 614 $55,000 01JAN1961
The PUT Functionproc print data=conversion noobs; run;
PROC PRINT Output
209209
20 data phonenumbers;21 set prog2.phone;22 Phone='(' !! put(Code,3.) !! ') ' !! Telephone;23 run;
NOTE: The data set WORK.PHONENUMBERS has 3 observations and 3 variables.
Explicit Numeric-to-Character Conversiondata phonenumbers; set prog2.phones; Phone='(' !! put(Code,3.) !! ') ' !! Telephone;run;
Partial Log
210210
Code Telephone Phone
303 393-0956 (303) 393-0956 919 770-8292 (919) 770-8292 301 449-5239 (301) 449-5239
proc print data=phonenumbers noobs; run;
Automatic Numeric-to-Character Conversion
PROC PRINT Output
212212
Exercises
Use the data set prog.students to create a new data set named students. Create a new character variable Telephone that has this pattern xxx-xxxx, where XXXXXXX is the value of Number.
Recall the previous program and alter it to create a new numeric variable Birthday from the DOB variable. Birthday should contain the MMDDYY10. format for SAS data values.
When you are confident that both variables were converted correctly, use a DROP= or KEEP= data set option to ensure that the only variables in the students data set are SSN, Telephone, and Birthday.
213213
Exercises
libname prog 'SAS-directory';data students(drop=Number DOB); set prog.students; /* The PUT function is used to convert NUMBER from numeric to character. The resulting character value is manipulated with the SUBSTR function to extract the first 3 characters then the last 4 characters. */
Telephone=substr(put(Number,7.),1,3) || '-' || substr(put(Number,7.),4);
/* The INPUT function is used to convert DOB from character to numeric. The character values are in the form ddMMMyyyy, so the DATE9. informat is used in the conversion. */ Birthday=input(DOB,date9.); format Birthday mmddyy10.;run;proc print data=students;run;
214214
Exercises
The SAS System
SSN Telephone Birthday
012-40-4928 546-7887 12/05/1968 012-83-3816 688-8321 05/03/1965 341-44-0781 941-8123 11/23/1972 423-01-7721 783-9191 06/28/1967 448-23-8111 942-8122 11/30/1960 723-14-8422 828-0911 02/12/1964 819-32-1294 387-8181 09/01/1968 831-34-2411 967-7810 12/24/1972 837-33-8374 992-7615 10/06/1971 877-22-7731 233-7449 07/08/1969
215215
Exercises
Use the data set agents to create a data set called
work.agents2. Create a variable called
TrueLocation that puts the City and State together, separated by a comma. (Desired output is shown on the next slide.)
Things to think about:
1. Do you need to do this for all observations?
2. If your output looks unexpected, try using the $QUOTE. format. (You do not need to specify a width.)
216216
Exercises
Work.Agents Output
TrueLocation CityCountry State
Auckland, New Zealand Auckland, New Zealand Amsterdam, Netherlands Amsterdam, Netherlands Anchorage, Alaska Anchorage, USA Alaska Canberra, Australia Canberra, Australia Australian Capital Athens (Athinai), Greece Athens (Athinai), Greece Birmingham, Alabama Birmingham, USA Alabama Bangkok, Thailand Bangkok, Thailand Nashville, Tennessee Nashville, USA Tennessee Boston, Massachusetts Boston, USA Massachusetts Kansas City, Missouri Kansas City, USA Missouri
217217
Exercises
libname prog2 'SAS-directory';data work.agents2 (drop=country); set prog2.agents; length Country $ 20 TrueLocation $ 40; Country=left(scan(CityCountry,2,',')); if Country='USA' then /* Note the space before USA */ TrueLocation = scan(CityCountry,1,',') !! ', ' !! State; else /* not USA */ TrueLocation = CityCountry;run;proc print data=work.agents2 noobs; var TrueLocation CityCountry State;run;