the essence of data step programming
DESCRIPTION
The fundamental of SAS programming is DATA step programming. The essence of DATA step programming is to understand how SAS processes the data during the compilation and execution phases. In this paper, you will be exposed to what happens “behind the scenes” while creating a SAS dataset. You will learn how a new dataset is created, one observation at a time, from either a raw text file or an existing SAS dataset, to the program data vector (PDV) and from the PDV to the newly-created SAS dataset. Once you fully understand DATA step processing, learning the SUM and RETAIN statements will become easier to grasp. Relating to this topic, this paper will also cover BY-group processing.TRANSCRIPT
![Page 1: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/1.jpg)
The Essence of DATA Step Programming
Arthur LiCity of Hope Comprehensive Cancer Center
Department of Information Science
![Page 2: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/2.jpg)
INTRODUCTION
SAS programming
DATA step programming
Understanding how SAS processes the data during the compilation and execution phases
Fundamental:
Essence:
![Page 3: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/3.jpg)
A COMMON BEFUDDLEMENT
The newly-created SAS dataset is not what we intended there are more or less observationsthe value of the variable was not retained
correctly
Reason:Learning only SAS language syntaxNot understanding the fundamental SAS
programming concepts
![Page 4: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/4.jpg)
INTRODUCTION
We will cover…what happens “behind the scenes” while creating a
SAS dataset Learn how a new dataset is created
one observation at a time a raw text file/SAS dataset PDVSAS dataset
The SUM and RETAIN statements BY-group processing Transposing dataset examples
![Page 5: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/5.jpg)
DATA STEP PROCESSING OVERVIEW
Compilation phase:Each statement is scanned for syntax errors.
Execution phase:The DATA step reads and processes the input data.
If there is no syntax error
A DATA step is processed in two-phase sequences:
![Page 6: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/6.jpg)
DATA STEP PROCESSING OVERVIEW
Variable names Columns
Name 1-7
Height 9-10
Weight 12-14
Program1:
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
Data Entry Error
The column input method:Each variable is occupied in a fixed fieldThe values are standard character or numerical values
Creating a new variable: BMI
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
![Page 7: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/7.jpg)
COMPILATION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
Used to hold raw dataWill not be created when reading a SAS dataset
![Page 8: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/8.jpg)
COMPILATION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
PDV is created
Memory area where SAS builds its new data set, 1 observation at a time.
_N_ D _ERROR_D
![Page 9: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/9.jpg)
COMPILATION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
PDV is created
Automatic variables:_N_ = 1: 1st observation is being processed_N_ = 2: 2nd observation is being processed
_N_ D _ERROR_D
![Page 10: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/10.jpg)
COMPILATION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
PDV is created
Automatic variables:_ERROR_ = 1: signals the data error of the currently-processed observation
_N_ D _ERROR_D
![Page 11: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/11.jpg)
COMPILATION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
A space is added to the PDV for each variable
_N_ D _ERROR_D Height KName K Weight K
![Page 12: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/12.jpg)
COMPILATION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
BMI is added to the PDV
_N_ D _ERROR_D Height KName K Weight K BMI K
![Page 13: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/13.jpg)
COMPILATION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV _N_ D _ERROR_D Height KName K Weight K BMI K
D = dropped
K = kept
![Page 14: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/14.jpg)
COMPILATION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV _N_ D _ERROR_D Height KName K Weight K BMI K
Checks for syntax errorsinvalid variable names invalid optionsincorrect punctuationsmisspelled keywords
![Page 15: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/15.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV_N_ D _ERROR_D Name K Height K Weight K BMI K
The DATA step works like a loopIt repetitively executes statements
reads data values creates observations one at a time
![Page 16: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/16.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV_N_ D _ERROR_D Name K Height K Weight K BMI K
1st Iteration:At the beginning
1 0
_N_ 1, _ERROR_ 0The remaining variables are set to missing
. . .
![Page 17: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/17.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV_N_ D _ERROR_D Name K Height K Weight K BMI K
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
1st data line input bufferThe input pointer @ the beginning of the input buffer
The INFILE statement identifies the location of Exampl1.txt
1 0 . . .
![Page 18: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/18.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV_N_ D _ERROR_D Name K Height K Weight K BMI K
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
1 0
The INPUT statement reads data values: input buffer PDV
. . .
![Page 19: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/19.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV_N_ D _ERROR_D Name K Height K Weight K BMI K
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
1 0
input buffer (columns 1-7) “Name” in the PDV
Barbara . . .
![Page 20: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/20.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV_N_ D _ERROR_D Name K Height K Weight K BMI K
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
1 0
The input pointer @ column 8
Barbara . . .
![Page 21: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/21.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV_N_ D _ERROR_D Name K Height K Weight K BMI K
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
1 0 . .
input buffer (columns 9-10) “Height” in the PDV
Barbara 61
![Page 22: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/22.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV_N_ D _ERROR_D Name K Height K Weight K BMI K
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
1 0
The input pointer @ column 11
Barbara 61 . .
![Page 23: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/23.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV_N_ D _ERROR_D Name K Height K Weight K BMI K
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
1 0
Tries to read Weight – invalid value
Barbara 61 . .
![Page 24: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/24.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
Tries to read Weight – invalid value _ERROR_ 1
_N_ D _ERROR_D Name K Height K Weight K BMI K
1 1 Barbara 61 . .
![Page 25: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/25.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
The input pointer @ column 15
_N_ D _ERROR_D Name K Height K Weight K BMI K
1 1 Barbara 61 . .
![Page 26: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/26.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
BMI will remain missing: operations on a missing value a missing value.
_N_ D _ERROR_D Name K Height K Weight K BMI K
1 1 Barbara 61 . .
![Page 27: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/27.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
The OUTPUT statement is executed
Only values marked with (K) are copied as a single observation to the SAS dataset ex1
Name Height Weight BMI
1 Barbara 61 . .
Ex1:
_N_ D _ERROR_D Name K Height K Weight K BMI K
1 1 Barbara 61 . .
![Page 28: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/28.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1st Iteration:
B a r b a r a 6 1 1 2 D
At the end of the DATA step, two things occur automatically:
Ex1:
Name Height Weight BMI
1 Barbara 61 . .
_N_ D _ERROR_D Name K Height K Weight K BMI K
1 1 Barbara 61 . .
![Page 29: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/29.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
1. The SAS system returns to the beginning of the DATA step
Ex1:
Name Height Weight BMI
1 Barbara 61 . .
_N_ D _ERROR_D Name K Height K Weight K BMI K
1 1 Barbara 61 . .
![Page 30: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/30.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
2. The values of the variables in the PDV are reset to missing _N_ ↑ 2
_ERROR_ 0
Ex1:
Name Height Weight BMI
1 Barbara 61 . .
_N_ D _ERROR_D Name K Height K Weight K BMI K
2 0 . . .
![Page 31: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/31.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
2nd Iteration:
J o h n 6 2 1 7 5
2nd data line input buffer The input pointer @
beginning of the input buffer
Ex1:
Name Height Weight BMI
1 Barbara 61 . .
_N_ D _ERROR_D Name K Height K Weight K BMI K
2 0 . . .
![Page 32: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/32.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
2nd Iteration:
J o h n 6 2 1 7 5
The INPUT statement is executed
Ex1:
_N_ D _ERROR_D Name K Height K Weight K BMI K
2 0 .62John 175
Name Height Weight BMI
1 Barbara 61 . .
![Page 33: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/33.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
2nd Iteration:
J o h n 6 2 1 7 5
BMI is calculated
Ex1:
_N_ D _ERROR_D Name K Height K Weight K BMI K
2 0 31.867862John 175
Name Height Weight BMI
1 Barbara 61 . .
![Page 34: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/34.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
2nd Iteration:
J o h n 6 2 1 7 5
The OUTPUT statement is executed
Ex1:
_N_ D _ERROR_D Name K Height K Weight K BMI K
2 0 31.867862John 175
Name Height Weight BMI
1 Barbara 61 . .
2 John 62 175 31.8678
![Page 35: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/35.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
2nd Iteration:
J o h n 6 2 1 7 5
At the end of the DATA step, two things occur automatically:
Ex1:
_N_ D _ERROR_D Name K Height K Weight K BMI K
2 0 31.867862John 175
Name Height Weight BMI
1 Barbara 61 . .
2 John 62 175 31.8678
![Page 36: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/36.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
Ex1:
_N_ D _ERROR_D Name K Height K Weight K BMI K
2 0 31.867862John 175
1. The SAS system returns to the beginning of the DATA step
Name Height Weight BMI
1 Barbara 61 . .
2 John 62 175 31.8678
![Page 37: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/37.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV
Barbara 61 12DJohn 62 175
Example1.txt12345678901234567890
Ex1:
2. The values of the variables in the PDV are reset to missing _N_ ↑3
Name Height Weight BMI
1 Barbara 61 . .
2 John 62 175 31.8678
_N_ D _ERROR_D Name K Height K Weight K BMI K
3 0 . . .
![Page 38: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/38.jpg)
EXECUTION PHASE
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
proc print data=ex1;run;
There are no more records to readThe SAS system next DATA/PROC step
![Page 39: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/39.jpg)
THE OUTPUT STATEMENT
data ex1; set example1; BMI = 700*weight/(height*height);
run;
The explicit OUTPUT statement:
write the current observation from the PDV to a SAS dataset immediately
not at the end of the DATA step
output;
![Page 40: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/40.jpg)
THE OUTPUT STATEMENT
data ex1; set example1; BMI = 700*weight/(height*height);
run;
It tells SAS to write observations to the dataset at the end of the DATA step
The implicit OUTPUT statement:
![Page 41: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/41.jpg)
THE OUTPUT STATEMENT
Using explicit OUTPUT will override the implicit OUTPUT
We can use more than one OUTPUT statement in the DATA step
![Page 42: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/42.jpg)
THE DIFFERENCE BETWEEN READING A RAW DATASET AND READING A SAS DATASET
data ex1; infile 'C:\Arthur\example1.txt'; input name $ 1-7 height 9-10 weight 12-14; BMI = 700*weight/(height*height); output;run;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 …
…Input buffer
PDV_N_ D _ERROR_D Name K Height K Weight K BMI K
Barbara 61 12DJohn 62 175Raw data
SAS dataset
When Reading a raw dataset …
Name Height Weight BMI
1 Barbara 61 . .
2 John 62 175 31.8678
![Page 43: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/43.jpg)
THE DIFFERENCE BETWEEN READING A RAW DATASET AND READING A SAS DATASET
data ex1; set example1; BMI = 700*weight/(height*height); output;run;
PDV_N_ D _ERROR_D Name K Height K Weight K BMI K
SAS dataset
When Reading a SAS dataset …
SAS dataset
Input dataset:Example1(after “set”)
Output dataset:Ex1(after “data”)
Name Height Weight
1 Barbara 61 .
2 John 62 175
Name Height Weight BMI
1 Barbara 61 . .
2 John 62 175 31.8678
![Page 44: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/44.jpg)
THE DIFFERENCE BETWEEN READING A RAW DATASET AND READING A SAS DATASET
When reading a raw dataset, SAS sets each variable value in the PDV to missing at the beginning of each iteration of execution, except for …
the automatic variablesvariables that are named in the RETAIN or SUM statementdata elements in a _TEMPORARY_ arrayvariables created in the options of the FILE/INFILE statement
![Page 45: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/45.jpg)
THE DIFFERENCE BETWEEN READING A RAW DATASET AND READING A SAS DATASET
data ex1; set example1; BMI = 700*weight/(height*height); output;run;
PDV
1st Iteration:At the beginning of the
execution phase, SAS sets each variable to missing in the PDV
When Reading a SAS dataset …
Example1: Name Height Weight
1 Barbara 61 170
2 John 62 175
_N_ D _ERROR_D Name K Height K Weight K BMI K
1 0 . . .
![Page 46: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/46.jpg)
THE DIFFERENCE BETWEEN READING A RAW DATASET AND READING A SAS DATASET
data ex1; set example1; BMI = 700*weight/(height*height); output;run;
PDV
1st Iteration:The SET statement is
executed
When Reading a SAS dataset …
Example1: Name Height Weight
1 Barbara 61 170
2 John 62 175
_N_ D _ERROR_D Name K Height K Weight K BMI K
1 0 .Barbara 170 61
![Page 47: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/47.jpg)
THE DIFFERENCE BETWEEN READING A RAW DATASET AND READING A SAS DATASET
data ex1; set example1; BMI = 700*weight/(height*height); output;run;
PDV
1st Iteration:BMI is calculated
When Reading a SAS dataset …
Example1: Name Height Weight
1 Barbara 61 170
2 John 62 175
_N_ D _ERROR_D Name K Height K Weight K BMI K
1 0 Barbara 31.9807170 61
![Page 48: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/48.jpg)
THE DIFFERENCE BETWEEN READING A RAW DATASET AND READING A SAS DATASET
data ex1; set example1; BMI = 700*weight/(height*height); output;run;
PDV
1st Iteration:Output statement is
executed
When Reading a SAS dataset …
Example1:
Ex1:
Name Height Weight
1 Barbara 61 170
2 John 62 175
Name Height Weight BMI
1 Barbara 61 170 31.9807
_N_ D _ERROR_D Name K Height K Weight K BMI K
1 0 Barbara 31.9807170 61
![Page 49: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/49.jpg)
THE DIFFERENCE BETWEEN READING A RAW DATASET AND READING A SAS DATASET
data ex1; set example1; BMI = 700*weight/(height*height); output;run;
PDV
2nd Iteration:When Reading a SAS dataset …
Example1:
Ex1:
Name Height Weight
1 Barbara 61 170
2 John 62 175
Name Height Weight BMI
1 Barbara 61 170 31.9807
_N_ D _ERROR_D Name K Height K Weight K BMI K
2 0 Barbara .170 61
Variables exist in the input dataset
SAS sets each variable to missing in the PDV only before the 1st iteration of the execution
Variables will retain their values in the PDV until they are replaced by the new values
![Page 50: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/50.jpg)
THE DIFFERENCE BETWEEN READING A RAW DATASET AND READING A SAS DATASET
data ex1; set example1; BMI = 700*weight/(height*height); output;run;
PDV
2nd Iteration:When Reading a SAS dataset …
Example1:
Ex1:
Name Height Weight
1 Barbara 61 170
2 John 62 175
Name Height Weight BMI
1 Barbara 61 170 31.9807
_N_ D _ERROR_D Name K Height K Weight K BMI K
2 0 Barbara .170 61
Variables being created in the DATA step
SAS sets each variable to missing in the PDV at the beginning of every iteration of the execution
![Page 51: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/51.jpg)
THE DIFFERENCE BETWEEN READING A RAW DATASET AND READING A SAS DATASET
data ex1; set example1; BMI = 700*weight/(height*height); output;run;
PDV
2nd Iteration:SET statement is executed
When Reading a SAS dataset …
Example1:
Ex1:
Name Height Weight
1 Barbara 61 170
2 John 62 175
Name Height Weight BMI
1 Barbara 61 170 31.9807
_N_ D _ERROR_D Name K Height K Weight K BMI K
2 0 John .175 62
![Page 52: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/52.jpg)
THE RETAIN STATEMENT
ID SCORE
1 A01 3
2 A02 .
3 A03 4
Consider the following dataset:
We would like to create a new variable that accumulates the values of SCORE
TOTAL
3
3
7
![Page 53: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/53.jpg)
THE RETAIN STATEMENT
ID SCORE
1 A01 3
2 A02 .
3 A03 4
Consider the following dataset:
How to do it?Set the TOTAL to 0 at the first iteration of the
execution Then at each iteration of the execution, add
values from SCORE to TOTAL
TOTAL
3
3
7
Problem: TOTAL is a new variable that you want to create TOTAL will be set to missing in the PDV at the beginning of every iteration of the execution.
![Page 54: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/54.jpg)
THE RETAIN STATEMENT
To fix this problem, we can use the RETAIN statement:
RETAIN VARIABLE <VALUE>;
Prevents the VARIABLE from being initialized each time the DATA step executes
![Page 55: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/55.jpg)
THE RETAIN STATEMENT
To fix this problem, we can use the RETAIN statement:
RETAIN VARIABLE <VALUE>;
Name of the variable that we will want to retain
A numeric valueUsed to initialize the VARIABLE
only at the first iteration of the DATA step execution
Not specifying an initial value VARIABLE is initialized as missing
![Page 56: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/56.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV _N_ D _ERROR_D ID K Total K
The execution phase begins immediately after the completion of the compilation phase
ID SCORE
1 A01 3
2 A02 .
3 A03 4
Score K
![Page 57: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/57.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
_N_ 1, _ERROR_ 0ID, SCORE missingTOTAL 0 because of the RETAIN
ID SCORE
1 A01 3
2 A02 .
3 A03 4
1st Iteration:
_N_ D _ERROR_D ID K Total KScore K
1 0 . 0
![Page 58: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/58.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
1st observation from ex2 PDV.
ID SCORE
1 A01 3
2 A02 .
3 A03 4
1st Iteration:
_N_ D _ERROR_D ID K Total KScore K
1 0 3 0 A01
![Page 59: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/59.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
The RETAIN statement is a compile-time only statement
It does not execute during the execution phase
ID SCORE
1 A01 3
2 A02 .
3 A03 4
1st Iteration:
_N_ D _ERROR_D ID K Total KScore K
1 0 3 0 A01
![Page 60: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/60.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
TOTAL is calculated
ID SCORE
1 A01 3
2 A02 .
3 A03 4
1st Iteration:
_N_ D _ERROR_D ID K Total KScore K
1 0 3 3 A01
![Page 61: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/61.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
The implicit OUTPUT statement tells the SAS system to write observations to the dataset
ID SCORE
1 A01 3
2 A02 .
3 A03 4
1st Iteration:ID SCORE TOTAL
1 A01 3 3
Ex2_2:
_N_ D _ERROR_D ID K Total KScore K
1 0 3 3 A01
![Page 62: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/62.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
_N_ ↑2 ID and SCORE are retained from the
previous iteration because data are read from an existing SAS dataset
TOTAL is also retained because the RETAIN statement is used
ID SCORE
1 A01 3
2 A02 .
3 A03 4
2nd Iteration:ID SCORE TOTAL
1 A01 3 3
Ex2_2:
_N_ D _ERROR_D ID K Total KScore K
2 0 3 3 A01
![Page 63: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/63.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
2nd observation from ex2 PDV
ID SCORE
1 A01 3
2 A02 .
3 A03 4
2nd Iteration:ID SCORE TOTAL
1 A01 3 3
Ex2_2:
_N_ D _ERROR_D ID K Total KScore K
2 0 . 3 A02
![Page 64: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/64.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
TOTAL is calculated
ID SCORE
1 A01 3
2 A02 .
3 A03 4
2nd Iteration:ID SCORE TOTAL
1 A01 3 3
Ex2_2:
_N_ D _ERROR_D ID K Total KScore K
2 0 . 3 A02
![Page 65: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/65.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
ID SCORE
1 A01 3
2 A02 .
3 A03 4
2nd Iteration:ID SCORE TOTAL
1 A01 3 3
2 A02 . 3
Ex2_2:
The implicit OUTPUT: The contents in PDV Ex2_2
_N_ D _ERROR_D ID K Total KScore K
2 0 . 3 A02
![Page 66: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/66.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
ID SCORE
1 A01 3
2 A02 .
3 A03 4
3rd Iteration:ID SCORE TOTAL
1 A01 3 3
2 A02 . 3
Ex2_2:
_N_ ↑ 3. ID and SCORE are retained
from the previous iteration. TOTAL is also retained.
_N_ D _ERROR_D ID K Total KScore K
3 0 . 3 A02
![Page 67: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/67.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
ID SCORE
1 A01 3
2 A02 .
3 A03 4
3rd Iteration:ID SCORE TOTAL
1 A01 3 3
2 A02 . 3
Ex2_2:
3rd observation from ex2 PDV
_N_ D _ERROR_D ID K Total KScore K
3 0 4 3 A03
![Page 68: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/68.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
ID SCORE
1 A01 3
2 A02 .
3 A03 4
3rd Iteration:ID SCORE TOTAL
1 A01 3 3
2 A02 . 3
Ex2_2:
TOTAL is calculated
_N_ D _ERROR_D ID K Total KScore K
3 0 4 7 A03
![Page 69: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/69.jpg)
THE RETAIN STATEMENT
data ex2_2; set ex2; retain total 0; total = sum(total, score);run;
PDV
ID SCORE
1 A01 3
2 A02 .
3 A03 4
3rd Iteration:ID SCORE TOTAL
1 A01 3 3
2 A02 . 3
3 A03 4 7
Ex2_2:
The implicit OUTPUT:The contents in PDV Ex2_2
_N_ D _ERROR_D ID K Total KScore K
3 0 4 7 A03
![Page 70: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/70.jpg)
THE SUM STATEMENT
The SUM statement has the following form:
VARIABLE + EXPRESSION;
The numeric accumulator variable that is to be created
It is automatically set to 0 at the beginning of the first iteration of the DATA step execution
Retained in following iterations
Any SAS expression If EXPRESSION is evaluated
to a missing value, it is treated as 0
![Page 71: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/71.jpg)
THE SUM STATEMENT
data ex2_2; set ex2;
run;
retain total 0;total = sum(total, score);
The previous program can be re-written as…
![Page 72: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/72.jpg)
THE SUM STATEMENT
data ex2_2; set ex2;
run;
The previous program can be re-written as…
total + score;
![Page 73: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/73.jpg)
THE SUBSETTING IF STATEMENT
We use the subsetting IF statement to continue processing only the observations that meet the condition of the specified expression
IF EXPRESSION;
If EXPRESSION is true for the observation, SAS continues to execute statements in the DATA step includes the current observation in the data set
![Page 74: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/74.jpg)
THE SUBSETTING IF STATEMENT
Use the IF statement to continue processing only the observations that meet the condition of the specified expression
IF EXPRESSION;
If EXPRESSION is false for the observation, no further statements are processed for that obs.SAS immediately returns to the beginning of DATA step the remaining program statements in the DATA step are
not executed and the current observation is not written to the output data set
![Page 75: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/75.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
ID SCORE
1 A01 3
2 A02 .
3 A03 4
One observation per subject
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
Multiple observations per subject-- Longitudinal data
Identify the beginning/end of measurement for each subject
This can be accomplished by using the BY-group processing method
![Page 76: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/76.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
SAS locates the beginning and end of a BY-group by creating two temporary indicator variables for each BY variable: FIRST.VARIABLELAST.VARIABLE
Suppose ID is the “BY” variable:
FIRST.ID
1
0
0
1
0
LAST.ID
0
0
1
0
1
SAS reads the 1st observation for ID = A01
SAS reads the last observation for ID = A01
![Page 77: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/77.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
Calculating the total scores for each subject
ID TOTAL
1 A01 9
2 A02 6
proc sort data=ex3; by id;run;data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
![Page 78: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/78.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
1st iteration:_N_ 1, _ERROR_ 0 FIRST.ID 1, LAST.ID 1 only at beginning of 1st iterationID, Score missingTOTAL 0 because of the SUM statement
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
1 0 1 1 . 0
![Page 79: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/79.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
The SET statement is executed1st observation PDVFIRST.ID 1 and LAST.ID 0
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
1 0 1 0 A01 3 0
1st iteration:
![Page 80: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/80.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
FIRST.ID = 1: TOTAL 0
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
1 0 1 0 A01 3 0
1st iteration:
![Page 81: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/81.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
TOTAL is accumulated
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
1 0 1 0 A01 3 3
1st iteration:
![Page 82: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/82.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
The subsetting IF statement is evaluated to be FALSE because LAST.ID ≠ 1
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
1 0 1 0 A01 3 3
SAS returns to the beginning of the DATA step to begin the 2nd iteration
1st iteration:
![Page 83: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/83.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
_N_ ↑ 2The values for the rest of the variables are retained
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
2 0 1 0 A01 3 3
2nd iteration:
![Page 84: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/84.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
2nd observation PDVNot the first observation for A01: FIRST.ID 0Not the last observation for A01: LAST.ID 0
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
2 0 0 0 A01 4 3
2nd iteration:
![Page 85: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/85.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
FIRST.ID ≠ 1: no execution
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
2 0 0 0 A01 4 3
2nd iteration:
![Page 86: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/86.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
TOTAL is accumulated
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
2 0 0 0 A01 4 7
2nd iteration:
![Page 87: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/87.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
2 0 0 0 A01 4 7
The subsetting IF statement is evaluated to be FALSE because LAST.ID ≠ 1
SAS returns to the beginning of the DATA step to begin the 3rd iteration
2nd iteration:
![Page 88: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/88.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
_N_ ↑3The values for the rest of the variables are retained
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
3 0 0 0 A01 4 7
3rd iteration:
![Page 89: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/89.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
3rd observation PDVNot the first observation: FIRST.ID 0 Last observation for A01: LAST.ID 1
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
3 0 0 1 A01 2 7
3rd iteration:
![Page 90: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/90.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
FIRST.ID ≠ 1: no execution
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
3 0 0 1 A01 2 7
3rd iteration:
![Page 91: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/91.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
TOTAL is calculated
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
3 0 0 1 A01 2 9
3rd iteration:
![Page 92: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/92.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
3 0 0 1 A01 2 9
The subsetting IF statement is evaluated to be TRUE
3rd iteration:
![Page 93: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/93.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
SAS reaches the end of the 3rd iterationThe implicit OUTPUT executesSAS returns to the beginning of the
DATA step to begin the 3rd iteration
ID TOTAL
1 A01 9
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
3 0 0 1 A01 2 9
3rd iteration:
![Page 94: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/94.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
_N_ ↑ 4The values for the remaining
variables are retained
ID TOTAL
1 A01 9
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
4 0 0 1 A01 2 9
4th iteration:
![Page 95: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/95.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
4th observation PDVFIRST.ID 1LAST.ID 0
ID TOTAL
1 A01 9
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
4 0 1 0 A02 4 9
4th iteration:
![Page 96: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/96.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
FIRST.ID = 1: TOTAL 0ID TOTAL
1 A01 9
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
4 0 1 0 A02 4 0
4th iteration:
![Page 97: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/97.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
TOTAL is calculatedID TOTAL
1 A01 9
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
4 0 1 0 A02 4 4
4th iteration:
![Page 98: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/98.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
ID TOTAL
1 A01 9
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
4 0 1 0 A02 4 4
The subsetting IF statement is evaluated to be FALSE
SAS returns to the beginning of the DATA step to begin the 5th iteration
4th iteration:
![Page 99: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/99.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
_N_ ↑ 5The values for the remaining
variables are retained
ID TOTAL
1 A01 9
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
5 0 1 0 A02 4 4
5th iteration:
![Page 100: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/100.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
5th observation PDVFIRST.ID 0LAST.ID 1
ID TOTAL
1 A01 9
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
5 0 0 1 A02 2 4
5th iteration:
![Page 101: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/101.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
FIRST.ID ≠ 1: no execution ID TOTAL
1 A01 9
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
5 0 0 1 A02 2 4
5th iteration:
![Page 102: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/102.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
TOTAL is calculated ID TOTAL
1 A01 9
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
5 0 0 1 A02 2 6
5th iteration:
![Page 103: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/103.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
The subsetting IF statement is evaluated to be TRUE
ID TOTAL
1 A01 9
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
5 0 0 1 A02 2 6
5th iteration:
![Page 104: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/104.jpg)
THE BY-GROUP PROCESSING IN THE DATA STEP
data ex3_1 (drop=score); set ex3; by id; if first.id = 1 then total = 0; total + score; if last.id = 1; run;
PDV
ID SCORE
1 A01 3
2 A01 4
3 A01 2
4 A02 4
5 A02 2
ID TOTAL
1 A01 9
2 A02 6
Ex3_1:
_N_ D _ERROR_D ID K Total KScore DFIRST.ID D LAST.ID D
5 0 0 1 A02 2 6
SAS reaches the end of the 5th iteration
The implicit OUTPUT executes
5th iteration:
![Page 105: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/105.jpg)
RESTRUCTURING DATASETS
Restructuring datasets:
data with one observation per
subject (the wide format)
data with multiple observations per
subject (the long format)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
![Page 106: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/106.jpg)
RESTRUCTURING DATASETS
Restructuring datasets:
data with one observation per
subject (the wide format)
data with multiple observations per
subject (the long format)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2S1 – S3 SCORE
Distinguish different measurements for each subject
![Page 107: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/107.jpg)
RESTRUCTURING DATASETS
The transformation can be easily done by using ARRAY/PROC TRANSPOSE (See my paper “The Many Ways to Effectively Utilize Array Processing”, paper 244-2011)
This can also be accomplished without advanced techniques for more simple cases
Here is a solution for using multiple OUTPUT statements in one DATA step
![Page 108: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/108.jpg)
FROM WIDE FORMAT TO LONG FORMAT
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide: Long:ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
Transform wide long2 observations to read 2 DATA step
iterationsUse multiple OUTPUT statementAny missing values in S1 – S3 will not be
outputted to long
data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
![Page 109: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/109.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:
_N_ 1Other variables missing
KID
.
DS1
.
DS2
.
DS3
.
KTIME
.
KSCORE
1
K_N_
FROM WIDE FORMAT TO LONG FORMAT
![Page 110: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/110.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:
1st observation from the wide PDV
A01
KID
3
DS1
4
DS2
5
DS3
.
KTIME
.
KSCORE
1
K_N_
FROM WIDE FORMAT TO LONG FORMAT
![Page 111: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/111.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:
Time 1
A01
KID
3
DS1
4
DS2
5
DS3
1
KTIME
.
KSCORE
1
K_N_
FROM WIDE FORMAT TO LONG FORMAT
![Page 112: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/112.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:
Score value from S1(3)
A01
KID
3
DS1
4
DS2
5
DS3
1
KTIME
3
KSCORE
1
K_N_
FROM WIDE FORMAT TO LONG FORMAT
![Page 113: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/113.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:
SCORE ≠ missing: ID, TIME, and SCORE Long
A01
KID
3
DS1
4
DS2
5
DS3
1
KTIME
3
KSCORE
1
K_N_
ID TIME SCORE
1 A01 1 3
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 114: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/114.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:
TIME 2
A01
KID
3
DS1
4
DS2
5
DS3
2
KTIME
3
KSCORE
1
K_N_
ID TIME SCORE
1 A01 1 3
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 115: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/115.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:
Score value from S2(4)
A01
KID
3
DS1
4
DS2
5
DS3
2
KTIME
4
KSCORE
1
K_N_
ID TIME SCORE
1 A01 1 3
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 116: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/116.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:
SCORE ≠missing: ID, TIME, and SCORE Long
A01
KID
3
DS1
4
DS2
5
DS3
2
KTIME
4
KSCORE
1
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 117: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/117.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:
TIME 3
A01
KID
3
DS1
4
DS2
5
DS3
3
KTIME
4
KSCORE
1
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 118: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/118.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:
SCORE value from S3(5)
A01
KID
3
DS1
4
DS2
5
DS3
3
KTIME
5
KSCORE
1
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 119: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/119.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:
SCORE ≠missing: ID, TIME, and SCORE Long
A01
KID
3
DS1
4
DS2
5
DS3
3
KTIME
5
KSCORE
1
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 120: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/120.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
1st iteration:There is no more implicit OUTPUT statementSAS returns to the beginning of the DATA step to
begin the 2nd iteration
A01
KID
3
DS1
4
DS2
5
DS3
3
KTIME
5
KSCORE
1
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 121: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/121.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:_N_ ↑2ID and S1-S3 are retained from the previous iterationTIME, SCORE missing
A01
KID
3
DS1
4
DS2
5
DS3
.
KTIME
.
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 122: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/122.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:
2nd observation from the Wide PDV
A01
KID
4
DS1
.
DS2
2
DS3
.
KTIME
.
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 123: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/123.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:
TIME 1
A01
KID
4
DS1
.
DS2
2
DS3
1
KTIME
.
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 124: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/124.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:
SCORE value from S1 (4)
A01
KID
4
DS1
.
DS2
2
DS3
1
KTIME
4
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 125: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/125.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:
ID, TIME, and SCORE Long
A01
KID
4
DS1
.
DS2
2
DS3
1
KTIME
4
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 126: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/126.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:
TIME 2
A01
KID
4
DS1
.
DS2
2
DS3
2
KTIME
4
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 127: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/127.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:
SCORE the value from S2 (missing)
A01
KID
4
DS1
.
DS2
2
DS3
2
KTIME
.
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 128: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/128.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:
SCORE = missing: no output is generated
A01
KID
4
DS1
.
DS2
2
DS3
2
KTIME
.
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 129: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/129.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:
TIME 3
A01
KID
4
DS1
.
DS2
2
DS3
3
KTIME
.
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 130: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/130.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:
SCORE the value from S3 (2)
A01
KID
4
DS1
.
DS2
2
DS3
3
KTIME
2
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 131: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/131.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:
ID, TIME, and SCORE Long
A01
KID
4
DS1
.
DS2
2
DS3
3
KTIME
2
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 132: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/132.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
Wide:data long (drop=s1-s3); set wide; time = 1; score = s1; if not missing(score) then output; time = 2; score = s2; if not missing(score) then output; time = 3; score = s3; if not missing(score) then output;run;
2nd iteration:SAS returns to the beginning of the DATA step to begin the
3rd iterationWith no more observations to read in the 3rd iteration, SAS
goes to the next DATA or PROC step
A01
KID
4
DS1
.
DS2
2
DS3
3
KTIME
2
KSCORE
2
K_N_
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
Long:
FROM WIDE FORMAT TO LONG FORMAT
![Page 133: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/133.jpg)
FROM LONG FORMAT TO WIDE FORMAT
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
![Page 134: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/134.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
Reading 5 observations but only creating 2 observations
You are not copying data from the PDV to the final dataset at each iteration
You only need to generate one observation once all the observations for each subject have been processed
FROM LONG FORMAT TO WIDE FORMAT
![Page 135: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/135.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
S1
S2
S3
S1
S3
if time = 1 then s1 = score;else if time = 2 then s2 = score;else s3 = score;
Use BY-group processing: BY ID Output to the final data when LAST.ID = 1
SCORE S1, S2 S3
RETAIN
FROM LONG FORMAT TO WIDE FORMAT
![Page 136: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/136.jpg)
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
proc sort data=long; by id;run;data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
FROM LONG FORMAT TO WIDE FORMAT
![Page 137: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/137.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
1ST iteration:_N_ 1FIRST.ID 1, LAST.ID 1Other variables missing
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
1 1 1 . . . . .
FROM LONG FORMAT TO WIDE FORMAT
![Page 138: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/138.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
1ST iteration:The SET statement copies the 1st observation PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
1 1 1 A01 1 3 . . .
FROM LONG FORMAT TO WIDE FORMAT
![Page 139: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/139.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
1ST iteration:The SET statement copies the 1st observation PDVFIRST.ID 1 since this is the 1st observation for A01LAST.ID 0 since this is not the last observation for A01
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
1 1 0 A01 1 3 . . .
FROM LONG FORMAT TO WIDE FORMAT
![Page 140: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/140.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
1ST iteration:Since TIME = 1, S1 SCORE (3)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
1 1 0 A01 1 3 3 . .
FROM LONG FORMAT TO WIDE FORMAT
![Page 141: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/141.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
1ST iteration:The subsetting IF statement is evaluated to be FALSE SAS returns to the beginning of the DATA step to begin the
2nd iteration
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
1 1 0 A01 1 3 3 . .
FROM LONG FORMAT TO WIDE FORMAT
![Page 142: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/142.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
2nd iteration:_N_ ↑2
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 1 0 A01 1 3 3 . .
FROM LONG FORMAT TO WIDE FORMAT
![Page 143: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/143.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
2nd iteration: FIRST.ID and LAST.ID are retained; they are automatic variables ID, TIME, SCORE are retained; they are from input dataset S1, S2, and S3 are retained because of the RETAIN statement
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 1 0 A01 1 3 3 . .
FROM LONG FORMAT TO WIDE FORMAT
![Page 144: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/144.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
2nd iteration:The SET statement copies the 2nd observation to the PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 1 0 A01 2 4 3 . .
FROM LONG FORMAT TO WIDE FORMAT
![Page 145: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/145.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
2nd iteration:The SET statement copies the 2nd observation to the PDVFIRST.ID 0; this is not the first observation for A01LAST.ID 0; this is not the last observation for A01 either
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 0 0 A01 2 4 3 . .
FROM LONG FORMAT TO WIDE FORMAT
![Page 146: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/146.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
2nd iteration:Since TIME = 2, S2 SCORE (4)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 0 0 A01 2 4 3 4 .
FROM LONG FORMAT TO WIDE FORMAT
![Page 147: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/147.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
2nd iteration:The subsetting IF statement is evaluated to be FALSE SAS returns to the beginning of the DATA step to begin the 3rd
iteration
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
2 0 0 A01 2 4 3 4 .
FROM LONG FORMAT TO WIDE FORMAT
![Page 148: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/148.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
3rd iteration:_N_ ↑3The rest of the variables are retained
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 0 A01 2 4 3 4 .
FROM LONG FORMAT TO WIDE FORMAT
![Page 149: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/149.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
3rd iteration:The SET statement copies the 3rd observation PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 0 A01 3 5 3 4 .
FROM LONG FORMAT TO WIDE FORMAT
![Page 150: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/150.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
3rd iteration:The SET statement copies the 3rd observation PDVFIRST.ID 0; this is not the first observation for A01LAST.ID 1; this is the last observation for A01
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 1 A01 3 5 3 4 .
FROM LONG FORMAT TO WIDE FORMAT
![Page 151: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/151.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
3rd iteration:Since TIME = 3, S3 SCORE (5)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 1 A01 3 5 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 152: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/152.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
3rd iteration:The subsetting IF statement is evaluated to be true
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 1 A01 3 5 3 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 153: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/153.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
3rd iteration:The implicit OUTPUT executes - variables marked with
(K) are copied to the dataset wideSAS returns to the beginning of the DATA step to
begin the 4th iteration
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
3 0 1 A01 3 5 3 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 154: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/154.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
4th iteration:_N_ ↑4The rest of the variables are retained
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 0 1 A01 3 5 3 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 155: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/155.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
4th iteration:The SET statement copies the 4th observation PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 0 1 A02 1 4 3 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 156: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/156.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
4th iteration:The SET statement copies the 4th observation PDVFIRST.ID 1; this is the first observation for A02LAST.ID 0; this is not the last observation for A02
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 1 0 A02 1 4 3 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 157: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/157.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
4th iteration:Since TIME = 1, S1 SCORE (4)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 1 0 A02 1 4 4 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 158: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/158.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
4th iteration:The subsetting IF statement is evaluated to be FALSE SAS returns to the beginning of the DATA step to begin the 5 th
iteration
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 1 0 A02 1 4 4 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 159: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/159.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
5th iteration:_N_ ↑5The rest of the variables are retained
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 1 0 A02 1 4 4 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 160: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/160.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
5th iteration:The SET statement copies the 5th observation PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 1 0 A02 3 2 4 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 161: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/161.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
5th iteration:The SET statement copies the 5th observation PDVFIRST.ID 0; this is not the first observation for A02LAST.ID 1; this is the last observation for A02
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 162: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/162.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
5th iteration:Since TIME = 3, S3 SCORE (2)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 4 2
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 163: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/163.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
5th iteration:The subsetting IF statement is evaluated to be TRUE
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 4 2
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 164: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/164.jpg)
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
data wide (drop=time score); set long; by id; retain s1 - s3; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
5th iteration:The implicit OUTPUT executes
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 4 2
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 4 2
How to fix this?
FROM LONG FORMAT TO WIDE FORMAT
![Page 165: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/165.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
FROM LONG FORMAT TO WIDE FORMAT
![Page 166: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/166.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
4th iteration:_N_ ↑4The rest of the variables are retained
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 0 1 A01 3 5 3 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 167: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/167.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
4th iteration:The SET statement copies the 4th observation PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 0 1 A02 1 4 3 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 168: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/168.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
4th iteration:The SET statement copies the 4th observation PDVFIRST.ID 1; this is the first observation for A02LAST.ID 0; this is not the last observation for A02
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 1 0 A02 1 4 3 4 5
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 169: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/169.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
4th iteration:Since FIRST.ID = 1, S1 – S3 missing
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 1 0 A02 1 4 . . .
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 170: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/170.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
4th iteration:Since TIME = 1, S1 SCORE (4)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 1 0 A02 1 4 4 . .
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 171: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/171.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
4th iteration:The subsetting IF statement is evaluated to be falseSAS returns to the beginning of the DATA step to begin the 5th
iteration
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
4 1 0 A02 1 4 4 . .
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 172: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/172.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
5th iteration:_N_ ↑5The rest of the variables are retained
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 1 0 A02 1 4 4 . .
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 173: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/173.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
5th iteration:The SET statement copies the 5th observation PDV
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 1 0 A02 3 2 4 . .
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 174: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/174.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
5th iteration:The SET statement copies the 5th observation PDVFIRST.ID 0; this is not the first observation for A02LAST.ID 1; this is the last observation for A02
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 . .
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 175: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/175.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
5th iteration:Since FIRST.ID ≠1, no execution
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 . .
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 176: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/176.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
5th iteration:Since TIME = 3, S3 SCORE (2)
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 . 2
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 177: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/177.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
5th iteration:The subsetting IF statement is evaluated to be true
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 . 2
ID S1 S2 S3
1 A01 3 4 5
FROM LONG FORMAT TO WIDE FORMAT
![Page 178: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/178.jpg)
data wide (drop=time score); set long; by id; retain s1 - s3; if first.id then do; s1 = .; s2 = .; s3 = .; end; if time = 1 then s1 = score; else if time = 2 then s2 = score; else s3 = score; if last.id;run;
ID TIME SCORE
1 A01 1 3
2 A01 2 4
3 A01 3 5
4 A02 1 4
5 A02 3 2
5th iteration:SAS reaches the end of the 5th iterationThe implicit OUTPUT executes
_N_ D FIRST.ID D LAST.ID D ID K TIME D SCORE D S1 K S2 K S3 K
5 0 1 A02 3 2 4 . 2
ID S1 S2 S3
1 A01 3 4 5
2 A02 4 . 2
FROM LONG FORMAT TO WIDE FORMAT
![Page 179: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/179.jpg)
CONCLUSION
The most important part of DATA step processing is to understand how data is transformed to the PDV and how data is copied from the PDV to a new dataset
To be a successful SAS programmer, we must be able to thoroughly comprehend how DATA steps are processed
![Page 180: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/180.jpg)
REFERENCES
Cody, Ron. 2001. Longitudinal Data and SAS® A Programmer’s Guide. Cary, NC: SAS Institute Inc.
![Page 181: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/181.jpg)
ACKNOWLEDGEMENT
I would like to thank MaryAnne DePesquo for inviting me to present at the SGF 2011
![Page 182: The essence of data step programming](https://reader033.vdocuments.site/reader033/viewer/2022061201/547a3ab3b4af9f08688b45e5/html5/thumbnails/182.jpg)
CONTACT INFORMATION
Arthur X. Li
City of Hope Comprehensive Cancer Center
Division of Information Science
1500 East Duarte Road
Duarte, CA 91010 - 3000
Work Phone: (626) 256-4673 ext. 65121
Fax: (626) 471-7106
E-mail: [email protected]