using sas® to locate and rename external files€¦ · using sas® to locate and rename external...
TRANSCRIPT
Using SAS® to Locate and Rename External Files
Lu Gan, Pharmaceutical Product Development, LLC, Austin, TX
ABSTRACT
When reading in external data into SAS, a program needs to specify the data file location and file name
in the related importing statements. If the raw data file is named with a timestamp, the import
program will require an extra step which is updating the file name every time before it’s run. This will
be a hassle if the import is needed on a routine schedule and/or there are multiple external data files.
Therefore, a SAS program can be written to eliminate the repeating manual update. This approach
includes two steps, first searching for the latest data file in the specific file directory, then renaming it
by removing the specific timestamp. After execution of the above steps, the import program will be
able to read in the data files directly.
The examples and syntax illustrated are in SAS 9.2. This paper assumes that the reader has a basic
understanding of DATA step programming and the macro language.
INTRODUCTION
Time‐saving is always a goal programmers like to achieve in daily work. Instead of manually renaming external data files, using SAS to programmatically do it undoubtedly improves efficiency. There are various types of external data files SAS can import; in this paper text file(.txt) is used as an example. The approach starts with the definitions of several macro variables which are going to be used in the sample code. ** 'filepath' is the file directory storing the file(s) which will be renamed; %LET filepath= %NRBQUOTE(U:\SCSUG\); ** 'delimiter' is the delimiter used in filenames; %LET delimiter= %NRBQUOTE(_); ** 'file_extension' is the file extension of the external data file(s); %LET file_extension= %NRBQUOTE(.txt); ** 'filename_parts' tells how many part(s) in the filename excluding file extension. Example: in filename 'Sample_yyyymmdd_hhmm_Filen.txt', there are 4 parts delimited by the delimiter '_'; %LET filename_parts= 4; ** 'except_file' is a list of file(s) that are with same file extension but don't want to be renamed; %LET except_file= %NRBQUOTE(Except1,Except2,Sample_File3);
F
s
G
T
s
e
f
FD R To
Figure 1 belo
specified in m
GET ALL FILE
There are mu
statement as
external file,
folder.
FILENAME iDATA all_f INFILE INPUT f
RUN;
The log showobservations
ow takes the
macro variab
E(S) INFORM
ultiple ways
ssociates a S
or lists attri
indat PIPEfile; indat TRU
fldt $ 1-1
ws 22 records and 4 varia
e first look of
ble &filepa
MATION IN T
for SAS prog
SAS fileref w
ibutes of ext
E "dir ""&
UNCOVER; 11 fltm $
s were readables shown
f the text file
ath.
Fig
HE FOLDER
grams to int
ith an extern
ternal files. I
&filepath.
12-22 fls
from the fiin figure 2.
es (.txt) cont
gure 1
teract with t
nal file or an
It’s used her
*&file_ex
iz $ 25-3
ileref ind
tained in the
he world ou
n output dev
re to get all f
xtension""
8 flnm $
dat. The dat
e source fold
utside of SAS
vice, disassoc
file(s) inform
";
39-100;
taset ‘all_f
der which is
S. FILENAME
ciates a filer
mation in the
file’ has 22
ref and
e above
2
C Agd% % % D
CLEAN THE F
As you mightgoing to be rdon’t want t%Clean_fil
%MACRO Cle
%LET numf=
DATA excep LENGTH
%DO % f O
FILE(S) INFO
t already noremoved. Beo rename solenames_In
ean_filena
=%EVAL(1+%
pt_files; H flnm $60i=1 %TO &LET _flnm
flnm= STRIOUTPUT;
RMATION
tice, datasetesides, in mao we also wanfo is create
ames_Info;
%SYSFUNC(C
0.; numf.;
m= %SCAN(&eP("&_flnm"
Fig
t ‘All_Fileacro variableant to excluded to do the
COUNTC("&e
except_fil"||"&file_
gure 2
e’ containse &except_fde them from‘cleaning’ w
except_fil
le., &i, '_extension
s some usele_file, therem further prwork.
e.", ',',
','); n");
ess observate are severalrocessing. Th
t)));
tions which a text files thhe following
are hat we macro
R PCSFW Q% A3
P Wf‘ D
%ENDRUN;
PROC SQL; CREATE TABSELECT * FROM all_fWHERE NOT
ST
QUIT; %MEND Clea
After macro 3 only conta
PROCESS TH
With all the ffile(s) will beSample_yyy
DATA file_ SET all FORMAT LENGTH
CALL MI
D;
BLE all_fi
file (INDEXC(f
TRIP(flnm)
an_filenam
%Clean_fiins useful in
E FILE NAM
file informate renamed toyymmdd_hh
_info; l_file_inffile_dt Yfiletime
ISSING(new
ile_info A
fldt,'Chec NOT IN (S
mes_Info;
lenames_Information th
E INFORMA
tion availablo. In this examm_Filen.tx
fo; YYMMDD10. $5. new_f
w_file_nam
AS
ck','DirecSELECT fln
nfo is execuhat will be n
Fig
ATION
e, we want tample, we asxt’ will be re
file_tm Tfile_name
me, fileti
ctory') > nm FROM ex
uted, outputeeded in su
gure 3
to follow nassume originnamed to ‘S
TIME5.; $50.;
ime, file_
0 OR MISSxcept_file
t dataset ‘albsequent st
ming convenal filename Sample_Filen
_dt, file_
ING(fldt)es);
ll_file_ineps.
ntion to find
n.txt’.
_tm);
) AND
nfo‘ shown i
d the filenam
n figure
me(s) the
R Ff
file_dt IF INDE
file ELSE
file filetim
ARRAY f DO i=1
flnaIF I
END;
DO i=1 new_ END; new_fil
KEEP flRUN;
File informatfile name if a
t = INPUT(EX(fltm,'Pe_tm = INP
e_tm = INPme= COMPRE
flnames{&fTO DIM(fl
ames(i)= SNPUT(flnaflnames(
TO DIM(fl_file_name
le_name= S
lnm file_d
tion is saveda file is renam
(fldt, MMDPM') THEN UT(SUBSTR(
UT(SUBSTR(ESS(PUT(fi
filename_plnames); CAN(TRANWRmes(i), ??(i)='';
lnames); = CATX("&d
STRIP(STRI
dt file_tm
d in dataset ‘med.
DDYY10.); (fltm,1,5)
(fltm,1,5)ile_tm, ??
parts} $20
RD(flnm, "?YYMMDD10.
delimiter"
IP(new_fil
m new_file
‘file_info
Fig
), TIME5.)
), TIME5.)TIME5.),
.;
"&file_ext.)= file_d
", new_fil
le_name)||
e_name;
’ which inclu
gure 4
)+43200;
); ':');
tension", dt OR flna
le_name, f
"&file_e
udes file nam
""), i, "ames(i)= f
flnames(i)
xtension"
me, date and
"&delimitefiletime T
));
);
d time and t
er"); THEN
he new
N
tt D R D
D
D
Now we find
to be renamthose existin
DATA delet SET fil BY new_ IF FIRS
DO; ELSE
DO; RUN;
Dataset ‘ren
Dataset ‘del
DELETE EXIS
d out which f
ed if file witng files with s
te_files rle_info; _file_nameST.new_filCALL SYMP
OUTPUT re
name_files
lete_files
ITING FILE(S
files are goin
h same namsame name
rename_fil
e flnm fille_name NEUT('existi
name_files
s’ includes al
s’ includes al
S) WHICH W
ng to be rena
me already exand flag the
les;
le_dt fileE LAST.newing_file',
s; END;
ll existing file
Fig
ll existing file
Fig
WILL BE REPLA
amed. Wind
xists, therefoem for deleti
e_tm; w_file_nam, '1'); OU
es that will b
gure 5
es that will b
gure 6
ACED
dows® operaore, before rion while ke
me AND newUTPUT dele
be renamed
be deleted a
ating system
renaming, weep the rest f
w_file_namete_files;
.
and replaced
m doesn’t all
we want to idfor renamin
me= flnm TH; END;
d thereafter.
ow a file
dentify g.
HEN
W‘d %
% %%% %% Ff
We use CALLdelete_fi
%MACRO del %IF %SY %DO; %SYS
%I %D %E %E %D
%E %END; %ELSE %DO; %P %END;
%MEND dele
%MACRO del%IF &exist%DO; DATA _N
SET CALL
RUN; %END; %MEND dele
Figure 7 is frfile folder sto
L EXECUTEiles’:
lete_old_fYSFUNC(FIL
SEXEC DEL F &SYSRC=
DO; %PUT AL
END; ELSE DO;
%PUT A&S
END;
PUT ALERT_
ete_old_fi
lete_existting_file=
NULL_; delete_fi
L EXECUTE(
ete_existi
om SAS log sores after ex
to call macr
file(file)LEEXIST("&
"&filepat 0 %THEN
LERT_I: Fi
ALERT_R: FSYSRC;
_R: &filepa
ile;
ting_file;= 1 %THEN
les; '%delete_o
ing_file;
showing whxecution of t
ro %delete_
; &filepath.
th.&file";
ile &file
File &file
ath.&file
old_file(f
at files havethe above m
Fig
_old_file
&file"))
was succe
was NOT s
NOT FOUND
file='||st
e been deletemacro.
gure 7
to delete th
%THEN
essfully d
successful
D.;
trip(flnm)
ed. Figure 8
he files store
deleted.;
lly delete
)||');');
displays wh
ed in dataset
ed. SYSRC=
hat files the o
t
=
original
R Acfm PBR D R
P
RENAME TH
After the aboconsider arefile—we assumake sure fi
PROC SORT BY new_filRUN;
DATA to_reSET BY nIF L
RUN;
PROC SQL;
E FILE(S)
ove proceed: 1> If more ume the file le(s) in '&exc
DATA= renle_name fl
ename_filerename_fi
new_file_nLAST.new_f
ding steps, fithan one filwith the latcept_file'
name_fileslnm file_d
es; les; ame flnm file_name;
Fig
nally we aree will be rentest timestam' will not be
s; dt file_tm
file_dt fi
Fig
gure 8
e ready to renamed to themp in filenamrenamed.
m;
ile_tm;
gure 9
name the file same nammes will be n
les . Couple me, we want needed. 2>
of things weto use the la Check agai
e want to atest n to
CSFWQ
Fl %D
R% D
R F
i
CREATE TABSELECT * FROM to_reWHERE new_QUIT;
Function Renibrary, an en
%MACRO filDATA _NULL
rc= REN IF rc N DO; PUT
CALL END; ELSE
PUT
RUN; %MEND file
DATA _NULL SET to
CALL E
RUN;
Figure 11 is
n the same f
BLE to_ren
ename_file_file_name
name() is avantry in a SAS
le_rename(L_;
NAME("&fil
NE 0 THEN
"ALERT_" unsuccess
L SYMPUT('
"ALERT_" successfu
e_rename;
L_; o_rename_f
XECUTE('%fST
from SAS Lo
folder after
name_files
es e NOT IN (
ailable sinceS catalog, an
(from_file
lepath.&fr
"R: Renamsful." rc=err','1')
"I: Renamul.";
files2;
file_renamTRIP(new_f
og showing
all proceedi
s2 AS
(SELECT fl
Fig
e SAS versionexternal file
ename, to_
rom_filena
me of file=; ;
me of file
me(from_fifile_name)
what files a
ng steps.
lnm FROM e
gure 10
n 9.2. Its pure, or a direct
_filename)
ame", "&fi
e &from_fi
e &from_fi
ilename='|||');');
re deleted o
except_fil
rpose is to retory. It is use
;
lepath.&t
lename to
ilename to
||STRIP(fl
or renamed.
es);
ename a meed here to re
o_filenam
&to_file
o &to_file
lnm)||', t
Figure 12 d
ember of a SAename the f
me", 'file'
name
ename
to_filenam
displays the
AS ile(s).
');
me='||
text files
C
T
a
o
a
S
s
R
S
S
A
CONCLUSION
This paper d
along with se
other necess
activities. Th
SAS system b
solutions to t
REFERENCES
Schacherer,
SAS(R) 9.2 La
ACKNOWLED
N
escribed a w
everal system
sary proceed
e purpose o
beyond basic
the day‐to‐d
S
C(2011) The
anguage Ref
DGEMENTS
way to locate
m functions
ding and suc
of this paper
c data manip
day program
e FILENAME
ference: Dict
Fig
Fig
e and renam
and comma
ceeding step
is trying to
pulation so a
mming activit
Statement:
tionary, Four
gure 11
gure 12
me external fi
ands. With th
ps to a batch
present the
as to achieve
ties.
Interacting w
rth Edition C
ile(s) using S
he automati
h job to fully
idea of expa
e more effic
with the wo
Cary, NC: SAS
SAS data ste
ion of renam
y automate e
anding the c
cient and mo
rld outside o
S Institute In
ps, macro la
ming, users c
external dat
capabilities o
ore automat
of SAS®
nc.
anguages
can add
a import
of the
ed
I’d like to thank my coworker Ranjan Karmakar for discussing with me when I had the idea about this approach. I also appreciate the support from the programming management team at Pharmaceutical Product Development, LLC.
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other
brand and product names are registered trademarks or trademarks of their respective companies.
CONTACT INFORMATION
Your comments and questions are valued and welcomed. Please contact the author at: Lu Gan PPD 7901 E. Riverside Drive Austin, TX 78744 [email protected] (512)747‐5715