lecture 5 sorting, printing, and summarizing your data

Click here to load reader

Upload: dina-cunningham

Post on 11-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Sorting, Printing, and Summarizing Your Data

Lecture 5Sorting, Printing, and Summarizing Your DataReviewCreating and Redefining VariablesSAS FunctionsIF-THEN StatementsGrouping Observations with IF-THEN/ELSESubsetting DataSimplifying Programs with Arrays

Lecture StructureUsing SAS ProceduresPrinting Your Data with PROC PRINTChanging the Appearance of Printed Values with Formats Summarizing Your Data Using PROC MEANS

Using SAS Procedures

LABEL ReceiveDate = 'Date order was received' ShipDate = 'Date merchandise was shipped';Printing Your Data with PROC PRINTUse the NOOBS option in the PROC PRINT statement. If you dont want observation numbersPrint the labels instead of the variable names, then add the LABEL option as well.

PROC PRINT DATA = data-set NOOBS LABEL;BY variable-list; The BY statement starts a new section in the output for each new value of the BY variables and prints the values of the BY variables at the top of each section. The data must be presorted by the BY variables. ID variable-list; When you use the ID statement, the observation numbers are not printed. Instead, the variables in the ID variable list appear on the left-hand side of the page. SUM variable-list; The SUM statement prints sums for the variables in the list. VAR variable-list; The VAR statement specifies which variables to print and the order. Without a VAR statement, all variables in the SAS data set are printed in the order that they occur in the data set. The following are optional statements that sometimes come in handy:Printing Your Data with PROC PRINTDATA sales; INFILE 'D:\My Documents\My Class\TA\MyCode\05code and data\Candy.dat'; INPUT Name $ 1-11 Class @15 DateReturned MMDDYY10. CandyType $ Quantity; Profit = Quantity * 1.25;PROC SORT DATA = sales; BY Class;PROC PRINT DATA = sales; BY Class; SUM Profit; VAR Name DateReturned CandyType Profit; TITLE 'Candy Sales for Field Trip by Class';RUN;Adriana 21 3/21/2008 MP 7Nathan 14 3/21/2008 CD 19Matthew 14 3/21/2008 CD 14Claire 14 3/22/2008 CD 11Caitlin 21 3/24/2008 CD 9Ian 21 3/24/2008 MP 18Chris 14 3/25/2008 CD 6Anthony 21 3/25/2008 MP 13Stephen 14 3/25/2008 CD 10Erika 21 3/25/2008 MP 17Changing the Appearance of Printed Values with FormatsCharacter Numeric Date $formatw. formatw.d formatw. FORMAT statementFORMAT Profit Loss DOLLAR8.2 SaleDate MMDDYY8.; FORMAT statements can go in either DATA steps or PROC steps. If the FORMAT statement is in a DATA step, then the format association is permanent and is stored with the SAS data set. If the FORMAT statement is in a PROC step, then it is temporaryaffecting only the results from that procedure.

PUT statement PUT Profit DOLLAR8.2 Loss DOLLAR8.2 SaleDate MMDDYY8.;

Changing the Appearance of Printed Values with FormatsAdriana 21 3/21/2008 MP 7Nathan 14 3/21/2008 CD 19Matthew 14 3/21/2008 CD 14Claire 14 3/22/2008 CD 11Caitlin 21 3/24/2008 CD 9Ian 21 3/24/2008 MP 18Chris 14 3/25/2008 CD 6Anthony 21 3/25/2008 MP 13Stephen 14 3/25/2008 CD 10Erika 21 3/25/2008 MP 17DATA sales; INFILE 'D:\My Documents\My Class\TA\MyCode\05code and data\Candy.dat'; INPUT Name $ 1-11 Class @15 DateReturned MMDDYY10. CandyType $ Quantity; Profit = Quantity * 1.25;PROC PRINT DATA = sales; VAR Name DateReturned CandyType Profit; FORMAT DateReturned DATE9. Profit DOLLAR6.2; TITLE 'Candy Sale Data Using Formats';RUN;

Summarizing Your Data Using PROC MEANSPROC MEANS options;If you do not specify any options, MEANS will print the number of non-missing values, the mean, the standard deviation, and the minimum and maximum values for each variable.MAX the maximum value MIN the minimum value MEAN the mean MEDIAN the median MODE the mode (new in SAS 9.2) N number of non-missing values NMISS number of missing values RANGE the range STDDEV the standard deviation SUM the sum BY variable-list; The BY statement performs separate analyses for each level of the variables in the list. [1] sorted in the same order as the variable-list. (You can use PROC SORT to do this.) CLASS variable-list; The CLASS statement also performs separate analyses is more compact than with the BY statement, and the data do not have to be sorted first. VAR variable-list; The VAR statement specifies which numeric variables to use in the analysis. If it is absent then SAS uses all numeric variables. If you use the PROC MEANS statement with no other statements, then you will get statistics for all observations and all numeric variables in your data set. Here are some of the optional statements you may want to use:

Summarizing Your Data Using PROC MEANSSummarizing Your Data Using PROC MEANS756-01 05/04/2008 120 80 110834-01 05/12/2008 90 160 60901-02 05/18/2008 50 100 75834-01 06/01/2008 80 60 100756-01 06/11/2008 100 160 75901-02 06/19/2008 60 60 60756-01 06/25/2008 85 110 100A wholesale nursery is selling garden flowers, and they want to summarize their sales figures by month. The data file which follows contains the customer ID, date of sale, and number of petunias, snapdragons, and marigolds sold:DATA sales; INFILE 'D:\My Documents\My Class\TA\MyCode\05code and data\Flowers.dat'; INPUT CustomerID $ SaleDate MMDDYY10. Petunia SnapDragon Marigold; Month = MONTH(SaleDate);PROC SORT DATA = sales; BY Month;* Calculate means by Month for flower sales;PROC MEANS DATA = sales; BY Month; VAR Petunia SnapDragon Marigold; TITLE 'Summary of Flower Sales by Month';RUN;ExerciseDownload the dataset Flowers.dat from the folder 05 code and data in our blackboard. Summarizing this dataset Using PROC MEANS by CustomerID. (This result does not need to submit. )

756-01 05/04/2008 120 80 110834-01 05/12/2008 90 160 60901-02 05/18/2008 50 100 75834-01 06/01/2008 80 60 100756-01 06/11/2008 100 160 75901-02 06/19/2008 60 60 60756-01 06/25/2008 85 110 100The data file which follows contains the customer ID, date of sale, and number of petunias, snapdragons, and marigolds sold:/* This is the Sample Code with red filled part*/DATA dataname; INFILE Locate your dataset here'; INPUT identify your data with right format;PROC function_name DATA = dataname; BY variable_name; VAR othervariable you want to show in your output;Exercise ResultDATA sales; INFILE 'D:\My Documents\My Class\TA\MyCode\05code and data\Flowers.dat'; INPUT CustomerID $ SaleDate MMDDYY10. Petunia SnapDragon Marigold;PROC SORT DATA = sales; BY CustomerID;* Calculate means by CustomerID, output sum and mean to new data set;PROC MEANS DATA = sales; BY CustomerID; VAR Petunia SnapDragon Marigold;