introductory sas tutori

Upload: sat1243

Post on 03-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Introductory SAS Tutori

    1/5

    Introductory SAS tutorial

    Prof. David Mendona

    Thursday, 13 June 2002

    3p-5p in CO-lab

    1 Background on SAS

    1.1 What is SAS and why do we care?

    The standard for large-scale statistical computing operations (e.g., marketing, data analysis) in a wide

    variety of industries (e.g., pharmaceutical, retail). SAS serves more than 38,000 business, government, and

    university sites in 199 countries, including 98%of the top 100 companies on the Fortune 500 and 90% of allcompanies on Fortune 500. Its also the largest private-held software company in the world.

    There are many facets to SAS software (e.g., Base SAS and various modules such as SAS/GIS,

    SAS/GRAPH, SAS/STAT, SAS/OR, etc.). Today we will concentrate on Base SAS and SAS/STAT.

    2 Basics

    Extensive but labyrinthine online help.

    Windows-specific help: Base SAS>Host Specific>MS Windows

    Most parts of SAS programs are multi-platform. Be careful of case-sensitive properties of some operating

    systems, though.

    2.1 SAS Work Area

  • 7/28/2019 Introductory SAS Tutori

    2/5

    Fig. 1. Display system options

    proc options;

    run;

    Explanations of all the displayed options are available in the online help.

    2.2 File I/O

    Objective: Understand and use data import/export from/to other applications (e.g., Excel, MATLAB)

    First create a directory called SAS Demos under the directory My SAS Files, then another directory,

    Tutor within the directory SAS Demos. Next create a SAS library called Tutor. Its a good idea neverto move these library around once theyre established. If you do, all links to them will have to be updated.

    Go to http://web.njit.edu/~mendonca. Get the framingham file and put it in the SAS Demos directory.

    Data Input

    Two approaches are presented here. Complete information is found in discussion of SAS DATA step. Open

    SAS Online Help. Go to section Base SAS> SAS Language Reference: Concepts> DATA Step

    Concepts

    Fig. 2. Input Approach 1: General (import from delimited file)

    DATATutor.fham;

    /* note: if the variable is text, put the dollar sign after its

    name */

    INFILE"C:\Documents and Settings\mendonca\My Documents\My SAS

    Files\SAS Demos\Tutor\framingham.csv" delimiter = ',' ;

    /* this is the Framingham study data set */

    INPUT cause $ age chol sex $;

    RUN;

    Save as importcsv.sas.

    Fig. 3. Input Approach 2: Specific (import Excel via OBDC)

    PROCIMPORT OUT= Tutor.GENL

    DATAFILE= "C:\Documents and Settings\mendonca\My Documents\SAS

    Demos\Tutor\General.xls"

    DBMS=EXCEL2000 REPLACE;

    GETNAMES=YES;

    RUN;

    See SAS Help for PROC IMPORT under the SAS procedures guide. Download General.xls.

    Fig. 4. Output Approach

    PROCEXPORT DATA= Tutor.Genl

    OUTFILE= "C:\Documents and Settings\mendonca\MyDocuments\SAS Demos\Tutor\General.xls"

    DBMS=EXCEL2000 REPLACE;

    RUN;

    For other ideas, see SAS Help for PROC EXPORT under the SAS procedures guide.

    Save as inputoutput.sas.

  • 7/28/2019 Introductory SAS Tutori

    3/5

    Statistical Operations

    Objectives: Understand how to generate descriptive statistics, perform a t-test and run a generalized linear

    model.

    2.3 Descriptive Statistics

    To have a look at some basic summary statistics, use PROC MEANS. See SAS Help for The Means

    Procedure at Base SAS>SAS Procedures>Procedures.

    Fig. 5. PROC MEANS

    /*Basic data description*/

    DATAheart;

    SET Tutor.fham;

    RUN;

    /*The data must first be sorted by the vars in the BY statement of PROC

    MEANS*/

    PROCSORTDATA = heart;

    by cause sex;

    RUN;

    PROCMEANSDATA = heart; /*the data step is redundant*/

    VAR age chol;

    BY cause sex;

    RUN;

    /*the output is visible in the Results window*/

    2.4 Statistical Tests

    SAS has the capability of performing a very wide variety of parametric and non-parametric statistical tests.

    Tests based on the Students tdistribution are common and so are discussed here. The TTEST procedureperforms t tests for one sample, two samples, and paired observations. The one-sample t test compares the

    mean of the sample to a given number. The two-sample t test compares the mean of the first sample minusthe mean of the second sample to a given number. The paired observations t test compares the mean of the

    differences in the observations to a given number [PROC TTEST documentation].

    See help in SAS/STAT Users Guide>The PROC TTEST procedure. Get file PROC_TTESTdata.sas.

    Fig. 6. PROC TTEST

    procttestdata=graze;

    class GrazeType;

    var WtGain;

    run;

    Is this a one- or two-sample test? What is the null hypothesis?

    Should the results be based on the assumption of equal or unequal variance? What is the result?

    2.5 General Linear Models

    The GLM procedure uses the method of least squares to fit general linear models. Among the statistical

    methods available in PROC GLM are regression, analysis of variance, analysis of covariance, multivariate

    analysis of variance, and partial correlation [PROC GLM documentation]. So, it does all analyses that

    PROC REG does and more:

    simple regression

    multiple regression

  • 7/28/2019 Introductory SAS Tutori

    4/5

    analysis of variance (ANOVA), especially for unbalanced data

    analysis of covariance

    response-surface models

    weighted regression

    polynomial regression

    partial correlation

    multivariate analysis of variance (MANOVA)

    repeated measures analysis of variance

    A limitation is that it does not have a built-in PLOT option.

    Help: SAS/STAT Users Guide>The PROC GLM procedure. Download the file

    Fig. 7. PROC GLM

    procglmdata=fitness;

    model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse;

    run;

    3 Graphing Procedures

    Objective: Understand how to generate a scatter plot.

    See SAS Help for The PLOT Procedure at Base SAS>SAS Procedures>Procedures. Fit the model.

    Fig. 8. PROC PLOT

    procglmdata=fitness;

    model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse;

    /*need access to output data*/

    outputout=Tutor.mfitness predicted=Oxygenhat r=resid stdr=eresid

    run;

    DATAmfitness;

    SET Tutor.mfitness;

    RUN;

    PROCPLOTDATA = mfitness;

    plot resid*Oxygenhat;

    title'Physical Fitness Analysis';

    title2'Model: Oxygen=Age Weight RunTime RunPulse RestPulse

    MaxPulse';

    title3'Residuals vs. Predicted';

    run;

  • 7/28/2019 Introductory SAS Tutori

    5/5

    4 Links

    http://www.sas.com

    http://gsbapp2.uchicago.edu/sas/sashtml/main.htm (look under Base SAS, SAS/STAT and

    SAS/GRAPH)

    http://www.stat.wisc.edu/computing/sas/intro.html

    http://www.stat.wisc.edu/computing/sas/ (miscellaneous links)

    http://www.sas-jobs.com/

    http://www.sasusers.com/

    5 Contact Information

    David MendonaEmail: [email protected]

    Phone: x5212

    mailto:[email protected]:[email protected]