lecture 1- what is sas system

27
ADVANCED DATA ANALYSIS ADVANCED DATA ANALYSIS AN OVERVIEW AN OVERVIEW

Upload: xarack

Post on 04-Oct-2015

224 views

Category:

Documents


0 download

DESCRIPTION

SAS intro

TRANSCRIPT

  • ADVANCED DATA ANALYSISAN OVERVIEW

  • WHAT DOES ADVANCED DATA ANALYSIS INVOLVE?Data Acquisition

    Data ManagementData manipulation to get the data into the form you need for analysisData cleaning to identify errors and outliers etc. in data

    Exploratory Data AnalysisData visualization: producing graphical representations to see relationships hidden in the dataData summarization

  • Data Analysis: Wide range of techniques are available for analyzing data. For example:

    Regression based techniques & diagnosticsPrincipal components/ Common Factor AnalysisStructural Equations ModelingTime Series Methods (unit root testing, Co-integration tests, VAR etc.)

    Reporting

  • Programming ApproachesGraphical user interface/Point-and-click approach Example: EVIEWS, Pros: Ease of useCons:limited flexibility, narrow area of specializationSuitability:For learning first courses in econometricsPre-Programmed and user-written routinesExample: Stata,Pros: Greater flexibility, greater degree of specializationCons: Data management is good, not great!Suitability: Applied econometrics courses

  • Programming ApproachR languagePros: Great data management, strong analytics, availability of new packages, and Its Free!Cons: Learning Curve, GUI is still evolving (R Studio)SASPros: Industry Standard (Corporate sector values SAS skills very highly), data management is second to none, Strong analytics, especially business analytics , Mature GUI, and its now Free for academic use!!Cons: learning curve, Expensive (not our concern any more!)

  • Components of Base SAS Software

  • I. Data Management FacilitySAS organizes data into a rectangular form called SAS data set. A SAS dataSet is shown below Data sets are created by writing code in SAS programming language. They can be modified by programming statements. A common use of data sets is to provide information input to computational proceduresVariableObservation

  • Creating SAS data set by reading raw data1. DATA statementTells SAS to begin building a data set. INPUT statement Specifies fields to be read and variables to be created from them. DATALINES Indicates that lines of data are to follow. Semicolon Marks end of in-stream raw data. RUN statement Marks the end of data stepI. Data Management Facility

  • Other ways of creating SAS data set

    Reading data stored in external file (ascii, csv, and tab-delimited etc.)

    Importing data from other applicationsSpreadsheet (e.g. Excel) SPSSDbaseOther DBMS (e.g. Oracle)

    Reading data from one or more other SAS datasets

    I. Data Management Facility

  • II. Programming LanguageElements of the SAS Language Statements Data Weight_club; or Run;

    ExpressionsX + Y ; or Age

  • II. Programming LanguageRules for SAS Statements SAS statements end with a semicolon. Example: Data Weight_club ;

    SAS statements can be entered in lowercase, uppercase, or a mixture of the two. Example : DATA WEIGHT_Club ; or data weight_club ;

    SAS statements can begin in any column of a line and several statements can be written on the same lineExample: Data Weight_club; Set clubdata ;

    A SAS statement on one line and continue it on another line, but you cannot split a word between two lines.

    Words in SAS statements are separated by blanks or by special characters . Example: Loss = Startweight EndWeight ;

  • II. Programming LanguageRules for SAS Names SAS names are used for SAS data set names, variable names, and other items.

    A SAS name can contain from one to 32 characters.

    The first character must be a letter or an underscore (_).

    Subsequent characters must be letters, numbers, or underscores.

    Blanks cannot appear in SAS names.

  • III. Data Analysis and Reporting UtilitiesData Step programming is a very powerful tool for data analysis SAS also has a library of built-in programs known as SAS procedures.

    A SAS procedure is referred as PROC.Example : PROC PRINT

    Proc print data= weight_club ; title Weight Club Data ; run;proc print calls the procedure

    data is SAS keyword followed by name of SAS dataset which is to supply dataData to the procedure

    Title statement makes SAS print the title on output

  • III. Data Analysis and Reporting UtilitiesProc Print produces the following output.

  • III. Data Analysis and Reporting UtilitiesConsider another SAS data set named CLASS on Height, Weight and Age of studentsDATA CLASS

  • III. Data Analysis and Reporting UtilitiesAnother example

    SAS PROC REG can be used to run a regression of variable Weight on variableHeight. The SAS code is shown below:

    proc reg data=Class; model Weight = Height; run;

  • III. Data Analysis and Reporting Utilities

    Output of PROC REG

  • Output Produced by the SAS System

    Traditional Output

    SAS data set SAS log

    Listing or report

    Other files such as Graphs

    Files used in other databases such as ORACLEIII. Data Analysis and Reporting Utilities

  • III. Data Analysis and Reporting Utilities

  • The Output Delivery System (ODS) enables you to produce output in a variety of formats, such as

    an HTML file

    traditional SAS Listing (monospace)

    PostScript file

    RTF file (for use with Microsoft Word)

    an output data setIII. Data Analysis and Reporting UtilitiesOutput from the Output Delivery System (ODS)

  • III. Data Analysis and Reporting Utilities

  • III. Data Analysis and Reporting UtilitiesExample of ODS html output from PROC REG

  • Running Programs in the SAS Windowing EnvironmentShortcut Keys

    F5 Editor

    F6 Log

    F7 Output

    F8 Run

    Ctrl-E Clear Window

  • Running Programs in the SAS Windowing EnvironmentEditing a Program in the Program Editor WindowWriteProgram here

  • Running Programs in the SAS Windowing EnvironmentRemember: F7 key brings up the output window !