research statistics jobayer hossain, phd larry holmes, jr, phd,drph, face october 2, 2008

57
RESEARCH STATISTICS RESEARCH STATISTICS Jobayer Hossain, PhD Larry Holmes, Jr, PhD,DrPH, FACE October 2, 2008

Post on 20-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

RESEARCH STATISTICSRESEARCH STATISTICS

Jobayer Hossain PhD

Larry Holmes Jr PhDDrPH FACE

October 2 2008

Class StructureClass Structure

Course Website httpmedsciudeleduopenStatClassOctober2008

Classes 8 Contact Hours 2 hours Assignment

ndash 3 Take-home ndash To be assigned in week 3 5 and 6ndash Due in week 4 6and 8

1 Take-home final examassignmentndash Assigned in week 8 -- return for final comments via e-

mail

Class ParticipationClass Participation

Default datasetndash 60 subjectsndash 3 or 4 groupsndash Several measures of different types

(Nominal Ordinal Interval Ratio)

Contributed datasets - (bring your own)

ndash DE-IDENTIFIED

Areas of special interestndash Let us know yours

Course objectivesCourse objectives

At the end of the course participants are expected to

ndash Understand the basic notion of statistics in research

ndash Know designs used to conduct research

ndash Understand some key elements in research such as- selection of

criteria of subjects variables measurement scales of variables

and hypothesis

ndash Learn various statistical techniques used to analyze data

ndash Be able to interpret results and draw conclusion

ndash Learn the tools used in the analysis of data ndash Excel and SPSS

Research Design and MethodologyResearch Design and Methodology

Research is the process of investigating scientific questions

Steps in Research process-

ndash Defining the problem and conceptualizing the study

ndash Designing and conducting study

Collecting data

Analyzing data

ndash Making sense of data

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevant previous research and identify-

ndash The problem (s) and causes of the problem (s)

ndash State outcomes of previous research on the problem

ndash State clearly what you are planning for the proposed research

ndash Form careful research questions and hypotheses

ndash Identify variables needed to achieve the objective (s)

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevanthellipidentify contd

ndash Identify scales to measure the variables

ndash Assess the feasibility of study objectives ie assess if it

is measurable what you want to measure

ndash Identify the target populations and define the eligibility

criteria

Research QuestionResearch Question

Example -

ndash Does smoking increase the risk of renal carcinoma

ndash Is oral inhaler effective in controlling asthma among

children

Hypothesis statementHypothesis statement

Example -ndash Smoking increases the risk of renal carcinoma in

pediatric patient

ndash Oral inhaler is effective in controlling asthma among

children

Study ObjectiveStudy Objective

The purpose or aim of the study

Example-

ndash To assess the risk of renal carcinoma associated with

smoking among pediatric patients (primary objective)

ndash To determine the race and gender disparities in the

prevalence of smoking (secondary objective)

Study variableStudy variable

Refers to measurement that changes from one individual to

another

Example- age gender BMI Systolic blood pressure

hematocrit

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Class StructureClass Structure

Course Website httpmedsciudeleduopenStatClassOctober2008

Classes 8 Contact Hours 2 hours Assignment

ndash 3 Take-home ndash To be assigned in week 3 5 and 6ndash Due in week 4 6and 8

1 Take-home final examassignmentndash Assigned in week 8 -- return for final comments via e-

mail

Class ParticipationClass Participation

Default datasetndash 60 subjectsndash 3 or 4 groupsndash Several measures of different types

(Nominal Ordinal Interval Ratio)

Contributed datasets - (bring your own)

ndash DE-IDENTIFIED

Areas of special interestndash Let us know yours

Course objectivesCourse objectives

At the end of the course participants are expected to

ndash Understand the basic notion of statistics in research

ndash Know designs used to conduct research

ndash Understand some key elements in research such as- selection of

criteria of subjects variables measurement scales of variables

and hypothesis

ndash Learn various statistical techniques used to analyze data

ndash Be able to interpret results and draw conclusion

ndash Learn the tools used in the analysis of data ndash Excel and SPSS

Research Design and MethodologyResearch Design and Methodology

Research is the process of investigating scientific questions

Steps in Research process-

ndash Defining the problem and conceptualizing the study

ndash Designing and conducting study

Collecting data

Analyzing data

ndash Making sense of data

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevant previous research and identify-

ndash The problem (s) and causes of the problem (s)

ndash State outcomes of previous research on the problem

ndash State clearly what you are planning for the proposed research

ndash Form careful research questions and hypotheses

ndash Identify variables needed to achieve the objective (s)

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevanthellipidentify contd

ndash Identify scales to measure the variables

ndash Assess the feasibility of study objectives ie assess if it

is measurable what you want to measure

ndash Identify the target populations and define the eligibility

criteria

Research QuestionResearch Question

Example -

ndash Does smoking increase the risk of renal carcinoma

ndash Is oral inhaler effective in controlling asthma among

children

Hypothesis statementHypothesis statement

Example -ndash Smoking increases the risk of renal carcinoma in

pediatric patient

ndash Oral inhaler is effective in controlling asthma among

children

Study ObjectiveStudy Objective

The purpose or aim of the study

Example-

ndash To assess the risk of renal carcinoma associated with

smoking among pediatric patients (primary objective)

ndash To determine the race and gender disparities in the

prevalence of smoking (secondary objective)

Study variableStudy variable

Refers to measurement that changes from one individual to

another

Example- age gender BMI Systolic blood pressure

hematocrit

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Class ParticipationClass Participation

Default datasetndash 60 subjectsndash 3 or 4 groupsndash Several measures of different types

(Nominal Ordinal Interval Ratio)

Contributed datasets - (bring your own)

ndash DE-IDENTIFIED

Areas of special interestndash Let us know yours

Course objectivesCourse objectives

At the end of the course participants are expected to

ndash Understand the basic notion of statistics in research

ndash Know designs used to conduct research

ndash Understand some key elements in research such as- selection of

criteria of subjects variables measurement scales of variables

and hypothesis

ndash Learn various statistical techniques used to analyze data

ndash Be able to interpret results and draw conclusion

ndash Learn the tools used in the analysis of data ndash Excel and SPSS

Research Design and MethodologyResearch Design and Methodology

Research is the process of investigating scientific questions

Steps in Research process-

ndash Defining the problem and conceptualizing the study

ndash Designing and conducting study

Collecting data

Analyzing data

ndash Making sense of data

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevant previous research and identify-

ndash The problem (s) and causes of the problem (s)

ndash State outcomes of previous research on the problem

ndash State clearly what you are planning for the proposed research

ndash Form careful research questions and hypotheses

ndash Identify variables needed to achieve the objective (s)

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevanthellipidentify contd

ndash Identify scales to measure the variables

ndash Assess the feasibility of study objectives ie assess if it

is measurable what you want to measure

ndash Identify the target populations and define the eligibility

criteria

Research QuestionResearch Question

Example -

ndash Does smoking increase the risk of renal carcinoma

ndash Is oral inhaler effective in controlling asthma among

children

Hypothesis statementHypothesis statement

Example -ndash Smoking increases the risk of renal carcinoma in

pediatric patient

ndash Oral inhaler is effective in controlling asthma among

children

Study ObjectiveStudy Objective

The purpose or aim of the study

Example-

ndash To assess the risk of renal carcinoma associated with

smoking among pediatric patients (primary objective)

ndash To determine the race and gender disparities in the

prevalence of smoking (secondary objective)

Study variableStudy variable

Refers to measurement that changes from one individual to

another

Example- age gender BMI Systolic blood pressure

hematocrit

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Course objectivesCourse objectives

At the end of the course participants are expected to

ndash Understand the basic notion of statistics in research

ndash Know designs used to conduct research

ndash Understand some key elements in research such as- selection of

criteria of subjects variables measurement scales of variables

and hypothesis

ndash Learn various statistical techniques used to analyze data

ndash Be able to interpret results and draw conclusion

ndash Learn the tools used in the analysis of data ndash Excel and SPSS

Research Design and MethodologyResearch Design and Methodology

Research is the process of investigating scientific questions

Steps in Research process-

ndash Defining the problem and conceptualizing the study

ndash Designing and conducting study

Collecting data

Analyzing data

ndash Making sense of data

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevant previous research and identify-

ndash The problem (s) and causes of the problem (s)

ndash State outcomes of previous research on the problem

ndash State clearly what you are planning for the proposed research

ndash Form careful research questions and hypotheses

ndash Identify variables needed to achieve the objective (s)

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevanthellipidentify contd

ndash Identify scales to measure the variables

ndash Assess the feasibility of study objectives ie assess if it

is measurable what you want to measure

ndash Identify the target populations and define the eligibility

criteria

Research QuestionResearch Question

Example -

ndash Does smoking increase the risk of renal carcinoma

ndash Is oral inhaler effective in controlling asthma among

children

Hypothesis statementHypothesis statement

Example -ndash Smoking increases the risk of renal carcinoma in

pediatric patient

ndash Oral inhaler is effective in controlling asthma among

children

Study ObjectiveStudy Objective

The purpose or aim of the study

Example-

ndash To assess the risk of renal carcinoma associated with

smoking among pediatric patients (primary objective)

ndash To determine the race and gender disparities in the

prevalence of smoking (secondary objective)

Study variableStudy variable

Refers to measurement that changes from one individual to

another

Example- age gender BMI Systolic blood pressure

hematocrit

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Research Design and MethodologyResearch Design and Methodology

Research is the process of investigating scientific questions

Steps in Research process-

ndash Defining the problem and conceptualizing the study

ndash Designing and conducting study

Collecting data

Analyzing data

ndash Making sense of data

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevant previous research and identify-

ndash The problem (s) and causes of the problem (s)

ndash State outcomes of previous research on the problem

ndash State clearly what you are planning for the proposed research

ndash Form careful research questions and hypotheses

ndash Identify variables needed to achieve the objective (s)

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevanthellipidentify contd

ndash Identify scales to measure the variables

ndash Assess the feasibility of study objectives ie assess if it

is measurable what you want to measure

ndash Identify the target populations and define the eligibility

criteria

Research QuestionResearch Question

Example -

ndash Does smoking increase the risk of renal carcinoma

ndash Is oral inhaler effective in controlling asthma among

children

Hypothesis statementHypothesis statement

Example -ndash Smoking increases the risk of renal carcinoma in

pediatric patient

ndash Oral inhaler is effective in controlling asthma among

children

Study ObjectiveStudy Objective

The purpose or aim of the study

Example-

ndash To assess the risk of renal carcinoma associated with

smoking among pediatric patients (primary objective)

ndash To determine the race and gender disparities in the

prevalence of smoking (secondary objective)

Study variableStudy variable

Refers to measurement that changes from one individual to

another

Example- age gender BMI Systolic blood pressure

hematocrit

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevant previous research and identify-

ndash The problem (s) and causes of the problem (s)

ndash State outcomes of previous research on the problem

ndash State clearly what you are planning for the proposed research

ndash Form careful research questions and hypotheses

ndash Identify variables needed to achieve the objective (s)

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevanthellipidentify contd

ndash Identify scales to measure the variables

ndash Assess the feasibility of study objectives ie assess if it

is measurable what you want to measure

ndash Identify the target populations and define the eligibility

criteria

Research QuestionResearch Question

Example -

ndash Does smoking increase the risk of renal carcinoma

ndash Is oral inhaler effective in controlling asthma among

children

Hypothesis statementHypothesis statement

Example -ndash Smoking increases the risk of renal carcinoma in

pediatric patient

ndash Oral inhaler is effective in controlling asthma among

children

Study ObjectiveStudy Objective

The purpose or aim of the study

Example-

ndash To assess the risk of renal carcinoma associated with

smoking among pediatric patients (primary objective)

ndash To determine the race and gender disparities in the

prevalence of smoking (secondary objective)

Study variableStudy variable

Refers to measurement that changes from one individual to

another

Example- age gender BMI Systolic blood pressure

hematocrit

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Defining the problem and Defining the problem and conceptualizing the studyconceptualizing the study

Review relevanthellipidentify contd

ndash Identify scales to measure the variables

ndash Assess the feasibility of study objectives ie assess if it

is measurable what you want to measure

ndash Identify the target populations and define the eligibility

criteria

Research QuestionResearch Question

Example -

ndash Does smoking increase the risk of renal carcinoma

ndash Is oral inhaler effective in controlling asthma among

children

Hypothesis statementHypothesis statement

Example -ndash Smoking increases the risk of renal carcinoma in

pediatric patient

ndash Oral inhaler is effective in controlling asthma among

children

Study ObjectiveStudy Objective

The purpose or aim of the study

Example-

ndash To assess the risk of renal carcinoma associated with

smoking among pediatric patients (primary objective)

ndash To determine the race and gender disparities in the

prevalence of smoking (secondary objective)

Study variableStudy variable

Refers to measurement that changes from one individual to

another

Example- age gender BMI Systolic blood pressure

hematocrit

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Research QuestionResearch Question

Example -

ndash Does smoking increase the risk of renal carcinoma

ndash Is oral inhaler effective in controlling asthma among

children

Hypothesis statementHypothesis statement

Example -ndash Smoking increases the risk of renal carcinoma in

pediatric patient

ndash Oral inhaler is effective in controlling asthma among

children

Study ObjectiveStudy Objective

The purpose or aim of the study

Example-

ndash To assess the risk of renal carcinoma associated with

smoking among pediatric patients (primary objective)

ndash To determine the race and gender disparities in the

prevalence of smoking (secondary objective)

Study variableStudy variable

Refers to measurement that changes from one individual to

another

Example- age gender BMI Systolic blood pressure

hematocrit

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Hypothesis statementHypothesis statement

Example -ndash Smoking increases the risk of renal carcinoma in

pediatric patient

ndash Oral inhaler is effective in controlling asthma among

children

Study ObjectiveStudy Objective

The purpose or aim of the study

Example-

ndash To assess the risk of renal carcinoma associated with

smoking among pediatric patients (primary objective)

ndash To determine the race and gender disparities in the

prevalence of smoking (secondary objective)

Study variableStudy variable

Refers to measurement that changes from one individual to

another

Example- age gender BMI Systolic blood pressure

hematocrit

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Study ObjectiveStudy Objective

The purpose or aim of the study

Example-

ndash To assess the risk of renal carcinoma associated with

smoking among pediatric patients (primary objective)

ndash To determine the race and gender disparities in the

prevalence of smoking (secondary objective)

Study variableStudy variable

Refers to measurement that changes from one individual to

another

Example- age gender BMI Systolic blood pressure

hematocrit

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Study variableStudy variable

Refers to measurement that changes from one individual to

another

Example- age gender BMI Systolic blood pressure

hematocrit

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Outcome vs independent variableOutcome vs independent variable

Responseoutcome variable Measures the outcome of the study treatment or experimental manipulation

Examples-ndash Renal carcinoma incident among children

ndash Asthma control in pediatric asthmatic patients

Independent predictorexplanatory variable Explains or influences changes in a response variable

Examples-ndash Smoking

ndash Oral inhaler

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Scale of variableoutput measurementScale of variableoutput measurement

Nominal - Categorical variables without any order or

ranking sequence such as names or classes (eg gender)

Binary- two categories multinomial- more than two

categories

Ordinal - Variables with an inherent rank or order eg

mild moderate severe Can be compared for equality or

greater or less but not how much greater or less

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Scale of variableoutput measurementScale of variableoutput measurement

Interval - Values of the variable are ordered as in Ordinal and

additionally differences between values are meaningful however the

scale is not absolutely anchored Calendar dates and temperatures on

the Fahrenheit scale are examples Addition and subtraction but not

multiplication and division are meaningful operations

Ratio - Variables with all properties of Interval plus an absolute non-

arbitrary zero point eg age weight temperature (Kelvin) Addition

subtraction multiplication and division are all meaningful operations

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Measurement biasMeasurement bias

Bias arises due to measurement error

Example-

ndash Suppose In the case of remission of Asthma the possible

outcomes are complete remission partial remission and no

remission If we measure the outcome variable as only remission

and non-remission basically we are committing an error by

putting partial remission in the non-remission group (type II error)

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Designing the studyDesigning the study

A study design is a careful advance plan of data collection

and the analytic approach needed to answer the research

question under investigation in a scientific way

The basic elements of a study design-

ndash Selecting an appropriate sample size for a specified

level of power and level of significance

ndash Selecting methods of sampling data collection and

analysis appropriate to the studys objectives

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

ClinicalExperimental versus Observational designClinicalExperimental versus Observational design

The Lancet 2002 Vol 359

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

The choice of a design mainly depends on the research

question (s) and type of research conduct ( experimental

or observational)

Experimental Interventional The investigator controls

the experimental environment in which the hypothesis is

tested The randomized double-blind clinical trial is the

gold standard

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

ClinicalExperimental vs Observational ClinicalExperimental vs Observational designdesign

Non-experimentalObservational The population is

observed without any interference by the investigator

For example in a study to see the effect of smoking it is

impossible for an investigator to assign smoking to the subjects

Instead investigator can study the effect by choosing a control

group and find the cause and relation effect Some examples are-

ndash Cross-sectional study

ndash Cohort study

ndash Case-control study

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Randomized control designRandomized control design

Random allocation of subjects to different interventions

(or treatments) for the purpose of comparingdetermining

the efficacy of the study treatment (s)

ndash Eg placebo or standard medication (active control) can

be used as a control

ndash Patients with cancer or painful disease can not receive

placebo as a control

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Randomized control designRandomized control design

Blindness Reduces the bias due to the preconception or

personal bias ndash Open trial Investigator and subject know the full details of the

treatment

ndash Single-blind trial Investigator knows about the treatment but

subject does not

ndash Double-blind Both investigator and subject do not know about the

treatment

ndash Triple-blind Sponsor investigator and subject do not know about

the treatment

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Distribution of a variableDistribution of a variable

Distribution - (of a variable) tells us what values the

variable takes and how often it takes these values Eg

distribution of some 26 pediatric patients of ages 1 to 6

at AIDHC are as follows-

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

StatisticsStatistics

Science of data collection summarization analysis

and interpretation

Descriptive versus Inferential Statistics

ndash Descriptive Statistic Data description

(summarization) such as center variability and

shape

ndash Inferential Statistic Drawing conclusion beyond the

sample studied allowing for prediction

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

A Taxonomy ofA Taxonomy of StatisticsStatistics

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

How does statistics help usHow does statistics help us

Age Distribution

0

2

4

6

8

10

12

14

16

40 60 80 100 120 140 More

Age in Month

Nu

mb

er o

f S

ub

ject

s

Ages (in month) of the 60 patients in our data set 1 are- 71 127 65 82 140 53 114 56 84 65 67 134 64 hellip 91 51

Mean 9041666667

Standard Error 3902649518

Median 84

Mode 84

Standard Deviation 3022979318

Sample Variance 9138403955

Kurtosis -1183899591

Skewness 0389872725

Range 95

Minimum 48

Maximum 143

Sum 5425

Count 60

By simply looking at the data we fail to produce any informative account to describe the data how ever statistics produce a quick insight in to data using graphical and numerical statistical tools

60

80

10

01

20

14

0

Distribution of age

Ag

e (

mo

nth

)

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistical Description of DataStatistical Description of Data

Statistics describes a numeric set of data by its

Center (mean median mode etc)

Variability (standard deviation range etc)

Shape (skewness kurtosis etc)

Statistics describes a categorical set of data by

Frequency percentage or proportion of each

category

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistical inference is the process by which we acquire information about populations from samples

Two types of estimates for making inferencesndash Point estimationndash Interval estimate

Statistical Inference

Statistical Inference

Sample Population

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Population and samplePopulation and sample

Population The entire collection of individuals or measurements about which information is desired

Sample A subset of the population selected for study

ndash Primary objective is to create a subset of population whose center spread and shape are as close as that of population

ndash Methods of sampling Random sampling stratified sampling systematic sampling cluster sampling multistage sampling area sampling qoata sampling etc

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Parameter vs StatisticsParameter vs Statistics

Parameter

ndash Any statistical characteristic of a population

ndash Population mean population median population

standard deviation are examples of parameters

ndash Parameter describes the distribution of a population

ndash Parameters are fixed and usually unknown

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Parameter vs StatisticsParameter vs Statistics

Statistic Any statistical characteristic of a sample

ndash Sample mean sample median sample standard

deviation are some examples of statistics

ndash Statistic describes the distribution of population

ndash Value of a statistic is known and is varies for different

samples

ndash Are used for making inference on parameter

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Parameter vs StatisticsParameter vs Statistics

Statistical Issue To describe the distribution of a

population through census or making inference on

population distribution population parameter using sample

distribution statistic

Eg sample mean is an estimate of the population mean

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Hypothesis TestingHypothesis Testing

Null hypothesis and Alternative hypothesis

Real Situation Ho is true Ho is false Reject Ho Type I

error (α) Correct Decision (1-)

D e c i s i o n

Accept Ho Correct Decision (1- α)

Type II Error ()

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

ElementsSteps in hypothesisElementsSteps in hypothesis

Hypothesis testing steps

ndash 1 Null (Ho) and alternative (H1)hypothesis specification

ndash 2 Selection of significance level (alpha) - 005 or 001

ndash 3 Calculating the test statistic ndasheg t F Chi-square

ndash 4 Calculating the probability value (p-value) or confidence

Interval

ndash 5 Describing the result and statistic in an understandable

way

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Point estimator

Sample distribution

Parameter

Population distribution

bull A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point

Point Estimation

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Interval estimatorSample distribution

bull An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval

Population distribution Parameter

Interval Estimation

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

P-Value versus the Confidence IntervalP-Value versus the Confidence Interval

Two main ways to assess study precision and the role of

chance in a study

ndash P value measures ( in probability) the evidence against

the null hypothesis

ndash An interval within which the value of the parameter lies

with a specified probability

ndash Eg 95 CI implies that if one repeats a study 100

times the true measure of association will lie inside the

CI in 95 out of 100 measures

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Procedures for sample size Procedures for sample size calculationcalculation

Selection of primary variables of interest and formulation

of hypotheses

Information of standard deviation ( if numeric) or

proportion (if categorical)

A tolerance level of significance ()

Selection of reasonable test statistic

Power or Confidence level

A scientifically or clinically meaning effect difference

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Brief concept of Statistical SoftwareBrief concept of Statistical Software

There are many software packages to perform statistical

analysis and visualization of data Some of them are-

ndash System for Statistical Analysis (SAS) S-plus R Matlab Minitab

BMDP STATA SPSS StatXact Statistica LISREL JMP

GLIM HIL MS Excel etc We will discuss MS Excel and SPSS in

brief

useful websites-

httpwwwR-projectorg (a free but powerful statistical software)

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Microsoft ExcelMicrosoft Excel

A Spreadsheet Application It features calculation graphing tools pivot tables and a macro programming language called VBA (Visual Basic for Applications)

There are many versions of MS-Excel Excel XP Excel 2003 Excel 2007 are capable of performing a number of statistical analyses

Starting MS Excel Double click on the Microsoft Excel icon on the desktop or Click on Start --gt Programs --gt Microsoft Excel

Worksheet Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page Each cell is referenced by its coordinates For example A3 is used to refer to the cell in column A and row 3 B10B20 is used to refer to the range of cells in column B and rows 10 through 20

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Microsoft ExcelMicrosoft Excel

Creating Formulas 1 Click the cell that you want to enter the formula 2 Type = (an equal sign) 3 Click the Function Button 4 Select the formula you want and step through the on-screen instructions

xf

Opening a document File Open (From a existing workbook) Change the directory area or drive to look for file in other locations

Creating a new workbook FileNewBlank Document

Saving a File FileSave

Selecting more than one cell Click on a cell eg A1) then hold the Shift key and click on another (eg D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Microsoft ExcelMicrosoft Excel

Entering Date and Time Dates are stored as MMDDYYYY No need to enter in that format For example Excel will recognize Jan 9 or jan-9 as 192007 and Jan 9 1999 as 191999 To enter todayrsquos date press Ctrl and together Use a or p to indicate am or pm For example 830 p is interpreted as 830 pm To enter current time press Ctrl and together

Copy and Paste all cells in a Sheet Ctrl+A for selecting Ctrl +C for copying and Ctrl+V for Pasting

Sorting Data Sort Sort By hellip

Descriptive Statistics and other Statistical methods ToolsData Analysis Statistical method If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Microsoft ExcelMicrosoft Excel

Statistical and Mathematical Function Start with lsquo=lsquo sign and then select function from function wizard xf

Inserting a Chart Click on Chart Wizard (or InsertChart) select chart give Input data range Update the Chart options and Select output range Worksheet

Importing Data in Excel File open FileType Click on File Choose Option ( DelimitedFixed Width) Choose Options (Tab Semicolon Comma Space Other) Finish

Limitations Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extremecases

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

A general purpose statistical package SPSS is widely used in the social sciences particularly in sociology and psychology

SPSS can import data from almost any type of file to generate tabulated reports plots of distributions and trends descriptive statistics and complex statistical analyzes

Starting SPSS Double Click on SPSS on desktop or ProgramSPSS

Opening a SPSS file FileOpen

bull Data Editor

Various pull-down menus appear at the top of the Data Editor window These pull-down menus are at the heart of using SPSSWIN The Data Editor menu items (with some of the uses of the menu) are

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

FILE used to open and save data files

EDIT used to copy and paste data values used to find data in a file insert variables and cases OPTIONS allows the user to set general preferences as well as the setup for the Navigator Charts etc

VIEW user can change toolbars value labels can be seen in cells instead of data values

DATA select sort or weight cases merge files

MENUS AND TOOLBARS

TRANSFORM Compute new variables recode variables etc

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

ANALYZE perform various statistical procedures

GRAPHS create bar and pie charts etc

UTILITIES add comments to accompany data file (and other advanced features)

ADD-ons these are features not currently installed (advanced statistical procedures)

WINDOW switch between data syntax and navigator windows

HELP to access SPSSWIN Help information

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Navigator (Output) Menus

When statistical procedures are run or charts are created the output will appear in the Navigator window The Navigator window contains many of the pull-down menus found in the Data Editor window Some of the important menus in the Navigator window include

INSERT used to insert page breaks titles charts etc

FORMAT for changing the alignment of a particular portion of the output

MENUS AND TOOLBARS

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Formatting Toolbar

When a table has been created by a statistical procedure the user can edit the table to create a desired look or adddelete information Beginning with version 140 the user has a choice of editing the table in the Output or opening it in a separate Pivot Table (DEFINE) window Various pulldown menus are activated when the user double clicks on the table These include

EDIT undo and redo a pivot select a table or table body (eg to change the font)

INSERT used to insert titles captions and footnotes

PIVOT used to perform a pivot of the row and column variables

FORMAT various modifications can be made to tables and cells

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing tab-delimited dataIn SPSSWIN click on FILE OPEN DATA Look in the appropriate location for rArr rArrthe text file Then select ldquoTextrdquo from ldquoFiles of typerdquo Click on the file name and then click on ldquoOpenrdquo You will see the Text Import Wizard ndash step 1 of 6 dialog box

You will now have an SPSS data file containing the former tab-delimited data You simply need to add variable and value labels and define missing values

Exporting Data to Excelclick on FILE SAVE AS Click on the File Name for the file to be exported For rArrthe ldquoSave as Typerdquo select from the pull-down menu Excel (xls) You will notice the checkbox for ldquowrite variable names to spreadsheetrdquo Leave this checked as you will want the variable names to be in the first row of each column in the Excel spreadsheet Finally click on Save

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

bull Additional menusCHART EDITOR used to edit a graph

SYNTAX EDITOR used to edit the text in a syntax window

bull Show or hide a toolbar

Click on VIEW TOOLBARS 1048635to show it to hide itrArr rArr

bull Move a toolbar

Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to its new location

bull Customize a toolbar

Click on VIEW TOOLBARS CUSTOMIZErArr rArr

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheetData from an Excel spreadsheet can be imported into SPSSWIN as follows1 In SPSSWIN click on FILE OPEN DATA The OPEN DATA FILE Dialog rArr rArrBox will appear2 Locate the file of interest Use the Look In pull-down list to identify the folder containing the Excel file of interest3 From the FILE TYPE pull down menu select EXCEL (xls)

4 Click on the file name of interest and click on OPEN or simply double-click on the file name

5 Keep the box checked that reads Read variable names from the first row of data This presumes that the first row of the Excel data file contains variable names in the first row [If the data resided in a different worksheet in the Excel file this would need to be entered]

6 Click on OK The Excel data file will now appear in the SPSSWIN Data Editor

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Importing data from an EXCEL spreadsheet

7 The former EXCEL spreadsheet can now be saved as an SPSS file (FILE rArrSAVE AS) and is ready to be used in analyses Typically you would label variable and values and define missing values

Importing an Access tableSPSSWIN does not offer a direct import for Access tables Therefore we must follow these steps1 Open the Access file2 Open the data table3 Save the data as an Excel file4 Follow the steps outlined in the data import from Excel Spreadsheet to SPSSWIN

Importing Text Files into SPSSWINText data points typically are separated (or ldquodelimitedrdquo) by tabs or commas Sometimes they can be of fixed format

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Running the FREQUENCIES procedure

1 Open the data file (from the menus click on FILE OPEN DATA) of rArr rArrinterest

2 From the menus click on ANALYZE DESCRIPTIVE STATISTICS rArr rArrFREQUENCIES

3 The FREQUENCIES Dialog Box will appear In the left-hand box will be a listing (source variable list) of all the variables that have been defined in the data file The first step is identifying the variable(s) for which you want to run a frequency analysis Click on a variable name(s) Then click the [ gt ] pushbutton The variable name(s) will now appear in the VARIABLE[S] box (selected variable list) Repeat these steps for each variable of interest

4 If all that is being requested is a frequency table showing count percentages (raw adjusted and cumulative) then click on OK

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting STATISTICSDescriptive and summary STATISTICS can be requested for numeric variables To request Statistics1 From the FREQUENCIES Dialog Box click on the STATISTICS pushbutton2 This will bring up the FREQUENCIES STATISTICS Dialog Box3 The STATISTICS Dialog Box offers the user a variety of choices

DESCRIPTIVES

The DESCRIPTIVES procedure can be used to generate descriptive statistics (click on ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES) The rArr rArrprocedure offers many of the same statistics as the FREQUENCIES procedure but without generating frequency analysis tables

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

Requesting CHARTSOne can request a chart (graph) to be created for a variable or variables included in a FREQUENCIES procedure

1 In the FREQUENCIES Dialog box click on CHARTS2 The FREQUENCIES CHARTS Dialog box will appear Choose the intended chart (eg Bar diagram Pie chart histogram

Pasting charts into Word1 Click on the chart2 Click on the pulldown menu EDIT COPY OBJECTSrArr3 Go to the Word document in which the chart is to be embedded Click on EDIT rArr PASTE SPECIAL4 Select Formatted Text (RTF) and then click on OK5 Enlarge the graph to a desired size by dragging one or more of the black squares along the perimeter (if the black squares are not visible click once on the graph)

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

BASIC STATISTICAL PROCEDURES CROSSTABS

1 From the ANALYZE pull-down menu click on DESCRIPTIVE STATISTICS rArrCROSSTABS

2 The CROSSTABS Dialog Box will then open

3 From the variable selection box on the left click on a variable you wish to designate as the Row variable The values (codes) for the Row variable make up the rows of the crosstabs table Click on the arrow (gt) button for Row(s) Next click on a different variable you wish to designate as the Column variable The values (codes) for the Column variable make up the columns of the crosstabstable Click on the arrow (gt) button for Column(s)

4 You can specify more than one variable in the Row(s) andor Column(s) A cross table will be generated for each combination of Row and Column variables

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

Limitations SPSS users have less control over data manipulation and statistical output than other statistical packages such as SAS Stata etc

SPSS is a good first statistical package to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages

Statistics PackageStatistics Packagefor the Social Science (SPSS)for the Social Science (SPSS)

QuestionsQuestions

QuestionsQuestions