programmability in spss 14, spss 15 and spss 16 the revolution continues jon peck technical advisor...

35
Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Upload: aaron-grady

Post on 26-Mar-2015

293 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Programmability in SPSS 14, SPSS 15 and SPSS 16

The Revolution ContinuesJon PeckTechnical AdvisorSPSS

Copyright (c) SPSS Inc, 2007Copyright (c) SPSS Inc, 2007

Page 2: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Recap of SPSS 14 Python programmability

Developer Central

New features in SPSS 15 programmability Writing first-class procedures Updating the data

New features in SPSS 16 programmability

Interacting with the user

Q & A

Conclusion

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Agenda

Page 3: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

"Because of programmability, SPSS 14 is the most important

release since I started using SPSS fifteen years ago."

"I think I am going to like using Python."

"Python and SPSS 14 and later are, IMHO, GREAT!"

"By the way, Python is a great addition to SPSS."

From InfoWorld (April 19, 2007) "Of all the tools fueling the dynamic-language trend in the enterprise,

general-purpose dynamic languages such as Python and Ruby present

the greatest upside for enhancing developer productivity."

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Quotations from SPSS Users

Page 4: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

SPSS provides a powerful engine for statistical

and graphical methods and for data

management.

Python® provides a powerful, elegant, and

easy-to-learn language for controlling and

responding to this engine.

Together they provide a comprehensive system

for serious applications of analytical methods to

data.

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

The Combination of SPSS and Python

Page 5: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

SPSS 14.0 provided Programmability Multiple datasets Variable and File Attributes Programmability read-access to case data Ability to control SPSS from a Python program

SPSS 15 adds Read and write case data Create new variables directly rather than generating syntax Create pivot tables and text blocks via backend API's Easier setup

SPSS 16 will add EXTENSION command for user procedures with SPSS syntax Dataset features for complex data management Ability to use R procedures within SPSS through R Plug-In

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Programmability Features in SPSS 14, 15, and 16

Page 6: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Makes possible easy jobs that respond to datasets, output, environment

Allows greater generality, more automation

Makes jobs more robust

Allows extending the capabilities of SPSS

Enables better organized and more maintainable code

Facilitates staff specialization

Increases productivity

More fun

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Programmability Advantages

Page 7: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Python extends SPSS via General programming language Access to variable dictionary, case data, and output Access to standard and third-party modules SPSS Developer Central modules Module structure for building libraries of code

Runs in "back-end" syntax context (like macro) SaxBasic scripting runs in "front-end" context

Two modes Traditional SPSS syntax window Drive SPSS from Python (external mode)

Optional install (licensed with SPSS Base)

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Programmability Overview

Page 8: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

SPSS is not the owner or licensor of the Python

software. Any user of Python must agree to the

terms of the Python license agreement located

on the Python web site.  SPSS is not making any

statement about the quality of the Python

program. SPSS fully disclaims all liability

associated with your use of the Python program.

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Legal Notice

Page 9: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Supports implementing various programming

languages Requires a programmer to implement a new language

VB.NET Plug-In available on Developer Central Works only in external mode C

op

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

The SPSS Programmability Software Development Kit

Page 10: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Python interpreter embedded within SPSS

SPSS runs in traditional way until BEGIN PROGRAM command is found

Python collects commands until END PROGRAM command is found; then runs the program

Python can communicate with SPSS through API's (calls to functions) Includes running SPSS syntax inside Python program Includes creating macro values for later use in syntax

Python can access SPSS output and data

OMS is a key tool

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

How Programmability Works

Page 11: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

BEGIN PROGRAM.import spss, spssauxspssaux.GetSPSSInstallDir("SPSSDIR")spssaux.OpenDataFile("SPSSDIR/employee data.sav")

# find categorical variablescatVars = spssaux.VariableDict(variableLevel=['nominal', 'ordinal'])if catVars:

spss.Submit("FREQ " + " ".join(catVars.variables))# create a macro listing categorical variablesspss.SetMacroValue("!catVars", " ".join(catVars.variables))

END PROGRAM.

DESC !catVars. Run

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Example: Summarize Categorical Variables

Page 12: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Two modes of operation

SPSS Drives mode (inside): traditional syntax context BEGIN PROGRAM …program… END PROGRAM Program in 14, 15, or 16 is in Python or, new in 16, in R

X Drives mode (outside): eXternal program drives SPSS Python interpreter (or VB.NET) No SPSS Viewer, Data Editor, or SPSS user interface

Output sent as text to the application – can be suppressed Has performance advantages Build programs with an IDE

Even if to be run in traditional mode

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Programmability Inside or Outside SPSS

Page 13: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

PythonWin IDE Controlling SPSS(eXternal Mode)

Page 14: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Be productive quickly

Get more return as you learn more

Python.org

Python Tutorial

Cheeseshop over 2200 packages as of April 11, 2007

SPSS Developer Central

SPSS Programming and Data Management, 4th ed, 2006.

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Python Resources

Page 15: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Dive Into Python book or PDF

Practical Python by Magnus Lie Hetland Extensive examples and discussion of Python

Python Cookbook, 2nd ed by Martelli, Ravenscroft, & Ascher

Python in a Nutshell, 2nd ed by Martelli, O'Reilly Very clear, comprehensive reference material

wxPython in Action by Rappin and Dunn Explains user interface building with wxPython

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Python Books

Page 16: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

scipy 0.5.2 Scientific Algorithms Library for Python Scipy.org

scipy is an open source library of scientific tools for Python. scipy gathers a variety of high level science and engineering modules together as a single package. scipy provides modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, genetic algorithms, ODE solvers, special functions, and more. scipy requires and supplements NumPy, which provides a multidimensional array object and other basic functionality.

Python is becoming a major language for scientific computing

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cheeseshop: scipy

Page 17: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

SPSS Developer Central is the web home for developing SPSS applications

Python, .NET, R Integration Plug-Ins

Supplementary modules by SPSS and others

Articles on programmability and graphics

Forums for asking questions and exchanging information

Programmability Extension SDK

Get Python itself from Python.org or CD SPSS 14, 15 use 2.4. (2.4.3) SPSS 16 will use 2.5

Not limited to programmability GPL graphics User-contributed code

Key Supplementary Modulesspssauxspssdata

New for SPSS 15trans extendedTransforms rake plsenhanced tables.py

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

SPSS Developer Central

Page 18: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

tables.py module on Developer Central can merge two tables into one. E.g., Ctables significance tests into main tables Merge or replace cells with cells from a different table Flexibly define the join

tables.py can also censor cells, e.g., blank statistics based on small counts.

Merge example: data on importance of education qualifications for immigration by region of Europe CTABLES /TABLE qfimeduBin BY Region

/TITLES TITLE='Qualifications for Immigration'/COMPARETEST TYPE=PROP

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Example: Manipulating Output: Merging Tables

Page 19: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Ctables Output

Page 20: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

BEGIN PROGRAM.import spss, tablescmd=r"""CTABLES /TABLE qfimeduBin BY Region /TITLES TITLE='Qualifications for Immigration' /COMPARETEST TYPE=PROP"""tables.mergeLatest(cmd, autofit=False)END PROGRAM.

Runs Ctables and merges test table into main table Using default merge behavior

"If it really is this simple this will generate a lot of excitement for us."

"This is really fantastic."

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Program to Merge

Page 21: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Qualifications for ImmigrationComparisons of Column Proportions

974 376 1024A B D

533

1361B D

336 1282A B D

574

2940D

974 2720A B D

1555

3543 1130 2989B

2038

3585C

1288C

2540 2229A C

1931C

823A C

876 1299A C

0

1

2

3

4

5

Qualification forimmigration:good educationalqualifications

Count(A)

WesternCount

(B)

EasternCount

(C)

NorthernCount

(D)

SouthernRegion of Europe

Results are based on two-sided tests with significance level 0.05. For eachsignificant pair, the key of the category with the smaller column proportionappears under the category with the larger column proportion.

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Merged Output

Page 22: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

You can extend SPSS capabilities by building new procedures Or use ones that others have built

Combine SPSS procedures and transformations with Python logic Poisson regression (SPSS 14) example using iterated CNLR New raking procedure built over GENLOG

GENLINin SPSS 15

Calculate data aggregates in SPSS and pass to algorithm coded in Python Raking procedure starts with AGGREGATE; uses GENLOG

Acquire case data and compute in Python Use Python standard modules and third-party additions Partial Least Squares Regression (pls module)

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Approaches to Creating New Procedures

Page 23: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Common to adapt existing libraries or code for use as Python extension modules C, C++, VB, Fortran,...

Python tools and API's to assist Chap 25 in Python in a Nutshell Tutorial on extending and embedding the Python

interpreter

Call R programs with SPSS 16

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Adapt Existing Code Libraries

Page 24: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Regression with large number of predictors (even k > N)

Similar to Principal Components but considers dependent

variable simultaneously

Calculates principal components of (y, X) then use regression

on the scores instead of original data

Equivalent to ordinary regression when number of factors

equals number of predictors and one y variable

For more information see An Optimization Perspective on

Kernel Partial Least Squares Regression.pdf.

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Partial Least Squares Regression

Page 25: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Strategy Fetches data from SPSS Uses scipy matrix operations to compute results

Third-party module from Cheeseshop

Writes pivot tables to SPSS Viewer Subject to OMS SPSS 14 viewer module created pivot table using OLE

automation SPSS 15 has direct pivot table API's

Saves predicted values to active dataset

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

The pls Module for SPSS 15

Page 26: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

GET FILE="c:/spss15/tutorial/sample_files/car_sales.sav".REGRESSION /STATISTICS COEFF R /DEPENDENT sales /METHOD=ENTER curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width .

begin program.import spss, pls

pls.plsproc("sales", """curb_wgt engine_s fuel_cap horsepowlength mpg price resale type wheelbas width""", yhat="predsales")end program.

plsproc defaults to five factors

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

pls Example: REGRESSION vs PLS

Page 27: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

PLS with 5 factors

almost equals

regression with 11

variables

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Results

Page 28: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

User procedures can be written in Python but specified using SPSS traditional syntax

User never writes or sees Python code

Used as if a built-in SPSS command

EXTENSION command defines command to SPSS via simple XML file

Python module called with syntax already checked and processed by SPSS

More general PLS module PLS y1 y2 y3 BY fac1 fac2 WITH z1 z2 z3

/CRITERIA LATENTFACTORS=2.

Dialog box interface tools in SPSS 17 In the meantime, use wxPython or

something similar

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

SPSS 16 User Procedures

Page 29: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

"Raking" adjusts sample weights to control totals in n dimensions

Example: data classified by age and sex with known population totals or proportions

Calculated by fitting a main effects loglinear model Various adjustments required Not a complete solution to reweighting

Not directly available in SPSS

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Raking Sample Weights

Page 30: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Strategy: combine SPSS procedures with Python logic

rake.py (from SPSS Developer Central) Aggregates data via AGGREGATE to new dataset Creates new variable with control totals Applies GENLOG, saving predicted counts Adjusts predicted counts Matches back into original dataset

Does not use MATCH FILES or require a SORT command Written in one (long) day

rake.rake("age sex", [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}], finalweight="finalwt")

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Raking Module

Page 31: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

SPSS 14 programmability can wrap SPSS syntax in

Python logic, e.g., generate COMPUTE commands

on the fly Useful when definitions can be expressed in SPSS syntax

SPSS 15 programmability can Generate new variables directly Add new cases directly Create new datasets from scratch

SPSS 16 has additional dataset capabilities

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Extending SPSS Transformations

Page 32: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

trans module facilitates plugging in Python code to iterate over cases

Runs as an SPSS procedure Passes the data Adds variables to the SPSS variable dictionary Can apply any calculation casewise

Use with Standard Python functions (e.g., math module) Any user-written functions or appropriate classes Functions in extendedTransforms module

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

trans and extendedTransforms Modules

Page 33: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

trans strategy Pass case data through Python code writing

result back to SPSS in new variables

extendedTransforms collection of 12 functions to

apply to SPSS variables, including Regular expression search/replace soundex and nysiis functions for phonetic equivalence Date/time conversions based on patterns

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

trans and extendedTransforms Modules

Page 34: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Pattern matching in text strings

If you use SPSS index or replace, you need these

Standardize string data (Mr, Mr., Herr, Senor,...)

Extract data from loosely structured text "simvastatin-- PO 80mg TAB" -> "simvastatin", "80"

Patterns can be simple strings (as with SPSS index) or complex patterns

Pick out variable names with common parts

Can greatly simplify code

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Python Regular Expressions

Page 35: Programmability in SPSS 14, SPSS 15 and SPSS 16 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Cop

yrig

ht (c) S

PS

S In

c, 20

07

Write to Me!