dexter the missouri census data center’s data extraction utility data extraction utility john...

106
Dexter Dexter The Missouri Census The Missouri Census Data Center’s Data Center’s Data Extraction Data Extraction Utility Utility John Blodgett: OSEDA, University of Missouri Rev.14May2007, jgb

Upload: lucinda-grant

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

DexterDexterThe Missouri Census The Missouri Census

Data Center’s Data Center’s

Data Extraction Data Extraction UtilityUtility

John Blodgett: OSEDA, University of Missouri

Rev.14May2007, jgb

Page 2: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

What Is Dexter?What Is Dexter?

A web utility for performing simple data A web utility for performing simple data queries, or queries, or extractsextracts. .

An integral part of the MCDC’s Uexplore-An integral part of the MCDC’s Uexplore-based data exploration/access system.based data exploration/access system.

Written in SAS© to access data stored in Written in SAS© to access data stored in SAS datasets but requires no knowledge SAS datasets but requires no knowledge of nor access to SAS.of nor access to SAS.

Page 3: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Who Uses Dexter?Who Uses Dexter?

Anyone interested in accessing the MCDC data Anyone interested in accessing the MCDC data archive, especially anyone who wants to directly archive, especially anyone who wants to directly access and manipulate the data.access and manipulate the data.

Not (directly) intended for the Not (directly) intended for the very very casual data casual data user. Has a small but non-trivial learning curve. user. Has a small but non-trivial learning curve.

Understanding the mechanics of Dexter is easy Understanding the mechanics of Dexter is easy compared to understanding the data to be compared to understanding the data to be extracted. extracted.

Page 4: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Dexter’s Role Within Uexplore Dexter’s Role Within Uexplore

Dexter accepts parameters that identify a Dexter accepts parameters that identify a database file/table from which data are to be database file/table from which data are to be extracted.extracted.

Uexplore provides the navigation tools to help Uexplore provides the navigation tools to help locate and understand the content of datasets. locate and understand the content of datasets.

Uexplore hyperlinks actually invoke Uexplore hyperlinks actually invoke uex2dex, uex2dex, the dexter preprocessorthe dexter preprocessor, , which in turn invokes which in turn invokes dexterdexter..

Page 5: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Uexplore Page With HyperlinksUexplore Page With Hyperlinks

Page 6: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

The URL Used to Invoke DexterThe URL Used to Invoke Dexter

On the previous screen the dataset name On the previous screen the dataset name ((usccflows.sas7bdatusccflows.sas7bdat) is a hyperlink. The URL ) is a hyperlink. The URL associated with it is:associated with it is:

http://mcdc2.missouri.edu/cgi-bin/broker?http://mcdc2.missouri.edu/cgi-bin/broker?_PROGRAM=websas.uex2dex.sas&_SERVIC_PROGRAM=websas.uex2dex.sas&_SERVICE=appdev9&path=/pub/data/E=appdev9&path=/pub/data/mig2000&dset=usccflows&view=0mig2000&dset=usccflows&view=0

It calls a program named It calls a program named uex2dexuex2dex, written in SAS, and , written in SAS, and passes parms to ID the data table to be queried.passes parms to ID the data table to be queried.

Page 7: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Dexter and Census DataDexter and Census Data

Dexter doesn’t really know much about the Dexter doesn’t really know much about the datasets from which it extracts data. datasets from which it extracts data.

It is It is notnot American FactFinder . It is just a generic . It is just a generic extraction tool. extraction tool.

It uses only very basic metadata tools. It uses only very basic metadata tools.

Other tools must be used to assist users in Other tools must be used to assist users in navigating the database.navigating the database.

Page 8: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Dexter and the MCDC Data Dexter and the MCDC Data Archive Archive

Technically, there is nothing inherent in Dexter Technically, there is nothing inherent in Dexter that ties it to this archive. that ties it to this archive.

In practice, however, the collection of public data In practice, however, the collection of public data files that we call the “MCDC Data Archive” is files that we call the “MCDC Data Archive” is what Dexter was created for.what Dexter was created for.

It is very probable the only reason you’re It is very probable the only reason you’re reading this is because you want to access reading this is because you want to access something in that archive.something in that archive.

Page 9: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

How Do You Invoke Dexter?How Do You Invoke Dexter?Most people will start at the uexplore home Most people will start at the uexplore home page - page - http://mcdc.missouri.edu/applications/uexplore.shtml http://mcdc.missouri.edu/applications/uexplore.shtml

You navigate the data collection by choosing “You navigate the data collection by choosing “filetypefiletype” ” directories and at some point (…yada yada yada) you directories and at some point (…yada yada yada) you wind up selecting (clicking on) a file that is a data wind up selecting (clicking on) a file that is a data table.table.

Clicking on the data table invokes the Clicking on the data table invokes the uex2dexuex2dex preprocessor. You fill out the form which preprocessor. You fill out the form which uex2dexuex2dex generates and click on an “generates and click on an “Extract DataExtract Data” button to ” button to actually invoke Dexter.actually invoke Dexter.

Page 10: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Accessing Uexplore Accessing Uexplore (Home Page)(Home Page)

From the MCDC home page (or any page with the navy From the MCDC home page (or any page with the navy blue navigation bar) click on “MCDC Data Archive”. blue navigation bar) click on “MCDC Data Archive”.

Or enter the URL: Or enter the URL: http://mcdc2.missouri.edu/applications/uexplore.shtmlhttp://mcdc2.missouri.edu/applications/uexplore.shtml

Page 11: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Choose Major Category Choose Major Category (from the links in teal box)(from the links in teal box)

Page 12: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Scroll Within the Filetype Scroll Within the Filetype Descriptions to Find the TypeDescriptions to Find the Type

(mig2000)(mig2000)

Page 13: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Click on the Filetype NameClick on the Filetype Name(links to uexplore for that directory/filetype)(links to uexplore for that directory/filetype)

In this case we want to click on the In this case we want to click on the mig2000mig2000 filetypefiletype. The text tells us what kind of data we . The text tells us what kind of data we can expect to find in this directory. can expect to find in this directory.

Page 14: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Uexplore Page - mig2000 FiletypeUexplore Page - mig2000 Filetype

This page is all about hyperlinks (all the blue text). Before This page is all about hyperlinks (all the blue text). Before proceeding to the the Dexter-invocation links we want to proceeding to the the Dexter-invocation links we want to back up and look at the data archive structureback up and look at the data archive structure. .

Page 15: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

(Back to)

The The Uexplore/Dexter Home PageUexplore/Dexter Home Page

Page 16: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

The Archive DirectoryThe Archive Directory((on the Uexplore/Dexter home pageon the Uexplore/Dexter home page))

The The tealteal box contains links to 8 major data box contains links to 8 major data categories (2000 Census thru Compendia)categories (2000 Census thru Compendia)

The rest of the page consists mostly of The rest of the page consists mostly of descriptions of and hyperlinks to the descriptions of and hyperlinks to the archive’s data categories (which we refer archive’s data categories (which we refer to as to as filetypesfiletypes.) .)

Filetypes within the major categories are Filetypes within the major categories are sorted in descending order of what we sorted in descending order of what we think will be their popularity. think will be their popularity.

Sf32000xSf32000x is our most popular filetype. is our most popular filetype.

Page 17: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

What’s In the Archive?What’s In the Archive?

Very important question. But not the focus Very important question. But not the focus of this tutorial. Some day we’ll do a of this tutorial. Some day we’ll do a separate, long tutorial just on that topic. separate, long tutorial just on that topic.

Not all filetypes are created equal. We Not all filetypes are created equal. We spend 90% of our resources on maybe spend 90% of our resources on maybe 10% of our data directories.10% of our data directories.

Filetypes that are Filetypes that are in boldin bold are the MCDC are the MCDC “house specialties”. “house specialties”.

Page 18: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

The Data Archive – General InfoThe Data Archive – General Info

We keep the data table files (the things Dexter We keep the data table files (the things Dexter accesses) in the same directories along with other accesses) in the same directories along with other related files (metadata, spreadsheets, csv files, related files (metadata, spreadsheets, csv files, Readme.html files, etc.)Readme.html files, etc.)

Each filetype directory has a special Each filetype directory has a special ToolsTools subdirectory subdirectory where we keep program code and other tool modules where we keep program code and other tool modules related to the data.related to the data.

Subdirectories & Files starting with uppercase letters are Subdirectories & Files starting with uppercase letters are listed first and are usually worth looking at. listed first and are usually worth looking at.

Dexter-accessible table files (“SAS datasets”) have Dexter-accessible table files (“SAS datasets”) have extensions of extensions of sas7bdatsas7bdat or or sas7bvewsas7bvew..

Page 19: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

ExerciseExercise

The Bureau of Economic Analysis The Bureau of Economic Analysis disseminates its REIS data with key disseminates its REIS data with key economic indictors for US geography economic indictors for US geography down to the county level. down to the county level.

Locate the filetype corresponding to this Locate the filetype corresponding to this data collection and navigate to the data collection and navigate to the directory page. directory page.

What’s the major category? What’s the major category?

Page 20: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Uexplore Data Directory PageUexplore Data Directory Page

What you see when you click on the beareis link on the Uexplore home page. It displays a list of files within the directory. The “File” column entries are hyperlinks. With a few exceptions the files are displayed in alphabetical order.

Datasets.html is a special file providing enhanced navigation of the data files in this dir. It displays just the data-table files, but in a more logical order and with additional metadata.

Page 21: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Datasets.html pageDatasets.html page

Page 22: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Datasets.html Columns Datasets.html Columns

The The NameName column is also a link to column is also a link to uex2dex / dexter.LabelLabel is a short description of the dataset. is a short description of the dataset.

#Rows#Rows (# of observations) and (# of observations) and #Cols#Cols (# of (# of columns/variables) are taken from the columns/variables) are taken from the datasets datasets metadata set. As are the metadata set. As are the Geographic UniverseGeographic Universe and and UnitsUnits. .

Link to DetailsLink to Details is the most important column. is the most important column.

Page 23: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Universe and UnitsUniverse and UnitsThe majority of datasets in the archive contain The majority of datasets in the archive contain summary data for geographic areas. For summary data for geographic areas. For example, a dataset in the example, a dataset in the popestspopests directory directory might contain the latest estimates for all counties might contain the latest estimates for all counties in the state of Missouri. The geographic in the state of Missouri. The geographic universe is Missouri, and the units are counties. universe is Missouri, and the units are counties.

When we have many datasets in a directory it’s When we have many datasets in a directory it’s usually because we have many different usually because we have many different combinations of universe and units. combinations of universe and units.

Page 24: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Common UniversesCommon Universes

Missouri (the state of) is by far the most common Missouri (the state of) is by far the most common universe for the MCDC archive.universe for the MCDC archive.

United States is second – we have quite a United States is second – we have quite a number of national datasets. number of national datasets.

Illinois and Kansas are also very common since Illinois and Kansas are also very common since we routinely download and convert census files we routinely download and convert census files for these key neighbor states. for these key neighbor states.

A common sort order for files on Datasets.html A common sort order for files on Datasets.html pages is Missouri files first, then US, then IL/KS pages is Missouri files first, then US, then IL/KS and then other states. and then other states.

Page 25: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Rows & ColumnsRows & Columns

The rows of the data tables are typically The rows of the data tables are typically geographic entities: a state, a county, a city, etcgeographic entities: a state, a county, a city, etc

Most of the columns in the data tables are Most of the columns in the data tables are summary stats for the entity: e.g. the 2000 pop summary stats for the entity: e.g. the 2000 pop count, the latest estimated pop, the change and count, the latest estimated pop, the change and percent change, etc.percent change, etc.

Other columns (“variables”) are Other columns (“variables”) are identifiersidentifiers with with names such as names such as sumlevsumlev, , geocodegeocode and and areaname . areaname .

Page 26: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Numeric vs. Character Numeric vs. Character VariablesVariables

SAS© stores data as character strings or as SAS© stores data as character strings or as numerics. numerics.

We store all identifiers (geographic codes, etc) We store all identifiers (geographic codes, etc) as character strings even if they are made up of as character strings even if they are made up of numeric digits. numeric digits.

So the value of the state code for CT is “09”, not So the value of the state code for CT is “09”, not 9. The leading “0” matters. 9. The leading “0” matters.

Unfortunately, Excel ignores the distinction when Unfortunately, Excel ignores the distinction when importing csv files. importing csv files.

Page 27: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Dataset Naming ConventionsDataset Naming ConventionsAll filetype names are 8 characters or less.All filetype names are 8 characters or less.

Dataset names were limited to 8 characters by the Dataset names were limited to 8 characters by the software until recently.software until recently.

The first characters of the dataset name often The first characters of the dataset name often correspond to the universe – e.g. “mo”, “il”, “us”. correspond to the universe – e.g. “mo”, “il”, “us”.

The geo units are often part of the ds-name – e.g. The geo units are often part of the ds-name – e.g. “motracts”, “uszips”. “motracts”, “uszips”.

For time series data the name usually ends with a For time series data the name usually ends with a time indicator – e.g. “uscom03” contains data thru time indicator – e.g. “uscom03” contains data thru 20200303..

Page 28: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Variable Naming ConventionsVariable Naming Conventions

Not as rigorously applied as we might like, esp. Not as rigorously applied as we might like, esp. for older datasets (conventions used for 1980 for older datasets (conventions used for 1980 datasets differ a little from 2K and 1990 sets, for datasets differ a little from 2K and 1990 sets, for example)example)

Certain names appear on many datasets and Certain names appear on many datasets and are consistent. These are mostly identifier are consistent. These are mostly identifier variables, the ones used in creating filters and variables, the ones used in creating filters and for merging data from different files. for merging data from different files.

Page 29: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Consistency With Census Bureau Consistency With Census Bureau Data Dictionary NamesData Dictionary Names

The Bureau often distributes data dictionary files The Bureau often distributes data dictionary files with their data that include suggested names for with their data that include suggested names for the fields.the fields.

Their name for the field (variable) containing the Their name for the field (variable) containing the name of the geographic area being summarized name of the geographic area being summarized is is ANPSADPIANPSADPI. We decided to go with . We decided to go with AreaNameAreaName instead. instead.

But in most cases we try to use the same name But in most cases we try to use the same name as in the data dictionary. as in the data dictionary.

Page 30: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Common ID VariablesCommon ID Variables

SumLev: Geographic summary level SumLev: Geographic summary level codes as used in 2K census. (3-char)codes as used in 2K census. (3-char)

State: 2-char state FIPS code.State: 2-char state FIPS code.

County: 5-char county FIPS code, incl. the County: 5-char county FIPS code, incl. the state.state.

Geocode: A composite code to id a Geocode: A composite code to id a geographic area. E.g. the value for a geographic area. E.g. the value for a census tract might be “29019-0010.00”. census tract might be “29019-0010.00”.

AreaName: Name of the area.AreaName: Name of the area.

Page 31: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Common ID Variables (cont)Common ID Variables (cont)

Tract: census tract in tttt.ss format, always Tract: census tract in tttt.ss format, always 7 characters with leading 0s and 00 7 characters with leading 0s and 00 suffixes. E.g. “0012.00” .suffixes. E.g. “0012.00” .

Esriid: Similar to geocode but intended to Esriid: Similar to geocode but intended to use as a key for linking to shape files from use as a key for linking to shape files from ESRI (the ArcInfo people). When ESRI (the ArcInfo people). When geocode=“geocode=“29019-0010.0029019-0010.00” the value of ” the value of esriid=“esriid=“2901900100029019001000”. ”.

Page 32: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

SAS FormatsSAS FormatsSome variables have custom Some variables have custom formatsformats associated with them, which cause them to associated with them, which cause them to display a name instead of their actual value. display a name instead of their actual value.

E.g. the variable E.g. the variable CountyCounty may have a value of may have a value of ““290129019” but displays as “9” but displays as “Boone MOBoone MO” using the ” using the format. Most Dexter output has the formatted format. Most Dexter output has the formatted values. values. Click the “Click the “View qmeta Metadata reportView qmeta Metadata report” option at ” option at the end of Section II on the Dexter form to see the end of Section II on the Dexter form to see which variables have formats associated. which variables have formats associated.

Page 33: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

More About the MCDC Data More About the MCDC Data ArchiveArchive

http://mcdc2.missouri.edu/http://mcdc2.missouri.edu/tutorials/mcdc_data_archive.ppttutorials/mcdc_data_archive.ppt

Page 34: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Details PageDetails Page

We get here by We get here by clicking on the clicking on the Details Details link on link on Datasets.htmlDatasets.html page.page.Lots of content Lots of content here – but will here – but will vary. vary. Key variablesKey variables is often is often extremely extremely useful when useful when doing doing filtersfilters..Note the direct Note the direct link to Dexter link to Dexter under under Access Access the dataset the dataset near near the bottom.the bottom.

Page 35: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Increase Text Size to Read Fine PrintIncrease Text Size to Read Fine Print

Page 36: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Exercise – Navigate to DatasetExercise – Navigate to Dataset

Earlier we were looking at datasets in the 2000 Earlier we were looking at datasets in the 2000 Census category, Census category, filetypefiletype mig2000.mig2000.Go to the Uexplore home page and navigate to Go to the Uexplore home page and navigate to this filetype. this filetype. Use the Datasets.html page to display the Use the Datasets.html page to display the datasets within the directory. datasets within the directory. Find the row for the Find the row for the usccflowsusccflows data table and data table and click on the click on the DetailsDetails link for this table. link for this table.From the Details page click on the keyvals link From the Details page click on the keyvals link for the variable State. for the variable State.

Page 37: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Key Variables Report: Key Variables Report: StateState

Tells you that Tells you that the variable the variable StateState has a has a value of value of 0101 (for (for ““AlabamaAlabama”)”) in in 2213722137 rows of rows of this dataset.this dataset.This can be This can be very helpful very helpful when doing a when doing a data filterdata filter in in Dexter.Dexter.

Page 38: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Finally…Finally…Time to See DexterTime to See Dexter

Page 39: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Dexter Dexter Input Page Input Page (Top)(Top)Sec. I. Output Sec. I. Output Format(s): csv Format(s): csv file (into Excel) file (into Excel) most common.most common.

Sec. II is where Sec. II is where the work is. Only the work is. Only 2 of 5 rows 2 of 5 rows shown here.shown here.

User fills out the User fills out the entire form before entire form before using using Extract DataExtract Data button to invoke button to invoke Dexter. Dexter.

Page 40: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Dexter Section IIDexter Section II

Page 41: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

FiltersFilters “ “A filter is a logical condition that references values of columns A filter is a logical condition that references values of columns

within a row. For each row, the condition is evaluated and, if within a row. For each row, the condition is evaluated and, if true, the row is selected for output. (If not, the row is omitted, true, the row is selected for output. (If not, the row is omitted, or "filtered".) To keep or "filtered".) To keep all rowsall rows, just skip this section. The , just skip this section. The filter being created here can consist of up to 5 logical filter being created here can consist of up to 5 logical segments, each referencing a data set segments, each referencing a data set VariableVariable, a relational , a relational OperatorOperator, and a data , and a data ValueValue (or values)   --   constants that (or values)   --   constants that the user must type in. The segments are evaluated as true or the user must type in. The segments are evaluated as true or false. Logical operators (which default to false. Logical operators (which default to AndAnd and appear and appear between the segment specification rows) relate the segments between the segment specification rows) relate the segments when more than one is specified, creating a compound when more than one is specified, creating a compound logical condition.”logical condition.”

If this explanation makes sense to you then you are going to If this explanation makes sense to you then you are going to have an easy time with Dexter. If not, follow through the have an easy time with Dexter. If not, follow through the examples and then try reading it again.examples and then try reading it again.

Page 42: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Example of Defining a Filter: Example of Defining a Filter: What We WantWhat We Want

Assuming we are running dexter to access the Assuming we are running dexter to access the mig2000.usccflowsmig2000.usccflows dataset we want to select dataset we want to select only those rows that:only those rows that:– have Missouri as the anchor statehave Missouri as the anchor state, , andand – have have at least 100 gross flowsat least 100 gross flows. .

We’ll just assume you’ve read the descriptions We’ll just assume you’ve read the descriptions and have some clue regarding what an anchor and have some clue regarding what an anchor state and a gross flow are. (People interested in state and a gross flow are. (People interested in population migration would be likely to know this.)population migration would be likely to know this.)

Page 43: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Select Variable for FilterSelect Variable for Filter

Click on the Variable/Column Click on the Variable/Column drop-down drop-down menumenu in the 1 in the 1stst row and select State. row and select State.

Page 44: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Select Comparison OperatorSelect Comparison Operator

Select “Equal to” as the Operator from drop down menu in the middle Select “Equal to” as the Operator from drop down menu in the middle column.column.

Page 45: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Enter Value to Complete RowEnter Value to Complete Row

Remember the Remember the Key ValuesKey Values report showing all the values for the variable report showing all the values for the variable State? If you did not know the code for Missouri you could find it there. State? If you did not know the code for Missouri you could find it there.

Page 46: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

What We Have So FarWhat We Have So FarWe have created a We have created a logical conditionlogical condition that can that can be evaluated for each row of the dataset:be evaluated for each row of the dataset: State = ’29’State = ’29’

According the According the key values report for State key values report for State we we know that this condition is true for 38,316 know that this condition is true for 38,316 rows in the dataset. The rows in the dataset. The filterfilter we are we are building will select just those 38,316 rows building will select just those 38,316 rows out of the 1.1+ million in the full dataset. out of the 1.1+ million in the full dataset.

Page 47: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Adding a Second ConditionAdding a Second Condition

But we do not want But we do not want allall the cases pertaining to the cases pertaining to Missouri as the anchor state. We only want Missouri as the anchor state. We only want those where we have at least 100 gross flows those where we have at least 100 gross flows (whatever those are).(whatever those are).

So we need to fill out a second row, adding So we need to fill out a second row, adding this condition. We select this condition. We select GrossMigGrossMig as the as the variable, variable, Greater Than or Equal ToGreater Than or Equal To as the as the Operator and enter Operator and enter 100100 in the Value field. in the Value field.

We leave the We leave the logical operatorlogical operator radio button set radio button set to “to “AndAnd” to indicate that this is an additional ” to indicate that this is an additional necessarynecessary condition. condition.

Page 48: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

The Completed FilterThe Completed Filter

You are now ready to scroll down to Section III.

Page 49: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Section III: Choose VariablesSection III: Choose Variables

Conceptually simple section: just Conceptually simple section: just select the variables you want on select the variables you want on your output from scrollable (if your output from scrollable (if needed) menu lists.needed) menu lists.IdentifiersIdentifiers (character type (character type variables) are listed separate variables) are listed separate from numerics. Important from numerics. Important MCDC Data Archive convention.MCDC Data Archive convention.Typing names instead of Typing names instead of selecting is possible but not selecting is possible but not recommended. recommended. Here we select all variables Here we select all variables except State.except State.Note the Note the Extract DataExtract Data button. button.

Page 50: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Section IV: Title & Sort OrderSection IV: Title & Sort Order

Entirely optional, typically Entirely optional, typically not used section.not used section.Title value is used as Title value is used as report title if you asked for report title if you asked for one, which we did not in one, which we did not in this example.this example.Sort specs are handy. Sort specs are handy. Note use of minus sign Note use of minus sign (hyphen) to indicate a (hyphen) to indicate a descending sort.descending sort.Another Another Extract DataExtract Data button to use to run query.button to use to run query.

Page 51: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Dexter Output PageDexter Output Page

The first output you The first output you see is this results see is this results “index” page.“index” page.

Always a link to a Always a link to a Summary LogSummary Log page page

Additional links Additional links depend on output depend on output formats requested.formats requested.

Page 52: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Dexter Summary LogDexter Summary Log

This file always This file always generated. Important generated. Important for documenting the for documenting the query. query.

Indicates Indicates whatwhat file(s), file(s), whenwhen run, as well as run, as well as any any filterfilter and the and the variablesvariables kept. kept.

Output directory Output directory detailsdetails can usually be can usually be ignored.ignored.

Page 53: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Select Output File(s)Select Output File(s)Click on Delimited File LinkClick on Delimited File Link

What happens when What happens when you click on this file you click on this file depends on how your depends on how your browser is configured. browser is configured. The file referenced The file referenced has a has a .csv.csv extension extension which IE usually which IE usually associates with the associates with the ExcelExcel plugin. plugin.Clicking this link will Clicking this link will typically invoke Excel.typically invoke Excel.

Page 54: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Viewing .csv Output in ExcelViewing .csv Output in Excel

The csv file is The csv file is read into Excel.read into Excel.

Rows 1 & 2 Rows 1 & 2 have names & have names & labels.labels.

Other rows Other rows contain the contain the selected data.selected data.

Note sort order.Note sort order.

Page 55: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Some Key Points So FarSome Key Points So FarNavigation tools such as the uexplore home Navigation tools such as the uexplore home page, the uexplore directory navigator and page, the uexplore directory navigator and Datasets.html reference pages are used to Datasets.html reference pages are used to make accessing data with Dexter easier.make accessing data with Dexter easier.You get to select rows (“You get to select rows (“filter”filter”) and columns ) and columns as well as the format(s) of your extracted as well as the format(s) of your extracted data. data. Filtering often requires knowledge of code Filtering often requires knowledge of code values. These can sometimes be accessed values. These can sometimes be accessed from the from the Key ValuesKey Values reports on the Details reports on the Details page referenced by a Datasets.html page.page referenced by a Datasets.html page.The query generated is summarized on a The query generated is summarized on a Summary.log page. Summary.log page.

Page 56: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Pop QuizPop Quiz1.1. Can Dexter be used to access an xls file?Can Dexter be used to access an xls file?2.2. How are the files sorted on a directory page How are the files sorted on a directory page

displayed by displayed by uexploreuexplore??3.3. What does the What does the uex2dexuex2dex interface app do? interface app do?4.4. What is the fastest way to tell how many rows What is the fastest way to tell how many rows

were selected by your query?were selected by your query?5.5. Which of the 5 sections of the Dexter query Which of the 5 sections of the Dexter query

form form mustmust be filled out to have a valid be filled out to have a valid request?request?

6.6. What’s a What’s a filetypefiletype? What does it mean when ? What does it mean when one is displayed in bold on the Uexplore home one is displayed in bold on the Uexplore home (Archive Directory) page? (Archive Directory) page?

Page 57: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Sample Query 2: Sample Query 2: What We WantWhat We Want

We want data from the 2000 Census, Summary File 3 We want data from the 2000 Census, Summary File 3 regarding poverty in Missouri – in cities and counties. regarding poverty in Missouri – in cities and counties.

We want the number and the % of poor persons, as We want the number and the % of poor persons, as well as the median household income. well as the median household income.

We only want the data for cities of at least 5000 We only want the data for cities of at least 5000 persons, but for persons, but for allall counties and for the state as a counties and for the state as a whole. whole.

We want output as an HTML file sorted by the type of We want output as an HTML file sorted by the type of geography (state, county, city) and then by descending geography (state, county, city) and then by descending poverty rate. poverty rate.

Page 58: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

What You Need to KnowWhat You Need to KnowYou need to know where these kinds of data You need to know where these kinds of data are stored. It is 2000 census data, but where are stored. It is 2000 census data, but where among all those different summary files? among all those different summary files?

Read the brief descriptions on the uexplore Read the brief descriptions on the uexplore home page. The home page. The sf32000sf32000 filetypefiletype looks good, looks good, but it turns out that it is too big. The standard but it turns out that it is too big. The standard extract version, extract version, sf32000x,sf32000x, has what we need. has what we need.

An alternate way by which users may arrive An alternate way by which users may arrive here is via links on the here is via links on the MCDC Demographic Profile MCDC Demographic Profile reports. reports.

Page 59: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

A Demographic Profile ReportA Demographic Profile Report

A link at the bottom of this report page invokes Dexter with the appropriate A link at the bottom of this report page invokes Dexter with the appropriate dataset selected. Follow the link (in title of this page) and try it.dataset selected. Follow the link (in title of this page) and try it.

Page 60: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

(Back on uexplore home page)(Back on uexplore home page)

Click on Click on sf32000xsf32000x to Start to Start

Descriptions with links from the Descriptions with links from the uexplore home pageuexplore home page..

Page 61: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

The sf32000x Directory The sf32000x Directory ((As seen by uexplore)As seen by uexplore)

Subdirectories & files Subdirectories & files with upcased first with upcased first letters are shown first.letters are shown first.

Index.html, Index.html, Readme.htmlReadme.html and, of and, of course, course, Datasets.htmlDatasets.html are are required reading required reading (browsing).(browsing).

Files are in Files are in alphabetical (not alphabetical (not logical) order. logical) order.

Page 62: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

(sf32000(sf32000xx) ) Readme.htmlReadme.html

Page 63: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

The The Datasets.html Datasets.html PagePage(for the sf32000x filetype(for the sf32000x filetype))

Page 64: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Details Page Details Page -- -- sf32000x.moisf32000x.moi

Lots of info here. Most important is perhaps the Lots of info here. Most important is perhaps the Key variablesKey variables link for variable link for variable SumLevSumLev (geographic summary level). (geographic summary level).

Page 65: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Key Variables Report for SumLev Key Variables Report for SumLev (stf32000x.moi)(stf32000x.moi)

Page 66: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Filters Based on SumLevFilters Based on SumLev

VarVar OperatorOperator ValueValue ResultsResultsSumLevSumLev EqualsEquals 040040 State Level State Level

Summary (only 1 Summary (only 1 row selected)row selected)

SumLevSumLev EqualsEquals 140140 Census Tract Census Tract Summaries – 1320 Summaries – 1320 rows selected.rows selected.

SumLevSumLev In ListIn List 040040::050050::160160 1 State level , 115 1 State level , 115 County level & 972 County level & 972 Place level rows.Place level rows.

Page 67: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Sample Query 2: What We WantSample Query 2: What We Want(Repeated in case you forgot)(Repeated in case you forgot)

We want data from the 2000 Census, Summary File We want data from the 2000 Census, Summary File 3 regarding poverty in Missouri cities and counties. 3 regarding poverty in Missouri cities and counties.

We want the number and the % of poor persons, as We want the number and the % of poor persons, as well as the median household income. well as the median household income.

We only want the data for cities of at least 5000 We only want the data for cities of at least 5000 persons, but for all counties persons, but for all counties and for the stateand for the state. .

We want output as an HTML file sorted by the type We want output as an HTML file sorted by the type of geography (state, county, city) and then by of geography (state, county, city) and then by descending poverty rate. descending poverty rate.

Page 68: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

A Complex FilterA Complex Filter

Page 69: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

The Filter ExplainedThe Filter Explained

There are 2 logical parts to the filter:There are 2 logical parts to the filter:1.1. SumLev In (‘040’,’050’) SumLev In (‘040’,’050’) 2.2. Sumlev = ‘160’ Sumlev = ‘160’ andand TotPop >= 5000 TotPop >= 5000

The parentheses checkboxes are used The parentheses checkboxes are used to group the 2to group the 2ndnd & 3 & 3rdrd lines. The lines. The andand between lines 2 and 3 is executed before between lines 2 and 3 is executed before the the oror between lines 1 and 2. between lines 1 and 2.

Page 70: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

The Filter Explained, cont.The Filter Explained, cont.

The SAS© code generatd by these menu The SAS© code generatd by these menu choices :choices :

where sumlev in (‘040’,’050’) or where sumlev in (‘040’,’050’) or (sumlev=‘160’ and totpop >=5000);(sumlev=‘160’ and totpop >=5000);

The “in” operator (called “In List” on The “in” operator (called “In List” on OperatorOperator pull-down menu) allows specifying that the value pull-down menu) allows specifying that the value of a variable should be one of a list of values. of a variable should be one of a list of values. Those values are entered separated by Those values are entered separated by ::’s in ’s in the the ValueValue column of the filter specs form. column of the filter specs form.

Page 71: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Completing the Query: Parts 3 & 4Completing the Query: Parts 3 & 4

Page 72: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

HTML OutputHTML Output

We see that We see that Pemiscot has Pemiscot has the highest the highest poverty rate of poverty rate of any county. How any county. How do we know do we know this?this?

Why don’t we Why don’t we see any data for see any data for cities? cities?

Page 73: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

ExerciseExerciseAccess the same dataset as in the example: Access the same dataset as in the example: sf32000x.moisf32000x.moi

Select census tract summaries in Greene co…Select census tract summaries in Greene co…

… … with a poverty rate of at least 10%.with a poverty rate of at least 10%.

Keep all identifiers necessary to identify the tract, Keep all identifiers necessary to identify the tract, and all variables related to poverty.and all variables related to poverty.

Generate a csv file and load it into a spreadsheet Generate a csv file and load it into a spreadsheet (probably Excel).(probably Excel).

Page 74: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Exercise 2Exercise 2Repeat the previous exercise except do it Repeat the previous exercise except do it for all counties (instead of census tracts) in for all counties (instead of census tracts) in the states of Arkansas and Oklahoma. the states of Arkansas and Oklahoma.

Sort the results by descending poverty rate Sort the results by descending poverty rate and generate output in pdf format as well and generate output in pdf format as well as a csv file. as a csv file.

Hint: A good place is start is with the Hint: A good place is start is with the Datasets.html page. Datasets.html page.

Page 75: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Begin Summary File Begin Summary File Processing SectionProcessing Section

Advanced section that can be Advanced section that can be skipped by many users. skipped by many users. But note But note that AFF can be used instead to that AFF can be used instead to

access most such data.access most such data.

Page 76: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Accessing Summary (Tape) FilesAccessing Summary (Tape) FilesThe Census Bureau creates very large table-The Census Bureau creates very large table-based summary files. For each census since based summary files. For each census since 1970. 1970. The MCDC has a good collection of such files The MCDC has a good collection of such files for ’80, a few for ’90 and many for 2k.for ’80, a few for ’90 and many for 2k.Filetype names begin “stf” or “sf” (the “t” was Filetype names begin “stf” or “sf” (the “t” was dropped in 2000.)dropped in 2000.)E.g. E.g. stf803stf803 for 1980 Summary Tape File 3, for 1980 Summary Tape File 3, sf12000sf12000 for 2000 Summary File 1. for 2000 Summary File 1. Follow links off Census section of Follow links off Census section of uexplore home pageuexplore home page..

Page 77: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Getting Started with S(T)FsGetting Started with S(T)Fs

If you are new to using Census data and/or summary If you are new to using Census data and/or summary files we highly recommend that you use the American files we highly recommend that you use the American FactFinder application to become familiar with these FactFinder application to become familiar with these files. files.

From the AFF page:From the AFF page:Under “Getting Detailed Data” follow the links to “About the Data” Under “Getting Detailed Data” follow the links to “About the Data”

and then to “Data Sets”and then to “Data Sets”

Experiment/practice locating and extracting tables for geographic Experiment/practice locating and extracting tables for geographic areas of interest. areas of interest.

Use the Census 2000 Summary File 3 (SF3) data set and specify Use the Census 2000 Summary File 3 (SF3) data set and specify you want “Detailed Tables”. you want “Detailed Tables”.

Make use of the “by subject” & “by keyword” tabs to select tables.Make use of the “by subject” & “by keyword” tabs to select tables.

Page 78: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Exercise – Use AFF to AccessExercise – Use AFF to Access 2000 Summary File 3 2000 Summary File 3

With Census 2000-SF3 chosen, use the Select With Census 2000-SF3 chosen, use the Select Geography step to choose the state of Missouri Geography step to choose the state of Missouri and Boone county.and Boone county.

Under Select Tables use “by subject” tab and Under Select Tables use “by subject” tab and search for tables related to poverty.search for tables related to poverty.

Find a table that has data on # persons below Find a table that has data on # persons below 50% of poverty level. 50% of poverty level.

Display the relevant tables for the 2 geographic Display the relevant tables for the 2 geographic areas selected.areas selected.

Page 79: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

When To Use Uexplore/Dexter InsteadWhen To Use Uexplore/Dexter Instead

In most cases, for most users, AFF will be the In most cases, for most users, AFF will be the better, easier-to-use tool for accessing SF’s. better, easier-to-use tool for accessing SF’s.

Uex/Dex is useful for users who know what they Uex/Dex is useful for users who know what they are looking for and may want more control over are looking for and may want more control over filtering or output format.filtering or output format.

The geographic summary unit may not be The geographic summary unit may not be available under AFF (e.g. RPC’s in Mo.)available under AFF (e.g. RPC’s in Mo.)

The SF may not be available under AFF (e.g. The SF may not be available under AFF (e.g. 1980 STF3). 1980 STF3).

Page 80: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Summary FilesSummary Files

Set of 4 SF’s for each decade.Set of 4 SF’s for each decade.

Summary Files 1 & 2 based on short form, 3 Summary Files 1 & 2 based on short form, 3 & 4 based on long form. & 4 based on long form.

Summary Files 1 and 3 most widely used, Summary Files 1 and 3 most widely used, especially 3. especially 3.

Within numbered SF’s there are lettered Within numbered SF’s there are lettered subfiles, e.g. subfiles, e.g. Summary File 3BSummary File 3B or or Summary Summary File 1CFile 1C. These are based on geographic . These are based on geographic coverage. coverage. CC files, for example, are national files, for example, are national files, while files, while AA files are for individual states. files are for individual states.

Page 81: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

MCDC SF DatasetsMCDC SF DatasetsThese are “These are “fat”fat” files with lots of variables. files with lots of variables. Rows correspond to geographic entities.Rows correspond to geographic entities.

Character-type variables ID the entity Character-type variables ID the entity being summarized, numeric variables are being summarized, numeric variables are primarily the tabulated summary items.primarily the tabulated summary items.

Metadata standards vary over time. Metadata standards vary over time. Data dictionaries stored in archive. Data dictionaries stored in archive.

Page 82: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

SF Tables and VariablesSF Tables and VariablesA table consists of multiple cells of data. A table consists of multiple cells of data.

Each cell is named <T#>i<cell#>, whereEach cell is named <T#>i<cell#>, where– <T#> is the table name, usually a letter & <T#> is the table name, usually a letter &

number.number.– i is literally the letter i is literally the letter ii, standing for “item”., standing for “item”.– <cell#> is the sequential cell # within the table<cell#> is the sequential cell # within the table

For example in sf32000 table P5 has 7 For example in sf32000 table P5 has 7 cells. The variables are named p5i1, p5i2,cells. The variables are named p5i1, p5i2,…p5i7.…p5i7.

Page 83: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Table TypesTable Types

In 1980 there were just plain tables, In 1980 there were just plain tables, without special prefixes. We used “without special prefixes. We used “t”t” as as the prefix to name the table cells, e.g. the prefix to name the table cells, e.g. t12i1t12i1 was the name of the first cell in was the name of the first cell in Table 12.Table 12.

In 1990 there were P and H tables.In 1990 there were P and H tables.

In 2000 there are P, H, PCT and HCT In 2000 there are P, H, PCT and HCT tables. (See notes). tables. (See notes).

Page 84: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Required Reading: Tech DocRequired Reading: Tech Doc

Trying to access a Summary File without first Trying to access a Summary File without first looking at the technical doc is like going on a looking at the technical doc is like going on a trip without a map. (Only works if you’ve trip without a map. (Only works if you’ve been there before.) been there before.)

American FactFinder is the best place to go American FactFinder is the best place to go to find out what tables have what data – to find out what tables have what data – if if the the file you want is included in AFF.file you want is included in AFF.

A datadict file in the mcdc data archive or A datadict file in the mcdc data archive or even a paper copy are other options.even a paper copy are other options.

Page 85: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

What Tables, What GeographyWhat Tables, What GeographyWhen accessing a Summary File dataset When accessing a Summary File dataset you should know ahead of time what you should know ahead of time what tablestables you want. (AFF may help). you want. (AFF may help).

You need to know what geographic You need to know what geographic entities are of interest. Many of the SF entities are of interest. Many of the SF datasets will have multiple geographic datasets will have multiple geographic levels (e.g. state, county, place) that you levels (e.g. state, county, place) that you need to specify. need to specify.

A A Summary Level Sequence ChartSummary Level Sequence Chart can can be very helpful. be very helpful.

Page 86: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Access Summary File 3, 2000 CensusAccess Summary File 3, 2000 CensusStart at uexplore home page and click Start at uexplore home page and click on Census/2000.on Census/2000.Click on the sf32000 filetype link. Click on the sf32000 filetype link. Check out the SumLevs.html page. Check out the SumLevs.html page. Check out the Readme.html page.Check out the Readme.html page.On the Readme page look at the On the Readme page look at the Uexplore Access link.Uexplore Access link.This is hardly typical, having this much This is hardly typical, having this much metadata & guidance. We wish it were. metadata & guidance. We wish it were.

Page 87: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Excerpt From uexplore Section of Excerpt From uexplore Section of Readme.htmlReadme.html

Page 88: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Sf32000 Query SpecsSf32000 Query Specs

We want to extract data on the number and We want to extract data on the number and percentage of percentage of minorityminority households at the households at the census tract level for St. Louis City and census tract level for St. Louis City and County. County. Ignore any tracts with fewer than 100 total Ignore any tracts with fewer than 100 total households. households. Want data in an Excel spreadsheet. Want data in an Excel spreadsheet. Hard part is knowing what Hard part is knowing what minorityminority means. means.

NoteNote: St. Louis City (29510) is also a county (equivalent).: St. Louis City (29510) is also a county (equivalent).

Page 89: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Questions for the QueryQuestions for the Query

What dataset? (We assume we know the What dataset? (We assume we know the directory/filetype.)directory/filetype.)

What output format?What output format?

What geographic areas within the dataset What geographic areas within the dataset – how to create the filter.– how to create the filter.

What variables? What variables?

What post-processing in Excel will we What post-processing in Excel will we have to do? have to do?

Page 90: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

The sf32000 Datasets.html The sf32000 Datasets.html pagepage

•Which dataset do we want?

Page 91: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

We Want the moph Dataset We Want the moph Dataset Because…Because…

The universe is Missouri as needed.The universe is Missouri as needed.

It contains the P and H tables (not PCT or It contains the P and H tables (not PCT or HCT).HCT).

It has “All SF3A levels” of geography, It has “All SF3A levels” of geography, including census tract as required. including census tract as required.

But now we need to see the details. But now we need to see the details.

Note the size of the dataset – Note the size of the dataset – 1.3 Gigabytes1.3 Gigabytes! !

Page 92: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

The stf32000.moph Details PageThe stf32000.moph Details Page

Page 93: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

What We Learn from Details What We Learn from Details PagePage

From the From the Key variablesKey variables reports for reports for SumLev and county we know we want SumLev and county we know we want the 140 summary level for counties the 140 summary level for counties 29189 and 29510. 29189 and 29510.

We get links to the data dictionary files We get links to the data dictionary files with variable names & labels.with variable names & labels.

We get a We get a Usage NoteUsage Note explaining the explaining the table-cell variable naming conventions. table-cell variable naming conventions.

A link to the Summary Level Sequence A link to the Summary Level Sequence chart.chart.

Page 94: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Sample of a Summary Level Sample of a Summary Level Sequence Chart (Partial)Sequence Chart (Partial)

Page 95: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Specify the FilterSpecify the Filter

First row selects census tract level summaries.First row selects census tract level summaries.

Second row selects the two counties of Second row selects the two counties of interest.interest.

Page 96: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Choose Columns/TablesChoose Columns/Tables

Page 97: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Selecting TablesSelecting Tables(instead of variables)(instead of variables)

Only for a small number of special Only for a small number of special filetypes. Mostly SF filetypes. filetypes. Mostly SF filetypes. You choose table H10 and the program You choose table H10 and the program translates this into selecting the columns translates this into selecting the columns (variables) named h10i1, h10i2,…h10i17.(variables) named h10i1, h10i2,…h10i17.Note the scrollbar at right side of Tables Note the scrollbar at right side of Tables select list. You may have to scroll select list. You may have to scroll horizontally to see this. horizontally to see this. Feature was added late in 2004. Feature was added late in 2004.

Page 98: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Waiting for ResultsWaiting for Results

We get to see this for We get to see this for about a whole minute. about a whole minute. It takes a while for It takes a while for Dexter to slog thru all Dexter to slog thru all that data. (A good that data. (A good reason to avoid reason to avoid sf32000 datasets sf32000 datasets when sf32000x sets when sf32000x sets will do.)will do.)

Wait for it to finish.Wait for it to finish.

Page 99: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

View Results: Summary LogView Results: Summary Log

A brief summary of what A brief summary of what you asked for and what you asked for and what you got.you got.286 rows (tracts) with 20 286 rows (tracts) with 20 variables (columns). variables (columns). Note the Note the upcaseupcase functions functions in the filter. All character in the filter. All character values entered are values entered are upcased and compared upcased and compared with upcased database with upcased database values. Of course, when values. Of course, when the characters are all the characters are all digits it doesn’t matter.digits it doesn’t matter.

Page 100: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Ready to Access Real OutputReady to Access Real OutputClick on Click on Delimited FileDelimited File to access the generated csv file. to access the generated csv file.

The (temporary) URL for the csv file isThe (temporary) URL for the csv file is (for this example):(for this example): http://mcdc2.missouri.edu/tmpscratch/11JUL05_00021http://mcdc2.missouri.edu/tmpscratch/11JUL05_00021.dexter/xtract.csv.dexter/xtract.csv

This temporary directory and file lives for 2 days. You can This temporary directory and file lives for 2 days. You can copy and paste the URL into an e-mail note and send it to copy and paste the URL into an e-mail note and send it to a colleague or client. Makes it a colleague or client. Makes it easy to share querieseasy to share queries..

Page 101: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Specify Variables by Typing NamesSpecify Variables by Typing Names

Not generally recommended because it is Not generally recommended because it is error-prone but useful for short lists. error-prone but useful for short lists. Useful in cases like these where you have Useful in cases like these where you have to select an entire table but all your really to select an entire table but all your really want are a few cells.want are a few cells.You have to type the ID variables as well You have to type the ID variables as well as the numerics. When dexter detects you as the numerics. When dexter detects you typed something it ignores any selections typed something it ignores any selections from the select lists.from the select lists.

Page 102: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Entering Table Cell VariablesEntering Table Cell Variables

Nothing is selected from Nothing is selected from TablesTables list & would not matter if it were. list & would not matter if it were. You can only do this if you understand the table-cell naming You can only do this if you understand the table-cell naming conventions. Instead of saving all 17 data cells in table H10, the conventions. Instead of saving all 17 data cells in table H10, the program will now save only the 3 specified cells. program will now save only the 3 specified cells. The selection of geocode on Identifiers list is irrelevant. The selection of geocode on Identifiers list is irrelevant.

Page 103: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

Typical Result of Clicking on Typical Result of Clicking on Delimited FileDelimited File

Page 104: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

What Are “Minority” HouseholdsWhat Are “Minority” HouseholdsA household is “minority” if the head of the A household is “minority” if the head of the HH is in a minority category. HH is in a minority category.

Minority for 2000 means you are either:Minority for 2000 means you are either:– Hispanic or Latino, ---or—Hispanic or Latino, ---or—– Not white (including multi-racial even if 1 of Not white (including multi-racial even if 1 of

those races is white).those races is white).

So So h10i1 – h10i3h10i1 – h10i3 is the formula to derive is the formula to derive mnority households. We do not need mnority households. We do not need h10i10 to derive it. h10i10 to derive it.

Page 105: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

End Summary File Access End Summary File Access SectionSection

Page 106: Dexter The Missouri Census Data Center’s Data Extraction Utility Data Extraction Utility John Blodgett: OSEDA, University of MissouriOSEDA Rev.14May2007,

End of ShowEnd of Show

See related tutorials at:See related tutorials at:

http://mcdc2.missouri.edu/tutorials/dexter2.ppthttp://mcdc2.missouri.edu/tutorials/dexter2.ppt

http://mcdc2.missouri.edu/tutorials/mcdc_data_archive.ppthttp://mcdc2.missouri.edu/tutorials/mcdc_data_archive.ppt