food access and food choice: applications for food...

48
1 Food Access and Food Choice: Applications for Food Deserts Final Report Research Innovation and Development Grants in Economics (RIDGE) Grant # 59-5000-0-0014 Gayaneh Kyureghian Research Assistant Professor Department of Food Science and Technology The Food Processing Center University of Nebraska-Lincoln Rodolfo M. Nayga, Jr. Professor and Tyson Endowed Chair Department of Agricultural Economics and Agribusiness University of Arkansas, Adjunct Researcher Norwegian Agricultural Economics Research Institute Azzeddine Azzam Professor Department of Agricultural Economics University of Nebraska-Lincoln Parts of this report have been previously published in manuscripts (Kyureghian and Nayga 2012(a), Kyureghian, Nayga and Bhattacharya 2012, Kyureghian and Nayga 2012(b)) with detailed results of our project. We thank Ms. Suparna Bhattachrya for research assistance.

Upload: trinhdat

Post on 01-May-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

1

Food Access and Food Choice: Applications for Food Deserts

Final Report

Research Innovation and Development Grants in Economics (RIDGE)

Grant # 59-5000-0-0014

Gayaneh Kyureghian

Research Assistant Professor

Department of Food Science and Technology

The Food Processing Center

University of Nebraska-Lincoln

Rodolfo M. Nayga, Jr.

Professor and Tyson Endowed Chair

Department of Agricultural Economics and Agribusiness

University of Arkansas,

Adjunct Researcher

Norwegian Agricultural Economics Research Institute

Azzeddine Azzam

Professor

Department of Agricultural Economics

University of Nebraska-Lincoln

Parts of this report have been previously published in manuscripts (Kyureghian and Nayga 2012(a), Kyureghian, Nayga and

Bhattacharya 2012, Kyureghian and Nayga 2012(b)) with detailed results of our project.

We thank Ms. Suparna Bhattachrya for research assistance.

2

I. Introduction

Poor food choices have been shown to contribute to the rise of major chronic diseases, including

overweight and obesity (Centers of Disease Control and Prevention (CDC)). Consequently, the

Dietary Guidelines for Americans, 2010, emphasizes the need to shift food intake patterns to a

more plant-based diet that emphasizes nutritious food, such as fruits and vegetables. Despite

these efforts, only 42% and less than 60% of Americans meet the recommendations for as fruit

and vegetable consumption, respectively. In academic and policy circles, as well as in the public

eye, the local food environment has been associated with food choices and diet-related health

consequences. Limited food access is considered especially worrisome for underserved,

predominantly low-income areas, which are believed to be disproportionately subject to health

and income disparities (Bitler and Haider 2011). The Food, Conservation, and Energy Act of

2008, refers to “an area in the United States with limited access to affordable and nutritious food,

particularly such an area composed of predominantly lower-income neighborhoods and

communities” (Sec. 7527. Study and Report on Food Deserts, The Food, Conservation, and

Energy Act of 2008, The United States Department of Agriculture, June 18, 2008) as food

deserts. In February 2010, the Obama Administration proposed a $400 million Healthy Food

Financing Initiative (H.R. 3525: Healthy Food Financing Initiative) that would eradicate food

deserts by improving food access. Several states have launched policy efforts to increase access

to healthy food.

The concern in policy circles is that there may be insufficient availability and affordability of

healthy food in these areas that may cause poor dietary choices. The literature findings in various

disciplines of social science, marketing and nutrition, has addressed the issue of food access and

choice from distinct, albeit overlapping angles (Larson, Story and Nelson 2009, Beaulac,

Kristjansson and Cummins 2009, Blanchard and Lyson 2002, Sharkey, Horel and Dean 2010,

Michimi and Wimberly 2010, Staus 2009). The empiric evidence from these disciplines lacks

consensus in whether the food deserts exist and why.

Findings from the studies on food deserts are quite diverse. For instance, Blanchard and

Lyson (2002) found that residents of food deserts (considered non-metropolitan areas where

people travel longer distance) are 23.4% less likely to consume the recommended level of fruits

and vegetables (F&V) compared to those in non-food desert areas. Rose and Richards (2004)

3

examined the effects of limited access to supermarkets on the amount of F&V purchases. They

measured “access” using three variables: distance to store, travel time to store, and car

ownership. They concluded that limited access was negatively associated with the purchases,

although the effect on fruits was not statistically significant. Michimi and Wimberly (2010) also

found that while the odds of meeting the dietary recommendations concerning F&V consumption

decreases as distance to large and medium-size supermarkets increases in metropolitan areas,

they did not find similar association in non-metropolitan areas for any size of supermarket.

Interestingly, in metropolitan areas they did not find significant relationship for large, medium

and small supermarkets combined. Sharkey, Horel and Dean (2010), on the other hand, showed

that underserved or low vehicle neighborhoods actually had better special access to a good

variety of F&V in six Texas rural counties. Pearson et al. (2005) found no evidence of

associations between the distance to the nearest supermarket and the difficulty of grocery

shopping with either fruit or vegetable consumption. Bodor et al. (2007) considered not only

distance to a store and the store concentration ratio, but also in-store food availability and found

that the availability of fresh vegetables in the vicinity was positively related to vegetable intake,

while fruit consumption was not associated with fresh fruit availability.

In a review of literature on disparities in access to healthy food, Larson, Story and Nelson

(2009) reported that although the majority of studies suggest a direct relationship between the

presence of supermarkets and meeting the dietary guidelines for F&V, especially for African

American adults, no such evidence was found for the youth. Likewise, in a systematic review of

food deserts, Beaulac, Kristjansson and Cummins (2009) reported mixed results concerning the

availability and quality of healthy foods in disadvantaged areas. A comprehensive review and

analysis of the empirical literature on food deserts can be found in Bitler and Haider (2011).

The gaps in the literature on whether food deserts exist appear to be related to the

inconclusive evidence on the linkage between accessibility and food choice due to data

limitations and methodological weaknesses. The data requirements to determine food access are

many. One of these limitations is the variety of forms and categories of available foods such as

produce, dry grocery, dairy, etc., in fresh, canned, frozen, juiced or dried form, in different sizes

of packages, etc. Since there is no consensus in the literature concerning a specific food or a food

group indicative of diet quality, data with reasonable coverage of a fairly representative group of

foods is essential. The food desert literature typically focuses on fresh fruits and vegetables,

4

perhaps due to short shelf life (Blanchard and Lyson 2002, Sharkey, Horel and Dean 2010,

Michimi and Wimberly 2010, etc.).

Another data requirement is the adequate coverage of the food retail source, such as

supermarkets, convenience and grocery stores, restaurants and other away from home sources,

farmers markets, pick-yourself farms, etc. The focus of food desert literature has been on

supermarkets as the retail outlets with the adequate assortment of healthy foods and affordable

prices (Report to Congress 2009). This raised issues of non-adequate representation of the retail

access environment, particularly when it concerns food away from home availability (Bitler and

Haider 2011).

In addition to the issues mentioned above, a common shortcoming of the primary data used

in food desert research is the inadequate geographic coverage, typically at county or community

level. While some secondary level data sets ameliorate this problem, they are plagued by issues

raised above nonetheless.

A widely criticized issue is the choice of the measure of food access in the literature. The

distance to the nearest store(s) or the density of stores in the market area are the most common

measures adopted in the literature (Hellerstein, Neumark and McIrerney 2008, Bitler and Haider

2011). The latter raises the issue of the choice of the appropriate geographic area as the relevant

market, such as the census tract, zip code, cluster of zip codes, county, state, etc. (Hellerstein,

Neumark and McIrerney 2008, Bitler and Haider 2011). The concept of the food desert also

hinges upon whether it is an absolute (no food retail outlet in the area of reference) or a relative

(fewer food retail outlets than in other areas) concept. The latter in turn raises the question of

‘adequateness’ or ‘sufficiency’ of food availability. There are several different definitions of

food deserts, such as a distance of 10 miles or more to the nearest grocery store in rural areas,

and 1 mile or more in urban areas, etc. Several other multidimensional definitions (e.g. by

USDA, CDC, etc.) take into account not only the distance, but also the income level, commuting

time, vehicle ownership, etc. in the reference area. The choice of the specific definition depends

on the research question or purpose. For example, while the USDA definition is designed to

capture the linkage between food availability and food choice, the CDC definition is more

concerned by the linkage between food access and health consequences, such as overweight and

obesity rates in the area. The different definitions mentioned above do not always overlap (Liese,

5

Battersby and Bell 2012), thereby creating variation in the evidence due to the specific research

objectives and, therefore, the choice of food desert definition.

Overall, it appears that the focus of much of the previous research is on supply side factors,

creating an implicit underlying assumption that food deserts as a supply-side market failure and

therefore motivating policy intervention to correct such market inefficiencies. But the

contradictory empiric evidence in the previous literature about such complex phenomena as food

deserts highlights the need for a more comprehensive approach. In this research project we

analyze and interpret factors affecting the associations between food access, affordability and

food choices. We consider both supply- and demand-side factors that may give rise or, at the

least, compound the adverse dietary and health effects associated with food deserts. This

research steps in to fill the aforementioned gaps in the literature. We focus on several staple

healthy and unhealthy food groups mentioned in the literature, with an emphasis on fruits and

vegetables. The food access measure in this project is the food store density at the county level.

The research questions we seek to answer are (i) whether the availability of different types of

food retail outlets affects the probability of patronizing that particular type of outlet for

purchasing fruits and vegetables; (ii) whether food access or affordability or a combination

thereof plays a major role in purchasing fruits and vegetables; (iii) whether household-level

heterogeneity confounds the true effects of increased access to supermarkets; and (iv) whether

the demands for 10 major food groups (healthy and unhealthy) are elastic or responsive to a

proportional increase in supermarket availability. To explore these hypotheses we utilize

national-level purchase data on all kinds of at home food purchases (Nielsen HomeScan Panel

data), which overcomes most the primary data level shortcomings mentioned above. We use

food availability data from the Census Bureau that covers food at home and away from home

sources to depict as complete a picture of retail environment as possibly. The results of this

research will help to design appropriate policy interventions to address heterogeneous strata

disproportionately affected by inadequate food access.

The rest of this report is organized as follows. The data sources and issues are discussed in

detail in Section II. In Section III we formally test the linkage between the availability of

supermarkets and the probability of patronizing supermarkets to purchase F&Vs.

6

II. Data

Data for this project were obtained from four sources: the Nielsen HomeScan; County Business

Patterns, U.S. Census Bureau, Population Estimates, U.S. Census Bureau; and Standard

Reference 24, National Nutrient Database, USDA. We draw on 2005 and 2007 County Business

Patterns and Population Estimates, U.S. Census Bureau, to delineate the food retail environment

and the population/area estimates for the geographical units in our analysis. The food

accessibility data include the number of establishments of the following store formats:

supermarkets and other grocery stores (North American Industry Classification System (NAICS)

code 44511), price clubs (NAICS code 452910), convenience stores (NAICS code 44512),

specialty food stores (NAICS code 4452), full-service restaurants (NAICS code 7221) and

limited-service eating places (NAICS code 7222) for approximately 3153 counties1. In selecting

the above food retail sector, we made a point to include all the food retail channels where people

obtain food, a shortcoming in the previous literature (see A Report to Congress, Economic

Research Service, USDA, 2009). These variables, adjusted for MSA or county level population

and area, obtained from Population Estimates, U.S. Census Bureau, were used to create the retail

store and restaurant densities per 1000 households per 100 square miles (hereafter referred to as

the density variables) for each MSA/county for the reference year.

There was a high proportion of missing data in the density variables. About 52% of all

counties had all five density variables reported; therefore ignoring the counties with missing

values would drastically reduce the sample size and possibly bias the results. To ameliorate this

problem, we resorted to using missing data imputation methods. We utilized two types of

imputations: last-value-dependent imputation and Markov-Chain Monte Carlo (MCMC) multiple

imputations (Xu et al., 2008; Kyureghian et al., 2011). In the case of the last-value-dependent

imputation, we obtained the time-series data for each one of NAICS codes mentioned above

starting from 19982, iteratively estimated a sequence of least squares regressions for each

isolated NAICS industry, then used the estimated parameters and values imputed in the previous

iteration to impute or fill in the data for counties with missing data points for the reference year.

While this method capitalizes on the past values of the same variable and is logically appealing,

1 We refer to these food retail outlets as Supermarkets, Clubs, Convenience, Specialty, FS and QS, respectively.

2 Data prior to 1998 had a different industry classification system – SIC. Although U.S. Census Bureau does provide

a matching of 2002 NAICS to 1987 SIC for retail trade, the matching for the five industries were not unambiguous,

and therefore were not considered appropriate for this imputation step.

7

it has two major drawbacks: it disregards the ‘cross-sectional’ interdependence between food

retail outlets at each point of time, and leaves a substantial portion of the missing data not filled

in due to the absence of past data for the particular county.

The MCMC multiple imputation method draws pseudorandom draws from the joint

distribution of all five NAICS numbers for 2007 until it forms a Markov-Chain that converges to

a target distribution. The MCMC method imputed or filled in all the missing values thereby

motivating our choice of this method of imputation3.

We align the information on food access with actual household purchase data from the

Nielsen panel from the same areas or counties. Nielsen, one of the largest commercial supplier of

scanner data, started collecting in-home household scanner data in 1989. The panel members,

selected from all 48 contiguous states, are supplied with handheld scanners to scan Universal

Product Codes (UPCs) of all purchases and to upload this information on a weekly basis. The

data are categorized in five datasets by food type: frozen foods, produce and meat products with

UPCs, random-weight products without a UPC, dairy products, dry grocery products, and

alcohol and cigarettes. Each record in the data set contains a household identification number,

purchase date, a set of variables that combined provide a complete description of each product

(product type variables), quantity purchased, price, etc. The dataset contains detailed information

about both panel demographics (household size and composition, age, education attainment,

employment status, race and ethnicity of male and female household heads, income, marital

status, area of residence, etc.) and purchase information (price, promotion, purchase date, store

type, etc.)4.

The data concerning the store type are organized into grocery, drug, mass merchandiser,

supercenters, clubs, convenience and other stores. Grocery stores are stores selling food and non-

food items, including dry grocery, canned goods and perishable items, with annual sales volume

3 The MCMC methods rely on the assumption that the missingness is at random (MAR): the occurrence of

missingness does not depend on the values of missing data. The County Business Patterns, U.S. Census Bureau,

explains missingness as non-response by corporations. Based on the facts that the reported data on business

establishments are aggregated geographically by counties, and that the unit of the source of missingness

(corporations) and the unit of the reported data (counties) are distinct and completely independent, we assume that

MAR is satisfied. 4 To identify observations corresponding to different food purchases we follow the procedure for the Quarterly

Food-at-Home Price Database by ERS, USDA. We gratefully acknowledge Dr. Jessica Todd’s help with SAS codes.

8

of $1M and more5. A mass merchandiser is a retail outlet that primarily sells nonfood items but

does have some limited nonperishable food items available. A supercenter is an expanded mass

merchandiser that also sells a full selection of grocery items. A warehouse club is a membership

store that sells packaged and bulk food and nonfood items. Convenience stores are small format

stores selling high convenience items such as beverages, snacks and limited grocery items.

Examples are conventional convenience and military stores, gas stations and kiosks. To get some

sense of how the store breakdown is constructed in the Nielsen classification system, Safeway is

classified as a grocery store, Rite Aid as a drug store, Wal-Mart as a supercenter, Target as a

mass merchandiser, Costco as a club store, and Seven Eleven as a convenience store (Broda,

Leibtag and Weinstein 2009).

The socio-demographic variables in the model include race/ethnicity, marital status,

education, employment, price, and Poverty Income Ratio (PIR). PIR is the ratio of household

income to poverty threshold issued by the U.S. Department of Health and Human Services for

each year. Households with PIR less than 1.35, from 1.35 to 1.85, from 1.85 to 2.50, from 2.50to

4.00 and greater than 4.00 are combined in income groups ‘Income 1’, ‘Income 2’, ‘Income 3’,

‘Income 4’ and ‘Income 5’, respectively.

5 TD Retail Trade Channel and Sub-Channel Overview.doc, Copyright © 2011, The Nielsen Company. All rights

reserved. Rev. 02/2011.

9

III. Food Store Access, Availability, and Choice When Purchasing Fruits and Vegetables

The existing literature on household’s choice of stores does not typically account for both

household and store characteristics. Dong and Stewart (2012) use the wealth of literature on

product brand choice and reconcile it with their data on household characteristics to study the

effects of consumer heterogeneity and habits on store type choice. They model the household

choice of store types by using three groups of variables – store and market variables, such as

price, promotion and seasonality; past shopping variables, such as number of shopping occasions

by households in each type of store and loyalty renewal; and demographic variables. The authors

find that household demographics and past shopping behavior can both influence choice

behavior. Our aim in this study is to examine the effects of density of different types of food

stores on the likelihood that households will purchase F&V in a specific type of store. In other

words, we propose to estimate the probability of patronizing each store types to purchase F&V

conditional to the availability of both in home and away from home food retail establishments.

We hypothesize that the retail food environment, along with marketing, store-level and socio-

demographic factors, plays a significant role in explaining store type choice decisions when

purchasing F&V. We use non-linear multinomial logit method to model this association. To

address the potential endogeneity of food retail density variables, we use the corresponding

lagged values for each county (Courtemanche and Carden 2011).

Model

Following the existing body of literature (e.g., Guadagni and Little 1983), we start with setting

up the model of the household utility function. For household , the utility of buying food in

store type at shopping occasion is expressed as:

( )

where is a store type specific parameter, variable accounts for seasonality in store choice,

and are market- or store-level variables, such as price or promotion. The last term in the

utility function, , has been referred to as the household loyalty variable, referring perhaps to

the subject matter in the past research – brand loyalty (Guadagni and Little 1983, Fader and

Lattin 1993, Dong and Stewart 2012). This is basically the term that captured the cross-sectional

10

household heterogeneity in the earlier literature, drawing from past purchasing behavior.

Guadagni and Little (1983), for example, used a weighted average of past purchases, with a

heavier weight placed on the most recent period. Fader and Lattin (1993) suggested an

improvement of this model by using draws from Dirichlet distribution, modified to capture the

non-stationarity in choice behavior, to construct the loyalty term. By this assumption, the

household choice from among J store types follows a Dirichlet distribution with a PDF

( ) ( ) ( )

( ) ( ) ( )

where ( ) is the gamma function, ∑ and . The expected probability of the store

type is expressed by

( ) ( )

where are store-specific parameters. By this definition the household choice only depends on

store-level factors.

Fader and Lattin (1993) suggested updating the expected probabilities by the number of

choice occasions, thereby making the probabilities household-specific. Define to be equal to

1 if household chose store type at shopping occasion , then (3) is updated accordingly as:

( ) ( ) ∑

Since the total number of shopping occasions are ∑ ∑

Dong and Stewart (2012) hypothesized that household characteristics are important factors in

explaining choice behavior and modified (4) to capture the full-spectrum of household

characteristics:

( ) ( ) ∑

where . Following Dong and Stewart (2012), this last term helps to capture

the household heterogeneity better than the number of past purchase occasions. Here and

are store-specific parameters, and is a vector of household demographic variables. We use

11

this household choice mechanism, modified to include retail environment or store density

variables along with demographic variables:

( )

There is no clear theoretical distinction nor is there any empirical evidence from the past

literature as to where the density variables should appear – whether (a) in the household loyalty

measure, and therefore enter the choice model (1) indirectly or implicitly through the household

loyalty factor, or (b) in the model directly or explicitly, alongside the store-level variables of

price and promotion. The choice depends in part on the research question and whether we

believe that the dominating effect in determining the probability of a household patronizing a

particular store type is the household loyalty to that type of store (affected by the retail

environment) or the availability of that particular (and other) type of stores in the household

residence area. In this case we rely upon the empirical model to guide the choice. Based on a set

of fit statistics we opted for the implicit model in (a).

Fader and Lattin (1993) and Dong and Stewart (2012) paid special attention to incorporating

non-stationarity in their models. We find that while non-stationarity is likely in modeling brand

choice, it is not likely to be a problem in a store choice, let alone a store type choice. In fact,

Dong and Stewart (2012) find no evidence of non-stationarity in their store choice model.

Therefore, we proceed to defining a store type choice multinomial logit model as:

( )

where the second equation follows from (1).

We use non-linear multinomial logit method to estimate model (7) (McFadden 1973,

Chintagunta, Jain, and Vilcassim 1991, Fader, Lattin, and Little 1992).

Data and Summary Statistics

In this study, we use the 2006 Nielsen HomeScan household-level data to account for the

consumer behavior. The Nielsen consumer panel for 2006 consists of 37,794 households. When

purchasing F&V, groceries and supercenters are the most frequented types, accounting for

12

approximately 72% of all purchases. Price clubs are the third most frequented store, but have the

highest price – 36 cents per 100 g or almost $1.63 per lb. Grocery and convenience stores are

next in line, with higher price offerings. The promotional status of the price is captured by

promotion variable. Interestingly, drug stores offer a disproportionately high rate of discounts on

F&V (i.e., F&V are on sale 54% of the time). Grocery stores offer over a third of their produce at

a discounted price. Seasonality variables indicate higher levels of F&V sales towards the end of

the year and this is consistent across all store types. Market and store level variables are

presented in Table 1.

Table 1. Means and Standard Deviations of the Marketing Variables by Food Store Type.

Store Type Shopping

Frequency

Price

(¢ / 100 g)

Promotion Season 1

Jan-Mar

Season 2

Apr-June

Season 3

July-Sep

Grocery 0.719 (0.45) 0.31 (0.31) 0.36 (0.48) 0.11 (0.32) 0.22 (0.41) 0.30 (0.46)

Drug 0.006 (0.08) 0.25 (0.22) 0.54 (0.50) 0.18 (0.38) 0.21 (0.41) 0.25 (0.43)

Mass 0.015 (0.12) 0.23 (0.68) 0.18 (0.38) 0.19 (0.39) 0.22 (0.41) 0.26 (0.44)

Supercenter 0.130 (0.34) 0.25 (0.21) 0.12 (0.32) 0.14 (0.35) 0.22 (0.41) 0.29 (0.45)

Club 0.072 (0.26) 0.36 (0.45) 0.06 (0.23) 0.14 (0.35) 0.24 (0.43) 0.31 (0.46)

Convenience 0.002 (0.05) 0.30 (0.23) 0.16 (0.36) 0.16 (0.36) 0.21 (0.41) 0.30 (0.46)

Other 0.055 (0.23) 0.26 (0.28) 0.16 (0.36) 0.15 (0.36) 0.23 (0.42) 0.30 (0.46)

Notes: Numbers in parentheses are standard deviations.

Household demographic variables indicate that approximately 67% of households were

married, with 9% and 3% of households with an African American or Asian American head,

respectively. 50% and 52% of female and male household heads are employed and 64% and 56%

of them have educational attainment of some college and higher, respectively. Approximately

24% of households have at least one child. The data also include information about household

income categories. We calculate a continuous measure of income – Poverty Income Ratio (PIR),

by assigning individual incomes equal to the midpoint of the category, and then adjusting to the

poverty thresholds by household size6. The names, descriptions, means and standard deviations

of the variables are reported in Table 2.

6 The poverty thresholds are issued in “The 2006 HHS Poverty Guideline” by the US Department of Health and

Human Services.

13

Table 2. Description and Summary Statistics of Variables Used in Analysis.

Variable Mean Std

Food Environment Variables

Super_2005 (NAICS 44511): # of supermarkets and grocers per 100 sq

miles 3.04 4.72

Clubs_2005 (NAICS 452910): # of price clubs per 100 sq miles 1.31 0.64

Convenience_2005 (NAICS 44512): # of convenience stores per 100

sq miles 1.35 2.06

Specialty_2005 (NAICS 4452): # of specialty stores per 100 sq miles 1.32 0.99

FS_2005 (NAICS 7221): # of full-service restaurants per 100 sq miles 14.27 17.92

QS_2005 (NAICS 7222): # of limited-service eating places per 100 sq

miles 21.92 24.26

Household variables

PIR 4.20 2.84

Child: = 1 if at least 1 child under 18 0.24 0.43

Female Education: = 1 if female head education level is some college

or more 0.64

0.48

Male Education: = 1 if female head education level is some college or

more 0.56

0.50

Female Employment: = 1 if female head employed 0.50 0.50

Male Employment: = 1 if male head employed 0.52 0.50

Married: = 1 if household head married 0.67 0.47

Black: = 1 if household head is African American 0.09 0.28

Asian: = 1 if household head is Asian 0.03 0.17

Results

The marginal effects of the variables from the estimation of (7) are presented in table 3. As

indicated above, we report the results from the model where food retail density variables enter

the utility function indirectly, through (6). In the interpretation of the estimates of the food retail

density variables, we are particularly interested in the marginal effect of supermarkets, which

include most of large grocery stores, mass merchandisers and supercenters (as defined by U.S.

Census Bureau), since they potentially offer the affordability, assortment and other

14

Table 3. Marginal Effects from the Non-Linear Multinomial Logit Model

Variable Grocery Drug Mass Supercenter Clubs Convenience Other

Predicted Probability 0.601 0.016 0.022 0.177 0.106 0.007 0.070

Marketing Variables

Season 1

0.0863**

(0.0012)

0.0003

(0.0005)

0.0069**

(0.0006)

-0.0006

(0.0019)

-0.0029*

(0.0015)

0.0003

(0.0003)

Season 2

0.0529**

(0.0010)

-0.0022**

(0.0005)

0.0000

(0.0006)

-0.0004

(0.0017)

0.0077**

(0.0013)

-0.0004

(0.0002)

Season 3

0.0165**

(0.0009)

-0.0035**

(0.0004)

-0.0023**

(0.0005)

-0.0045**

(0.0015)

0.0041**

(0.0011)

-0.0002

(0.0002)

Price

0.0570**

(0.0016)

0.0018*

(0.0008)

-0.0163**

(0.0011)

-0.0135**

(0.0027)

0.0755**

(0.0017)

0.0021**

(0.0003)

0.0467**

(0.0003)

Price Deal

0.1033**

(0.0010)

0.0251**

(0.0005)

0.0041**

(0.0005)

-0.0262**

(0.0017)

-0.0881**

(0.0016)

-0.0001

(0.0002)

0.0467**

(0.0003)

Household Demographic Variables

PIR 0.0001**

(0.0000)

0.0001**

(0.0000)

-0.0001**

(0.0000)

-0.0002**

(0.0000)

0.0013**

(0.0001)

-0.0001**

(0.0000)

0.0000**

(0.0000)

Child

0.0007**

(0.0001)

-0.0014**

(0.0001)

0.0026**

(0.0001)

0.0044**

(0.0003)

0.0062**

(0.0003)

-0.0004**

(0.0000)

0.0002**

(0.0000)

Female Education

0.0000**

(0.0000)

-0.0001

(0.0001)

-0.0006**

(0.0001)

-0.0009**

(0.0001)

0.0031**

(0.0002)

0.0001

(0.0000)

0.0000**

(0.0000)

Male Education -0.0007**

(0.0001)

0.0004**

(0.0001)

-0.0007**

(0.0001)

-0.0029**

(0.0002)

0.0025**

(0.0002)

-0.0001*

(0.0000)

-0.0002**

(0.0000)

Female Employ 0.0006**

(0.0000)

-0.0009**

(0.0001)

0.0014**

(0.0001)

0.0030**

(0.0002)

0.0016**

(0.0002)

-0.0001**

(0.0000)

0.0002**

(0.0000)

Male Employ 0.0003

(0.0014)

-0.0021

(0.0051)

0.0013

(0.0037)

0.0045

(0.0082)

-0.0046

(0.0730)

0.0008

(0.0006)

0.0001

(0.0003)

Married

0.0002

(0.0014)

0.0011

(0.0051)

0.0002

(0.0037)

0.0025

(0.0082)

-0.0042

(0.0730)

-0.0004

(0.0006)

0.0001

(0.0003)

Black

-0.0005**

(0.0000)

-0.0001

(0.0001)

0.0002

(0.0001)

0.0002

(0.0002)

-0.0047**

(0.0003)

0.0003**

(0.0000)

-0.0001**

(0.0000)

Asian

-0.0012**

(0.0001)

0.0021**

(0.0002)

-0.0002

(0.0002)

-0.0089**

(0.0006)

0.0025**

(0.0004)

-0.0009**

(0.0001)

-0.0003**

(0.0001)

Food Environment Variables

Supermarkets -0.0001**

(0.0000)

0.0001**

(0.0000)

-0.0001**

(0.0000)

-0.0024**

(0.0001)

-0.0003**

(0.0000)

0.0000**

(0.0000)

0.0000**

(0.0000)

Clubs 0.0000**

(0.0000)

0.0000

(0.0000)

0.0000

(0.0000)

0.0007**

(0.0001)

-0.0001**

(0.0000)

0.0000*

(0.0000)

0.0000

(0.0000)

Convenience -0.0001**

(0.0000)

0.0002**

(0.0000)

0.0002**

(0.0000)

-0.0097**

(0.0006)

-0.0006**

(0.0001)

0.0000

(0.0000)

0.0000**

(0.0000)

Specialty -0.0002**

(0.0000)

-0.0008

(0.0001)

-0.0002**

(0.0001)

-0.0003*

(0.0002)

0.0012**

(0.0001)

0.0003**

(0.0000)

-0.0001**

(0.0000)

FS 0.0000**

(0.0000)

-0.0001**

(0.0000)

0.0000**

(0.0000)

-0.0012**

(0.0001)

0.0002**

(0.0000)

0.0000**

(0.0000)

0.0000**

(0.0000)

QS 0.0000

(0.0000)

0.0001**

(0.0000)

0.0000**

(0.0000)

0.0006**

(0.0000)

0.0000**

(0.0000)

0.0000**

(0.0000)

0.0000

(0.0000)

-2 Log Likelihood 1,983,620

AIC 1,983,926

BIC 1,983,760

Sample size 1,187,149

Notes: Marginal errors are calculated at mean values of variables. Standard errors are in parentheses. * indicates significance at 5%

level, **

indicates significance at 1% level.

15

characteristics often cited in the literature as necessary for improving diets (e.g., they offer a

varied range of F&V).

In general the predicted probabilities reported in table 3 preserve the order and are close in

magnitude to observed frequencies reported in table 2. As mentioned above, unlike the loyalty

measures in the previous literature, we incorporated the store access variables as well. The

results indicate that the number of supermarkets and grocery stores (NAICS 44511) and clubs

(NAICS 452910) have negative impact on the probability of patronizing supercenters and

grocery stores and clubs for purchasing F&Vs. Supermarkets, in fact, have negative access on all

types of large retailers. These outcomes should not be interpreted as a decrease in the probability

of patronizing these types of stores as a result of an increase in the number of these stores. They

merely indicate that the probability of purchasing F&Vs from these types of stores is decreased.

A likely explanation is that supermarkets are typically a less expensive source of all kinds of

food in general, not only F&Vs (U.S. Department of Agriculture, Economic Research Service

(USDA ERS) 2009), therefore giving rise to possible substitution away from F&Vs to some

other food groups. This result means that the number of supermarkets, which includes most of

large grocers, mass merchandisers and supercenters (as defined by U.S. Census Bureau), is

negatively associated with the probability of patronizing these stores to purchase F&V. This is in

line with findings of Kyureghian, Nayga and Bhattacharya (2012), Kyureghian and Nayga

(2012), Beaulac, Kristjansson and Cummins (2009), and Michimi and Wimberly (2010) that

generally demonstrate mixed or no association between the availability of these stores and

purchase and consumption of F&Vs.

Unlike supermarkets and clubs, an increase in convenience stores translates into an increase

in the probability of patronizing convenience stores to purchase F&Vs. This indicates that

households highly value convenience, which may also explain the large negative impact

convenience stores have on the probability of shopping at supercenters: a 1-unit increase in the

number of convenience stores reduces the probability of shopping in supercenters by

approximately 1 percentage point. The number of specialty stores (bakery, produce and butcher

stores, etc.) has mixed effects on different types of stores – a negative impact on the probabilities

of shopping at mass merchandiser, supercenter and grocery stores, but impacts positively the

probability of purchasing F&Vs in club and convenience stores, possibly due to the extreme

16

heterogeneity of this group. Contrary to the public belief the limited-service restaurants (NAICS

7222) actually increase the likelihood of purchasing F&Vs from nearly all types of stores.

The marketing type variables, like “price” and “price deals”, defined as the unit price and the

promotional status of the price, have mixed effects on patronizing different types of stores. For

example, price is negatively associated with patronizing supercenters and mass merchandisers,

which is in line with these stores being perceived as lower-priced than other types (table 1). Price

deals, on the other hand, increase the probability of purchasing F&Vs from higher-priced store

types, such as grocery, drug, mass merchandiser and other stores. The marginal effects of

seasons 1to 3 on probabilities of patronizing grocery stores in these seasons relative to season 4

(Oct.-Dec.) are large and positive. These effects on other types of stores are mixed and

sometimes insignificant.

The results of household-level variables show that income is positively associated with

patronizing higher-priced grocery, drug and club stores, echoing previous findings in literature

(Dong and Stewart 2012, Staus 2009). The marginal effect of income is negative for all other

types of stores. The presence of children in a household has large positive effect on lower-priced

(supercenters) and high-volume (clubs) store types. Household head education attainment has

negative and positive impact on the odds of shopping in supercenter and club stores,

respectively. Household head employment and marital status have no discernible impact on store

choice. Supercenters are noticeably less patronized by non-whites, with Asians preferring clubs

more7 and convenience stores less than whites. African American households demonstrate strong

preference of convenience stores and less preference of clubs compared to white households.

Concluding Remarks

Household store choices could depend not only on store marketing characteristics and household

demographic characteristics, but also on physical availability of different types of retail stores.

The role of the latter in affecting the probability of patronizing a specific type of food store,

when purchasing fruits and vegetables, is the focus of this study. Our results generally suggest

that availability of supermarket and club types of food stores is inversely related to the likelihood

7 The stores both in the Nielsen and Census Bureau classifications are classified by size and assortment, thereby

making it hard to discern a clear delineation which store types might include the ethnic stores.

17

of patronizing these specific types of food stores when purchasing fruits and vegetables. This

finding has important policy implications given the attention that the accessible and affordable

food retail environment (i.e., supermarkets) has attracted recently in relation to improving dietary

quality and reducing obesity rates in the United States. The finding that the availability of

convenience stores does in fact induce higher probability of purchasing fruits and vegetables

from this type of store is equally intriguing and important. The disproportionately large negative

effect of convenience stores on the likelihood of patronizing a supercenter indicates that when it

comes to shopping for produce the households value convenience more than larger assortment

and affordability typically found at supercenters. This finding suggests perhaps a whole new

direction of policy intervention emphasizing reliance on smaller, more flexible store types. This

reliance will take advantage of already proliferation dollar and other convenience store network

hereby allowing the market mechanism to provide some of the solutions to the access problem

and alleviate the burden on tax payers. Future studies should that would research the effect of

food access and availability on likelihood of purchasing other types of food in different types of

food stores will contribute to fully understanding the issue and designing appropriate remedies.

18

IV. The Effect of Food Store Access and Income on Household Purchases of Fruits and

Vegetables: A Mixed Effects Analysis

Given these concerns raised in the literature, it is important to realize that food availability

affects food choice not only through physical access, but also through price and income. Unless

these effects are accounted for, the results will likely be spurious. For this reason, supercenters

and supermarkets have received much attention primarily due the price affordability and wide

assortments of F&V they typically offer (Larson, Story and Nelson, 2009; Larsen and Gilliland,

2009) and due to the market power they exert in influencing market price (Broda, Leibtag and

Weinstein, 2009; Courtemanche and Carden, 2011; Hausman and Leibtag, 2007; Hausman and

Leibtag, 2004). Broda, Leibtag and Weinstein (2009) examined the consumer behavior in food

demand across different store chains, store types and household and zip code characteristics.

They used household-level purchase data to debunk several popularly-held beliefs. For example,

while it may be the case that supermarkets do not locate in poorest neighborhoods, poorer

households do not appear to have limited access to supercenters and they do not pay more for

identical foods either due to limited access or market power exercised by the traditionally low-

priced retailers in underserved areas. The authors demonstrated that even though supercenters,

mass merchandisers and even drug stores have significantly lower prices than traditional

groceries, poor households combine the convenience in shopping nearby with the large volume

of shopping from low-priced stores in a way that renders them not worse off than their richer

counterparts. Despite the significant contribution of this paper to the empirical literature, failure

to account for the retail environment along with other zip code characteristics limits the findings

in a significant way.

The potential availability of stores is bound to be a major driver for consumption patterns and

should be taken into consideration as well. Courtemanche and Carden (2011) did this precisely in

their manuscript by researching the effects of supercenters (i.e., Walmart in this case) on health

outcomes, obesity in particular, through decreased food prices. They examined the endogeneity

of store location decision and effectively estimated the effect of Walmart stores on the rise in

obesity. Hausman and Leibtag (2007) demonstrated similar downward trend in prices when a

supercenter move into a neighborhood and pointed out the social benefits associated with

encouraging such entry.

19

Our study builds on previous findings by associating actual consumer behavior (similar to

Broda, Leibtag and Weinstein, 2009) with neighborhood retail food availability (Courtemanche

and Carden, 2011; Hausman and Leibtag, 2007; Hausman and Leibtag, 2004; and Michimi and

Wimberly, 2010). Our aim is to model the individual and interaction effects of income and food

access on actual purchases of F&V by households. The specific objectives of this study are

threefold. First, we reconcile the gap between food access and actual purchase behavior by using

a national purchase data set that has detailed household purchase and demographic information.

In this paper, we improve upon the existing literature by analyzing the actual shopping patterns

of households by explicitly isolating the effects of food access from the effects of income

constraints, while addressing the data limitations mentioned above. Another improvement over

the literature is our use of a wider definition of food access to encompass all retail outlets for

both food at home and away from home. Second, we use hierarchical data analysis methods to

account for possible clustering effects due to income or food access, which is an improvement

over the methodology used in past studies. Finally, we conduct variance decomposition to

describe the magnitude or the proportion of the contribution these two factors have on the

variability of F&V purchases. The findings in this research will improve understanding of how

food access issues interact with income levels in influencing purchases of F&V at the household

level. To our knowledge, no other known study has examined this issue in the past using detailed

household purchase, demographic, geographic, and food store access data. The focus on F&V is

also noteworthy considering the need to improve the quality of diets, not to mention the high

obesity rates in the U.S

Model

Following Courtemanche and Carden (2011) and Broda, Leibtag and Weinstein (2009), we

postulate a model that estimates the impact of the retail food availability on F&V consumption.

The F&V purchase is therefore modeled as a function of own-price, income, demographic

variables, and store availability (Courtemanche and Carden, 2011). The choice of mixed effects

modeling is motivated by the nature of the data. The observations in the data set are weekly

purchases of F&V by households. The observations are completely nested in households, which

in turn are partially nested in different income groups and in MSAs/counties with different food

20

access levels. The desired mixed effects model therefore involves a hierarchy of 288,884 weekly

purchases of F&V by 52,943 households (the number of observations per household varies from

1 to 53, with a mean and median of approximately 27), residing in 3141 counties, clustered in

2311 MSAs. In matrix notation the mixed model is specified as

where Y is the variable of interest, X is a vector of fixed covariates, Z is a vector of random

effects, and is a vector of disturbances. The random effects in Z have mean and variance

represented as

E [ ] = [

] and Var [

] [

]

where G is the variance-covariance matrix for the random effects that controls for among group

variations, and R is a block diagonal matrix of variance for the residual that allows within group

variation in the model. The above approach of modeling covariance enables us to account for

heteroskedasticity and correlations in the variables.

The empirical models we designed to test the hypotheses set forth in the introduction

capitalize on the richness of the mixed effects modeling to make inference using our hierarchical

data. Four models were specified with two different food retail density variables for the two

dependent variables. The models are

∑ ∑ (8)

where is the logarithmic transformation of the dependent variable – the ratio of actual and

recommended servings of F&V (ratio) for household , in week ;

is a vector of fixed effects: household specific demographic variables, price, season and region;

is a vector of random effects: the scaled number of Supermarkets and PIR; and are fixed

and random effect parameters, respectively; and is the idiosyncratic error term.

We define a second model to estimate the effects of Supermarkets and income, only with a

full set of density variables - Supermarkets, Convenience, Specialty, FS and QS:

∑ ∑ (9)

21

where is a vector of random effects: the scaled numbers of - Supermarkets, Convenience,

Specialty, FS and QS and PIR. Other variables are defined as above.

The alternative model specifications are

∑ ∑ (10)

∑ ∑ (11)

where is the logarithmic transformation of the dependent variable – the amount of F&V

servings (level) purchased by household , in week ; is a vector of

random effects: Supermarkets and PIR in equation (10) and a full set of the food access - the

scaled numbers of Supermarkets, Convenience, Specialty, FS and QS and PIR (equation (11)).

Other variables are defined as in (8) above. A total of 4 models are estimated.

In the analysis, no restrictions were imposed on the variance-covariance matrix for the

residual – R. In other words, residuals are modeled as homoskedastic. The variance-covariance

matrix for the random effects, G, was set as a block-diagonal matrix with the blocks identified by

levels of income/access interaction variables, differentiated by metropolitan area status for each

household.

Data and Summary Statistics

We employ four data sets in our analysis: the Nielsen HomeScan; County Business Patterns,

U.S. Census Bureau, Population Estimates, U.S. Census Bureau; and Standard Reference 24,

National Nutrient Database, USDA. We draw on 2007 County Business Patterns and Population

Estimates, U.S. Census Bureau, to delineate the food retail environment and the population/area

estimates for the geographical units in our analysis. For the purchase data we use 2008 Nielsen

HomeScan panel data.

The purchase of a food items in Nielsen is captured by a quantity variable expressed by

ounces or fluid ounces. Following Nevo (1997), the reported ounces and fluid ounces were

expressed in terms of serving sizes. This was a convenient transformation facilitate the

aggregation of the quantities of different types of produce (canned, fresh, frozen, etc.) and

relating them to the dietary guidelines. In the first step of this conversion process, the

22

observations in the Nielsen data set were divided into two food groups – fruits (fresh, frozen,

canned, dried, juice) and vegetables (fresh, frozen, canned, and juice) that came from the frozen-

produce-meats and dry grocery data sets. Three key variables used to uniquely identify each

produce item are: product group (e.g. fresh produce), product module (e.g. fresh fruit remaining),

and product (e.g. lemon or mango, etc.)

The reference data source of the serving sizes and refuse rates (i.e., the ratio of the skin,

stone, and any other inedible parts that are discarded prior to eating in the total weight) for each

produce item is the USDA National Nutrient Database for Standard Reference (SR 24). In a few

cases where one-to-one matching between the products from the two data sets was not possible,

alternative measures were taken: 1) higher level of aggregation (i.e. product modules or product

groups) were considered, 2) weighted average of existing products were taken (e.g. melons in

Nielsen matched as average weight of all types of melons in SR 24), or 3) different types were

considered (e.g. under product module- fruit refrigerated, a specific product citrus salad was

matched as “fruit salad, canned” in SR 24).

Using two models, we estimate the associations of food access (i) on quantities of F&V

purchased and (ii) on the extent households meet the dietary recommendations concerning F&V.

The dependent variables we used in our analysis are therefore expressed as (i) number of

servings purchased (in the remainder of the paper we refer to this as “level”), and (ii) the ratio of

the actually purchased to the recommended numbers of servings of F&V (we refer to this as

“ratio”) per household per week. The recommended numbers of servings by gender and level of

physical activity are available from the Centers for Disease Control and Prevention (CDC). Since

the levels of physical activity for household members are not available in the Nielsen panel, we

considered “5 a day” as the recommended servings for F&V for every household member.

Finally, we aggregated the data by week and by broad food group (F&V). In the subsequent

regression analysis, we use a logarithmic transformation of the dependent variables, along with

random effects, to satisfy the normality requirement for a mixed model specification (Searle,

Casella and McCulloch, 1992; Littell et al., 2006).

The socio-demographic variables in the model include race/ethnicity, marital status,

education, employment, price, and Poverty Income Ratio (PIR). A detailed description of these

variables and summary statistics are provided in Table 4.

23

Table 4. Variable Descriptions and Summary Statistics

Variable Name Description Mean

(Std. Dev)

Ratio of purchased and

recommended servings

Percent of the purchased number of F&V servings in

recommended number, per household per week

12.31

(21.68)

Purchased servings Number of servings purchased, per household per week 8.42

(11.82)

Price of F&V The weighted average price, per serving per week 65.32

(64.43)

Supermarkets Number of supermarkets and large groceries per 1000

households per 100 square mile in each MSA

63.85

(886.43)

Convenience Number of convenience per 1000 households per 100 square

mile in each MSA

58.43

(1076.42)

Specialty Number of specialty stores per 1000 households per 100

square mile in each MSA

46.84

(448.03)

Full-Service Restaurants (FS) Number of full-service restaurants per 1000 households per

100 square mile in each MSA

110.54

(1295.32)

Quick-Service Restaurants (QS) Number of limited-service eating places per 1000 households

per 100 square mile in each MSA

152.61

(1596.87)

Season1

Months in Jan-Mar 0.25

(0.43)

Season2

Months in Apr-Jun 0.24

(0.42)

Season3

Months in Jul-Sep 0.23

(0.42)

Season4

Months in Oct-Dec 0.28

(0.45)

Region1

East 0.16

(0.37)

Region2

Central 0.26

(0.44)

Region3

South 0.38

(0.49)

Region4

West 0.20

(0.40)

PIR Poverty Income Ratio = midpoint of category adjusted to

poverty thresholds* by household size

4.23

(2.65)

Household Size Household Size 2.40

(1.23)

Married A binary variable that takes a value of 1 if married and 0

otherwise (single, widowed, divorced/separated)

0.68

(0.47)

White A binary variable that takes a value of 1 if white and 0

otherwise (black, oriental, other)

0.86

(0.34)

Black A binary variable that takes a value of 1 if black and 0

otherwise (white, oriental, other)

0.07

(0.26)

*The poverty thresholds are issued in “The 2008 HHS Poverty Guideline” by the US Department of

Health and Human Services.

24

Table 4. Variable descriptions and summary statistics - continued

Variable Name Description Mean

(Std. Dev)

Female Head (FH) Education

A binary variable that takes a value of 1 if FH is a college or

post college graduate and 0 otherwise (grade school, some

high school, graduated high school, or some college)

0.55

(0.50)

Male Head (MH) Education A binary variable that takes a value of 1 if MH is a college or

post college graduate and 0 otherwise (grade school, some

high school, graduated high school, or some college)

0.45

(0.50)

Female Head (FH) Employment A binary variable that takes a value of 1 if FH is employed

more than 35 hours per week, and 0 otherwise (less than 35

hours, not employed for pay or no female head)

0.35

(0.48)

Male Head (MH) Employment A binary variable that takes a value of 1 if the MH is

employed more than 35 hours per week, and 0 otherwise

(less than 35 hours, not employed for pay or no female head)

0.47

(0.50)

Income 1

PIR < 1.35 - Food stamp level 0.09

(0.28)

Income 2

1.35 ≤ PIR < 1.85 - Food stamp level 0.07

(0.25)

Income 3

1.85 ≤ PIR < 2.32 – 25th percentile 0.08

(0.27)

Income 4

2.32 ≤ PIR < 3.69 - 50th percentile 0.26

(0.44)

Income 5

3.69 ≤ PIR 0.51

(0.50)

Access 1

Supercenters = 0 (3.13th percentile) 0.03

(0.17)

Access 2 0 < Supercenter ≤ 9.38 (25th percentile) 0.22

(0.41)

Access 3 9.38 < Supercenter ≤ 18.75 (50th percentile) 0.25

(0.43)

Access 4 18.75 < Supercenter ≤ 33.19 (50th percentile) 0.25

(0.43)

Access 5 33.19 < Supercenter 0.25

(0.43)

The price variable is derived as the quantity-weighted average expenditure on F&V per

serving. On average, 64 cents were paid per serving, although most of the servings cost less

(median = 47 cents per serving). In order to capture the seasonality effect on purchases, three

binary variables were introduced representing seasons by quarter – the season variable for the

fourth quarter was the base. The observations in our data set were approximately evenly

distributed over the quarters. Since only a subsample of the Nielsen households is retained in our

25

analysis (i.e., only those that purchased F&V), we draw comparisons and give an idea of how

representative the subsample is relative to the Nielsen sample.

The F&V subsample was mostly represented by predominantly white, married households

(86% and 68% compared to the overall Nielsen 84% and 61%, respectively). Compared to the

Nielsen sample, slightly higher proportions of the household heads in the F&V subsample have

college or graduate education (55% and 45% compared to 54% and 43% for females and males,

respectively). The employment status profile of household heads in both samples is alike.

Since the food access and income level are at the foundation of the hypotheses this research

seeks to test, we paid close attention to these variables in our data set. The Poverty Income Ratio

(PIR) in the sample was slightly higher (4.23) than the overall Nielsen average (4.02), possibly

indicating that more affluent households bought F&V. By the current definition, PIR is a

continuous variable. In order to be able to use the information on income as an indicator to F&V

purchases, it is logical to also create a discrete variable indicating the income status. The adopted

cutoffs for this variable are the PIR cutoffs of 1.35 and 1.85 used for the eligibility for

participation in the National Food Stamp Program8, as well as 2.32 and 3.69 as the 25

th and 50

th

percentiles.

The geographic unit upon which the food retail establishment density variables are available

is at the county level. We also obtained an alternative measure – density variables by

metropolitan statistical areas (MSAs). These are essentially clusters of counties in a complete

nested manner – that is each county is part of only one MSA. This type of population cluster

interpretation serves two purposes: (i) it accounts for more precise food availability and access

measures in counties (possibly in different states) that are clustered in a single MSA and (ii) it

serves as an alternative formulation for checking the robustness of the results from the county-

based analyses. This alternative formulation, with a total of 2,311 MSAs, is not only logically

but also empirically more appealing compared to the county formulation with 3,141 counties.

Bitler and Haider (2011) extensively discuss the importance of the right choice of the geographic

unit. They cite a USDA (2009) study that uses American Time Use Survey to report that time

spent shopping for groceries is less if done from work than from home. Additionally Hellerstein,

Neumark and McIrerney (2008) show that only 14% work in the zip code in which they live, but

8 The 2009 HHS Poverty Guidelines, U.S. Department of Health and Human Services.

26

92% work and live in the same MSA, which motivated our choice of MSA over counties.

Nevertheless, we conducted the analysis both at the MSA and county levels to test the robustness

of our findings.

Along with these density measures, we also created two binary variables: Metro and Rural

that indicate whether an MSA or a county is a metropolitan or rural area, respectively. These

variables were created for the sole purpose of grouping or combining the terms in the

variance/covariance matrix when modeling the possible covariation between any two

observations that come from metro or rural areas.

Similar to the grouping of households by income level, we distinguished MSAs by access

level (variable name Access) using data-driven cutoffs at the 3.13th

9, 25

th , 50

th and 75

th

percentiles. The actual numbers that correspond to these percentiles are reported in Table 2.

Unlike the cutoffs for PIR, the cutoffs for access are hard to motivate and are not unambiguously

defined. This is referred to as ‘adequacy’ of access in the literature10

. To reemphasize, Access

and Income, along with their interaction terms, are artificial constructs created for the sole

purpose of modeling the variance in the mixed models.

Finally, the Nielsen Company strives to recruit and maintain a panel that is a nationally

representative sample of the US population. For this purpose, weights are assigned to each

household that are constructed by adjusting the marginal distributions of some key characteristics

of the households in the panel with the national counterparts. These weights are created using

Iterative Proportional Fitting, which iteratively adjusts the sample marginal distributions using

the population marginal distributions until the sample marginal distributions stabilize and

converge to the population marginal distributions. We opted not to use these weights based on

the fact that the initial sample marginal distributions that were used to create the weights were

not representative of the sample we retained for our analysis – i.e., the households that purchased

F&V.

The final data set we used in our analysis has a total of 288,884 observations of weekly

purchases aggregated from 1,740,670 observations of fruit and vegetable purchases from dry

grocery and frozen, produce and meat data sets in the Nielsen 2008.

9 3.13% of observations are from MSAs that have exactly 0 supermarkets.

10 Economic Research Service Report to Congress, USDA, 2009.

27

Results

Table 5 reports the coefficient estimates of interest from models (8) through (11). Only the

estimates of the parameters of interest (income and access) are reported. The first two columns

represent estimations with ratio dependent variable (equations (8) and (9)), while the last two

columns represent estimations with level dependent variables (equations (10) and (11)). The

mixed effects estimates are remarkably similar across all four models.

Our estimates in columns (1) and (3) (corresponding to models with only Supermarket

availability) are significant at 10% level. They suggest that a 1 percent increase in PIR would

induce a 0.18 and 0.03 percent increase in the ratio and level of F&V, respectively. The

supermarket density variable is not significant in either model, indicating statistically

insignificant associations between F&V consumption and Supermarket availability. These results

are in accordance to those reported by Michimi and Wimberly, 2010, as our Supermarket

variables combine all sizes of establishments in this category.

In the models with all store types (columns (2) and (4)), the results indicate that a 1%

increase in PIR would induce a 0.20% increase in F&V ratio. All other parameter estimates of

interest – density measures for other store availability, do not significantly influence F&V

purchases.

In all four models, the statistically significant likelihood ratio tests indicate the superiority of

the random model specification over the fixed effects only specification. The covariance

parameter estimates indicate that in all four models, the residual and income/access interaction

terms are the only significant ones. This means that the variation in the dependent variable is not

due to the variation in income or access in isolation, but rather by their interaction.

Concluding Remarks

Limited access to healthy food has been linked to poor diets and adverse health effects

particularly in underserved and predominantly poor communities. The objective of this research

is to identify the relative influences of income and food access factors on healthy food (i.e., fruits

and vegetables) purchases using retail food availability data, detailed purchase scanner data and

mixed effects models. Our results indicate significant gains of modeling mixed effect as opposed

28

Table 5. Mixed Effects Regression Parameter Estimates (Standard Errors)

Ratio of Purchased and

Recommended Amounts of

F&V

(Percent)

Amount of F&V Purchased

(Number of Servings)

Availability

by

Supermarket

Only

Availability

by

All Store

Types

Availability

by

Supermarket

Only

Availability

by

All Store

Types

(1) (2) (3) (4) Parameter Estimates

PIR

0.1770*

(0.0265)

0.1993**

(0.0047)

0.0334*

(0.0042)

0.0325

(0.0061)

Supermarket

0.0071

(0.0024)

0.0070

(0.0030)

0.0062

(0.0017)

0.0088

(0.0021)

Convenience

0.0045

(0.0043)

-0.0035

(0.0021)

Specialty

0.0054

(0.0151)

-0.0034

(0.0121)

Full-Service

-0.0064

(0.0092)

-0.0021

(0.0101)

Quick-Service

-0.0109

(0.0031)

0.0054

(0.0029)

Demographic Yes Yes Yes Yes Household Size No No Yes Yes Likelihood Ratio Test

177.3

(<.0001)

151.5

(<.0001)

28.8

(<.0001)

52.1

(<.0001)

Number of Observations 288,884 288,884 288,884 288,884

Covariance Parameters

Income/Access

0.0022***

(0.0008)

0.0015***

(0.0005)

0.0002**

(0.0001)

0.0003**

(0.0001)

Residual

0.7135***

(0.0019)

0.7136***

(0.0019)

0.6081***

(0.0016)

0.6080***

(0.0016)

***,

**, and

* represent significance at 1%, 5% and 10%, respectively.

29

to invariably homoscedastic fixed effects modeling. We demonstrate that the densities of

supermarkets in Metropolitan Statistical Areas do not have significant effects on fruit and

vegetable purchases. Similar results are observed for other density variables in all settings.

Our results suggest that increases in income as opposed to supermarket density can result in a

mild increase in F&V purchases. The covariance parameter estimates suggest that policy actions

aimed at alleviating healthy food accessibility or affordability problems in isolation are likely not

to be effective, as evidenced by the insignificance of the covariance parameter estimates of the

random effects (Income and Access separately) in all models.

The results in this study provide motivation for future research to conduct a comprehensive

demand analysis of not only fruits and vegetables, but also related food groups since the

estimation of relevant cross price and income elasticities can help us better understand the effects

of availability of different types of retail stores on food purchases. It could also provide more

insights that could be used in analyzing the intended and unintended consequences of policy

actions aimed at creating incentives to increase food availability in food deserts.

With data availability, future studies should also include random weight purchases of fruits

and vegetables in the analysis given that it may be a non-trivial portion of overall fruit and

vegetable consumption. With random weight data, it would then be possible to include fruit and

vegetable markets (NAICS 44523) as a separate food access or density variable to help depict a

more comprehensive picture of the food environment.

Admittedly, the retail food sources captured by the combination of the different types of

establishments we used in this study can be a limited way of accounting for food availability

since it disregards the size of the establishments, as well as expansions, conversions and

improvements in the existing stores. Given data availability, a more comprehensive and detailed

source of information on food availability would enable future research to capture variations in

food purchases or consumption attributable not only to the number and type of establishment but

also to the amount and quality of the food offered in these establishments.

30

V. The Role of Food Access in Meeting Some Dietary Guidelines: A Natural Experiment

The natural experiment setting constitutes a unique experiment that can help increase our

understanding of the linkage between food availability and food choice. Given the difficulty to

conduct natural experiments on national scale, it is hard to obtain appropriate control and

treatment groups to derive proper statistical inference. Moreover, identifying and isolating

changes in purchase solely due to increase in availability is confounded by the impact of changes

in other factors such as changes in the household size, marital status, educational attainment,

employment status, income, etc. We will use a standard difference-in-difference approach to

model the association between increase in availability and choice.

In this Section we set to test two hypotheses: (i) increased food availability induces increase

F&V purchases; and (ii) the increase of food availability in underserved areas (food deserts)

induces increase in F&V purchases. For this research we use the Nielsen HomeScan purchase

data from 2005 to 2006 to estimate the causal effect of increases in the supply of food retail

outlets on F&V purchases. We will compare the evolution of purchases by households that were

‘treated’ (exposed to increased number of groceries, supermarkets, supercenters and price clubs)

with the purchases of households that did not face improved shopping opportunities from 2005 to

2006 (‘control’ group). We are using difference-in-difference (DD) and triple difference (DDD)

estimation methods to insure that the change in F&V consumption is attributable to the improved

availability and rule out spurious effects due to other changes. Our results suggest that there is

little evidence to support the popular belief that improved food retail environment would indeed

induce increased F&V purchases, and therefore would eventually improve diets.

Model

The baseline model specification is a difference-in-difference type fixed-effect OLS regression

model (Conley and Taber, 2011; Donald and Lang, 2007). Let index a household observed in

county at time .

( )

31

where the dependent variable is the quantity (oz) of F&V purchased by household in

county at year ; are time-invariant county fixed effects; are year fixed effect; are

household-specific regressor (for example, demographic variables); are group-time effects;

and is the idiosyncratic error.

To address the data issues described above, the baseline model (12) is modified and

estimated by methods developed for both cross-sectional and panel data structures (Wooldridge,

2002).

The data structure of pooled cross sections over time can be used for estimating ‘treatment’

or ‘intervention’ effects in natural experiment settings such as the supply shock in this analysis

(Wooldridge, 2007, p. 129). In this setting the data are considered to be independently, but not

identically distributed. The independence requirement - the incidence of some households being

in both cross sections is considered purely incidental and random, may be violated as 78% of the

2005 cross section continues to the next year. Hence, the need for dual analysis at panel and

cross-sectional levels. The ‘treatment’ is the increase in Supercenters in some counties from

2005 to 2006 at the county level – all households in the county are assumed to have been

exposed to a uniformly improved availability. The households residing in these counties are the

treatment group, households residing in the rest of the counties comprise the control group. The

changes in F&V purchases for both treatment and control groups in both pre-treatment and post-

treatment periods are modeled in this scenario with the purpose to isolate and estimate the

change in the dependent variable due to the treatment.

The variable that measures the treatment is ‘Improved Supercenters’. A similar variable –

‘Improved QS’, indicates an increase in quick service restaurants that was included to account

for the ‘food swamp’ effect mentioned in the previous literature (Congress, 2009; Larson, Story

and Nelson, 2009). To identify the treatment effect, other factors that might contribute to the

change in the dependent variable, separately or combined, but have nothing to do with the

treatment, such as income, household size, marital status, education level and employment status

change, have been included in the model as well. The baseline model (12) is modified to estimate

32

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( )

( ) ( )

The parameter of interest in (13) is which measures the change in F&V purchases due to

increased availability, as expressed in hypothesis 1. Statistically insignificant would indicate

that making new Supercenters available would not induce more F&V purchase or consumption.

An alternative data structure is panel when households in both 2005 and 2006 cross sections

are retained (Wooldridge, 2007, p. 265). In this setting, the ‘treatment’ is the increase in

availability for some households from 2005 to 2006 either due to increase in the number of

Supercenters in the county or due to moving to a new county with a higher number of

Supercenters. Therefore the treatment is at the household level – it is possible that a household in

a county is ‘treated’ while the rest of the households in the county are not. These households are

the treatment group, the rest of the households are the control group.

The variable that measures this treatment is ‘Better Sup’. A similar variable – ‘Better QS’,

indicates a better access to quick service restaurants. The same set of identifying controls is

added to the modified model (3). The parameter of interest in (14) is again .

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

To test hypothesis 2 of this research we turn to the triple difference models. The DDD

models are further fine-tuning the parameter estimates of interest to allow for the food desert

interpretation. The intention is to check if the starting point makes a difference. In other words,

33

adding the eleventh Supercenter to a county is going to have the same effect on F&V

consumption as adding the first to a similar county? It is important to realize that this hypothesis

builds on the first hypothesis – does increased availability translate into increased consumption?

Hence the triple difference specification that compares the treated group in 2006 that is also

underserved to the rest of the sample. The corresponding modifications to DD models therefore

are estimated as

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

The parameter of interest in (15) and (16) is which measures the relative increase or

decrease in F&V if the treatment is administered in 2006 in under-served areas. The models in

both settings were analyzed using difference-in-difference type fixed-effect OLS regression

estimations. For panel data, in the special case of only two time periods the difference-in-

difference and fixed effect OLS estimates are identical (Wooldridge, 2002).

34

Data and Summary Statistics

We draw on 2005 and 2006 County Business Patterns and Population Estimates, U.S. Census

Bureau, to delineate the food retail environment and the population/area estimates for the

geographical units in our analysis. We then align the information on food access with actual

household purchase data from the Nielsen panel from the same areas or counties for the same

period.

Although the Nielsen HomeScan data are collected from a nationally representative cross

section of households over time, the data set is referred to as panel as a sizable portion of the

households continue the membership from one year to the next. There were 38,802 and 37,719

households participating in the Nielsen panel for 2005 and 2006, respectively. Altogether there

were 46,301 households that participated in one year or both. A panel of 30,269 households or

approximately 78% and 80% of 2005 and 2006 cross-sections, respectively, participated in both

years.

The food accessibility data, obtained from the County Business Patterns, U.S. Census

Bureau, include the number of establishments of the following store formats: supercenters and

price clubs (hereafter Supercenters, including North American Industry Classification System

(NAICS) codes 44511 and 452910) and quick-service restaurants (hereafter QS, including

NAICS code 7222) for 3,091 counties. For each county a binary variable ‘Improved

Supercenters’ was created that equals to unity if the number of establishments (NAICS codes

44511 and 452910) has increased from 2005 to 2006. A similar variable was created for the

quick-service restaurants – ‘Improved QS’. Many studies on food deserts use commuting

distance of 10 miles or more as an indicator for food deserts. Following these studies, we

identified counties that have 1 or less supercenters per 31411

square miles as ‘Underserved’.

An indicator variable ‘Better Sup’ was created to indicate if the household exposed to better

retail conditions in 2006 due to increase in the number of Supercenters in the residence county

from 2005 to 2006, or due to moving to a new county with a higher number of Supercenters in

2006. A similar variable was created to indicate an increased access to QS restaurants – ‘Better

QS’. In case the households moved, the Supercenters and QS variables were adjusted by

11

An alternative definition of ‘Underserved’ – differentiated by Metropolitan or Micropolitan Areas, with 1and 3

miles or more for metro areas and 10 miles for non-metro areas, was used for the robustness check of the results.

The results were close to the reported ones and, therefore, are not reported in the Results section.

35

expressing the levels by 100 square miles. A binary variable ‘Moved’ was created to capture the

change due to moving to a new place. Quantity and Price measure the quantity in ounces and

price per ounce of fresh and processed F&V.

The descriptive statistics of the full set of variables for the panel are in Table 6. Households

in the panel sample purchase slightly more F&V than households in the cross-sectional sample in

general. In both samples the purchase quantity decreased from 2005 to 2006, while the mean

price increased from $0.100/oz to $0.106/oz. In 2006 approximately 80% of panel sample is in

the highest income groups, with 9% being in the lowest income groups.

Approximately 33% of the sample in both years resided in counties that experienced an

increase in Supercenters in 2006. 33% of households were exposed to increased Supercenters

availability due to either increase in the number of establishments in their respective counties (if

stayed) or due to moving to counties with higher number of establishments. Approximately 15%

of households reside in counties that were under-served in 2005.

The demographic information indicates that approximately 59% of households were married,

with 2.3 members, on average. 37% and 36% of female household heads are full-time employed

in the cross-section and panel samples, respectively. The full-time employment rates for male

household heads are slightly lower in the panel sample than in the cross-sectional sample with

both declining in 2006. The educational attainment for female household heads is consistently

higher than that of male household heads. In 2005 and 2006, 79.3% and 80.6% of households in

the panel sample had no children under 18 in the household.

36

Table 6. Descriptive Statistics of Variables in the Cross-Sectional and Panel Samples.

Variable

Control (20,165 HHDs)

Treatment

(10,104 HHDs)

33.38%

Mean Std Mean Std

F&V Quantity (annual, oz) 1282.500 967.711 1289.250 1003.730

Price (average, oz) 0.287 0.116 0.299 0.132

Poverty Income Ratio (PIR) 3.841 2.461 4.111 2.587

Income

Income 1 (= 1 if PIR < 1.30) 0.078 0.268 0.070 0.256

Income 2 (= 1 if 1.30 ≤ PIR < 1.85) 0.135 0.341 0.115 0.319

Income 3 (= 1 if 1.85 ≤ PIR < 2.50) 0.133 0.340 0.115 0.320

Income 4 (= 1 if 2.50 ≤ PIR < 4.00) 0.266 0.442 0.263 0.440

Income 5 (= 1 if 4.00 ≤ PIR ) 0.388 0.487 0.436 0.496

Treatment

Better Conv (= 1 if in 2006 hhd has access to

increased number of Convenience)

0.172

0.377

0.144

0.351

Better Spec (= 1 if in 2006 hhd has access to

increased number of Specialty)

0.286

0.452

0.443

0.497

Better FS (= 1 if in 2006 hhd has access to

increased number of FS)

0.527

0.499

0.654

0.476

Better QS (= 1 if in 2006 hhd has access to

increased number of QS)

0.591

0.492

0.683

0.465

Underserved (= 1 if county has less than 1

Supercenters per 314 sq. mile)

0.819

0.385

0.724

0.447

Married (= 1 if married) 0.590 0.492 0.581 0.493

Household Size 2.257 1.234 2.251 1.227

Moved 0.031 0.174 0.014 0.116

Female Head Age

No Female Head 0.103 0.304 0.111 0.314

Under 45 Years 0.180 0.384 0.196 0.397

45-54 Years 0.235 0.424 0.245 0.430

55+ Years 0.483 0.500 0.448 0.497

Male Head Age

No Male Head 0.276 0.447 0.277 0.448

Under 45 Years 0.140 0.347 0.156 0.363

45-54 Years 0.192 0.394 0.196 0.397

55+ Years 0.392 0.488 0.371 0.483

37

Table 6. – Continued.

Variable

Control (20,165 HHDs)

Treatment

(10,104 HHDs)

33.38%

Mean Std Mean Std

At least one child under 18 0.020 0.141 0.025 0.155

Female Head Employment

Part time 0.144 0.351 0.142 0.349

Full time 0.356 0.479 0.366 0.482

Male Head Employment

Part time 0.051 0.219 0.051 0.220

Full time 0.415 0.493 0.433 0.496

Female Head Education

High school or less 0.300 0.458 0.273 0.446

Some college or more 0.597 0.490 0.616 0.486

Male Head Education

High school or less 0.242 0.429 0.219 0.414

Some college or more 0.482 0.500 0.504 0.500

Hispanic 1.938 0.241 1.951 0.216

Race non-white 0.148 0.355 0.182 0.386

Results

Table 7 shows the OLS estimates from the panel analysis. The results for DD and DDD

specifications are presented for the pooled sample (columns 1 and 2) and for subsamples by

income group (columns 3 to 8). The dependent variable is the natural logarithm of the number of

ounces of F&V purchased. All standard errors are corrected for heteroskedasticity, but not for

intragroup correlation (Donald and Lang, 2007; Moulton, 1990; Moulton, 1986; Conley and

Taber, 2011).

In general, the results do not support either hypothesis this research addresses. The results

demonstrate that there is no indication that food availability has any causal effect on the food

choice.

38

Table 2. Impact of Improved Grocery Access on Household Fruit and Vegetable Purchases – Difference-in-Difference and

Triple Difference Analysis of Panel Data.

Full Sample Income 1

PIR < 1.30

Income 2

1.30 ≤ PIR < 1.85

Income 3

1.85 ≤ PIR < 2.50

Income 4

2.50 ≤ PIR < 4.00

Income 5

4.00 ≤ PIR

DD

(1)

DDD

(2)

DD

(3)

DDD

(4)

DD

(5)

DDD

(6)

DD

(7)

DDD

(8)

DD

(9)

DDD

(10)

DD

(11)

DDD

(12)

(Year_2006)

(Better Supercenters)

-0.032

(0.022)

-0.048

(0.045)

-0.043

(0.102)

0.022

(0.070)

-0.010

(0.059)

0.021

(0.054)

-0.021

(0.062)

0.033

(0.168)

-0.040

(0.041)

-0.096

(0.086)

-0.031

(0.032)

-0.043

(0.067)

(Year_2006)

(Better Convenience)

-0.028

(0.025)

-0.028

(0.025)

-0.091

(0.090)

-0.090

(0.053)

-0.002

(0.079)

-0.003

(0.042)

-0.118

(0.071)

-0.118

(0.071)

-0.048

(0.048)

-0.046

(0.048)

0.007

(0.037)

0.007

(0.037)

(Year_2006)

(Better Specialty)

0.019

(0.021)

0.019

(0.021)

-0.079

(0.077)

-0.080

(0.045)

0.007

(0.060)

0.006

(0.034)

0.040

(0.066)

0.036

(0.066)

0.089*

(0.040)

0.091*

(0.040)

0.004

(0.030)

0.004

(0.030)

(Year_2006)

(Better FS)

0.004

(0.019)

0.004

(0.019)

0.013

(0.077)

0.012

(0.042)

0.004

(0.049)

0.004

(0.031)

0.021

(0.061)

0.017

(0.060)

0.021

(0.037)

0.022

(0.037)

-0.021

(0.029)

-0.022

(0.029)

(Year_2006)

(Better QS)

0.022

(0.020)

0.013

(0.044)

0.100

(0.075)

0.046

(0.067)

0.053

(0.050)

0.078

(0.051)

-0.032

(0.064)

-0.106

(0.167)

-0.002

(0.038)

-0.014

(0.081)

0.019

(0.031)

0.010

(0.068)

(Year_2006)

(Underserved)

(Better Supercenters)

0.020

(0.051)

-0.086

(0.092)

-0.041

(0.068)

-0.072

(0.177)

0.074

(0.096)

0.016

(0.075)

(Year_2006)

(Underserved)

(Better QS)

0.010

(0.024)

0.071

(0.084)

-0.031

(0.063)

0.095

(0.176)

0.013

(0.091)

0.010

(0.076)

(Year_2006)

(Better Supercenters)

(Moved)

-0.022

(0.044)

0.076

(0.152)

-0.020

(0.107)

-0.087

(0.103)

0.108

(0.088)

-0.040

(0.077)

F-Value 5.05

(<.0001)

5.04

(<.0001)

2.43

(<.0001)

2.42

(<.0001)

2.09

(<.0001)

2.08

(<.0001)

1.92

(<.0001)

1.92

(<.0001)

2.85

(<.0001)

2.85

(<.0001)

3.13

(<.0001)

3.12

(<.0001)

R-Square 0.1717 0.1727 0.4459 0.4459 0.3259 0.3259 0.3150 0.3151 0.2543 0.2544 0.1880 0.1881

Adj R-Sq 0.1385 0.1384 0.2625 0.2619 0.1696 0.1693 0.1511 0.1508 0.1651 0.1651 0.1279 0.1278

Sample Size 60,528 60,528 4,561 4,561 7,753 7,753 7,697 7,697 16,037 16,037 24,480 24,480

39

The parameter estimates of interest in the panel models, namely the parameter estimates to

variables that indicate (i) the households that have better access in 2006 than in 2005 purchase

more, and (ii) the households that are from underserved areas and have better access in 2006 than

in 2005 purchase more, are presented in Table 3. The results indicate that none of and are

significant in any of the models (Table 7).

Based on the estimation results from the pooled sample and Income 1 group subsample,

households that were exposed to better retail environment due to moving purchase significantly

more (2.29oz) compared to the control group. So did households from counties that experienced

increase in Supercenters and QS outlets (pooled sample and Income 1 subsample only). In the

pooled model households that resided in under-served areas and that faced increased QS

availability purchased significantly less F&V ( ).

Concluding Remarks

The literature findings on the linkage of diets with metabolic diseases motivated a host of

empirical studies seeking to shed light on the reasons giving rise to particular food choices or

patterns of it. A favorite factor considered by the professional community and public press is the

neighborhood food environment or food availability. Despite the logical appeal, there does not

seem to be conclusive evidence in the literature on the existence or the nature of such linkage.

This research aims to shed light on some aspects of this question. In particular we seek to answer

two questions: (1) is there a positive relationship between the number of stores available in each

location and F&V consumption by the households in that location? and (2) is there a pronounced

relationship in low-income, under-served neighborhoods?

We use national level purchase data for two years – 2005 and 2006, in a difference-in-

difference type fixed effects OLS estimation to model the relationship between food availability

and food choice. The analyses were replicated for cross-sectional and panel samples from these

two years. Our results suggest no statistically significant association between food access and

food choice. No statistical significance emerged when estimating the same models for

subsamples determined by different income levels. Based on the current results, the conclusion

we offer is that the objective of improving the population’s diet through increasing F&V

consumption may be attained through increasing food retail outlet availability.

40

The shortcomings of this research may motivate the need for the future research that will

help with a more comprehensive analysis. The current research disregards the bordering effects –

it views counties separately, not in the context of clusters. The replication of the current analysis

with Metropolitan Statistical Areas as the unit of residence may help with confirming the

robustness of our findings. Given the separate findings in the literature, it also may prove helpful

to conduct this analysis separately for fruits and vegetable, and separately for fresh and non-fresh

produce.

Another shortcoming that could possibly confound the true effects is disregarding the

magnitude or the length of the treatment – we account for an increase, but not the size of the

increase, or the time span of the exposition to the supply shock. Our indication variable for the

increase does not account for how many more stores were added to the neighborhood, or how big

the new stores are. The current data do not allow for estimating the improvement of the current

stores as well, which may be useful information to account.

41

VI. The Effect of Food Store Access and Income on Household Food Purchases: An AIDS

Demand Analysis

This Section builds on Section IV above by extending the analysis beyond F&Vs. Our aim is to

build a demand model for major food groups mentioned in the Dietary Guidelines for a balanced

mixture of healthy and unhealthy foods. The food groups considered for this analysis are fruits,

vegetables, whole grains, milk, meats (not including fish and shellfish), eggs, oils and fats,

carbonated soft drinks and other sugary drinks, snacks and alcohol. The focus on a basket of

foods rather than any particular food is noteworthy for explaining the maximum impact of supply

shocks on diet in general.

Model

The Almost Ideal Demand System (AIDS) can be expressed as follows:

(

) ( )

where is the budget share of the ith

food, , is the price of kth

food, is the

expenditure on all foods, ( ) ∑ ∑

.

The store density variables are appended to (1) through a linear translation

( )

where is the number of retail outlets of type , .

The restrictions for additivity, symmetry and homogeneity were imposed. The Stone price index

was constructed as a linear approximation of the AIDS price index. Finally, block separability

was assumed to estimate the demand system (1).

42

Data and Summary Statistics

We draw on 2006 County Business Patterns and Population Estimates, U.S. Census Bureau, to

delineate the food retail environment and the population/area estimates for the geographical units

in our analysis. We then align the information on food access with actual household purchase

data from the Nielsen panel from the same areas or counties for the same period. Purchase prices

and expenditures on 10 food groups for 37,520 households in the Nielsen panel for 2006 were

used for the analysis. The shares of expenditure on each food group for the pooled sample and

subsamples by income groups are presented in Table 1. The households in the high income

subsample (PIR > 4.00) spend a larger proportion of their expenditures on fruit, vegetables,

whole grains and alcohol. The subsample in the lowest income group (PIR<1.30) spends larger

proportion of their expenditures on all other food groups the subsample. Similarly the

Table 8. Shares of Food Expenditures and Food Access by Income Group.

Food Group

Full Sample Income 1

PIR < 1.30

Income 2

1.30≤PIR<1.85

Income 3

1.85≤PIR<2.50

Income 4

2.50≤PIR<4.00

Income 5

4.00 ≤ PIR

Mean Food Expenditure Shares

Fruits 0.101 0.087 0.100 0.097 0.098 0.107

Vegetables 0.090 0.085 0.090 0.091 0.090 0.091

Whole Grains 0.031 0.026 0.031 0.030 0.031 0.032

Milk 0.070 0.080 0.078 0.076 0.071 0.064

Meats 0.128 0.140 0.129 0.132 0.130 0.123

Eggs 0.015 0.017 0.017 0.016 0.015 0.014

Oils 0.040 0.042 0.044 0.042 0.040 0.038

Sugary Drinks 0.119 0.137 0.121 0.120 0.120 0.115

Snacks 0.319 0.332 0.329 0.324 0.325 0.309

Alcohol 0.086 0.056 0.060 0.072 0.079 0.108

Retail Food Density

Supermarket 20.48 16.87 16.01 17.15 19.96 23.93

Convenience 2.47 2.12 1.98 2.18 2.38 2.82

Specialty 5.72 4.56 4.47 4.75 5.52 6.77

Full Service 125.07 99.23 99.19 107.17 121.97 145.53

Quick Service 210.58 168.09 167.39 182.40 204.51 244.58

43

subsample in the highest income group has more access to all kinds of retail food outlets

compared to the lowest income group.

Results

Table 9 shows the estimates from the AIDS analysis. The results indicate that there is a mild

increase in the elasticity of demand for fruits and whole grains per 1% increase in supermarkets

for the pooled sample. The results for subsamples by income group indicate that the effect is

entirely due to the positive significant elasticity apparent in the highest income group. The

demand for meat in the pooled sample and income subsamples demonstrate positive significant

elasticities of supermarket density. This indicates that a 1% increase in supermarkets in the

county induces positive proportional increase in the meat demand for households regardless of

income groups. A 1% increase in supermarket availability for the households in the lowest

Table 9. Supermarket Elasticities of Food Groups by Income Groups and for the Pooled

Sample.

Food Group

Full Sample Income 1

PIR < 1.30

Income 2

1.30≤PIR<1.85

Income 3

1.85≤PIR<2.50

Income 4

2.50≤PIR<4.00

Income 5

4.00 ≤ PIR

Fruits

0.0135*

(0.0058)

-0.0194

(0.0222)

-0.0018

(0.0153)

-0.0062

(0.0144)

0.0037

(0.0115)

0.0329**

(0.0096)

Vegetables

0.0075

(0.0046)

-0.0166

(0.0166)

0.0045

(0.0121)

0.0150

(0.0115)

0.0138

(0.0090)

0.0102

(0.0075)

Whole Grains

0.0235**

(0.0086)

-0.0052

(0.0335)

0.0220

(0.0236)

0.0189

(0.0210)

0.0337*

(0.0169)

0.0237

(0.0138)

Milk

-0.0046

(0.0063)

0.0074

(0.0208)

0.0282

(0.0160)

-0.0280

(0.0146)

-0.0084

(0.0121)

0.0049

(0.0108)

Meats

0.0479**

(0.0054)

0.0669**

(0.0175)

0.0418**

(0.0135)

0.0382**

(0.0123)

0.0576**

(0.0104)

0.0558**

(0.0094)

Eggs

0.0083

(0.0072)

0.0007

(0.0283)

0.0466*

(0.0183)

-0.0106

(0.0160)

0.0392**

(0.0140)

-0.0043

(0.0122)

Oils

0.0110

(0.0057)

-0.0192

(0.0214)

-0.0077

(0.0157)

0.0305*

(0.0137)

0.0230*

(0.0108)

0.0167

(0.0095)

Sugary Drinks

0.0018

(0.0061)

0.0474*

(0.0205)

0.0179

(0.0166)

-0.0116

(0.0147)

-0.0154

(0.0119)

0.0008

(0.0101)

Snacks

-0.0058

(0.0031)

-0.0329**

(0.0103)

-0.0181*

(0.0081)

0.0021

(0.0073)

0.0038

(0.0059)

-0.0088

(0.0054)

Alcohol

-0.0869**

(0.0127)

-0.0264

(0.0541)

-0.0865*

(0.0415)

-0.0649

(0.0345)

-0.1313**

(0.0266)

-0.0952**

(0.0188)

Sample Size 37,520 2,825 4,613 5,272 9,376 15,433

44

income group increase and decrease their demand for carbonated sweet drinks and snacks,

respectively. The highly significant negative elasticity of the demand for alcohol in the pooled

sample is largely driven by that of the demand of the highest income groups. The demand for

vegetables and milk seems to be non-responsive to changes in supermarket availability.

Concluding Remarks

Our results suggest that the demand for fruits and whole grains responds positively to changes in

the supermarket availability only in the high income subsample, leaving the poor largely non-

responsive to the changes in the retail environment. The demand for vegetables and milk is

invariant to changes in the supermarket availability regardless of income level. The demand for

carbonated soft drinks in the lowest income group increases with the increased supermarket

availability. The effects on the demand for snacks are negative indicating that the households in

the lowest income group would actually decrease their demand for snacks as a response to a 1%

increase in the supermarket availability.

From these results it would appear that an increase in the supermarket availability would

induce increased healthy food (fruits and whole grains) and decrease ‘unhealthy’ food (alcohol)

demands in the high income groups. In the low income groups, a 1% increase in the supermarket

availability appears to have no effect on any healthy food demand, hand have ambiguous effects

on unhealthy food (carbonated soft drinks and other sugary fruit drinks, and sweet and non-sweet

snacks) demand. These suggest that increased availability will possibly have ambiguous effects

on diet, especially that of the population in the low income groups.

45

VII. Conclusion

Limited access to healthy food has been linked to poor diets and adverse health effects

particularly in underserved and predominantly poor communities. The objective of this research

is to identify the influence of food access factors on healthy food purchases using retail food

availability data, detailed purchase scanner data. We demonstrate that the densities of

supermarkets in Metropolitan Statistical Areas or counties induce significant positive effects in

‘healthy’ food demand in the high income groups. The effect on the ‘healthy’ food purchases by

low income population is insignificant. On the other hand, the effects of increased availability on

‘unhealthy’ food consumption are ambiguous for the low income population.

The current findings of this project suggest that the objective of improving diet may be

unattainable by solely increasing supermarket or other large retailer availability, especially for

the low income population. To conclusively answer this question more research is needed that

would account for the length and magnitude of increased availability.

46

References

Beaulac, J., E. Kristjansson, and S. Cummins. 2009. A Systematic Review of Food Deserts,

1966-2007. Preventing Chronic Disease 6(3):A105-A115.

Bitler, M., and S.J. Haider. 2011. An Economic View of Food Deserts in the United States.

Journal of Policy Analysis and Management 30(1):153-176.

Blanchard, T.C., and T.A. Lyson. 2002. Retail Concentration, Food Deserts, and Food

Disadvantaged Communities in Rural America. Article ed. Mississippi State University.

Bodor, J.N., D. Rose, T.A. Farley, C. Swalm, and S.K. Scott. 2007. Neighborhood Fruit and

Vegetable Availability and Consumption: the Role of Small Food Stores in an Urban

Environment. Public Health Nutrition 11(4):413-420.

Broda, C., E. Leibtag, and D.E. Weinstein. 2009. The Role of Prices in Measuring the Poor’s

Living Standards. Journal of Economic Perspectives 23(2): 77-97.

Chintagunta, P.K., D.C. Jain, and N.J. Vilcassim. 1991. Investigation Heterogeneity in Brand

Preferences in Logit Models for Panel Data. Journal of Marketing Research 28: 417-428.

Conley, T.G., and C.R.Taber. 2011. Inference with “Difference in Differences” with Small

Number of Policy Changes. The Review of Economics and Statistics 93(1): 113-125.

Courtemanche, C., and A. Carden. 2011. Supersizing Supercenters? The Impact of Walmart

Supercenters on Body Mass Index and Obesity. Journal of Urban Economics 69: 165-181.

Cummins, S., M. Petticrew, C. Higgins, A. Findlay, and L. Sparks. 2005. Large Scale Food

Retailing as an Intervention for Diet and Health: Quasi-Experimental Evaluation of a Natural

Experiment. Journal of Epidemiology and Community Health 59: 1035-1040.

Donald, S.G., and K. Lang. 2007. Inference with Difference-in-Difference and Other Panel Data.

The Review of Economics and Statistics 89(2): 221-233.

Dong, D., and H. Stewart. 2012. Modeling a Household’s Choice among Food Store Types.

American Journal of Agricultural Economics 94(3): 702-717.

Fader, P.S., and J.M. Lattin. 1993. Accounting for Heterogeneity and Nonstationarity in a Cross-

Sectional Model of Consumer Purchase Behavior. Marketing Science 12(3): 304-317.

Fader, P.S., J.M. Lattin, and J.D.C. Little. 1992. Estimating Nonlinear Parameters in the

Multinomial Logit Model. Marketing Science 11(4): 372-385.

Guadagni, M.P., and J.D.C. little. 1983. A Logit Model of Brand Choice Calibrated on Scanner

Data. Marketing Science 2(3): 203-238.

Hausman, J., and E. Leibtag. 2004. CPI Bias from Supercenters: Does the BLS Know That Wal-

Mart Exists?” NBER Working Paper 10712.

Hausman, J., and E. Leibtag. 2007. Consumer Benefits from Increased Competition in Shopping

Outlets: Measuring the Effect of Wal-Mart. Journal of Applied Econometrics 22: 1157-1177.

47

Hellerstein, J., D. Neumark, and M. McIrerney. 2008. Spatial Mismatch or Racial Mismatch?

Journal of Urban Economics 64: 464-479.

Holmes, T.J. 2011. The Diffusion of Wal-Mart and Economies of Density. Econometrica 79(1):

253-302.

Kyureghian, G., O. Capps Jr., and R. M. Nayga, Jr. 2011. A Missing Variable Imputation

Methodology with an Empirical Application. Advances in Econometrics 27: 313-337.

Kyureghian, G., and R.M. Nayga Jr. 2012(a). The Role of Food Access in Meeting Some

Dietary Guidelines: A Natural Experiment. Selected Paper for 2012 Agricultural & Applied

Economics Association Annual Meeting.

Kyureghian, G, and R.M. Nayga Jr. 2012(b). Food Store Access, Availability and Choice When

Purchasing Fruits and Vegetables. American Journal of Agricultural Economics. Forthcoming.

Kyureghian, G., R.M. Nayga, Jr., and S. Bhattacharya. 2012. The Effect of Food Store Access

and Income on Household Purchases of Fruits and Vegetables: A Mixed Effect Analysis at the

County and MSA levels. Agricultural Economic Perspectives ad Policy. Forthcoming.

Laraia, B.A., A.M. Siega-Riz, C. Gundersen, and N. Dole. 2006. Psychosocial Factors and

Socioeconomic Indicators Are Associated with Household Food Insecurity among Pregnant

Women. The Journal of Nutrition 136: 177-182.

Larsen, K., and J. Gilliland. 2009. A Farmers’ Market in a Food Desert: Evaluating Impacts on

the Price and Availability of Healthy Food. Health & Place 15: 1158-1162.

Larson, N.I., M.T. Story, and M.C. Nelson. 2009. Neighborhood Environments: Disparities in

Access to Healthy Foods in the U.S. American Journal of Preventative Medicine 36(1):74-

81.e10.

Liese A., Battersby S., and Bell B. 2012. Identifying Food Deserts in the Rural South: A

Comparison of Food Access Measures. Food Assistance Programs, Food Access and Choices,

and Obesity and Other Health Outcomes. Research Innovation and Development Grants in

Economics (RIDGE) Conference, Washington, DC, October 2012.

Littell, R.C., G.A. Milliken, W.W. Stroup, R.D. Wolfinger, and O. Schabenberger. 2006. SAS

for Mixed Models. Second Edition. Cary, NC, SAS Institute Inc.

McFadden, D. 1973. Conditional Logit Analysis of Qualitative Choice Behavior. in Frontiers in

Econometrics, ed. by P. Zarembka. New York: Academic Press, 105-142.

Michimi, A., and M.C. Wimberly. 2010. Associations of Supermarket Accessibility with Obesity

and Fruit and Vegetable Consumption in Conterminous United States. International Journal of

Health Geographics 9(49):1-14.

Moulton, B.R. 1990. An Illustration of a Pitfall in Estimating the Effect of Aggregate Variables

on Micro Units. The Review of Economics and Statistics 72: 334-338.

Moulton, B.R. 1986. Random Group Effects and the Precision of Regression Estimates. Journal

of Econometrics 32: 385-397.

48

Neckerman, K. M. Bader, M. Purciel, and P. Yousefzadeh. 2009. Measuring Food Access in

urban areas. Accessed at: http://www.npc.umich.edu/news/events/food-access/index.php.

Nevo, A. 2001. Measuring Market Power in the Ready-to-Eat Cereal Industry. Econometrica 69

(2): 307-342.

Pearson, T., J. Russell, M.J. Campbell, and M.E. Barker. 2005. Do ‘Food Deserts’ Influence

Fruit and Vegetable Consumption? – A Cross-Sectional Study. Appetite 45:195-197.

Rose, D. M., J.N. Border, C. M. Swalm, J.C. Rice, T.A. Farley, and P.L. Hutchinson. 2009.

Deserts in Orleans? Illustrations of Urban Food Access and Implications for Policy. Accessed at:

http:// www.npc.umich.edu/news/events/food-access/index.php.

Rose, D., and R. Richards. 2004. Food Store Access and Household Use among Participants in

the US Food Stamp Program. Public Health Nutrition 7(8):1081-1088.

Searle, S. R., Casella, G., and McCulloch, C. E. 1992. Variance Components, New York: John

Wiley & Sons.

Sharkey, J.R., and S. Horel. 2009. Characteristics of Potential Spatial Access to a Variety of

Fruits and Vegetables in a Large Rural area. Accessed at: http://

www.npc.umich.edu/news/events/food-access/index.php.

Sharkey, J.R., S. Horel, W.R. Dean. 2010. Neighborhood Deprivation, Vehicle Ownership, and

Potential Spatial Access to a Variety of Fruits and Vegetables in a Large Rural Area in Texas.

International Journal of Health Geographics 9:26.

Staus, A. 2009. Determinants of Store Type Choice in the Food Market for Fruits and

Vegetables. International Journal of Arts and Sciences 3(2): 138-174.

United States Department of Agriculture. 2009. Access to Affordable and Nutritious Food:

Measuring and Understanding Food Deserts and Their Consequences. Report to Congress.

Accessed at: http://www.ers.usda.gov/publications/AboutPDF.

Wooldridge, J. M. 2002. Econometric Analysis of Cross Section and Panel Data. Massachusetts

Institute of Technology Press.

Xu, L. and J. Shao. 2009. Estimation in Longitudinal or Panel Data Models with Random-Effect-

Based Missing Responses, Biometrics 65, 1175-1183.