customer satisfaction in retail financial services
Post on 16-Oct-2014
55 Views
Preview:
TRANSCRIPT
On the Meaning of Customer Satisfaction A Study in the Context of Retail Banking
Maarten Terpstra
Printed by: Offsetdrukkerij Ridderprint B.V., Ridderkerk ISBN/EAN: 978-90-5335-171-0 Copyright: © Maarten Terpstra
On the Meaning of Customer Satisfaction A Study in the Context of Retail Banking
Proefschrift
ter verkrijging van de graad van doctor aan de Universiteit van Tilburg, op gezag van de
rector magnificus, prof. dr. F.A. van der Duyn Schouten, in het openbaar te verdedigen ten
overstaan van een door het college voor promoties aangewezen commissie in de aula van de
Universiteit op vrijdag 14 november 2008 om 14.15 uur door
Maarten Jan Terpstra
geboren op 24 augustus 1969 te Boxmeer
Promotores: Prof. dr. A.A.A. Kuijlen
Prof. dr. K. Sijtsma
Preface
There exists confusion about the meaning of psychological properties. This is because a
psychological property is not a thing within a person, but an organisational principle with
respect to behaviour of persons. This may sound odd, but it means that a psychological
property is a theoretical concept which we use to interpret and describe behaviour of persons.
The latter years I studied the meaning of the psychological property satisfaction. The results
of the study are reported in this thesis.
First of all, I want to express my gratitude to my promotores Ton Kuijlen and Klaas
Sijtsma. They taught me how to do scientific research, they helped and inspired me, and I
have enjoyed our cooperation. I am also grateful to ING for facilitating my study, and to
many colleagues from ING for their support and interest in my results. Furthermore, I thank
Tom Breur for his support and his feedback on the many drafts he read. Most of all, I thank
Monique for her confidence in me finishing my study and for her unconditional support
throughout the years.
Contents
Chapter 1 Introduction 1
Chapter 2 Measurement of psychological constructs 11
Chapter 3 The theoretical meaning of customer satisfaction 33
Chapter 4 Deductive design for test development and construct
validation
65
Chapter 5 Method of the first empirical study into customer satisfaction
with BANK
81
Chapter 6 Results of the first empirical study into customer satisfaction
with BANK
97
Chapter 7 Method of the second empirical study into customer
satisfaction with BANK
151
Chapter 8 Results of the second empirical study into customer
satisfaction with BANK
159
Chapter 9 General discussion 183
References 191
Samenvatting (Summary in Dutch) 205
Appendices 211
Chapter 1
Introduction
1 Introduction
Satisfaction is an important concept in societal contexts, business contexts, and academic
contexts. This is evidenced by the vast amount of studies that were conducted with respect to
satisfaction in various contexts. Ironically, satisfaction seems to be a somewhat elusive
phenomenon. It is as Oliver (1997, p. 13) noted: ‘Everyone knows what satisfaction is, until
asked to give a definition. Then it seems, nobody knows.’ This warrants further research into
the meaning of satisfaction.
The subject of this thesis is the unravelling of the meaning of customer satisfaction in
the context of retail banking. The phrase meaning of customer satisfaction has multiple
connotations. In this thesis, it refers to (a) the linguistic use of the term customer satisfaction,
(b) the theoretical framework of customer satisfaction, (c) the empirical indicators of
customer satisfaction, and (d) the importance of customer satisfaction in the domain of retail
banking. The thesis includes a theoretical study of customer satisfaction and an empirical
study into customer satisfaction with a major Dutch retail bank.
2 A typology of satisfaction studies
Satisfaction was studied in various settings and at various levels of aggregation (e.g., Oliver,
1997, pp. 15-17). This is reflected by the use of different terms, such as job satisfaction, life
satisfaction, consumer satisfaction, customer satisfaction, transaction-specific satisfaction,
attribute satisfaction, service satisfaction, summary satisfaction, and aggregated satisfaction.
The types of satisfaction are mutually related by what Wittgenstein (1953) labeled family
resemblances, meaning that they are mutually related in diverse ways. For example, consumer
satisfaction and customer satisfaction are closely related since both pertain to the satisfaction
response to consumption-related experiences, and these two terms were used more or less
interchangeably in the marketing literature (e.g., Giese & Cote, 2000). However, because
customer satisfaction is only appropriate for satisfaction in commercial contexts and
consumer satisfaction may also be used for satisfaction in other contexts, the domain of the
consumer satisfaction is larger than the domain of customer satisfaction.
1
There are also differences within each type of satisfaction with respect to the
characteristics of the satisfaction response. For example, the consumer satisfaction response
to dinner in a restaurant differs from the consumer satisfaction response to dental treatment.
Whereas the former satisfaction response may encompass a feeling of pleasure, the latter
satisfaction response may encompass a feeling of relief. Furthermore, a consumer satisfaction
response may reflect anhedonic cognitions (Oliver, 1997, p. 318), meaning that it reflects
cognitions that are not emotionally processed. An example is the consumer satisfaction
response to using a pencil.
It is useful to examine the difference between two types of satisfaction studies, which
are (a) studies that are conducted at the individual person level, and (b) studies that are
conducted at higher levels of aggregation. The first type of satisfaction studies is characterised
by analyses of person data. These are, for example, studies of satisfaction of persons with
single encounters with a phenomenon (i.e., transaction-specific satisfaction; Oliver 1997, p.
15), or studies of satisfaction of persons with the accumulation of encounters with a
phenomenon (i.e., summary satisfaction; Oliver, 1997, p. 15).
The second type of satisfaction studies is conducted at higher levels of aggregation, such
as a firm, an industry, or a society (Oliver, 1997, p. 15). These studies are characterised by the
analysis of satisfaction data that are aggregated at the level of firms, industries, or societies.
For example, several theorists (e.g., Anderson, Fornell, & Lehmann, 1994; Anderson, Fornell,
& Mazvancheryl, 2004; Anderson & Mittal, 2000; Gruca & Rego, 2005) used satisfaction
data at the firm level to study the connections between satisfaction and economic performance
of firms.
Thus, there are different types of satisfaction and different types of satisfaction studies.
The present satisfaction study is conducted at the individual person level, and is limited to
persons’ summary satisfaction with a company of which they are customers. We refer to this
kind of satisfaction as customer satisfaction (see also Chapter 3).
3 Satisfaction research in the marketing domain
Satisfaction is an important concept in marketing theory. Consequently, there is a vast amount
of studies into satisfaction in the marketing literature (e.g., Giese & Cote, 2000; Oliver, 1997;
Yi, 1990). Most studies dealt with satisfaction of consumers or customers with products or
services or companies providing products or services. In these studies, satisfaction is often
2
labeled consumer satisfaction or customer satisfaction, and is often measured by means of a
psychological test that is administered in survey research (Section 4).
Marketing theorists generally agree that satisfaction is a response to consumption-
related experiences (e.g., Anderson, Fornell & Lehmann, 1994; Giese & Cote, 2000; Oliver,
1997; Tse & Wilton, 1988; Yi, 1990). Still, there exist a variety of definitions and measures of
satisfaction in academic marketing research (e.g., Giese & Cote, 2000; Peterson & Wilson,
1992). Furthermore, the term satisfaction is sometimes applied to antecedents and sometimes
to consequences of satisfaction (Oliver, 1997, p. 15). The measurements of these antecedents
and consequences are sometimes used as proxies for satisfaction. Examples of concepts used
as proxies for satisfaction are quality perceptions, recommendation intentions, loyalty,
behaviour, and profits. Although these concepts may serve the purpose of specific studies,
they do not coincide with satisfaction. The use of these concepts as proxies for satisfaction
has contributed to the confusion about the meaning of satisfaction (Oliver, 1997, pp. 15-17).
On the basis of a review of the literature, Giese and Cote (2000) demonstrated a number
of deficiencies in the definition and measurement of satisfaction in studies that were
conducted in the last three decades. These deficiencies pertain to (a) the explication of the
definition of satisfaction, (b) the justification of the definition of satisfaction, and (c) the
justification of the measurement of satisfaction. The deficiencies hampered the development
and validation of satisfaction theory (e.g., Giese & Cote, 2000; Yi, 1990). Giese and Cote
(2000) argued that, as there exist multiple definitions of satisfaction, a researcher must
explicitly define satisfaction and justify the definition selected. Because it is impossible to
develop a universal definition of satisfaction, which is caused by the complexity and the
context-specific nature of satisfaction, they recommended the development of context-specific
definitions of satisfaction. This stance implies that measures of satisfaction should also be
context-specific, because the measure should match the definition of satisfaction. Giese and
Cote (2000) proposed a framework to guide researchers in developing a context-specific
definition and a corresponding measurement procedure for satisfaction.
The meaning of satisfaction thus is context-dependent. There are similarities and
differences in the meaning of satisfaction in different domains. Satisfaction with a retail bank
has both similarities and differences with satisfaction with dinner in a restaurant, satisfaction
with consumption of non-durable consumer goods, and satisfaction with consumption of
durable consumer goods. All pertain to the fulfilment response (Oliver, 1997, p. 13), but the
characteristics of the satisfaction response and the nomological network (Cronbach & Meehl,
3
1955) of satisfaction differ between these domains. These differences warrant the
development of context-specific definitions and corresponding measurement procedures for
satisfaction, as proposed by Giese and Cote (2000). Therefore, the first objective of this study
was to explore the theoretical meaning of customer satisfaction in the context of retail
banking, and to develop a context-specific definition and measurement procedure for
customer satisfaction.
4 Measurement of satisfaction
Satisfaction is a psychological property. Psychological properties are mostly conceived of as
theoretical constructions, which are labeled psychological constructs (e.g., Lord & Novick,
1968, p. 352; Nunally, 1978, p. 96) and which may be measured by means of psychological
tests and psychological questionnaires (e.g., Molenaar, 1995; Oosterveld, 1996, Schouwstra,
2000). Psychological tests and psychological questionnaires are instruments (e.g., well-chosen
sets of items that are administered in a survey) that are assumed to elicit behaviour (e.g., the
responses of a person to the items administered in the survey) that is representative of the
property of interest. The position of the person on the property is inferred from the response
behaviour of the person (e.g., Molenaar, 1995). In the psychometric literature, the phrase test
is often used when maximum performance is measured (e.g., as with educational testing and
intelligence testing) and the phrase questionnaire when typical behaviour is measured (e.g., as
with personality traits and attitudes). Because test has gained a wider use in psychological
measurement (e.g., Cronbach, 1971, p. 443; Murphy & Davidshofer, 1991, p. 8; Schouwstra,
2000, pp. 56-77) we prefer to use it also in this thesis for measurement instruments for typical
behaviour.
Validity of measurement is a key success factor in satisfaction research and in
marketing research in general. This is broadly acknowledged since the influential papers of
Jacoby (1976), Churchill (1979), and Peter (1981). First, academic studies in this domain
increasingly discuss the convergent, divergent, and nomological validity of measurements of
the constructs of interest. This is in accordance with suggestions by Cronbach and Meehl
(1955), Campbell and Fiske (1959), Churchill (1979), and Peter (1981). Second,
measurements of psychological constructs in academic marketing research are generally
based upon multiple-item instruments. This is in accordance with psychometric theory, which
postulates that single items often yield inadequate measurements of constructs (e.g., Messick,
1989, pp. 14, 35).
4
The interest in validity of measurement by no means implies that the issues with regard
to validity are resolved. A review of the marketing literature demonstrates a serious problem
regarding the definition and measurement of psychological constructs such as satisfaction
(e.g., Giese & Cote, 2000; Hausknecht, 1990; Peterson & Wilson, 1992; Yi, 1990). For
example, Verhoef (2001, p. 129) noticed that attribute-based measures of satisfaction differ
from affective measures of satisfaction, and that the latter measures of satisfaction have strong
resemblance with measures of affective commitment. Thus, different studies use different
labels for the same construct or use the same label for different constructs, and such
conceptual ambiguities slow down scientific progress.
The practice of validation of measurements of psychological constructs often is not
consistent with theory of validity, and has been criticised by validity theorists. This criticism
includes the practice of validation research in satisfaction studies. The assessment of
convergent, divergent, and nomological validity (Campbell & Fiske, 1959; Churchill, 1979;
Cronbach & Meehl, 1955) does not cover the major threats to construct validity, which are
construct underrepresentation and irrelevant variance (e.g., Messick, 1989, 1995; Schouwstra,
2000). Cronbach (1989) characterised most applications of the multitrait-multimethod design
(Campbell & Fiske, 1959) as mindless and mechanical, involving the collection of facts with
little concern for their usefulness for construct validation. Borsboom, Mellenbergh, and Van
Heerden (2004) criticised the practice of assessing nomological validity, and proposed to
assess validity on the basis of the test of a causal theory regarding the relation between the
property of interest and response behaviour.
Validity theorists (e.g., Anastasi, 1988; Borsboom et al., 2004; Messick, 1989;
Schouwstra, 2000) agree that construct validation has to start at the outset of test
development. This implies that the methodology of validation research should incorporate a
methodology of test development. The second objective of the present study is the selection
of a methodology for the development of a test for customer satisfaction and the validation of
test scores that is in line with validity theory.
5 Importance of satisfaction
Customer satisfaction is expected to influence customer behaviour, customer profitability, and
company profitability (e.g., Anderson & Mittal, 2000; Fornell, 1992; Oliver, 1997).
Therefore, customer satisfaction is considered of strategic importance for companies in many
retail markets, including the Dutch market for retail banking (e.g., Goedee, Reijnders, & Van
5
Thiel, 2008). During the present study, the Dutch market for retail banking was a mature and
competitive market. Most of the market was divided between six large retail banks. They all
offered a broad range of financial products, including current accounts, saving accounts,
credit cards, loans, mortgages, mutual funds, and insurances. A number of these products was
also offered by insurance companies and various niche players. Virtually each Dutch adult
owned at least a current account and most owned a variety of financial products. Most of
them had products from different financial companies.
Fornell (1992) argued that customer satisfaction is a key success factor for companies
that operate in mature and competitive markets. In these markets, company growth is
accomplished at the expense of competing firms, and retention of customers is of major
importance for companies in these markets (Fornell, 1992; Reichheld & Sasser, 1990).
Customer satisfaction is considered a key success factor for these companies, because it is
expected to affect retention of customers and to provide a defence against offensive strategies
by its competitors (Fornell & Wernerfelt, 1987, 1988).
Longitudinal studies (e.g. Anderson, Fornell, & Lehmann, 1994; Anderson & Mittal,
2000; Gruca & Rego, 2005) demonstrated a relation between customer satisfaction and future
financial results of companies. The results of these studies strengthen the expectation that
customer satisfaction influences customer profitability. If customer satisfaction influences
customer profitability, there must be a relation between customer satisfaction at time t = 0 and
customer profitability at time t > 0. However, longitudinal studies conducted at the person
level and exploring the relation between customer satisfaction and future customer
profitability, are rare in the marketing literature. Therefore, the third objective of this study is
to explore the latter relation on the basis of longitudinal data.
6 Research goal
Deficiencies in the definition and measurement of satisfaction have hampered the
development and validation of satisfaction theory (e.g., Giese & Cote, 2000; Peterson &
Wilson, 1992; Yi, 1990). The usefulness of satisfaction research for the development of
satisfaction theory may be increased by the resolution of these deficiencies. Because
psychometrics is concerned with the measurement of psychological constructs such as
satisfaction, psychometric methods may serve to overcome these deficiencies. This thesis
aims at contributing to the improvement of the methodology of satisfaction research by the
use of psychometric methods for the definition and measurement of customer satisfaction.
6
Furthermore, the thesis aims at contributing to the development and validation of satisfaction
theory by means of a study into the meaning of customer satisfaction in the context of retail
banking.
In order to meet the research goal, the thesis addresses four research questions:
1. What is a suitable methodology for test development and construct validation in the
domain of satisfaction research?
2. What is the theoretical meaning of customer satisfaction in the context of retail
banking?
3. What is the empirical meaning of customer satisfaction in the context of retail
banking?
4. What is the importance of customer satisfaction in the context of retail banking?
7 Contents of the thesis
This thesis encompasses three components, which are (a) a theoretical study into the
measurement of psychological constructs and the validity of measurement, (b) a theoretical
study into the meaning of customer satisfaction and customer dissatisfaction, and (c) two
empirical studies into customer satisfaction with a major Dutch retail bank. The empirical
studies were based on survey research that was conducted among customers of the bank.
Chapter 2 addresses the measurement of psychological constructs. The chapter starts
with an introduction into the conception of psychological constructs, the different approaches
to test development, and the measurement process. Subsequently, the theory of validity of
measurement is discussed. The chapter ends with the choice of the appropriate methodology
for test development and construct validation for this study.
Chapter 3 discusses the theoretical meaning of customer satisfaction. The chapter starts
with an exploration of the theory on customer satisfaction and customer dissatisfaction, and
the conceptions of satisfaction and dissatisfaction in these theories. Subsequently, the
nomological network of customer satisfaction is explored. On the basis of these explorations,
a definition of customer satisfaction in the domain of retail banking is provided.
Chapter 4 discusses the deductive design (Schouwstra, 2000). The chapter starts with an
explication of the deductive design, which is a methodology for test development and
construct validation for personality traits and attitude-like properties. Subsequently, the theory
of violators (Oort, 1996), the purpose of the empirical study, the development of the test for
7
8
customer satisfaction with a retail bank, the outline of the measurement model, and the
hypotheses regarding the validity of measurement of customer satisfaction are addressed.
The purpose of the first empirical study was to measure customer satisfaction with a
retail bank, to investigate the validity of the measurement of customer satisfaction, and to
explore the relation between customer satisfaction and future customer profitability. Chapter 5
addresses the method of the first empirical study. The chapter includes a discussion of the
measurement instruments that were applied in this study, the questionnaire, the pre-tests, the
pilot study, and the main study.
Chapter 6 presents the results of the first empirical study. The chapter starts with the
discussion of the preliminary data analyses. Subsequently, the measurement analyses and the
tests of the hypotheses are discussed. Next, the relation between customer satisfaction and
future customer profitability is further explored. The chapter concludes with a discussion of
the meaning of the results of the empirical study for the assessment of the validity of
measurement of customer satisfaction.
The purpose of the second empirical study was to test hypotheses regarding the validity
of measurement that were not addressed in the first empirical study. Chapter 7 addresses the
method of the second empirical study. The chapter includes a discussion of the measurement
instruments that were applied in this study, the questionnaire, the sample, and the data
collection.
Chapter 8 presents the results of the second empirical study. The chapter includes a
discussion of the preliminary analyses, the measurement analyses, and the tests of the
remaining hypotheses regarding the validity of the measurements of customer satisfaction.
The chapter concludes with a discussion of the meaning of the results of the study for the
assessment of the validity of the measurements of customer satisfaction.
Chapter 9 is the general discussion. It discusses the results from this study and their
implications for customer satisfaction theory and marketing measurement.
9
10
11
Chapter 2
Measurement of psychological constructs
1 Introduction
A psychological construct such as satisfaction is a theoretical construction with both linguistic
and empirical content. This means that a psychological construct is a term with (a) linguistic
meaning, such as any linguistic term, and (b) relations with empirical phenomena, that is,
observable behaviours. Constructs are highly similar to concepts, and to some extent both
terms may be used interchangeably. Hox (1997, p. 49) noted that both constructs and concepts
are theoretical abstractions, meaning that they represent ideas that are formed by
generalisations from similar phenomena, and that constructs refer to concepts that are sort of
formally defined in scientific theories. Thus, the term concept refers to a somewhat broader
group of theoretical abstractions than the term construct.
The major positions regarding the ontology of psychological constructs are realism
and constructivism (e.g., Borsboom, Mellenbergh & Van Heerden, 2003; Borsboom, 2005,
pp. 6-9). These two positions are discussed in the next section. A third position regarding the
ontology of constructs is operationalism (e.g., Borsboom, Mellenbergh & Van Heerden, 2003;
Borsboom, 2006). Operationalism equates theoretical constructs with their measurements.
Because it is broadly acknowledged that operationalism is untenable (e.g., Borsboom et al.,
2003; Heiser, 2006; Kane, 2006), this position is not discussed in this chapter.
In order to measure a construct one needs (a) to obtain a sample of instances within the
corresponding behavioural domain, (b) to assess whether these instances provide a
representative sample, and (c) to assess how to combine the observations into a measure of
the construct of interest. For the latter purpose one needs to apply statistical models and to
assess the quality of measurement. This is the major subject of psychometrics.
This chapter addresses the theory on the conception and the measurement of
psychological constructs. The focus of the chapter is on theory regarding the conception and
the measurement of attitude-constructs, such as satisfaction. Theory that is specific for the
conception and the measurement of ability-constructs, such as the various types of specific
intelligence, is not taken into account.
2 Conception of psychological constructs
Scientific concepts are the core of scientific theories (Sartori, 1984; Thomson, 1961). This
implies that psychological constructs are the core of psychological theories. Torgerson (1958,
p. 9) denoted psychological constructs such as satisfaction as property-constructs, and
contrasted them to system-constructs, which are objects and things that possess sets of
particular properties. He argued that, to be of use in scientific theory, a property-construct
must possess both theoretical meaning and empirical meaning (Torgerson, 1958, p. 11).
Whereas theoretical meaning refers to the definition of a construct in terms of theoretical
concepts, empirical meaning refers to the definition of a construct in terms of observable data.
The distinction between theoretical meaning and empirical meaning of constructs is founded
in linguistic and logical positivistic philosophies (e.g., Carnap, 1956; Frege, 1892).
A psychological construct such as intelligence has both theoretical and empirical
meaning. The theoretical meaning of intelligence entails its definition in terms of (a) the
group of attributes or phenomena to which it refers, and (b) its relation with other constructs
in the nomological network. The empirical meaning of intelligence entails the empirical
indicators of the construct and includes, for example, the score on a particular intelligence
test. However, as Torgerson (1958, p. 7) noted, the operationally defined intelligence is not
universally agreed to be the same thing as the theoretically defined intelligence. There is no
identity relation between the theoretical meaning of a construct and its empirical referents.
There is an ongoing debate regarding the ontological status of psychological properties
(e.g., Borsboom, 2005, 2006; Borsboom, Mellenbergh, & Van Heerden, 2003, 2004; Sijtsma,
2006). The major positions regarding the ontological status of psychological properties are
realism and constructivism. The realistic position is founded upon the assumption that
psychological properties exist as unobservable but real entities (e.g., Borsboom, 2005, p. 6).
This means that a property exists independent of its observations, and that the measurement of
a particular psychological property is a reflection of the entity. Borsboom et al. (2003) argued
that measurement of psychological properties requires a realistic position regarding the
particular construct, as the sentences Test X measures the attitude towards nuclear energy and
Attitudes do not exist cannot both be true.
The realistic position regarding the existence of psychological properties raises the
question What is property X, which is supposed to exist? Thus, it seems that the question
regarding the meaning of a particular psychological property precedes the question regarding
the ontological status of this property. The question regarding the meaning of a particular
psychological property is ultimately a linguistic question (Wittgenstein, 1953, 1958). This
12
means that the property is a term with a meaning that needs to be clarified on the basis of an
examination of the use of the term in linguistic contexts, including psychological theories.
Constructivism does not assume the existence of psychological properties as entities in a
realistic sense. According to constructivism, a psychological property may be conceived of as
an organisational principle with respect to behaviour. Borsboom (2005, pp. 7-9) differentiated
between three constructivist movements, which are logical positivism, instrumentalism, and
social constructivism. These movements have many different characteristics and concerns, but
what they have in common is (a) the differentiation between a theoretical concept and an
empirical concept, and (b) the denial of knowledge about the existence of theoretical concepts
as realistic entities, beyond their existence as organisational principles of behaviour.
Social constructivism deserves special attention because it advocates a linguistic
conception of psychological constructs, meaning that it is the linguistic use of the term that
grants theoretical meaning to the construct (Wittgenstein, 1958, 1953, section 43). This point
of view implies that the justification of a particular construct is founded in the use of the
construct within a particular language context, such as psychological theory. It makes sense to
question what a particular construct refers to, whether it is appropriate in a particular context,
and whether it is useful, but it makes no sense to question whether a particular construct exists
in any physical or physiological sense.
Empirical observations have to demonstrate the use of a construction in a particular
language context. According to Wittgenstein (1953, 1958), the description of particular cases
of a construction will reveal the meaning of the construction. It is fruitless to search for a
sharp definition of a construction like thinking, because cases of thinking are connected to
each other by family resemblances. There is no combination of defining characteristics, which
separates all cases of thinking from everything else. A sharp definition will not converge with
the actual use of a construct, because the actual use does not have distinct borders
(Wittgenstein, 1958, p. 44).
The linguistic conception of psychological properties does not defy the existence
psychological properties in a realistic sense, but it does defy knowledge beyond the
observable. This is best illustrated by the beetle argument (Wittgenstein, 1953, section 293):
‘Suppose everyone had a box with something in it: we call it a beetle – Here it
would be quite possible for everyone to have something different in the box. One
might even imagine such a thing constantly changing – But suppose the word
beetle had a use in these people’s language? – If so it would not be used as the
13
name of a thing. The thing in the box has no place in the language game at all; not
even as a something: for the box might even be empty. – No, one can divide
through by the thing in the box; it cancels out, whatever it is. – That is to say: if
we construe the grammar of the expression of sensation on the model of ‘object
and designation’ the object drops out of consideration as irrelevant.’
Sartori (1984) provided an important extension of the linguistic conception of social
science concepts. He acknowledged that social science theories are generally expressed in
natural language, which implies fuzzy reasoning, thinking, and operationalisation of concepts,
and that language influences our reasoning and theorising. He argued that science needs a
specialised language, which encompasses the unequivocal definition of its concepts. For this
purpose, Sartori (1984, pp. 31-35) proposed concept analysis. Concept analysis aims at
establishing the meaning of the concept by establishing the scientific definition of the
concept, making sure that the concept is understood unequivocally, and determining the
empirical referents of the concept. The core of concept analysis is the establishing of the
scientific definition of the concept. Sartori (1984, pp. 32-33) proposed to define a concept in
terms of a well-specified set of defining and accompanying characteristics. This is a verbal
definition. Concepts that have different connotations in natural language have to be split,
which results in unequivocally defined concepts. The empirical referents are loosely described
as the real world counterpart of words, which are the objects, entities, or processes denoted by
words. Sartori’s (1984) concept analysis bears resemblance to the explication of constructs
(Carnap, 1950, 1956).
The unequivocal definition of a construct is legitimate and desirable for the
development of scientific theory (Sartori, 1984; Torgerson, 1958). There is ample evidence of
a negative effect of conceptual ambiguities regarding constructs on scientific progress. See Yi
(1990) and Giese and Cote (2000) for discussion on the importance of an unequivocal
conceptualisation of satisfaction for the development of satisfaction theory. Concept analysis
is a useful starting point for research into social science concepts and marketing concepts,
because it may serve to overcome these conceptual ambiguities. However, unequivocal
definitions cannot bridge the gap between theoretical meaning and empirical meaning,
because the meaning of a term differs from the empirical referents (Frege, 1892; Wittgenstein,
1953, 1958). Theoretical constructs exist as linguistic constructions, and they have a surplus
meaning over any empirical meaning.
14
The constructivist position regarding the ontology of psychological properties is in line
with psychometrics, which is concerned with the modelling of data that reflects behaviour of
persons. This means that the latent trait in a measurement model is estimated from the data,
but it is not the attribute behind the data (e.g., Nunnally, 1978, p. 96, pp. 105-109; Sijtsma,
2006). Lord and Novick (1968, p. 352) explained that psychometrics does not assume the
existence of a property in a physical or physiological sense:
‘…nowhere in psychological theory is there any necessary implication that traits
exist in any physical or physiological sense. It is sufficient that a person behave as
if he/she were in possession of a certain amount of each of a number of relevant
traits and that he/she behaves as if these amounts substantially determined his
behaviour.’
Theory about psychological constructs has to take three points into consideration, which
originate from the conception of psychological constructs as linguistic constructions. First,
psychological constructs are terms that are used in different language contexts, such as
psychological theories. The linguistic use of the term is the first observable, and the analysis
of the use of the term reveals the meaning of the term. Second, psychological constructs may
have empirical referents, which are behaviours interpreted in terms of the construct. The
behaviours are the second observable, and they are the raw material for measurement. Third,
one cannot point to one particular kind of behaviour or one particular set of behaviours, which
totally cover a particular construct and nothing else. This means that a particular
psychological construct is connected to a domain of behaviours that cannot be delineated
sharply and cannot be listed exhaustively.
3 Test development
The development of scientific theory requires that its concepts can be measured adequately
(Sartori, 1984; Torgerson, 1958). Psychological constructs can be measured by means of
psychological tests (Chapter 1, Section 4). As a psychological construct is connected to a
domain of behaviours, one can hardly depend on the observation of one instance within a
domain in order to measure the construct. Moreover, Messick (1989) noticed that single items
yield moderate measurements of constructs because they almost certainly reflect a
confounding of multiple determinants. Consequently, the measurement of a psychological
15
construct on the basis of a single item will be biased. This problem is solved with multiple-
item scales, if the different items have different unique components that are mutually
independent.
Scientific research has suggested different methods for the development of
psychological tests. Oosterveld (1996, p. 25) categorised these methods in three approaches
for test development, which are the deductive approach, the intuitive approach, and the
inductive approach.
The methods of the deductive approach are based upon explicit theory about the
construct of interest. This theory is the basis of the formulation of a definition of the construct
and eventually the content of the items and the composition of the test (Oosterveld, 1996, p.
25).
The methods of the intuitive approach are based upon implicit knowledge and implicit
hypotheses regarding the construct of interest. There is no theory regarding the construct of
interest that grounds the formulation of a definition of the construct and eventually the content
of the items and the composition of the test.
The methods of the inductive approach are exploratory. A test is developed on the basis
of observable relations between either the items or the items and some criterion. The methods
may be characterised as data driven, which means that the analysis of the available data
makes up the core of test development.
On the basis of empirical research into the quality of different methods, Oosterveld
(1996, p. 127) concluded that the deductive approach to test construction yields better tests
than the intuitive and inductive approaches. This means that the methods of the deductive
approach yielded tests that provided test scores having better validity and reliability than the
methods of the other approaches. Oosterveld (1996) studied two methods of the deductive
approach, which were the construct method (Jackson, 1971, 1973) and the facet design
method (Guttman, 1954). The methods can be described in terms of four components, which
are (a) the conception of the construct, (b) scale development, (c) scale construction, and (d)
evaluation of scale scores (Oosterveld, 1996, p. 24).
The construct method (Oosterveld, 1996, pp. 16-20) is a theory-oriented method. The
first step of the method is the definition of the construct on the basis of scientific theory
regarding the construct. The definition of the construct in terms of phenomena and attributes
that it refers to is called the explicit definition, and the definition of the construct in terms of
its relation with other constructs in the nomological network is called the implicit definition
(Schouwstra, 2000, p. 61). The second step of the method is elaboration or scale development.
16
This step includes item specification, item production, and item judgement. The items need to
be content saturated. This means that each item should correlate relatively high with the scale
score that represents the concept the item is expected to measure, and relatively low with
scale scores representing other concepts (Oosterveld, 1996, p. 19). Thus, each item must
possess convergent and divergent validity. The third step of the method is scale construction,
which refers to the application of a measurement model to the empirical data aimed at
producing a scale on which persons can be measured with respect to the concept of interest.
The fourth step of the method is the evaluation of the scale scores. This step includes, for
example, the assessment of reliability and construct validity of scale scores. It may be noted
that the construct method bears resemblance to Churchill’s (1979) procedure for test
development in marketing research.
Guttman (1954; see also Hox, 1997) introduced the facet design. The facet design
defines a universe of observations by classifying them with a scheme of facets (i.e., variables)
that contain different elements (i.e., values). Facet theory distinguishes three types of facets,
which are (a) population facets, which classify the population, (b) content facets, which
classify the concept, and (c) response facets, which classify the behaviours. Each of these
facets has one or more distinct values that are called the elements of the facet. The product of
all elements of all facets defines the universe of observations.
The facet design method (Oosterveld, 1996, pp. 20-24; Stouthard, Mellenbergh &
Hoogstraten, 1993) is a method for test development that is aimed at the optimisation of
content validity by means of a systematic representation of the concept. The concept is
represented on the basis of the combination of one or more content facets. Each content facet
has one or more elements, and a particular combination of elements of each content facet is
called a structuple (Oosterveld, 1996, p. 22). The product of all elements of all content facets
defines the set of structuples and delineates the concept (see, e.g., Section 4 from Chapter 4).
The second step of the method is elaboration or scale development. This step includes item
specification, item production, and item judgement. The items have to be derived from the
facet structure. Each item must be specific for a single structuple of the facet structure. The
third step of the method is scale construction. Scale construction refers to the analysis of the
data by means of a measurement model, aimed at producing the measurement scales and the
scale scores. The fourth step is the evaluation of the scale scores. This step includes, for
example, the assessment of reliability and construct validity of scale scores.
Both the construct method and the facet design method incorporate some kind of
concept analysis that clarifies the meaning of the construct of interest and facilitates its
17
definition. In the case of the facet design method, this analysis should facilitate a definition of
the construct in the format of a facet design, and in the case of the construct method this
analysis should facilitate an explicit and an implicit definition of the construct. However, it is
not immediately clear what this concept analysis is, that reveals the meaning of the construct
and facilitates its definition. Wittgenstein (1953, 1958, p. 44) argued that it is the examination
of examples of the use of a term in language contexts that reveals the meaning of the term.
Following this argumentation, it is appropriate to examine the use of the term in various
language contexts, including scientific theories, in order to clarify the meaning of the
construct and to develop a research definition of the construct. In practice, this requires the
inventarisation of diverse studies into the construct, and the examination of the conception of
the construct in these studies. See Giese and Cote (2000) for an example of this practice in
consumer satisfaction research; that is, the examination of definitions of consumer satisfaction
in scientific research, the analysis of similarities and differences between these definitions,
and the introduction of a framework for the development of context-specific definitions of
consumer satisfaction.
4 Measurement process
Coombs (1964, p. 4) represented the process of psychological measurement in a scheme
(Figure 1). The observations Coombs (1964) referred to are observations of behaviour, and
the data are psychological data. In phase one of the process, the researcher has to decide on
the collection of observations. The universe of observations is theoretically unlimited, and it is
up to the researcher to choose and to record particular observations from a particular research
population. In phase two, the researcher transforms the observations into data. It always takes
some decision or action on the part of the researcher to create the data on the basis of his/her
observations. Therefore, Coombs (1964, pp. 3-6, 29) conceived of data as interpretations of
observations by the researcher. In phase three, the researcher applies a measurement model to
the data in order to construct one or more scales, and to classify the stimuli and/or the persons.
A scale represents a property, and the classification of stimuli and/or persons on a scale
constitutes the measurement of a property. Thus, it is properties of stimuli and/or persons,
which are measured, and it is stimuli and/or persons, which are classified (Torgerson, 1958, p.
9).
18
Universe of Recorded Data Inferential potential observations classification of observations individuals and stimuli
Phase 1 Phase 2 Phase 3
Figure 1: The Measurement Process (Coombs, 1964, p. 4)
Figure 1 illustrates that the scaling analyses are not at the core but at the end of the
measurement process. Coombs (1964, p. 5) argued that the phases preceding the scaling
analyses are at least as important components of the measurement process. Furthermore, the
scheme illustrates that each phase encompasses one or more decisions made by the researcher,
which influence the output of the phase concerned and the measurements. For example, the
researcher may code the answers to some closed question as nominal data, ordinal data, or
numerical data, and use a suitable measurement model to analyse the data. The coding of the
responses and the choice of the measurement model are based upon assumptions made by the
researcher with respect to the observations that he or she made. For this reason, Coombs
(1964, p. 5) noted that ‘psychological data and measurements and scales are theory’.
Psychometrics suggests different measurement models that may be applied in the last
phase of the measurement process. The major types of measurement models are the classical
test theory (CTT) model (Lord & Novick, 1968), the item response theory (IRT) models (e.g.,
Embretson & Reise, 2000) and the factor analytic models (e.g., Bollen, 1989; Gorsuch, 1983).
It is noteworthy that different measurement models may yield different scales of the property,
which means that they may yield different classifications of persons. The choice of a
researcher for a particular measurement model may be based on the hypothesised relationship
between the data and the property, the desired level of measurement, and the intention to test
hypotheses about the fit of the model.
The quality of measurement is not self-evident but has to be demonstrated. The major
criteria with respect to quality of measurement are the fit of the measurement model, the
reliability of the scale scores, the generalisability of conclusions, and the validity of the
interpretation of the scale scores (Molenaar, 1995).
The first criterion is the fit of the measurement model. The measurement model is a
formal representation of the expected data structure. The fit of the model refers to the extent
19
to which the theoretical assumptions of the model regarding the structure of the data match
the empirical data. This is, for example, the extent to which the theoretical correlation matrix
that is based upon the scale scores is in agreement with the empirical correlation matrix, or the
extent to which a theoretical assumption such as unidimensionality is in agreement with the
dimensionality of the empirical data. A major advantage of IRT models such as the Mokken
model (Mokken, 1971) and the Rasch model (Rasch, 1960) is the availability of powerful
tests of the fit of the model to the data (Molenaar, 1995). Since these models imply testable
statements regarding the structure of the data, their fit can be falsified on the basis of
empirical data.
The second criterion is reliability, which refers to the accuracy of scale scores. The
reliability coefficient originated from CTT, and is defined as the ratio of the true score
variance and the observed score variance in the population of interest. Neither the true scores,
which are defined as the observed scores minus the measurement errors, nor the true score
variance can be observed. Therefore, the reliability coefficient has to be estimated by other
means, such as the internal consistency coefficient, which is known as coefficient alpha
(Cronbach, 1951). The reliability coefficient is generally used to obtain the standard error of
measurement in scale scores. The standard error of measurement is used to estimate a
confidence interval for a person’s true score, and can be used for testing hypotheses about the
true score. For example, it can be tested whether two scale scores, which serve as estimates of
the true scores, are different, or whether a scale score is significantly different from a cut
score.
In IRT, an item response function is defined for each item in the test. For a particular
item, the item response function defines the probability of a particular score given the
person’s measurement value on the scale of interest. Thus, persons with different
measurement values have different probabilities of providing a particular score. An example
is an item response function that defines the probability of a correct answer to a particular
arithmetic item as an increasing function of arithmetic ability. Persons having higher
arithmetic ability levels have higher probabilities of giving the correct answer. The use of
item response functions implies that the magnitude of the measurement error depends on the
person’s location on the scale. Thus, one person may be measured with greater accuracy using
a particular item and a particular test than another person who has another scale location
(Molenaar, 1995).
The third criterion is generalisability, which refers to the extent to which conclusions
from measurement analyses are generalisable over various conditions. To assess the
20
generalisability of conclusions, one has to study the sources of randomness in measurement
(Molenaar, 1995). Major sources of randomness are (a) the sampling of persons, (b) the
sampling of items, (c) the test conditions, and (d) the mode of administration of the test. For
example, due to differences in test conditions (e.g., Messick, 1989, p. 81) a set of items may
constitute a scale in one empirical study but not in another empirical study. This necessitates
the assessment of the fit of the measurement model in different empirical studies in which the
measurement instrument is used. Furthermore, the mode of administration may influence the
responses to test items. For example, results obtained via telephone interviews cannot be
compared with results obtained from on-line interviews without having investigated the
comparability of these modes of data collection (e.g., Bronner & Kuijlen, 2007). It is
recommended to reflect on the plausible sources of randomness in advance of a study and, if
necessary, to test empirically whether particular generalisations are justified (Molenaar,
1995).
The fourth criterion is validity. Messick (1989, p. 13) defined validity as ‘an integrated
evaluative judgement of the degree to which empirical evidence and theoretical rationales
support the adequacy and appropriateness of inferences and actions based on test scores or
other modes of assessment’. This definition entails validity of measurement (i.e., the validity
of test-score interpretations for describing a person; Cronbach, 1971, pp. 445-449) and
validity for decision-making (i.e., the validity of test-score interpretations for making
decisions about a person; Cronbach, 1971, pp. 445-449). Validity is extensively discussed in
the next session.
5 Validity
The concept of validity has evolved throughout time (e.g., Anastasi, 1986; Angoff, 1988;
Schouwstra, 2000). Initially, validity was conceived of as the degree to which a test measures
what it purports to measure (Kelley, 1927). Validity was demonstrated on the basis of the
correlation of test scores with some criterion, which is called criterion-related validity (e.g.,
Anastasi, 1988, p. 145; Cronbach & Meehl, 1955). However, it proved to be difficult to find
objective criteria for different kinds of measurements, such as measurements of different
psychological constructs. This problem gave rise to new methods for establishing validity and
eventually to different conceptualisations of validity, such as (a) criterion-related validity, (b)
content validity, and (c) construct validity (Cronbach & Meehl, 1955).
21
Content validity is established by showing that the behaviours sampled by the test are a
representative sample of the domain of interest (e.g., Anastasi, 1988, p. 140; Cronbach, 1971,
p.451; Messick, 1989, pp. 39-42; Murphy & Davidshofer, 1991, pp. 107-109). As such,
content validity pertains to evidence about the domain coverage and the degree to which the
content of the test represents the domain. In order to establish content validity, one must
depart from an elaborated definition of the construct of interest. This definition should include
a detailed description of what the construct refers to, and of what the construct does not refer
to but may be related to (Schouwstra, 2000). Content validity is then established on the basis
of the comparison of the structure of the test with the specified structure of the construct.
Thus, content validity is a property of tests rather than of test-score interpretations (Messick,
1989, p. 17)
Two additional remarks are in order. First, content validity has to be incorporated at the
onset of test development. For example, Messick (1989, p. 39) noted that, on the basis of the
construct definition, a researcher can develop a test which covers all aspects or facets of the
construct of interest according to a specified rule such as equal coverage, which means that all
aspects or structuples are equally represented in the test. This is the core of content validity.
Second, content validity should not be confused with face validity. The latter pertains to
whether the test looks valid to test users, and not to what the test scores actually reflect
(Anastasi, 1988, p. 144). Therefore, validity theorists do not consider face validity as a
conceptualisation of validity.
Cronbach and Meehl (1955) conceived of construct validity as the appropriateness of
test-score interpretations. They discussed construct validation, and they concluded that
construct validation may include many investigations, such as research into content validity,
criterion-related validity, inter-item correlations, and inter-test correlations. Furthermore, they
proposed defining a construct by means of a network of associations or propositions in which
the construct of interest occupies a central position. This network is the nomological network.
The study of relations between test scores and measurements of concepts in the nomological
network provides evidence pro or contra construct validity. Construct validation requires the
integration of all evidence into a judgement of construct validity. Because this judgement is
qualitative by nature, it cannot be expressed as a single coefficient, such as the reliability of
test scores (Cronbach & Meehl, 1955).
One additional remark is in order. Cronbach & Meehl (1955) explained construct
validation, and their explanation illustrates that they conceived of construct validity as the
22
appropriateness of test-score interpretations (see also Cronbach, 1971, p. 447). However, they
did not provide an explicit definition of construct validity. The lack of an explicit definition
may have contributed to confusion about the meaning of construct validity. For example,
Churchill (1979) conceived of construct validity as a property of a test, which does not match
the conception of construct validity as the appropriateness of test-score interpretations.
Churchill (1979) and Peter (1981) introduced construct validity in the marketing
literature. The work of these authors has guided validation research in academic marketing
research up to the present day. Elaborating on the work of Cronbach and Meehl (1955) and
Campbell and Fiske (1959), they split construct validity into (a) nomological validity, (b)
divergent validity, and (c) convergent validity. Nomological validity refers to the relationships
between the test scores and measures purported to assess different but related concepts.
Discriminant or divergent validity refers to the extent to which test scores differ from
measures of other concepts that are expected to be different from the concept of interest in
theoretically interesting ways. Convergent validity refers to the extent to which test scores
correlate with other measurements of the same construct.
Churchill (1979) and Peter (1981) proposed multitrait-multimethod (MTMM) research
(Campbell & Fiske, 1959) to investigate construct validity. MTMM research requires
measurements of at least two traits by at least two methods, so that each trait is measured by
each method. The MTMM matrix consists of the correlations between (a) the same trait
measured by means of different methods, (b) different traits measured by means of the same
method, and (c) different traits measured by means of different methods. Convergent validity
is assessed on the basis of inspection of the first set of correlations, divergent validity is
assessed on the basis of inspection of the second set of correlations, and method bias is
assessed on the basis of a comparison of the second and the third set of correlations.
Belson (1986) explicitly addressed the subject of validity in survey research. The
measurement of psychological constructs is typically based upon survey research. Thus, the
quality of the survey data delineates the validity of measurements of psychological constructs.
Belson (1986) noted that the accuracy of answers to survey questions cannot be taken for
granted because misinterpretations of questions, memory decay of participants, and
unwillingness to respond may contaminate the data. Ample evidence exists of the effects of
questionnaire format, questionnaire length, and the wording of questions and response
categories on the responses of participants to questions (e.g., Belson, 1981; Bradburn, 1983;
Rugg, 1941; Schuman & Presser, 1981; Sudman & Bradburn, 1982; Saris et al. 1998;
Scherpenzeel, 1995). For these reasons, Belson (1986) proposed assessing validity in survey
23
research on the basis of an investigation of the quality of the answers given to survey
questions. This includes the investigation of the quality of opinion data. Belson (1986)
proposed various techniques to assess the validity of survey research, such as (a) the
evaluation of the data collection procedure in terms of the known principles of question
formulation and questionnaire design, (b) the pre-testing of the questions, and (c) the
execution of a pilot of the questionnaire.
Messick’s (1989, p. 13) definition of validity is important for various reasons. First, the
definition expresses unequivocally that the subject of validation is the interpretation and the
use of test scores. This is in agreement with the practice of validation in psychological
research, which is to investigate the meaning of test scores in a specific context and the
usefulness of test scores for various decision-making purposes (e.g., Anastasi, 1988;
Cronbach, 1971; Murphy & Davidshofer, 1991). Second, the definition expresses that
different lines of evidence have to be considered when making a judgement of validity. This
includes evidence of criterion-related validity, content validity, and the original conception of
construct validity (Cronbach & Meehl, 1955). Third, the definition expresses that these
different lines of evidence cannot be integrated into a single coefficient, but have to be
integrated into a judgement regarding the test-score interpretation (e.g., Cronbach, 1971, p.
464; Cronbach & Meehl, 1955; Messick, 1989, 1995). This judgement has a gradual nature
(Messick, 1989, p. 13), which implies that the test-score interpretations may have high
validity, moderate validity, low validity, or no validity at all. Fourth, the definition expresses
that validation is an unending process that includes the judgement of evidence gathered in the
processes of test development and test use (Anastasi, 1986, 1988; Cronbach, 1971, p. 452;
Messick, 1989, p. 13). Fifth, Messick (1989, pp. 20-21) differentiated between the assessment
of construct validity and the assessment of the consequences of the use and the interpretation
of test scores as the two bases of validity. In that context, Messick (1989, pp. 20-21, 34; 1995)
argued that construct validity comprises the rationales and evidence supporting the
trustworthiness of test-score interpretations in terms of the construct of interest, and that the
validation of decision-making practices of test scores comprises the appraisal of social
consequences of the use and interpretation of test scores.
Messick (1989, pp. 34-35, 1995; see also Cook & Campbell, 1979) addressed two
general threats to construct validity, which are construct underrepresentation and irrelevant
variance. Construct underrepresentation refers to the risk of measuring only a part of the
construct of interest, such as only the cognitive aspect of customer satisfaction instead of both
the cognitive and affective aspects of the construct (e.g., Oliver, 1997, p. 343). Irrelevant
24
variance refers to the risk of measuring more than just the construct of interest, such as other
traits, concepts related to specific group membership, or response tendencies. Both construct
underrepresentation and irrelevant variance refute the interpretation of test scores in terms of a
reflection of the construct of interest and nothing else. It may be noted that the common
practice of collecting empirical evidence for a network of associations between measurements
does not exclude the two threats to construct validity. When a relationship is found between
the measure of the attribute and other attributes, the test score may still reflect only part of the
attribute. Also, the test score may reflect something more than just the attribute of interest.
Messick’s (1989, 1995) conception of construct validity as a property of test-score
interpretations is today’s dominant conception of construct validity in psychometrics.
However, it does not provide a clear-cut methodology for investigating construct validity. For
this purpose, Schouwstra (2000, pp. 58-59) proposed the deductive design, which is a
methodology for the development of tests for typical behaviour such as behaviour related to
satisfaction and construct validation. The deductive design is consistent with Messick’s
conception of construct validity. Schouwstra’s methodology encompasses the collection of
theoretical and empirical evidence regarding the interpretation of test scores in terms of the
construct of interest, and nothing else. As such, it takes the two global threats to construct
validity into account, which are construct underrepresentation and construct irrelevant
variance.
Borsboom et al. (2004) criticised Messick’s (1989, p. 13) definition and conception of
validity. They subscribed to Kelley (1927) that a test is valid if it measures what it purports to
measure, and they defined validity of tests accordingly: ‘A test is valid for measuring an
attribute if (a) the attribute exists in the real world, and (b) variations in the attribute causally
produce variations in the outcomes of measurement procedures’. Thus, Borsboom et al.
(2003, 2004) defined validity as a property of tests, and they took a realistic stance regarding
the nature of psychological constructs. They opposed Cronbach and Meehl (1955) and
Messick (1989, 1995), who conceived of construct validity as a property of test-score
interpretations, and conceived of psychological constructs as postulated attributes of people.
Two additional remarks are in order. First, Borsboom et al. (2004) argued that the
conception of validity as a property of tests has direct relevance for validation research.
Evidence of validity should be based upon research into the response process, that is, the
relation between the attribute and response behaviour. The research should test a hypothesis
with respect to the processes that lead to measurement outcomes. This amounts to a test of a
causal theory about the relation between attribute and response behaviour. Because a
25
nomological network is not a theory of the causal relation between attribute and test score, the
authors considered the nomological network irrelevant for validation research. Thus, in their
view validation research should not assess the relationship of the construct with other
constructs in the nomological network, but test a causal theory about the processes that evoke
behaviour.
Second, Borsboom et al. (2004) argued that the conception of validity as a property of
tests has direct relevance for test construction. A large part of test validity research has to be
done at the stage of test construction. Test development should depart from a theory on the
causal relation between the attribute and behaviour. This approach to test development has
been applied successfully with respect to measurement of some specific ability constructs,
such as transitive reasoning (Bouwmeester & Sijtsma, 2006) and cognitive development
(Jansen & Van der Maas, 1997).
6 Discussion
There is no broad consensus on either the conception of validity or the methodology of
validation research. This is due partly to different conceptions of validity being based upon
different conceptions of psychological constructs, and partly to validity theory that is still
developing and has not yet come to a conclusion. We discerned three perspectives on validity
and validation research that are important for current academic research: (a) the Churchill
perspective, (b) the Messick perspective, and (c) the Borsboom perspective. These
perspectives are presented in Table 1.
The Churchill perspective on construct validity is the leading perspective on construct
validity in academic marketing research. It was introduced in Churchill’s (1979) procedure for
test development in marketing research. Peter (1981) and Fornell and Larcker (1981) further
elaborated Churchill’s perspective and the associated methods for validation research.
Churchill’s procedure for test development in marketing research has contributed
markedly to the measurement of psychological constructs in the corresponding domain (e.g.,
Bearden, Netemeyer, & Mobley, 1993), but Churchill’s perspective on construct validity is
not in line with modern theories of construct validity. The criteria associated with Churchill’s
perspective do not address the two global threats to construct validity, which are construct
underrepresentation and construct irrelevant variance (Messick, 1989, 1995; Schouwstra,
2000). Consequently, the methods associated with this perspective do not suffice for the
assessment of construct validity.
26
Table 1: Three Perspectives on Validity and Validation Research Churchill perspective Messick perspective Borsboom perspective
Theoretical
foundation
Constructivism Constructivism Realism
Conception Construct validity is a
property of tests
Construct validity is a
property of test-score
interpretations
Validity is property of
tests
Criteria Convergent validity
Divergent validity
Nomological validity
Quality of construct
representation
Absence of irrelevant
variance
Test of causal theory
Prototypical
design
MTMM design
Correlation with criterion
Deductive design Experimental design
Outcome Gradual judgement of
validity
Gradual judgement of
validity
Binary judgement of
validity
First, content validity receives insufficient attention in Churchill’s perspective on
construct validity. Moreover, content validity was confused with face validity (e.g., Churchill,
1979; Bearden et al. 1993, p. 3). This may be considered the major flaw of the Churchill
perspective, because face validity only provides intuition for a particular interpretation of
what the test measures. Instead, empirical evidence is needed to support construct validity.
Such evidence comes from the investigation of the fit of the measurement model, the
plausible sources of measurement bias, and the nomological network of the construct. The
investigation of a test’s content validity adds to the process of construct validation in that it
provides evidence whether the item set used in the test is representative for the hypothetical
domain of items used to operationalise the attribute (e.g., Messick, 1989, pp. 36-42).
Second, the practice of MTMM research does not generate strong evidence of construct
validity. This is partly due to the fact that MTMM research is not concerned with content
validity, and partly due to the lack of direction of how to choose appropriate traits and
methods in MTMM studies. For obtaining strong evidence of construct validity, it is
necessary that the traits chosen are clearly similar and that the methods chosen are clearly
different. For example, Anastasi (1988, p. 158) argued that the agreement between two
measures of the same trait that are obtained by maximally similar methods reflects reliability,
and that the agreement between two measures of the same trait that are obtained by maximally
different methods reflects validity. In general, the methods applied in MTMM studies are
27
quite similar (e.g., Byrne, 1989; Churchill & Supranant, 1982; Fornell & Larcker, 1981; Saris
et al., 1998; Scherpenzeel, 1995; Wirtz & Lee, 2003). As a consequence, the agreement
between different measures of the same trait mostly reflects reliability rather than validity.
The Messick perspective is the leading perspective on construct validity in psychology.
In this perspective, construct validity is conceived of as a property of test-score interpretations
(i.e., the appropriateness of the interpretations of test scores in terms of the construct of
interest; this is also labeled validity of measurement, validity of test-score interpretations, and
construct validity of test scores). The best argument in favour of this conception of construct
validity is that a test may yield valid measurements of the construct of interest in one context,
and invalid measurements of the construct in another context. Moreover, a particular
interpretation of a test score may be valid while another interpretation is invalid. Therefore it
is the test-score interpretation that needs to be validated, and not the test.
The Messick perspective matches the constructivist position regarding the ontology of
psychological constructs. This is a major virtue of the Messick perspective. Another major
virtue is that it can be put into action by the deductive design (Schouwstra, 2000). The
deductive design provides a methodology for validation research that addresses the two global
threats of construct validity, and that is in line with Messick’s conception of construct
validity. Also, the deductive design incorporates the rationales behind test development. This
is in agreement with the notion stipulating that construct validation starts with the process of
test development. For these reasons, we subscribe to Messick’s perspective on construct
validity and Schouwstra’s methodology for validation research.
The Borsboom perspective is important for several reasons. First, it advocates a theory-
driven approach to construct validation. Borsboom et al. (2004) rightly argued that construct
representation is at the core of validity, and that proof of construct representation is founded
in theory regarding the construct of interest. Second, Borsboom et al. (2004) demonstrated the
limited usefulness of investigating convergent, divergent, and nomological validity. They
rightly argued that the investigation of these types of validity is subordinate to other evidence
regarding construct representation, such as theory testing. Third, Borsboom et al. (2004)
recommended that one explicates and tests theories of response behaviour. This is a useful
suggestion, because there is ample evidence of the disturbing influence of method
characteristics on response behaviour (e.g., Belson, 1981; Belson, 1986; Bradburn, 1983;
Rugg, 1941; Schuman & Presser, 1981; Sudman & Bradburn, 1982; Saris et al. 1998;
Scherpenzeel, 1995). Fourth, Borsboom et al. (2004) criticised Messick’s (1989, p. 13)
definition of validity as a judgement instead of a property. We subscribe to that criticism
28
where it concerns construct validity, but not where it concerns validity for decision-making
uses of test scores.
Borsboom et al. (2004) subscribed to Kelley (1927) that a test is valid if it measures
what it purports to measure. Thus, the Borsboom perspective is characterised by the
conception of validity as a property of tests. This conception of validity is problematic,
because whether a test measures what it purports to measure does not depend exclusively on
the content of the test. It also depends on, for example, the administration of the test, the
population in which the test is used, and eventually on the research goal. Thus, a particular
test may measure what it purports to measure in one instance but not in another instance.
Consequently, validity has to be assessed with each administration of a test, and this justifies a
conception of validity as a property of test-score interpretations.
The major weakness of the Borsboom perspective is its foundation in a realistic
conception of properties, which causes three problems. The first problem pertains to the
meaning of the statement Property X exists. According to realism, the statement expresses that
property X exists as an entity, independent of the observations (Borsboom, 2005, p. 6). We
consider this interpretation inappropriate, because properties are organisational principles
through which we perceive and interpret the world. Some of these organisational principles
are useful because they have many empirical referents. An example is aggression. Other
organisational principles are less useful because they have few if any empirical referents. An
example is clairvoyance. Thus, we contend that the statement property X exists expresses that
property X exists as an organisational principle. The second problem pertains to the statement
that variations in the property cause variations in the outcomes of measurement procedures.
This statement cannot be tested because one cannot observe covariation between an
unobservable entity and its measurement. Thus, one cannot know whether this statement is
true. The third problem pertains to the definition of properties. Borsboom’s perspective
requires a well-specified theory on the relationship between the property and response
behaviour. The theory should specify the set of responses for each level of the property, how
responses vary if levels of the property vary, and which response patterns exist and which not.
This amounts to a definition of the property in terms of response patterns, but that cannot be
the meaning of the property.
The Borsboom perspective may suit abilities, such as transitive reasoning, for which the
meaning is close to its operationalisation. However, for psychological attributes such as
satisfaction the Messick perspective is to be preferred, because it is founded on a
constructivist position regarding the ontology of psychological properties.
29
30
7 Conclusions
1. A psychological construct is a theoretical concept with theoretical and empirical
meaning. There is, however, no identity relation between the theoretical meaning and
the empirical meaning. This means that a construct has a surplus meaning over its
empirical indicators.
2. The theoretical meaning of a construct is linguistic by nature. It is the linguistic use of a
construct that grants meaning to the construct, and it is the examination of the linguistic
use that demonstrates the theoretical meaning of the construct. This means that the
theoretical meaning of a construct should be studied by means of an examination of
various examples of the linguistic use of that construct.
3. The theoretical meaning of a construct encompasses (a) the group of attributes and
phenomena the construct refers to, and (b) the relation of the construct with other
constructs in a nomological network. The former component is expressed in the explicit
definition of the construct and the latter component in the implicit definition of the
construct (Schouwstra, 2000, p. 61).
4. The empirical meaning of a construct embraces a domain of behaviours that cannot be
delineated sharply and cannot be listed exhaustively. Nevertheless, the construct has to
be measured on the basis of different observations from this behavioural domain. The
sampling of these observations constitutes the first phase of the measurement process
(Coombs, 1964, p. 4).
5. The development and validation of psychological theory requires measurements of
constructs that are in line with their theoretical meaning. This supports a deductive
approach to test development, which means that the development of the test is based
upon a formal definition of the construct of interest.
6. The Messick perspective on construct validity corresponds best with the linguistic
conception of psychological constructs. In this perspective, construct validity is the
appropriateness of test-score interpretations in terms of the construct of interest.
7. The deductive design exemplifies how to validate measurements according to Messick’s
perspective. For this reason, we chose the methodology of the deductive design for test
development and construct validation in the empirical study (Chapter 4 onwards).
31
32
33
Chapter 3
The theoretical meaning of customer satisfaction
1 Introduction
In chapter 2, we concluded that the theoretical meaning of a construct is inherently linguistic,
and that it is the linguistic use of the term that grants meaning to the construct (Wittgenstein,
1953). For this reason, the theoretical meaning of customer satisfaction has to be clarified by
means of an examination of the linguistic use of the term. This is the examination of examples
of the linguistic use of the term in scientific studies as well as its use in everyday language.
In the present chapter, the theoretical meaning of customer satisfaction is investigated.
The investigation encompasses an examination of (a) conceptions of satisfaction, (b)
conceptions of dissatisfaction, (c) theories of satisfaction, (d) concepts in the nomological
network of satisfaction, and (e) measures of satisfaction in the marketing literature. Based on
the results of the investigation, the term customer satisfaction is explained and defined. The
explicit definition of customer satisfaction addresses the group of attributes and phenomena
that customer satisfaction refers to, and the implicit definition of customer satisfaction
addresses the connections of customer satisfaction with other concepts in a nomological
network.
2 Conceptions of satisfaction
A review of the marketing literature by Yi (1990) and Giese and Cote (2000) yielded a
multitude of definitions of consumer satisfaction, customer satisfaction, summary satisfaction,
and transaction-specific satisfaction. The different definitions of these terms reflect different
conceptions of satisfaction. In order to clarify the theoretical meaning of satisfaction, we
examined the major conceptions and the corresponding definitions of satisfaction in the
marketing literature.
The marketing literature distinguishes two important conceptions of satisfaction. The
first is satisfaction as a response to disconfirmation (Table 1, first column) and the second is
satisfaction as a valenced response to consumption (Table 1, second column). Both
conceptions can be applied to transaction-specific satisfaction (Oliver, 1997, p. 15), which
concerns satisfaction with single encounters with the focal object (Table 1, first row), and to
summary satisfaction (Oliver, 1997, p. 15), which concerns satisfaction with the accumulation
of encounters with the focal object (Table 1, second row). Each cell in Table 1 is associated
with several definitions of satisfaction, as they can be found in the marketing literature (e.g.,
Giese & Cote, 2000; Yi, 1990). Because the subject of this thesis is summary satisfaction with
a bank, we discuss both satisfaction as a response to disconfirmation and satisfaction as a
valenced response to consumption, and also the prototypical definitions of summary
satisfaction associated with each of the two conceptions of satisfaction.
Table 1: Conceptions of Satisfaction in the Marketing Literature Response to disconfirmation Valenced response to
consumption
Based on a single encounter with
focal object
Transaction-specific satisfaction Transaction-specific satisfaction
Based on accumulation of
encounters with focal object
Summary satisfaction Summary satisfaction
Satisfaction as a response to disconfirmation
Disconfirmation refers to the perceived discrepancy between pre-consumption expectations
and post-consumption perceptions. The conception of satisfaction as a response to
disconfirmation originated from disconfirmation theory (e.g., Oliver, 1980, 1997). According
to disconfirmation theory, the level of satisfaction (and also dissatisfaction) is a function of
pre-consumption expectations and disconfirmation of expectations. Whereas positive
disconfirmation of expectations contributes to satisfaction, negative disconfirmation of
expectations contributes to dissatisfaction. In the augmented disconfirmation theory, the level
of satisfaction is also a function of the perceptions of outcomes of consumption (Oliver, 1997,
pp. 119-121). The augmented disconfirmation model is represented in Figure 1.
Disconfirmation theory is the dominant satisfaction theory, and was investigated in
several studies (e.g., Churchill & Suprenant, 1982; De Ruyter, Bloemer, & Peeters, 1997;
Oliver, 1980; Oliver, 1997; Oliver & Burke, 1999; Oliver & DeSarbo, 1988; Tse & Wilton,
1988; Van Montfort, Masurel, & Van Rijn, 2000). Although these studies are not unanimous
with respect to the magnitude of the effects of expectations, perceptions, and disconfirmation
34
on satisfaction, there is evidence of the significance of each of these effects (e.g., Oliver,
1997; Oliver & Burke, 1999).
Expectations
Disconfirmation (Dis)satisfaction
Perceptions
Figure 1: The augmented disconfirmation model of satisfaction
The disconfirmation model has met with three important problems. The first problem
pertains to the use of pre-consumption expectations as the comparison standard for the
consumer’s post-consumption perceptions. Alternatives for this comparison standard are (a)
the ideals held by the consumer, (b) the needs of the consumer, and (c) standards concerning
fairness held by the consumer (Oliver, 1997, pp. 71-72, 133-134). Thus, there is no broad
consensus about the conception of disconfirmation. The second problem pertains to the
operationalisation of expectations. If one cannot get access to consumers before consumption
took place, it is not possible to measure pre-consumption expectations, and instead one can
only measure retrospective expectations at best. Because expectations may change during the
process of consumption, retrospective expectations may differ from the pre-consumption
expectations held by the consumer. The third and major problem pertains to the conception of
satisfaction as a response to disconfirmation (e.g., Bloemer, 1993, p. 93; Oliver, 1980; Tse &
Wilton, 1988). This conception disregards the content of the satisfaction response, which
should be the core of the explicit definition of the concept (e.g., Oliver, 1997, p. 13; Sartori,
1984, pp. 32-33; Schouwstra, 2000, p. 61).
The definitions of satisfaction associated with this theory define satisfaction in terms of
a response to disconfirmation. For example, Tse & Wilton (1988; also see Table 2) defined
35
36
consumer satisfaction/dissatisfaction as ‘the consumer’s response to the evaluation of the
perceived discrepancy between prior expectations (or some other norm of performance) and
the actual performance of the product as perceived after its consumption’. Bloemer (1993, p.
61; also see Table 2) defined satisfaction as the ‘outcome of the subjective evaluation that the
chosen alternative (the brand) meets or exceeds the expectations of the person’. It may be
noted that the subjective evaluation is the perceived discrepancy between prior expectations
and actual performance of the brand, and that the subjective evaluation results from the
processing of expectations and performance of the brand. Bloemer (1993, p. 93; also, see
Bloemer & Kasper, 1995; Bloemer & Poiesz, 1989) argued that the extent to which persons
process expectations and performances depends on both the motivation and the ability of the
person to do so. For this reason, she differentiated between latent satisfaction, which results
from a low degree of processing of expectations and performances, and manifest satisfaction,
which results from a high degree of processing of expectations and performances. Because
this differentiation is an elaboration of the conception of satisfaction, it is an important
extension of disconfirmation theory.
Satisfaction as a valenced response to consumption
The conception of satisfaction as a valenced response to consumption concerns the
satisfaction response to consumption experiences, and is therefore typical of consumer
satisfaction and customer satisfaction. Oliver (1997, p. 28) explained valence as ‘polarity, the
positivity or negativity of a state of nature’. Thus, a valenced response can be placed on a
dimension that ranges from negative to positive. A special case of the valenced response is the
neutral response. A neutral response to consumption is given when a person is neither
satisfied nor dissatisfied with his or her consumption experience. It may be noted that in the
conception of satisfaction as a valenced response to consumption, the satisfaction response is
distinguished from non-valenced responses (e.g., the propositions it is dark, and 2+2=4), and
valenced responses towards things, which were not consumed (e.g., a person’s judgement of a
car that he or she never drove).
The prototypical definitions associated with this conception of satisfaction are the
definitions provided by Howard and Sheth (1969), Fornell (1992), Oliver (1997), and Giese
and Cote (2000). There are important differences between these definitions. Howard and
Sheth (1969; also see Table 2) defined satisfaction as ‘the buyer’s cognitive state of being
adequately or inadequately rewarded for the sacrifices he or she has undergone’. This is the
Tabl
e 2:
Def
initi
ons o
f Sat
isfa
ctio
n in
the
Mar
ketin
g Li
tera
ture
Aut
hor
Con
cept
ion
of
satis
fact
ion
Def
initi
on o
f sat
isfa
ctio
n Ex
plan
atio
n of
the
defin
ition
of s
atis
fact
ion
Tse
&
Wilt
on
(198
8)
Res
pons
e to
disc
onfir
mat
ion
The
cons
umer
’s re
spon
se to
the
eval
uatio
n of
the
perc
eive
d
disc
repa
ncy
betw
een
prio
r exp
ecta
tions
(or s
ome
othe
r nor
m o
f
perf
orm
ance
) and
the
actu
al p
erfo
rman
ce o
f the
pro
duct
as
perc
eive
d af
ter i
ts c
onsu
mpt
ion.
A p
roto
typi
cal d
efin
ition
of s
atis
fact
ion
in
disc
onfir
mat
ion
theo
ry. S
atis
fact
ion
is n
ot e
quat
ed w
ith
disc
onfir
mat
ion,
but
it is
con
ceiv
ed a
s a re
spon
se to
disc
onfir
mat
ion.
Blo
emer
(199
3)
Res
pons
e to
disc
onfir
mat
ion
The
outc
ome
of th
e su
bjec
tive
eval
uatio
n th
at th
e ch
osen
alte
rnat
ive
(the
bran
d) m
eets
or e
xcee
ds th
e ex
pect
atio
ns o
f the
pers
on.
A p
roto
typi
cal d
efin
ition
of s
atis
fact
ion
in
disc
onfir
mat
ion
theo
ry. T
he a
utho
rs d
iscr
imin
ate
betw
een
the
proc
esse
s tha
t evo
ke m
anife
st sa
tisfa
ctio
n
and
the
proc
esse
s tha
t evo
ke la
tent
satis
fact
ion.
How
ard
&
Shet
h (1
969)
Val
ence
d re
spon
se
to c
onsu
mpt
ion
The
buye
r’s c
ogni
tive
stat
e of
bei
ng a
dequ
atel
y or
inad
equa
tely
rew
arde
d fo
r the
sacr
ifice
s she
or h
e ha
s und
ergo
ne.
The
defin
ition
exp
ress
es th
at th
e sa
tisfa
ctio
n re
spon
se
is c
ogni
tive
by n
atur
e.
Forn
ell
(199
2)
Val
ence
d re
spon
se
to c
onsu
mpt
ion
An
over
all p
ost-p
urch
ase
eval
uatio
n.
A p
roto
typi
cal d
efin
ition
of s
umm
ary
satis
fact
ion,
or
cum
ulat
ive
satis
fact
ion.
Oliv
er
(199
7)
Val
ence
d re
spon
se
to c
onsu
mpt
ion
The
judg
emen
t tha
t a p
rodu
ct o
r a se
rvic
e fe
atur
e, o
r the
pro
duct
or se
rvic
e its
elf,
prov
ided
or i
s pro
vidi
ng a
ple
asur
able
leve
l of
cons
umpt
ion-
rela
ted
fulfi
lmen
t, in
clud
ing
leve
ls o
f und
er- o
r
over
fulfi
lmen
t.
The
defin
ition
exp
ress
es th
at th
e sa
tisfa
ctio
n re
spon
se
may
hav
e af
fect
ive
and
cogn
itive
con
tent
.
Gie
se &
Cot
e (2
000)
Val
ence
d re
spon
se
to c
onsu
mpt
ion
(a) a
n af
fect
ive
resp
onse
of v
aryi
ng in
tens
ity, (
b) d
irect
ed
tow
ards
foca
l asp
ects
of t
he a
cqui
sitio
n an
d/or
con
sum
ptio
n op
prod
ucts
and
serv
ices
, (c)
det
erm
ined
at t
he ti
me
of p
urch
ase
or
tem
pora
l poi
nts d
urin
g co
nsum
ptio
n.
The
defin
ition
exp
ress
es th
at th
e sa
tisfa
ctio
n re
spon
se
is a
ffec
tive
by n
atur
e.
37
prototypical definition of satisfaction as a cognition. Fornell (1992; also see Anderson,
Fornell, & Lehmann, 1994; also see Table 2) defined customer satisfaction as ‘an overall
post-purchase evaluation’. This definition was only applied with respect to summary
satisfaction, and it was the basis of several national customer satisfaction indices (Fornell,
1992; Johnson, Gustafsson, Andreassen, Lervik, & Cha, 2001).
Oliver (1997, p. 13) defined consumer satisfaction as ‘the judgement that a product or a
service feature, or the product or service itself, provided or is providing a pleasurable level of
consumption-related fulfilment, including levels of under- or overfulfilment’. This definition
requires an explanation. First, the definition expresses that satisfaction is a response to
fulfilment, which implies that it is evoked during or after consumption. Second, the term
judgement in the definition expresses that the satisfaction response is a valenced response.
Third, the term fulfilment in the definition expresses that a goal exists, that something needs to
be fulfilled. Fourth, the term pleasurable in the definition expresses that satisfaction includes
affects. This notion is in line with the results from recent studies into the nature of satisfaction
responses (e.g., Friman, 2004; Giese & Cote, 2000; Van Dolen, Lemmink, Mattsson, &
Rhoen, 2001; Wirtz & Lee, 2003).
Oliver (1997, pp. 318-319) noted that satisfaction responses may become manifest as an
affect (a pleasant or an unpleasant feeling), a cognition (a positive or a negative judgement),
or both. Whether the satisfaction response is manifested as an affect, a cognition, or both
depends on the person, the focal object, and the context. For example, satisfaction with the
postal services may become manifest in the form of cognitions, and satisfaction with dinner in
a restaurant may become manifest in the form of affects. Consequently, Oliver (1997, pp.
318-319) distanced himself from the view of satisfaction as anhedonic cognition. He
concluded that affects coexists alongside cognitive judgements in producing the satisfaction
response. This means that satisfaction may be manifested in affects as well as in cognitions.
Oliver (1997) demonstrated that satisfaction may arise from different processes, such as
performance evaluations, processing of expectations, disconfirmation of expectations, need
fulfilment, equity evaluations, cognitive dissonance, and processing of affects. Therefore he
concluded that satisfaction may become manifest in various responses. Oliver (1997, pp. 337-
342) suggested differentiating between four prototypical satisfaction responses, which he
labeled satisfaction-as-contentment, satisfaction-as-pleasure, satisfaction-as-delight and
satisfaction-as-relief. In some contexts, satisfaction may be manifested as the absence of
dissatisfaction (Giese & Cote, 2000; Westbrook & Oliver, 1991). In survey research in the
automotive industry, Westbrook and Oliver (1991) demonstrated that a large part of the
38
consumers was rather unemotional about their car. In general, these consumers responded
positively to satisfaction items, and negatively to dissatisfaction items. The authors argued
that in this consumer segment, satisfaction might be interpreted as the absence of
dissatisfaction. This implies, for example, that consumers remain satisfied until problems
occur that hamper consumption. According to Oliver (1997, p. 340), absence of
dissatisfaction is a special case of satisfaction-as-contentment.
Oliver (1997, p. 339) described the contentment satisfaction state as a passive response
to consumption that results when satisfaction states are maintained or prolonged. Contentment
satisfaction or latent satisfaction (Bloemer, 1993) appears to be a common meaning of
satisfaction in contexts that are characterised by stable consumption outcomes, such as the
consumption of postal services or of a long-lasting consumer durable. According to Oliver
(1997, p. 340), if a survey focuses on satisfaction in an ongoing-use situation, most persons
will be responding from a satisfaction-as-contentment state, and fewer persons will be
responding from a satisfaction-as-delight, satisfaction-as-pleasure, or satisfaction-as-relief
state.
Giese and Cote (2000) defined consumer satisfaction as ‘(a) an affective response of
varying intensity, (b) directed towards focal aspects of the acquisition and/or consumption of
products or services, and (c) determined at the time of purchase or temporal points during
consumption, and lasting for a finite but variable amount of time’. This is the prototypical
definition of satisfaction as an affect. Qualitative research in a sample of 158 persons (Giese
& Cote, 2000) demonstrated that 60 to 70 percent of the participants explained the term
satisfaction in terms of affect. This is an important result because it demonstrates the affective
content of satisfaction. Giese and Cote (2000) concluded that consumer satisfaction is an
affective response of a consumer towards some phenomenon. They argued that cognitions
may be at the basis of the formation of consumer satisfaction, but that these cognitions do not
constitute consumer satisfaction.
Giese and Cote also argued that the meaning of satisfaction is context-specific. There
are many contextual variables that affect how satisfaction is perceived, and these variables
differ over domains in reality. For example, satisfaction with a retail bank differs from
satisfaction with medical care or satisfaction with a sports car. Persons have different needs
and different expectations in different contexts, and these differences influence the meaning
of satisfaction in these contexts. Therefore, Giese and Cote (2000) concluded that the
definition and the measurement of satisfaction also are context-specific. They proposed a
39
framework for developing context-specific definitions of consumer satisfaction. In line with
their definition, the framework addresses three components of the definition of satisfaction.
These components are (a) the type of affective response, (b) the timing of the response, and
(c) the focus of the response. The framework should facilitate the development of context-
specific definitions of satisfaction and corresponding measurement procedures.
3 Conceptions of dissatisfaction
A major issue in satisfaction research, including satisfaction research in the marketing
domain, is the conception of dissatisfaction. The literature provides two stances regarding the
conception of dissatisfaction (Giese & Cote, 2000). Dissatisfaction is either considered to be
the opposite of satisfaction on a bipolar dimension (the one-factor theory; Figure2) or
satisfaction and dissatisfaction are viewed as two different dimensions (the two-factor theory;
Figure 2). The latter stance postulates that an individual can be simultaneously satisfied and
dissatisfied with a focal object (Yi, 1990). This means, for example, that one can be
simultaneously satisfied and dissatisfied with one’s car if, for example, the car is reliable but
does not accelerate well.
According to the one-factor theory, dissatisfaction is the opposite of satisfaction on a
bipolar dimension. This stance is reflected in, for example, Oliver’s (1997, p. 28) definition of
dissatisfaction as ‘the negative satisfaction state, when the consumer’s level of fulfilment is
unpleasant’. Thus, he considers dissatisfaction to be the opposite of satisfaction on a bipolar
dimension. It is noteworthy that the conception of dissatisfaction as the opposite of
satisfaction does not defy the possibility that a consumer is satisfied with one aspect of
consumption outcomes and dissatisfied with another aspect. However, it does defy the
possibility that a consumer is both satisfied and dissatisfied with one phenomenon at one
point in time.
According to the two-factor theory (Herzberg, Mausner, & Snyderman, 1959)
satisfaction and dissatisfaction have different antecedents, and should be conceived of as
independent dimensions. The notion that satisfaction and dissatisfaction have different
antecedents, results from research into phenomena that caused satisfaction responses and
phenomena that caused dissatisfaction responses (e.g. Herzberg et al., 1959; Johnston, 1995).
For example, Johnston (1995) reported that the phenomenon of helpfulness of a bank was a
determinant of satisfaction with a bank, and that the phenomenon of integrity of a bank was a
determinant of dissatisfaction with a bank. Similarly, Herzberg et al. (1959, pp. 72-74)
40
reported that the phenomenon of responsibility was a determinant of satisfaction with a job,
and the phenomenon of salary was a determinant of dissatisfaction with a job. The
phenomena that are expected to cause satisfaction responses are often labeled motivator
factors or motivators, and the phenomena that are expected to cause dissatisfaction are often
labeled hygiene factors or hygienes (e.g., Oliver, 1997, pp. 146-150; Wolf, 1970).
Two-factor theory One-factor theory
Satisfaction and dissatisfaction are unipolar constructs
Not satisfied Satisfied
Dissatisfaction is the opposite of satisfaction on a bipolar dimension
Not dissatisfied Dissatisfied
Satisfied Dissatisfied
Figure 2: Conceptions of satisfaction and dissatisfaction in the one-factor theory and the two-factor theory, respectively
The two-factor theory is disputable because empirical research demonstrated that a
phenomenon (e.g., magnitude of responsibility) can be a source of both satisfaction and
dissatisfaction (e.g, job satisfaction and job dissatisfaction; for an overview of empirical
studies into the two-factor theory, see Wolf, 1970; see also Oliver 1997, pp. 146–150). For
example, Soliman (1970) studied satisfaction and dissatisfaction of persons with their jobs,
and found that satisfaction and dissatisfaction were the opposite ends of a continuum.
Furthermore, Soliman (1970) found that when needs of a person were provided for
adequately, motivators were more important for satisfaction/dissatisfaction than hygienes, and
when needs of a person were provided for moderately, motivators and hygienes were equally
important for satisfaction/dissatisfaction. Eventually, Soliman (1970) concluded that the
effects of motivators and hygienes on satisfaction/dissatisfaction were dependent upon the
level of need fulfilment which was already accomplished. On the basis of a review of various
research findings, Wolf (1970) reached a similar conclusion.
41
Generalising the results of Soliman (1970) and Wolf (1970) implies, for example, that a
person’s satisfaction/dissatisfaction with his or her car depends on the level of need fulfilment
which was already accomplished. Assuming that the acceleration power of a car is a motivator
factor and that the reliability of a car is a hygiene factor, acceleration power of one’s car is
more important for satisfaction/dissatisfaction when the needs of a person are provided for
adequately, and reliability of one’s car is more important for satisfaction/dissatisfaction when
the needs of a person are provided for badly.
Russell and Carroll (1999a) investigated whether positive affect at some point in time is
the opposite of negative affect at that same point in time, or whether positive affect is
independent of negative affect. They defined a bipolar model of momentary affect, deduced
the theoretical correlations between positive affect measures and negative affect measures,
and compared these theoretical correlations with the empirical correlations observed in
various empirical studies (for an overview, see Russell & Carroll, 1999a). The authors
concluded that when controlling for the major factors that influence the correlation between
positive affect and negative affect, which are measurement error, item selection, and response
format, there was no basis for rejection of the bipolarity hypothesis. The more sources of bias
against bipolarity were removed the closer the data matched the bipolar model. Consequently,
Russell and Carroll (1999a, 1999b) concluded that the empirical evidence supports the
bipolarity hypothesis of momentary affect. It is plausible that this conclusion can be
generalised to satisfaction, and that dissatisfaction should be conceived of as the opposite of
satisfaction on a bipolar dimension. This is consistent with the dominant causal theory of
satisfaction, which is disconfirmation theory (e.g., Oliver, 1997; Tse & Wilton, 1988).
Generalising the results of Russell and Carroll (1999a, 1999b) to satisfaction and
dissatisfaction, a person’s simultaneous satisfaction with the reliability of his or her car and
dissatisfaction with its acceleration power does not imply that satisfaction and dissatisfaction
have to be considered two different dimensions. It implies that satisfaction/dissatisfaction is
assessed with respect to different attributes of the car and that, with respect to each attribute,
satisfaction is the opposite of dissatisfaction on a bipolar dimension. Thus, satisfaction with a
focal object can be conceived of as the opposite of dissatisfaction with the same focal object
(Oliver, 1997, p. 28).
42
4 The dual process model of satisfaction and dissatisfaction
Oliver (1997) proposed a model that describes how both a satisfaction response and a
dissatisfaction response may result from different psychological processes. This model is
denoted as the dual-process model (Oliver, 1997, p. 317), because it addresses two kinds of
processes, appraisal and non-appraisal of affects and cognitions, which may evoke a
satisfaction response. The satisfaction response may be manifested in the form of (a)
unappraised affects, (b) appraised affects, (c) unappraised cognitions, and (d) appraised
cognitions. Oliver conceived of unappraised affects and unappraised cognitions as the
immediate affects and the immediate cognitions that follow upon the experience of the focal
object. Appraised affects and appraised cognitions refer to affects and cognitions that have
been elaborated more intensively.
Satisfaction responses as unappraised affect refer to the immediate pleasure or the
immediate displeasure caused by consumption experiences. For example, an unappraised
affect is the immediate pleasure caused by smoking a cigarette. Satisfaction responses as
appraised affects result from the elaborations upon these affects. These elaborations include
the attribution of affects to a particular cause, and the evaluation of the value of the affect for
the individual. For example, the immediate reaction to smoking a cigarette may be the
experience of satisfaction and feelings of comfort, but the cognitive elaboration upon smoking
may yield feelings of doubt and eventually dissatisfaction. Unappraised cognitions are factual
cognitions regarding consumption outcomes, which are not further processed and do not raise
affects. The processes evoking unappraised cognitions account for the manifestation of
satisfaction as anhedonic cognitions; for example, noticing that one’s car functions well
without experiencing any feelings whatsoever (e.g., Oliver, 1997, pp. 318; Westbrook &
Oliver, 1991). Satisfaction responses as appraised cognitions result from elaborations of
cognitions resulting from consumption experiences, such as the satisfaction responses that
result from disconfirmation of expectations. For example, contrary to expectation one’s car
may not function well. The disconfirmation may evoke feelings of displeasure and eventually
dissatisfaction. The dual-process model is represented in Figure 3. It may be noted that
affects, cognitions, and satisfaction are psychological properties, and that consumption and
appraisal are activities.
The dual-process model accounts for different manifestations of satisfaction. First, the
process evoking unappraised affects accounts for the manifestation of satisfaction as an
affective response to consumption experiences. The conception of satisfaction as unappraised
43
affect is a special case of the manifestation of the satisfaction response according to the
definition of satisfaction by Giese and Cote (2000), which also includes affective appraisals of
cognitions. Second, the process evoking appraised affects accounts for the manifestation of
satisfaction as an overall evaluation. This manifestation of the satisfaction response may be
interpreted as a special case of the definition of satisfaction by Fornell (1992), which seems to
be focussed primarily at the cognitive evaluation of consumption experiences without
explicitly distinguishing immediate cognitions and elaborations of cognitions, but far less at
affects. Third, the process evoking unappraised cognitions accounts for the manifestation of
satisfaction as anhedonic cognitions (e.g., Oliver, 1997, pp. 318; Westbrook & Oliver, 1991).
Fourth, the process evoking appraised cognitions accounts for the manifestation of
satisfaction as a response to cognitions, such as the affective response to disconfirmation.
This manifestation of the satisfaction response is consistent with the definition of satisfaction
given by Giese and Cote (2000).
Affects
Consumption Appraisal (Dis)satisfaction
Cognitions
Figure 3: Dual-process model of satisfaction and dissatisfaction
T he dual-process model is in agreement with the conception of satisfaction as a
valenced response to consumption experiences, and with Oliver’s (1997, p. 13) definition of
satisfaction. Therefore, the dual-process model constitutes an important contribution to
satisfaction theory. However, two remarks are in order. First, according to the dual-process
model appraisal is either present or absent. This may be a simplification of reality, because
appraisal may be represented by a continuum ranging from absence of appraisal to presence
44
of appraisal. Second, the dual-process model does not express the conditions under which
appraisal is present or absent. Therefore, further research is needed to elaborate the model.
5 Concepts in the nomological network of customer satisfaction with a retail bank
This section addresses the nomological network of customer satisfaction in the context of
retail banking (Figure 4). The nomological network of a concept is the network of
associations of a concept with other concepts. The nomological network with respect to
satisfaction that is relevant in this study includes the concepts of trust, quality, loyalty, and
profitability. This nomological network is shown in Figure 4. The four concepts are (a)
considered important in the financial services industry, and (b) expected to be related to
customer satisfaction in this industry. According to many theorists (e.g., Hennig-Thurau,
Gwinner, & Gremler, 2002); Luo & Homburg, 2007; Oliver, 1997; Verhoef, 2001; Yi, 1990),
customer satisfaction is also related to concepts such as word-of-mouth, image, commitment,
marketing communication, retention, and cross-sell. Each of these concepts may be further
split up into part concepts. For example, image may be split up into corporate associations,
corporate image, and corporate reputation (e.g., Berens, 2004), and commitment may be split
up into affective commitment and calculative commitment (e.g., Verhoef, 2001). These
additional concepts were ignored in this study, because (a) trust, quality, loyalty, and
profitability were considered of primary importance to satisfaction research in the context of
retail banking, (b) inclusion of all concepts would introduce redundancy, such as the inclusion
of both loyalty (primary importance) and commitment (alternative concept), and (c) it was
anticipated that the measurement of all concepts in a survey would produce a questionnaire
that would be too long and ask too much time and effort of the participants of this study. Even
though one might argue that the alternative concepts also have a place in the nomological
network of satisfaction, we decided to leave then out to maintain a simple model tailored to
the practice of this study (Chapter 4 onwards).
First, the relationship between trust and customer satisfaction is discussed. Trust is
considered to be of major importance in retail banking, and has been shown to be related to
customer satisfaction (e.g. Hennig-Thurau et al., 2002; Singh & Sirdeshmukh, 2000; Verhoef,
2001). Trust is often seen as an antecedent of satisfaction (but for an exception, see Singh &
Sirdeshmukh, 2000); thus, in Figure 4 an arrow runs from trust to satisfaction.
45
Satisfaction
Profitability
Loyalty
Quality
Trust
Figure 4: Nomological network of satisfaction in the context of retail banking
Second, the relationship between quality and customer satisfaction is addressed. Quality
of products and services is considered to be of major importance in retail banking, and has
been shown to be related to customer satisfaction (e.g., Anderson et al., 1994; Cronin &
Taylor, 1992; Zeithaml & Bitner, 1996). Like trust, quality is often conceived of as an
antecedent of satisfaction but there seems to be more agreement among theorists with respect
to quality; thus, in Figure 4 the arrow runs from quality to satisfaction.
Third, the relationship between customer satisfaction and customer loyalty is addressed.
The relationship between these constructs has been demonstrated in various studies (e.g.,
Caruana, 2002; Oliver, 1999), and customer satisfaction is often conceived of as a necessary
although not a sufficient condition for customer loyalty (e.g. Gremler & Brown, 1996; Oliver,
1999). Therefore, in Figure 4 the arrow runs from satisfaction to loyalty.
Fourth, the relationship between customer satisfaction and customer profitability is
discussed. Longitudinal studies by Anderson et al. (1994), Anderson and Mittal (2004), and
Gruca and Rego (2005) have investigated the relationship between customer satisfaction and
future financial performance of companies. The results of these studies strengthen the
expectation that customer satisfaction influences customer profitability. In Figure 4, the arrow
pointing toward customer profitability shows the influence of customer satisfaction on
customer profitability.
46
Conceptions of trust
A review of the marketing literature yields two important conceptions of trust. The
expectations-conception of trust focuses on a person’s expectations with respect to an
exchange partner, while the behavioural-conception focuses on a person’s behavioural
intentions with respect to an exchange partner (Singh & Sirdeshmukh, 2000). An example of
an expectation is that a customer expects to be treated fair by the bank, and an example of a
behavioural intention is the customer’s intention to continue the relationship with the bank or
even expand the relationship, for example, by buying new products such as an insurance or a
mortgage in addition to a bank account. The major difference between these conceptions is
that the expectations-conception of trust does not include behavioural intentions in the domain
of trust, while the behavioural-conception of trust does.
Morgan and Hunt (1994) conceived of trust as existing when one party has confidence
in an exchange partner’s reliability and integrity. This is an expectations-conception of trust,
which is based upon Rotter (1967), who defined trust as a generalised expectancy held by an
individual that the word of another individual or a group can be relied upon. Following
Morgan and Hunt (1994), we defined trust as a person’s confidence in the reliability and
integrity of the company. This is a common definition of trust in the marketing literature (e.g.,
Verhoef, 2001, p.18), which we also adopt in this study (also, see Chapter 5).
Singh and Sirdeshmukh (2000) conceived of trust as a continuum that is bounded on one
side by a high level of trust and on the other side by a high level of distrust. The trust state and
the distrust state differ with respect to the valence of the expectations held by the person. It
may be noted that some authors suggested distinguishing between different dimensions of
trust, such as competence-trust and benevolence-trust (e.g., Singh & Sirdeshmukh, 2000), or
benevolence-trust and honesty-trust (e.g., Medlin & Quester, 2002). This stance implies that
each dimension of trust is bounded by a high level of trust on the one side and by a high level
of distrust on the other side. However, the dimensionality of trust is an empirical question,
and studies establishing the dimensionality of trust are rare (Singh & Sirdeshmukh, 2000) so
that definitive conclusions cannot be drawn. It may also be noted that empirical research
demonstrated a relation between expectations and customer satisfaction. This relation is
reflected in disconfirmation theory, in which expectations are conceived of as antecedents of
customer satisfaction (e.g., Oliver, 1997, Tse & Wilton, 1988). Because trust concerns a
person’s expectations regarding an exchange partner (Morgan & Hunt, 1994), trust may also
be conceived of as an antecedent of customer satisfaction (Singh & Sirdeshmukh, 2000).
47
In the financial services industry, trust is often conceived of as confidence in the
reliability and integrity of a company. This is in agreement with the expectations-conception
of trust, which is the common conception of trust in the marketing literature. Because persons
are expected to prefer a company they trust to companies they do not trust, trust is considered
an important success factor for companies in the financial services industry (e.g., Goedee,
Reijnders, & Van Thiel, 2008).
Conceptions of quality
There are two important conceptions of quality, which are objective quality and perceived
quality (Oliver, 1997; p. 162-166). Objective quality pertains to the extent that a product, a
service, or a process meets its technical specifications. It may be operationalised as the
number of failures of a product, a service, or a process (e.g., Garvin, 1983; Kackar, 1989, p. 6;
Woodall, 2001; because the number of failures is counter-indicative of quality, small numbers
of failures reflect high quality and large numbers of failures reflect low quality). Perceived
quality pertains to a person’s judgements of quality of products or services. It may be
operationalised on the basis of a questionnaire (e.g., Parasuraman, Berry, & Zeithaml, 1988;
Cronin & Taylor, 1992). Perceived quality is similar to perceived performance of products or
services, which is broadly conceived of as an antecedent of customer satisfaction (e.g., Oliver,
1997; Tse & Wilton, 1988; Yi, 1990).
The meaning of quality is context-specific. This implies that the definition and the
operationalisation of quality have to be adapted to the context and the purpose of a study. In
the present study, we defined quality as a person’s perceptions of the quality of attributes of
products and services provided by the company (also, see Chapter 5). Thus, in this study
quality was conceived of as perceived quality, which is in agreement with the conception of
quality in many studies (e.g., Grönroos, 1990; Zeithaml, Parasuraman, & Berry, 1990).
Furthermore, quality is established with respect to distinct attributes of products and services,
which corresponds with the suggestion of theorists (e.g., Anderson & Mittal, 2000; Zeithaml
et al, 1990; Zeithaml & Bitner, 1996) to distinguish different dimensions of quality. For
example, Zeithaml and Bitner (1996, p. 85) distinguished service quality, product quality, and
price quality as drivers of customer satisfaction. The combination of a customer’s positions on
these dimensions was expected to drive customer satisfaction.
Service quality has been studied extensively (e.g., Cronin & Taylor, 1992, 1994;
Grönroos, 1984, 1990; Parasuraman, Zeithaml, & Berry, 1985, 1988, 1994; Zeithaml &
48
Bitner, 1996; Zeithaml, Parasuraman, & Berry, 1990). These studies yielded several
measurement instruments for service quality, for example SERVQUAL (Parasuraman et al.,
1988) and SERVPERF (Cronin & Taylor, 1992). One remark is in order concerning these
instruments. SERVQUAL and SERVPERF were developed for the measurement of quality
across industries, but they were not customised for the measurement of quality in particular
industries, such as retail banking (e.g., Buttle, 1996; Coulthard, 2004; Newman, 2001; Oliver,
1997, p. 49). Therefore, the instruments may not cover all aspects of quality that are relevant
within a particular industry, and for that reason business researchers are required either to
customise these instruments to their research domain or to develop new measurement
instruments.
In the financial services industry, quality is broadly conceived of as a driver of customer
satisfaction (e.g., Goedee et al., 2008; Terpstra & Van Gastel, 2004). This is in accordance
with academic studies and theories (e.g., Caruana, 2002; Oliver, 1997; Van Montfort,
Masurel, & Van Rijn, 2000; Tse & Wilton, 1988; Yi, 1991; Zeithaml & Bitner, 1996). A
major part of in-company research in this industry is aimed at the assessment of distinct
dimensions of quality, and their relations with satisfaction. For this purpose, quality is mostly
operationalised on the basis of quality judgements by customers, regarding distinct attributes
of products and services.
Conceptions of customer loyalty
In present marketing theories, customer loyalty is conceived of as a psychological construct.
Gremler and Brown (1996, 1999) have defined loyalty to a service provider as ‘the degree to
which a customer exhibits repeat purchasing behaviour from a service provider, possesses a
positive attitudinal disposition towards the provider, and considers only this provider when a
need for this service arises’. This definition encloses three different aspects of loyalty, which
are (a) behavioural loyalty, (b) attitudinal loyalty, and (c) cognitive loyalty. Gremler and
Brown (1996) described the ultimately loyal customer as one who ‘regularly uses a service
provider, really likes the organisation and thinks very highly of it, and does not ever consider
using another service provider for this service’. This description of the loyal customer
includes an implicit comparison of the service provider with other providers (also, see Dick &
Basu, 1994). On the other end of this continuum is the ultimately non-loyal customer, who
may be described as one who does not regularly use a service provider, does not really like
the organisation, does not think highly of it, and considers using another service provider for
49
this service (Gremler & Brown, 1996). Gremler and Brown’s (1996, 1999) conception of
loyalty to a service provider is similar to Oliver’s (1997, 1999) conception of customer
loyalty in general.
Most theorists agreed that customer loyalty encompasses psychological aspects as well
as behavioural aspects (e.g., Dick & Basu, 1994; Gremler & Brown, 1996, 1999; Oliver 1997,
1999). Therefore, the construct has to be measured on the basis of a set of items that reflect
both aspects. Empirical research using measurement instruments of customer loyalty that are
composed of items reflecting psychological aspects and behavioural aspects of customer
loyalty (e.g., Caruana, 2002; Gremler & Brown, 1999), yielded unidimensional measurements
of customer loyalty. Customer loyalty has also been operationalised as an intention to
recommend the company to family, friends, or colleagues (e.g., Reichheld, 2006). Because of
three reasons, it is doubtful whether this was a proper operationalisation. First, the
operationalisation did not agree with the definitions of customer loyalty provided by Oliver
(1997, 1999) and Gremler and Brown (1996, 1999). Reichheld’s (2006) operationalisation
appears more consistent with conceptions of word-of-mouth, which is a concept that was not
investigated in this study. Second, the operationalisation ignored the general principle that
psychological constructs are best measured on the basis of multiple-item scales (e.g., Messick,
1989) Third, Terpstra (2006a) found indications that customers, who said they will
recommend a particular company to friends and family, often said they will recommend
competing companies. This seems to be inconsistent with customer loyalty.
In the financial services industry, customer loyalty is considered important for
commercial success of companies (e.g., Goedee et al., 2008). Customer loyalty is expected to
affect the behaviour of customers and ultimately their profitability. Furthermore, business
researchers in this domain broadly conceive of customer loyalty as a consequence of customer
satisfaction. This agrees with results from academic research (e.g., Caruana, 2002; Gremler &
Brown, 1996; Hennig-Thurau et al., 2002; Oliver, 1997, 1999).
Conceptions of customer profitability
Customer profitability is of major importance for all commercial companies in service
industries, including the financial services industry. Theorists suggested using customer
profitability for marketing decision-making and accounting (e.g., Cooper & Kaplan, 1991;
Mulhern, 1999; Niraj, Gupta, & Narasimhan, 2001). There are two important conceptions of
customer profitability, which are gross customer profitability and net customer profitability.
50
Gross customer profitability refers to the gross financial contribution of a customer to the
company in some period of time (e.g., Cooper & Kaplan, 1991, p. 469; Niraj et al., 2001). In
the context of retail banking, the gross financial contribution consists of interest profits and
provision profits (to be discussed in Chapter 5). Net customer profitability refers to the net
financial contribution of a customer to a company in some period of time. The net financial
contribution consists of the customer’s gross customer profitability in that period of time
minus the companies’ costs allocated to the corresponding customer in the same period of
time (e.g., Campbell & Frei, 2004; Cooper & Kaplan, 1991, p. 469; Mulhern, 1999; Niraj et
al., 2001; Pfeifer, Haskins, & Conroy, 2005).
Customer profitability is the resultant of customer behaviour, such as the acquisition
and use of products and services from the focal company. Because customers differ with
respect to their behaviour, they also differ with respect to customer profitability. Furthermore,
because a customers’ behaviour changes over time, customer profitability also changes over
time. For example, a customer who increases his or her business with the company will
become more profitable to the company than he or she was before.
In the financial services industry, customer profitability is the resultant of financial
behaviour. Because a customers’ financial behaviour is related to his or her financial means, a
customer’s profitability is also related to his or her financial means. Obviously, a customer
with large financial means may achieve higher customer profitability than a customer with
smaller financial means. The absence of data with respect to customers’ means, which in this
kind of research is more the rule than the exception, may complicate research into the
connection between customer satisfaction and customer profitability in the financial services
industry.
The operationalisation of customer profitability is context-dependent. For example, the
period of time may be a day, a month, a quarter of a year, or a year (e.g., Campbell & Frei,
2004). For example, due to the high purchase frequency, a two-week period may be sufficient
to reliably record customers’ purchase behaviour in a supermarket (a two-week period is
expected to cancel out highs and lows), but due to the much lower purchase frequency, at
least a one-year period may be required to reliably record customers’ purchase behaviour with
a retail bank. Therefore, a two-week period may suffice for the operationalisation of customer
profitability for supermarkets, while a one year period is required for the operationalisation of
customer profitability in retail banking.
We expected that customer satisfaction positively influenced a customer’s gross
financial contribution, but we held no expectation about the influence of customer satisfaction
51
on the costs associated with a customer. Therefore, we chose the gross customer profitability
conception of customer profitability for the present study. In agreement with this conception
of customer profitability, we defined customer profitability as the gross financial contribution
of a customer to the company in some period of time.
The influence of customer satisfaction on customer profitability
Customer satisfaction is broadly expected to influence customer profitability and company
profitability (e.g., Anderson et al. 1994; Anderson et al. 2004; Anderson & Mittal, 2000;
Fornell, 1992; Gustafsson et al., 2005; Homburg et al., 2005; Mittal & Kamakura, 2001;
Oliver, 1997; Rust & Zahorik, 1993). This is an important reason for the interest in customer
satisfaction in various industries, including the financial services industry.
If customer satisfaction (denoted by CS) influences customer profitability (denoted by
CP), there must be a relation between customer satisfaction at time t = 0 and customer
profitability at time t > 0 (e.g., Ittner & Larcker, 1998). Then a model for the relation between
customer satisfaction (denoted CSt=0), other independent variables (denoted Xi), and future CP
(denoted CPt>0) is:
εγβα ++++= ∑=> iitt XCSCP ...00 .
The model was based on Ittner and Larcker (1998). The exact specification of the model is
context-dependent (see Chapter 6).
Henceforth, customer profitability at time t > 0 is labeled future customer profitability,
because it is the customer profitability at a point in time after the measurement of customer
satisfaction. Current customer profitability is measured at time t = 0 and customer satisfaction
is measured at time t = 0.
It is plausible that the effect size of customer satisfaction on future customer
profitability depends on characteristics of customers and markets, such as involvement of
customers and the availability of alternatives in the market. Fornell (1992) hypothesised that
customer satisfaction affects the commercial success of companies that operate in mature and
competitive markets. Therefore, we expect that in retail banking industries in mature markets,
customer satisfaction has a significant positive effect on future customer profitability.
Various studies (Anderson et al., 1994; Anderson & Mittal, 2000; Gruca & Rego, 2005;
Ittner & Larcker, 1998) demonstrated a relationship between customer satisfaction and
52
company profitability after one year. Ittner & Larcker (1998) also demonstrated a relationship
between customer satisfaction and customer profitability after one year. In-company research
(Terpstra, 2005, 2008) demonstrated a relationship between customer satisfaction and
customer profitability after 15 months. Therefore, we expect that in retail banking the
influence of customer satisfaction on customer profitability is manifest after one year.
Former studies in the financial services industry (e.g., Campbell & Frei, 2004; Terpstra,
2005, 2006b) demonstrated that current customer profitability is the major determinant of
future customer profitability. This relationship may be due to, for example, inertia of
customers, and the relationship between current customer profitability and the financial means
of customers. For these reasons, we consider current customer profitability an indispensable
variable in the model of the relation between customer satisfaction and future customer
profitability in retail banking.
6 Measures of satisfaction
Many measures of satisfaction have been reported in the marketing literature (e.g.,
Hausknecht, 1990; Peterson & Wilson, 1992; Westbrook & Oliver, 1981; Wirtz & Lee, 2003).
Hausknecht (1990) listed 34 measures (i.e., operationalisations) of satisfaction, which were
used in satisfaction research. The list included behavioural measures (i.e., registrations of
behaviour, such as number of complaints about the focal product) and self-report measures
(i.e., survey items, such as rating scales). The self-report measures differed with respect to the
number of items included (varying from one to six items), the format of items (verbal items,
graphical items, and items reflecting observations of behaviours such as the number of
complaints), the wording of the items (some items were phrased in the form of a question and
others were phrased in the form of a statement) and the format of response categories (varying
from two to thirteen response categories). Hausknecht (1990) noted that the validity of
measurements of satisfaction was rarely assessed; also see Giese and Cote (2000) and
Peterson and Wilson (1992).
It is remarkable that different measures of satisfaction, which were used in different
studies, yielded similar distributions of satisfaction ratings. Peterson and Wilson (1992) noted
that ‘Virtually all self-reports of customer satisfaction possess a distribution in which a
majority of the responses indicate that customers are satisfied and the distribution itself is
negatively skewed.’ They also demonstrated that method-related factors, such as question
format, question context, questionnaire administration, and measurement timing, affected the
53
54
average satisfaction ratings and the skewness of distributions of satisfaction ratings. They
concluded that it is not clear what customer satisfaction ratings reflect, that average
satisfaction ratings are not very informative without valid norms for average customer
satisfaction, and that more effort is needed to improve the measurement of customer
satisfaction.
In this section, different measures of satisfaction are discussed in association with the
corresponding definitions of satisfaction as discussed in Section 2 (Table 2). The definitions
of satisfaction and the corresponding measures are listed in Table 3.
Tse and Wilton (1988) used a single-item measure of satisfaction, which was a 5-point
bipolar item with response categories ranging from very dissatisfied to very satisfied. The
item reads: ‘Considering everything, how satisfied are you with the [product]?’. This bipolar
item is a rather common measure of satisfaction, which was also used by others who,
however, used a 7-point rating scale instead of a 5-point rating scale (e.g., Westbrook &
Oliver, 1991, Wirtz & Lee, 2003). Furthermore, the item was used in various multiple-item
measures of satisfaction (e.g., Wirtz & Lee, 2003).
Tse and Wilton (1988) demonstrated that their single-item measure correlated with
disconfirmation and perceived performance. Nevertheless, the measure has three drawbacks.
First, the definition of satisfaction by Tse and Wilton (1988) has a level of abstractness that
does not automatically lead to this specific item. Second, it is a single-item measure of
satisfaction, whereas most theorists suggested the use of multiple-item measures for the
measurement of psychological constructs such as satisfaction because multiple-item measures
better capture the meaning of the construct (e.g., Churchill, 1979; Jacoby, 1976; Messick,
1989; Yi, 1990). Third, Westbrook and Oliver (1991) demonstrated that their 7-point version
of the item performed worse than other measures of satisfaction that were used in the same
study. These three drawbacks call into question the validity of the measurement using a single
item.
Bloemer (1993) proposed a two-step approach to measure satisfaction and
dissatisfaction. First, a person was asked whether he or she was satisfied or dissatisfied with
the focal object. Second, the person was asked how satisfied (or how dissatisfied) he or she
was in terms of, for example, a percentage ranging from 0 to 100. Bloemer’s (1993) measure
correlated with commitment and repeat-purchasing behaviour. However, three comments are
in order. First, the measure lacks a thorough explanation. Bloemer (1993, pp. 79, 128)
conceived of satisfaction and dissatisfaction as two different dimensions, but this does not
Tabl
e 3:
Mea
sure
s of C
usto
mer
Sat
isfa
ctio
n an
d/or
Con
sum
er S
atis
fact
ion
Aut
hor
Def
initi
on o
f sat
isfa
ctio
n M
easu
re o
f sat
isfa
ctio
n
Tse
&
Wilt
on
(198
8)
The
cons
umer
’s re
spon
se to
the
eval
uatio
n of
the
perc
eive
d
disc
repa
ncy
betw
een
prio
r exp
ecta
tions
(or s
ome
othe
r nor
m o
f
perf
orm
ance
) and
the
actu
al p
erfo
rman
ce o
f the
pro
duct
as
perc
eive
d af
ter i
ts c
onsu
mpt
ion.
Satis
fact
ion
[ with
a fo
cal o
bjec
t (f.o
.)] w
as m
easu
red
on th
e ba
sis o
f one
5-
poin
t bip
olar
item
, with
resp
onse
cat
egor
ies r
angi
ng fr
om ‘v
ery
diss
atis
fied’
to ‘v
ery
satis
fied’
.
Blo
emer
(199
3)
The
outc
ome
of th
e su
bjec
tive
eval
uatio
n th
at th
e ch
osen
alte
rnat
ive
(the
bran
d) m
eets
or e
xcee
ds th
e ex
pect
atio
ns o
f the
pers
on.
Satis
fact
ion
[with
a f.
o.] w
as m
easu
red
on th
e ba
sis o
f a 2
-ste
p ap
proa
ch:
• Are
you
satis
fied
or d
issa
tisfie
d w
ith th
e br
and?
• How
muc
h ar
e yo
u sa
tisfie
d (d
issa
tisfie
d) in
term
s of a
per
cent
age?
How
ard
&
Shet
h (1
969)
The
buye
r’s c
ogni
tive
stat
e of
bei
ng a
dequ
atel
y or
inad
equa
tely
rew
arde
d fo
r the
sacr
ifice
s she
or h
e ha
s und
ergo
ne.
No
mea
sure
was
pro
pose
d.
Forn
ell
(199
2)
An
over
all p
ost-p
urch
ase
eval
uatio
n.
Satis
fact
ion
[with
a f.
o.] w
as m
easu
red
on th
e ba
sis o
f 3 q
uest
ions
:
• on
e 10
-poi
nt b
ipol
ar it
em o
n gl
obal
satis
fact
ion
• on
e 10
-poi
nt b
ipol
ar it
em o
n di
scon
firm
atio
n of
exp
ecta
tions
• on
e 10
-poi
nt b
ipol
ar it
em o
n di
stan
ce to
the
idea
l.
Oliv
er
(199
7)
The
judg
emen
t tha
t a p
rodu
ct o
r a se
rvic
e fe
atur
e, o
r the
prod
uct o
r ser
vice
itse
lf, p
rovi
ded
or is
pro
vidi
ng a
ple
asur
able
leve
l of c
onsu
mpt
ion-
rela
ted
fulfi
lmen
t, in
clud
ing
leve
ls o
f
unde
r- o
r ove
rful
film
ent.
A m
ultip
le-it
em m
easu
re o
f sat
isfa
ctio
n [w
ith a
f.o.
] was
pro
pose
d:
• se
ven
5-po
int L
iker
t ite
ms t
hat a
re in
dica
tive
of sa
tisfa
ctio
n
• fiv
e 5-
poin
t Lik
ert i
tem
s tha
t are
cou
nter
-indi
cativ
e of
satis
fact
ion.
Gie
se &
Cot
e (2
000)
(a) a
n af
fect
ive
resp
onse
of v
aryi
ng in
tens
ity, (
b) d
irect
ed
tow
ards
foca
l asp
ects
of t
he a
cqui
sitio
n an
d/or
con
sum
ptio
n op
prod
ucts
and
serv
ices
, (c)
det
erm
ined
at t
he ti
me
of p
urch
ase
or
tem
pora
l poi
nts d
urin
g co
nsum
ptio
n.
A fr
amew
ork
was
pro
pose
d fo
r the
dev
elop
men
t of c
onte
xt-s
peci
fic
defin
ition
s of s
atis
fact
ion.
As a
con
sequ
ence
, no
mea
sure
was
pro
pose
d th
at
is g
ener
ally
app
licab
le.
55
explain the use of a two-step approach to measure satisfaction and dissatisfaction. One may
argue that if satisfaction and dissatisfaction are conceived of as different dimensions, it is
appropriate to separately measure the level of satisfaction as well as the level of
dissatisfaction of each customer. Second, the assessment of the level of satisfaction is based
upon only one item (Bloemer, 1993, p.145), but most theorists advocate multiple-item scales
for the measurement of psychological constructs such as satisfaction (e.g., Churchill, 1979;
Jacoby, 1976; Messick, 1989; Yi, 1990). Third, a study by Westbrook and Oliver (1991), who
used one 11-point item on satisfaction and one 11-point item on dissatisfaction, indicated that
dissatisfaction and satisfaction are opposites on a bipolar dimension. This is in contrast with
Bloemer’s (1993) stance.
Howard and Sheth (1969) did not discuss the measurement of satisfaction, and did not
propose a measure of satisfaction. Measures of satisfaction that are associated with the
definition of satisfaction as a cognition (e.g., Howard & Sheth, 1969) are summated
performance ratings (Oliver, 1997, p. 318). An example is the measurement of customer
satisfaction by means of the sum of a customer’s ratings of features of products and services.
We subscribe to Oliver’s (1997, pp. 33-34, 318) criticism that (a) it is unclear which features
of products and services may be used for the measurement of customer satisfaction and how
these features may be weighted, (b) these measurements do not match the theoretical meaning
of satisfaction, which incorporates the affective content of satisfaction, and (c) these
measurements are useless for research in which the influence of features of products and
services on satisfaction are investigated.
Fornell (1992) proposed a measure of summary satisfaction (or cumulative satisfaction)
that was composed of three 10-point bipolar items. The items concerned (a) global
satisfaction of the customer with the product, service, or company, (b) disconfirmation of
expectations of the customer regarding the product, service, or company, and (c) the distance
from the customers’ hypothetical ideal product, service, or company. The measure was
incorporated in the Swedish Customer Satisfaction Index, the Norwegian Customer
Satisfaction Index, and the American Customer Satisfaction Index (e.g., Fornell, 1992;
Fornell, Johnson, Anderson, Cha, & Bryant, 1996; Johnson, Gustafsson, Andreassen, Lervik,
& Cha, 2001), and it was used in various empirical studies (e.g., Anderson, Fornell, &
Lehmann, 1994; Anderson, Fornell, & Mazvancheryl, 2004; Anderson & Mittal, 2000; Gruca
& Rego, 2005; Ittner & Larcker, 1998). This strengthened the confidence in the quality of the
measure. However, the measure lacks correspondence with Fornell’s (1992) definition of
satisfaction, meaning that it is not obvious why the abstract definition of satisfaction resulted
56
in this particular measure of satisfaction and not in another measure. For example, Verhoef
(2001, p. 18, p. 57) developed a measure of satisfaction on the basis of the definition by
Fornell (1992; also Anderson, Fornell, & Lehmann, 1994) that was much different from
Fornell’s (1992) measure. Verhoef’s (2001) measure of satisfaction was the total score (or the
factor score derived from confirmatory factor analysis) on seven items regarding satisfaction
with the company. An example of such item was ‘How satisfied are you with the personal
attention of XYZ’?, which had five response categories, ranging from very dissatisfied to very
satisfied (the seven items were in the same format). Because Fornell’s (1992) definition does
not provide many clues for constructing measures, it is difficult to judge whether Fornell’s
(1992) and Verhoef’s (2001) measures correspond with this definition.
Oliver (1997, p. 343) proposed a measure of summary satisfaction that incorporates
different phenomena together defining the meaning of satisfaction. First, Oliver noted that
satisfaction is best measured using a multiple-item scale. Second, he noted that the measure
should contain an anchor item, which is an item formulated in terms of general satisfaction
with the product or the service provided. Third, Oliver listed several aspects or antecedents of
satisfaction that may be incorporated in a measure of satisfaction, such as performance
evaluations, expectations, disconfirmation, need fulfilment, dissonance, and affects. Fourth,
he included several items that are counter-indicative of satisfaction and, consequently,
indicative of dissatisfaction. This is consistent with the conception of dissatisfaction as the
opposite of satisfaction on a bipolar dimension, and with general psychometric principles
regarding the measurement of psychological constructs (e.g., Oort, 1996).
The inclusion of items on various phenomena in Oliver’s (1997) measure of summary
satisfaction does not imply that the author conceived of summary satisfaction as a
multidimensional construct. The dimensionality of a construct is ultimately an empirical
question, and empirical research (e.g., Mano & Oliver, 1993; Oliver & Swan, 1989; Oliver,
1993; Wirtz & Lee, 2003) has supported the conception of summary satisfaction as a
unidimensional construct.
Oliver’s (1997) measure of summary satisfaction was composed of twelve 5-point
Likert items. Seven items were indicative of satisfaction and five items were counter-
indicative of satisfaction. The measure was accommodated to the measurement of satisfaction
with one’s car. An earlier version of the measure was composed of six 5-point Likert items
(Oliver, 1980), and was accommodated to the measurement of satisfaction with a flu
vaccination program.
57
Oliver (1997) argued that the optimal composition of a measure depends on (a) the
research topic and (b) the research purpose. For example, if a particular phenomenon such as
disconfirmation has to be related to satisfaction, it should not be incorporated in the
satisfaction measure (Oliver, 1997, p. 343). This is in accordance with the psychometric
principle regarding divergent validity (Campbell & Fiske, 1959).
Giese and Cote (2000) argued that a measure of satisfaction should be context-specific
and, as a result, they did not propose a measure of satisfaction that is generally applicable.
The absence of a general measure is consistent with the view that satisfaction may have
different meanings in different contexts, and contrasts Fornell’s (1992) position that resulted
in a measure that was applicable across a variety of industries.
Three remarks are in order with respect to the measurement instruments of satisfaction
listed in Table 3. First, the correspondence between a particular definition of satisfaction on
the one hand and a particular measurement instrument of satisfaction on the other hand is
often ambiguous. Thus, it is not obvious why a particular definition of the construct resulted
in a particular measurement instrument for satisfaction, and not in another one. This lack of
clarity may be due to the generality of most definitions of satisfaction, which did not provide
sufficiently many clues for the development of a measurement instrument of satisfaction. For
example, the definition of satisfaction by Fornell (1992; see also Anderson, Fornell, &
Lehmann, 1994) was used as a justification for two very different measurement instruments of
customer satisfaction.
Second, construct validity has been underexposed. Satisfaction studies yielded evidence
of convergent, divergent, and nomological validity of measurements of satisfaction (e.g.,
Oliver, 1980; Oliver & Burke, 1999; Tse & Wilton, 1988; Verhoef, 2001; Westbrook &
Oliver, 1991; Wirtz & Lee, 2003), but failed to address the main threats to construct validity,
which are construct underrepresentation and construct-irrelevant variance (Messick, 1989).
For example, except for Oliver’s (1997) measure it was insufficiently investigated whether the
measures sufficiently represented the construct, and none of the other studies investigated
contamination of measurements with method-related irrelevant variance.
Third, the usefulness of satisfaction research for the further development of satisfaction
theory may be enhanced by the further improvement of measurement instruments of
satisfaction. Because the meaning of satisfaction is context-specific, such measurement
instruments may be developed on the basis of context-specific definitions of satisfaction. This
implies the development of different measurement instruments for satisfaction for different
research domains (also, see Giese & Cote, 2000).
58
7 Discussion
Satisfaction may be considered a response to disconfirmation; thus, the process that evokes
the satisfaction response is at the centre of attention. The definitions associated with this
conception are process definitions. They describe the process that evokes the satisfaction
response, but fail to explain what the satisfaction response is (Oliver, 1997, pp. 12-13).
Alternatively, satisfaction may be considered a valenced response to consumption. Here, the
content of the satisfaction response is central. Because the meaning of satisfaction concerns
the content of the satisfaction response (for a more general discussion, see Sartori, 1984;
Schouwstra, 2000), we consider the latter conception more useful for defining satisfaction
than the former conception.
The prototypical definitions associated with the conception of satisfaction as a valenced
response to consumption differ with respect to the specification of the properties of the
satisfaction response and the level of detail of the explanation of the satisfaction response.
First, Howard and Sheth (1969) defined satisfaction as a cognitive response to consumption,
whereas Giese and Cote (2000) defined satisfaction as an affective response to consumption.
Second, Fornell (1992) provided a generic definition of satisfaction, whereas Oliver (1997)
provided a detailed definition of satisfaction. As was noted in Section 6, Fornell’s (1992)
definition of satisfaction was too generic for the development of a measurement instrument of
satisfaction. Following Giese and Cote (2000), we think that a sufficiently detailed definition
of satisfaction requires the specification and the explanation of (a) the type of satisfaction
response, (b) the focal object of the satisfaction response, and (c) the timing of the satisfaction
response.
There is no consensus definition of satisfaction, which probably is due to the context-
specific nature of satisfaction (Giese & Cote, 2000). Therefore, we subscribe to Giese and
Cote’s (2000) recommendation to develop context-specific definitions and corresponding
measurement instruments of satisfaction. Because the meaning of satisfaction is context-
dependent, we do not agree with Giese and Cote (2000) that satisfaction is limited to affective
responses to consumption experiences. Oliver (1997) demonstrated that satisfaction can have
cognitive content and affective content, because it can manifest in performance evaluations,
expectations, disconfirmation, regret, and emotions. Whether the cognitive content or the
affective content prevails, depends on the research domain and on characteristics of the
person (Oliver, 1997, pp. 316-318).
59
Four additional remarks are in order to explain satisfaction in the context of retail
banking. First, satisfaction pertains to the satisfaction of the customers of the bank. For this
reason, we consider customer satisfaction the best term for satisfaction in the context of retail
banking. Second, consumption of products and services from a retail bank is an ongoing
process. Persons remain customer of a bank for a long period of time, in which they make use
of products and services from the company, and maintain some contact with the company. In
this context, customer satisfaction results from the accumulation of encounters with the
company. Third, because customer satisfaction may result from unappraised affects, appraised
affects, unappraised cognitions, and appraised cognitions, the construct includes both manifest
customer satisfaction and latent customer satisfaction. Fourth, because a customer’s
satisfaction with a bank may range from very satisfied to very dissatisfied, customer
satisfaction is the opposite of customer dissatisfaction on a bipolar dimension. In this study,
each of these four remarks was taken into account.
Explicit definition of customer satisfaction with a retail bank
Giese and Cote (2000) rightly argued that the meaning of satisfaction is context-specific, and
that the definition and measurement of satisfaction also need to be context-specific. It is not
possible to develop an explicit definition of satisfaction that grasps the meaning of satisfaction
in all contexts. It is more fruitful to analyse the meaning of customer satisfaction within a
particular context, and then develop a context-specific definition. This study pertains to
customer satisfaction with a retail bank, and it is limited to summary satisfaction. In this
context, customer satisfaction
(a) is limited to the satisfaction of customers of the company;
(b) pertains to the company as a whole, and not to single products or services;
(c) results from the accumulation of encounters of customers with the company;
(d) results from the psychological processing of consumption outcomes;
(e) covers customers’ affects and cognitions reflecting a value judgement;
(f) may result from appraised affects, appraised cognitions, unappraised affects, and
unappraised cognitions;
(g) becomes manifest in customers’ performance evaluations, expectations,
disconfirmation, emotions, and regret; and
(h) is the opposite of customer dissatisfaction on a bipolar dimension.
60
These eight characteristics explain the content of customer satisfaction with a retail
bank. We summarise them accordingly: customer satisfaction with a retail bank is the
valenced response of the customer, directed towards the retail bank, and evoked by the
customer’s experiences with the retail bank throughout time. This is the explicit definition of
customer satisfaction with the retail bank. It may be noted that the definition covers the three
components, which Giese and Cote (2000) required from a definition of satisfaction. First,
satisfaction is conceived of as the customer’s valenced response. Second, the focus of the
customer’s response is the retail bank. Third, the timing of the response is during or after the
customer’s experiences with the retail bank. Because evaluations range from positive to
negative, dissatisfaction is simultaneously defined as the opposite of satisfaction on a bipolar
dimension.
Implicit definition of customer satisfaction with a retail bank
Whereas the explicit definition addresses the construct, the implicit definition addresses the
construct’s relations to other constructs and variables (Schouwstra, 2000, p. 61). Therefore,
the implicit definition of customer satisfaction is founded on the nomological network of the
construct, which was discussed in section 5 of this chapter.
Customer satisfaction with a retail bank is implicitly defined in terms of its relations
with trust, quality, and customer loyalty, and its influence on customer profitability. As a
consequence, it is expected that overall satisfaction with a retail bank is positively related to
(a) trust in the company, (b) quality perceptions regarding the products and services provided
by the company, (c) loyalty to the company, and (d) future customer profitability.
8 Conclusions
1. The meaning of customer satisfaction differs between and within contexts. For this
reason (a) it cannot be sharply defined but it needs to be explained by means of
examples, and (b) the examples are context-dependent.
2. Dissatisfaction may be conceived of as the opposite of satisfaction on a bipolar
dimension. This means that satisfaction/dissatisfaction is expected to constitute a
unidimensional construct, and that customers are not both satisfied and dissatisfied with
the same phenomenon at one point in time.
3. Customer satisfaction with a retail bank is explicitly defined as the valenced response of
the customer that is directed towards the bank and that is evoked by the whole of
61
62
consumption experiences with the bank. This definition encloses various cases that are
mutually related by family resemblances.
4. Customer satisfaction with a retail bank is implicitly defined on the basis of its
connections with other psychological constructs and with behaviour. In the domain of
retail banking, the relations of satisfaction with (a) trust, (b) quality, (c) customer
loyalty, and (d) future customer profitability are considered most important.
5. Many measures of satisfaction have been reported in the marketing literature, and
different measures of satisfaction are associated with different definitions of the
construct. However, evidence of construct validity of most measures of satisfaction is
absent. This limits the usefulness of satisfaction research for the development of
satisfaction theory.
6. The usefulness of satisfaction research for the development of satisfaction theory may
be enhanced by further improvement of the measures of satisfaction. The improvement
of measures of satisfaction entails (a) explication of the context-specific meaning of
satisfaction, (b) explication of correspondence between the definition and the measure of
satisfaction, and (c) assessment of validity of measurements of satisfaction. In the next
chapters, we will develop a context-specific measurement instrument of satisfaction and
validate the measurements obtained with this instrument.
63
64
Chapter 4
Deductive design for test development and construct validation
1 Introduction
Psychological properties can be measured by means of psychological tests (Chapter 1, Section
4). A psychological test is an instrument which elicits behaviour that is representative of the
property of interest and which can be used to measure the extent to which a person possesses
the property. A test may consist of a well-chosen set of items that are administered in a
survey. On the basis of the responses a person provides to these items, his or her position on
the scale for the property is inferred.
This chapter addresses the design of the empirical study. The purpose of the study was
to develop a measurement instrument for customer satisfaction with retail banks, and to test
the relations of customer satisfaction with constructs and variables in the corresponding
nomological network. For this purpose, we applied the deductive design (Schouwstra, 2000)
for test development and construct validation.
2 The deductive design
The deductive design (Schouwstra, 2000) is a methodology for test development and
construct validation for typical-behaviour properties such as customer satisfaction. The
methodology departs from a theoretical analysis of the construct of interest. In this respect, it
is consistent with the deductive approach to test development (Oosterveld, 1996).
Following Messick (1989, pp. 13, 34), Schouwstra (2000, p. 57) defined construct
validity as ‘an evaluative judgement of the trustworthiness of a test-score interpretation in
terms of a construct’. Messick (1989, pp. 34-35, 1995) addressed two general threats to
construct validity, which are (a) construct underrepresentation, and (b) measurement of
irrelevant variance. Construct underrepresentation occurs when only a part of the construct is
measured. For example, a test measures only a part of the construct of customer satisfaction
with a focal object when it only includes items that reflect cognitions about the object but no
items that reflect affects. Measurement of irrelevant variance occurs when not only the
construct is measured, but also other psychological properties, attributes related to group
65
membership, or response tendencies. For example, a test for customer satisfaction measures
more than just the intended construct when it also includes items that require a high level of
verbal intelligence to be comprehended. Then, the test scores also depend on verbal
intelligence, and the variation in test scores that is caused by variation in verbal intelligence is
conceived of as irrelevant variance. Also, a test for customer satisfaction may be administered
to one part of the sample by telephone and to another part by the Internet, and as a result of
these different administration modes different response categories may be used. Now, test
scores partly depend on administration mode, and the variation in test scores that is caused by
differences in the administration procedure is conceived of as irrelevant variance. Both
construct underrepresentation and irrelevant variance refute the interpretation of test scores in
terms of a reflection of the construct and nothing else (Messick, 1989; Schouwstra, 2000).
Hence, construct validation concerns the assessment of construct representation and absence
of irrelevant variance.
Following Anastasi (1986), Schouwstra (2000) argued that construct validation should
start at the outset of test development. This stance is reflected in the deductive design, which
demands two lines of evidence for construct validation (Table 1; from Schouwstra, 2000, p.
60). The first line of evidence should be made of rationales underlying the test-score
interpretations, and the second line of evidence should be made of empirical evidence that the
test score reflects the complete construct and nothing else. Each line of evidence should
address construct representation and absence of irrelevant variance in test scores.
Table 1: Outline of Construct Validation Within the Deductive Design (Schouwstra, 2000, p. 60) Scientific arguments Construct representation Irrelevant variance
Rationales
a. Formulation
b. Translation
c. Modelling
Of what construct of interest is
Of construct of interest into test content
How test score reflects construct
And what not
And nothing else
And nothing else
Empirical evidence That test score reflects whole of construct And nothing else
The rationales consist of an explanation of how the test-score interpretations are derived
from the theory about the construct (Schouwstra, 2000). First, this explanation requires
formulating what the construct of interest is and what it is not, and to which other constructs it
is related. The construct has to be defined explicitly by means of the specification of the
66
aspects and attributes to which it refers, and implicitly by the specification of related concepts
that constitute the nomological network. Second, the way in which the construct definition is
translated into test content needs to be specified. This specification involves the formulation
of (a) guidelines concerning the formulation of items that reflect the construct and nothing
else, (b) guidelines for acts that control for possible response tendencies, and (c) the items,
which constitute the operationalisation of the construct. Third, the measurement model that is
expected to fit the empirical data needs to be specified. This specification includes the
explanation of the relationship between the items and the test score.
The empirical evidence consists of results from empirical research into the test-score
interpretations. Following Cronbach (1988, 1989), Schouwstra (2000, pp. 1-3) noted that a
strong version of construct validity research involves the testing of hypotheses about what a
test score measures and what it does not measure. These hypotheses refer to (a) the explicit
construct representation, (b) the implicit construct representation, (c) concept-related
irrelevant variance, and (d) method-related irrelevant variance (Schouwstra, 2000, pp. 68-71).
The explicit construct representation of test scores encompasses content validity,
convergent validity, and divergent validity, and is assessed on the basis of tests of
corresponding hypotheses. The implicit construct representation pertains to the nomological
validity of test scores, and is assessed on the basis of tests of hypotheses regarding the
relationship of test scores with measures of other concepts in the nomological network.
Method-related irrelevant variance pertains to variance caused by phenomena that are not
related to the construct of interest, such as response tendencies and characteristics of the
research method. Concept-related irrelevant variance pertains to variance caused by
phenomena that are related to the construct of interest, such as the concepts in the
nomological network and properties related to group membership. Both method-related
irrelevant variance and concept-related irrelevant variance are investigated on the basis of
tests of hypotheses regarding the contamination of test scores by other properties and
variables. The methodology to test hypotheses regarding the contamination of test scores is
addressed in the next section.
Both lines of evidence need to be integrated into an evaluative judgement of the validity
of the test-score interpretations (Schouwstra, 2000, p. 71). This judgement reflects the
interpretation whether and to what extent the evidence supports the interpretation of test
scores in terms of the construct of interest, and nothing else. The more comprehensive the
argumentation for the test-score interpretation, the more convincing the support for construct
67
validity. However, the support is never conclusive. First, construct validation is an unending
process that includes the judgement of evidence gathered in the processes of test development
and test use (Anastasi, 1986, 1988; Cronbach, 1971, p. 452; Messick, 1989, p. 13). Second,
construct representation is to some extent arbitrary, because constructs do not have sharp
boundaries (e.g., Wittgenstein, 1953, 1958). Third, it is not possible to exclude all irrelevant
variance in the context of psychological measurement. For example, most psychological tests
require linguistic skills of participants, and the extent to which a participant possesses these
skills can influence his or her response behaviour (e.g., Schouwstra, 2000, p. 63). When a test
is used in a sample containing participants with different levels of education, test scores are
readily biased with respect to varying linguistic skills of participants. This example implies
that construct validity is almost always imperfect.
Summarising, the deductive design is a methodology for test development and test-score
validation. First, it is directed towards the development of tests that encompass all the
important aspects of the construct of interest. Second, the deductive design is directed towards
the minimisation of test-score variance that is irrelevant to the construct of interest
(Schouwstra, 2000, pp. 81-83). The methodology encompasses the explication of rationales
and the collection of empirical evidence with respect to the interpretation of test scores.
3 The theory of violators
The theory of violators (Oort, 1996) addresses a methodology to test hypotheses with respect
to the contamination of test scores by other variables, such as other traits, attributes related to
group membership, or response styles. In the theory of violators, irrelevant variance is
conceived of as variance caused by phenomena that violate the unidimensionality of the scale
(Oort, 1996). The theory of violators is based upon the following definitions of item bias and
unidimensionality (Oort, 1996, p. 7): ‘A scale consisting of a set of items is unidimensional if
and only if each of the items is unbiased with respect to every potential violator that might be
relevant in whatever context the test might be used’, and ‘An item I is unbiased with respect to
a potential violator V and given trait T if and only if, for all values i and v and t:
P(I=i | T=t, V=v) = P(I=i | T=t).’
68
The theory of violators requires local independence between item and violator, meaning
that the probability of endorsement with item I, given trait-score t, is independent of violator-
score v. Marginal independence between item and violator is not required, meaning that it is
not required that the probability of endorsement to item I is independent of violator-score v.
Let rest-score R be the total score of a person on the set of items measuring trait T minus
the score on an item I. Then, item bias may be investigated by means of the partial correlation
of an item I and violator V while controlling for the rest-score R. Oort (1996) suggested
restricted factor analysis (to be discussed in Chapter 6) for testing the hypothesis that test
scores are not contaminated by a violator V.
The theory of violators provides a useful methodology for empirical research into the
contaminating effects of violators on test scores. Nevertheless, three comments are in order.
First, research into the unidimensionality of a scale cannot exclude all irrelevant variance that
may threaten the interpretation of test scores in terms of the construct of interest. For example,
characteristics of the measurement instrument (e.g., the method of administration and the
question format; Bradburn, 1983) may affect the magnitude of test scores without affecting
the unidimensionality of the scale. Second, multidimensionality does not necessarily imply
that the measurement is invalid. For example, a particular construct may encompass different
attributes (e.g., intelligence may encompass verbal intelligence and spatial intelligence;
Gardner, 1993), and the measurement of the construct may turn out multidimensional instead
of unidimensional. Third, perfect unidimensionality seems impossible because it is unlikely
that the items of a scale would be unbiased for all possible violators (e.g., Oort, 1996, pp. 18-
19). This is in agreement with the notion that it is impossible to eliminate all irrelevant
variance in psychological measurement, and it endorses the notion that a judgement of
construct validity has to be qualitative and gradual by nature (e.g., Cronbach & Meehl, 1955).
4 Purpose of the study and conditions for test development
The purpose of this study was to develop a measurement instrument for customer satisfaction
with retail banks, and to validate theory regarding the meaning of customer satisfaction in the
domain of retail banking. Given the context of this study, the measurement instrument had to
be accommodated to the meaning of customer satisfaction with a retail bank, and it had to be
used in empirical research in the corresponding domain.
69
The population of interest in this study consisted of the mature customers of a Dutch
retail bank, in this study denoted as BANK. The measurement instrument had to be applied in
survey research to a sample from this population, and therefore it was administered in Dutch.
Furthermore, the instrument had to comply with requirements regarding the composition of
questions and questionnaires used in surveys (e.g., Belson, 1986; Dillman, Tortora, &
Bowker, 1998; Sheatsley, 1983; Sudman & Bradburn, 1982).
5 Test development
Test development is the development of the measurement instrument. In Chapter 3 (Section
7), customer satisfaction with a retail bank was explained on the basis of eight characteristics,
which were summarised in the explicit definition: customer satisfaction with a retail bank is
the valenced response of the customer, directed towards the retail bank, and evoked by the
customer’s experiences with the retail bank throughout time. Furthermore, customer
satisfaction was defined implicitly by its connections with trust, quality, customer loyalty, and
customer profitability. The latter concepts are part of the nomological network of customer
satisfaction in the domain of retail banking, and they delineate the construct to a large extent.
The explicit definition of customer satisfaction with a retail bank covers the three
components Giese and Cote (2000) required of a definition of satisfaction, which are response
type, timing of the response, and focus of the response (also, see Chapter 3). The explicit
definition was used here to formulate a facet design (Table 2) with three facets representing
the three components (the response focus facet was not reflected in Table 2, because it had
one element). The facet response type had two elements (i.e., cognitive response and affective
response), the facet time frame had two elements (i.e., present and past), and the facet
response focus had one element (i.e., the bank). Thus, the facet design had four structuples.
Table 2: The Facet Design for Customer Satisfaction with a Retail Bank Response type / Time frame Present Past
Cognitive Structuple 1 Structuple 2
Affective Structuple 3 Structuple 4
The purpose of the design was to facilitate the formulation of an item set that yields a
complete construct representation. Following Oliver (1997, p. 343), we chose to formulate a
70
comprehensive set of items of the Likert type (Likert, 1932). This type of items allows for the
construction of items that are (a) expected to be monotonically related to the construct of
interest, and (b) either indicative or counter-indicative of the construct of interest (e.g., Oort,
1996). The following specifications guided the formulation of the items:
1. Each structuple is represented by two items. One item should be indicative and the
other counter-indicative of the construct (e.g., Fabrigar, Krosnick, & MacDougall,
2005; Likert, 1932), in order to represent both poles of the
satisfaction/dissatisfaction continuum. In order to prevent the questionnaire from
becoming too long and ask too much of the participants, the number of items for
each structuple was limited to two.
2. Each item should be monotonically related to customer satisfaction. In the context of
this study, this means that the probability of choosing a particular answer category or
a higher answer category in response to a positively worded item, should be a
monotonically nondecreasing function of customer satisfaction (i.e., a function that
decreases nowhere along the scale. Instead, the function either increases
monotonically, remains constant, or increases across some intervals of the scale and
remains constant across other intervals; henceforth, to keep things simple we call
this monotonicity; see Sijtsma & Molenaar, 2002, pp. 20, 119). Negatively worded
items are re-coded prior to data analysis, and monotonicity should hold as well.
3. Each item should reflect general satisfaction with the company. For this reason, the
subject in each item should be the bank, and not a particular transaction, product, or
product feature.
4. The wording of the items should be kept simple and unambiguous (e.g., Belson,
1986). This means, for example, that the items should be kept short and easy to
understand, and that negations should be avoided.
5. The item set should contain one anchor item (Oliver, 1997, p. 343), which is an item
that is formulated in terms of satisfaction with the company (i.e. an item such as I
am satisfied with BANK).
6. None of the items should be phrased in terms of related constructs such as trust,
quality, and customer loyalty (Chapter 3). This means that items should not be
phrased in terms of (a) preference for the company over other companies, (b)
expectations regarding competence and integrity of the company, and (c) attributes
of products and services provided by the company.
71
On the basis of these specifications, a set of nine items was formulated (Table 3). The
set contained one anchor item and eight items representing the four structuples (Table 2). All
items were of the Likert type with five ordered response categories that ranged from totally
agree to totally disagree. We chose five response categories in order to also include a neutral
option. Because satisfaction/dissatisfaction was conceived of as a continuum (Chapter 3), it
was expected that a unidimensional measurement model would fit the empirical data and that
all items were monotonically related to the satisfaction/dissatisfaction dimension.
Table 3: Items of Customer Satisfaction with BANK Item Structuple Aspect
I am satisfied with BANK None General satisfaction
BANK meets all my requirements for a bank 1 Need fulfilment
There are good reasons to leave BANK (*) 1 Cognition
BANK has met my expectations 2 Disconfirmation of expectations
Last year I had some problems with BANK (*) 2 Cognition
At BANK I feel at home 3 Affect
I have mixed feelings about BANK (*) 3 Affect
Last year I had a pleasant relationship with BANK 4 Affect
I have regretted my choice for BANK (*) 4 Regret
(*) = item is counter-indicative of customer satisfaction with BANK
6 The measurement model
A measurement model is a statistical representation of the responses of the participants of a
survey to the measurement instrument. If the measurement model represents the data well, a
scale for measurement and measurement values for the participants follow from the model. It
was hypothesised that a unidimensional measurement model fits the data, and that all items
are monotonically related to the underlying construct of satisfaction with BANK (see Section
5). The Mokken model of monotone homogeneity (MH model; Mokken, 1971) was used for
this investigation (Chapter 5 through Chapter 8).
The MH model is an item response theory (IRT) model. IRT is a psychometric theory
about the relation between a trait and the probability of a particular response to an item
reflecting the trait. The relationship is typically represented by an item response function
(IRF). For dichotomously scored items (i.e., two scores, often 0/1 for disagree/agree), the IRF
reflects the probability of endorsement (i.e., score 1) with an item given a particular position
72
on the trait (e.g., see the MH model for dichotomous items; Sijtsma & Molenaar, 2002, p. 11).
For polytomously scored items (i.e., three or more ordered scores, reflecting degrees of
endorsement, e.g., 0, 1, 2, 3, 4), the item step response function (ISRF) reflects the probability
of choosing a particular answer category or a higher category (i.e., at least a score of x; e.g., x
= 0, 1, 2, 3, 4) of an item given a particular position on the trait (e.g., the MH model for
polytomous items; Sijtsma & Molenaar, 2002, p. 119).
The MH model is based upon three assumptions (Sijtsma & Molenaar, 2002, pp. 18-21).
The first assumption is unidimensionality, which means that all items reflect the same trait,
for example, customer satisfaction. The second assumption is local independence, which
means that, given a fixed value of the latent trait, the probability of obtaining at least a score
of x is unrelated to the scores obtained on the other items in the test. This means that items
reflecting customer satisfaction are unrelated in a group of persons who have the same level
of customer satisfaction. This may sound odd but local independence is a mathematical way
of saying that only customer satisfaction explains relationships among items measuring
aspects of this trait, and if the trait is held constant all remaining variation in the item scores is
due to error. The third assumption is monotonicity, which means that the probability of
obtaining at least a score of x is a non-decreasing function of the latent trait. Thus, the higher
one’s level of customer satisfaction the higher the probability of obtaining a high score on
items measuring the trait.
A consequence of the assumptions of unidimensionality, local independence, and
monotonicity is that the MH model yields ordinal measurements of the trait. This is different
from the numerical measurements obtained by more-demanding parametric IRT models, such
as the Rasch model (Rasch, 1960), but ordinal measurements suffice for many measurement
purposes. In particular, let the score of person p on item i be denoted Xpi, and let the sum score
or the total score of participant p on the items (indexed i) in the test be defined as the sum of
the item scores, , then under the MH model the ordering of the participants by
means of their values reflects their ordering on the scale of the latent trait, except for
measurement error (Sijtsma & Molenaar, 2002, p. 121; also, see Van der Ark, 2005).
∑=+ i pip XX
+X
Empirical research has demonstrated that the total-score has a strong linear
correlation with the estimated latent trait value from parametric IRT models in several
measurement applications (e.g., Sijtsma, Emons, Bouwmeester, Nyklicek, & Roorda, 2008).
Sijtsma et al. (2008) suggested the use of total-score for diagnostic purposes, such as the
assessment of the position of a person on the latent trait, and for statistical analyses, such as
+X
+X
73
the comparison of groups or the measurement of change. A general condition for this use of
is that the MH model fits the data. +X
iH
+X
The extent to which the theoretical data structure predicted by the MH model is different
from the observed data is expressed by means of total-scale scalability coefficient H
(Loevinger, 1948; Mokken 1971) for the whole set of items, and item scalability coefficient
for individual items. Coefficient H ranges from a negative value, depending on several
characteristics of the item scores, to the maximum of 1. For a given distribution of total-score
and a particular set of monotone increasing ISRFs, as the slopes of the ISRFs become
steeper, item scalability coefficients and total-scale scalability coefficient H have higher
positive values, gradually approaching 1 as the slopes become nearly vertical. Thus, high
positive values (usually, and
iH
3.0≥iH ≥H 0.3; Sijtsma & Molenaar, 2002, p. 60) of item
scalability coefficients and total-scale scalability coefficient H in a data set are taken as
evidence of steeply monotone ISRFs, and this in turn means that person ordering by means of
is more reliable.
iH
+X
A virtue of an analysis by means of the MH model is the availability of the MSPwin5.0
software (software for Mokken Scale analysis for Polytomous items; Molenaar & Sijtsma,
2000). MSPwin5.0 facilitates to investigate statistically whether the MH model fits the data.
In particular, it facilitates (a) the investigation of the dimensionality of an item set using a
confirmatory strategy, or (b) the investigation of the dimensionality of an item set using an
exploratory strategy, and (c) the test of the assumption of monotonicity. Furthermore,
MSPwin5.0 provides the test-score distribution and interesting summary statistics, such as the
mean, the standard deviation, and the skewness of this distribution (Molenaar & Sijtsma,
2000, pp. 60-61).
The confirmatory strategy to investigate the dimensionality entails the investigation
whether a set of items, which is defined a priori to form a scale, indeed is found to be a scale
based on values of the item scalability coefficients and total-scale scalability coefficient H
in the sample data set from the population of interest. To have a Mokken scale, all inter-item
correlations must be positive and the values of and H must be at least 0.3 (Sijtsma &
Molenaar, Chapter 5). A Mokken scale is unidimensional and allows sufficiently reliable
person measurement by means of total-score . MSPwin5.0 facilitates this strategy by
means of the item selection method Test (Molenaar & Sijtsma, 2000, p. 48).
iH
iH
+X
74
The exploratory strategy to investigate the dimensionality entails the clustering of items
from a larger set into smaller clusters (one cluster is also allowed), each of which is
characterised by positive inter-item correlations and item scalability coefficients and total-
scale scalability coefficient H that are at least 0.3. Thus, each cluster represents a Mokken
scale. MSPwin5.0 facilitates this search strategy by means of the item selection methods
Search normal (forms item clusters from a set of items) and Search extended (takes the
second, third, and so on, Mokken scale found by means of Search normal as point of
departure for clustering while leaving the other items free for selection), and the option to
choose different lower bounds than the default value 0.3 for item scalability coefficients
and total-scale scalability coefficient H (Molenaar & Sijtsma, 2000, p. 40).
iH
iH
The assumption of monotonicity can be investigated for every ISRF of every item, by
estimating the ISRFs from the data. An item which has five different item scores, has five
different ISRFs, which are conditional probabilities )|( θxXP i ≥ , in which x = 0, …, 4 and θ
stands for the latent trait. Because every participant has one of the five possible scores, the
probability of obtaining at least a score of 0 equals 1 (a participant always has one of the
scores). Thus, only the four ISRFs for x = 1, …, 4 are of interest. In data analysis, when the
ISRFs of item i are estimated, the latent trait is replaced by the rest-score R. Rest-score R is
the total-score minus the item-score . The use of the total-score would lead to
heavily biased estimates of the ISRFs of item i, and this is prevented by using rest-score R.
+X iX +X
A rest-score group contains all participants having equal rest scores. The assumption of
monotonicity is violated in the sample if the probability of obtaining a score on item i of at
least x is higher for a lower rest-score group than for a higher rest-score group. MSPwin5.0
provides an option called Minsize for the manipulation of the minimum size of the rest-score
groups (adjacent rest-score groups may be merged to obtain sufficiently large groups; this is
convenient for small and large scores which are often underrepresented in samples), and an
option Minvi which defines the minimum value of observed violations of monotonicity in
sample ISRFs that are subjected to statistical testing (small violations may be uninteresting
irrespective of whether they are significant or not; Molenaar & Sijtsma, 2000, pp. 67-73). In
MSPwin5.0, the default value for Minsize is 10 percent of the sample size, and the default for
Minvi is 0.03 on a probability scale that runs from 0 to 1. The option Alpha = p manipulates
the significance level for tests of significance of violations of monotonicity. Default in
MSPwin5.0 is Alpha = 0.05.
75
It was hypothesised that the nine items of satisfaction with BANK constitute a scale
according to the MH model. This hypothesis was tested in sample data from the population of
interest. If the MH model fits the data, a scale according to the MH model can be constructed,
and the scale scores can be computed.
7 Hypotheses
This section addresses the formulation of hypotheses regarding characteristics of the
satisfaction scores (i.e., the satisfaction with BANK scale-scores). The hypotheses concerned
(a) the explicit construct representation, (b) the implicit construct representation, (c) concept-
related irrelevant variance, and (d) method-related irrelevant variance, and they were tested in
empirical studies with respect to customer satisfaction (Chapter 5 through Chapter 8). The
purpose of the tests of the hypotheses was to gather empirical evidence whether the scale
scores can be interpreted in terms of satisfaction with BANK, and nothing else.
Explicit construct representation
First, it was expected that persons attached different connotations to the term satisfaction
when asked to explain what satisfaction with the company meant to them. This expectation
was in line with the theory of Oliver (1997) that satisfaction may result from different
processes, and the notion by Wittgenstein (1953, 1958) that the linguistic meaning of a term
cannot be delineated sharply. Second, it was expected that the nine items (Table 3) constituted
a scale according to the MH model (Section 6). Third, it was expected that the satisfaction
with BANK scale-scores were positively related to other satisfaction with BANK scores. This
was in agreement with the requirement of convergent validity (Campbell & Fiske, 1959).
Implicit construct representation
Customer satisfaction was expected to be positively related to (a) trust, (b) quality, (c)
customer loyalty, and (d) future customer profitability. The associations between these
concepts were postulated in the nomological network of customer satisfaction (Chapter 3).
Concept-related irrelevant variance
Concept-related irrelevant variance refers to variance due to variables that are presumably
related to the construct of interest. Variables that are presumably related to customer
76
satisfaction are the variables in the nomological network of the construct (Chapter 3). In terms
of the theory of violators (Oort, 1996), such variables are possible violators of the
unidimensionality of the scale of the construct of interest. The measurement instrument for
customer satisfaction was constructed with the purpose to minimise contamination of scale
scores by these variables (Section 5). Therefore, it was expected that trust, quality, customer
loyalty, and current customer profitability did not contaminate satisfaction scores obtained by
the satisfaction with BANK measurement instrument.
Method-related irrelevant variance
Method-related irrelevant variance refers to variance caused by variables that are presumably
unrelated to the construct of interest, such as characteristics of the method of the study and
response styles of persons. Characteristics of the method that may affect response behaviour
are, for example, the mode of administration, the format of items, the item order, and the
wording of items (e.g., Bradburn, 1983). There is ample evidence of the effect of these
phenomena on the person’s responses to items (e.g., Belson, 1981, 1986; Bradburn, 1983;
Bronner & Kuijlen; 2007; Krosnick, 1999; Schuman & Presser, 1981, Sheatsley, 1983). The
classical example was provided by Rugg (1941), who demonstrated that 46% of the
participants in a survey supported free speech when asked ‘Do you think the United Stated
should forbid public speeches against democracy’, while only 25% of the participants
supported free speech when asked ‘Do you think the United States should allow public
speeches against democracy.’ Thus, the question phrased in terms of to allow yielded
different results than the question phrased in terms of to forbid. Schuman and Presser (1981,
pp. 276-278) replicated this result.
Paulhus (1991, p. 17) explained a response style of a person as a consistent tendency of
a person to respond to questionnaire items on some basis other than the specific item content
(i.e., what the items were designed to measure). Examples of response styles are
acquiescence, disacquiescence, midpoint responding, extreme responding, noncontingent
responding, and socially desirable responding (e.g., Baumgartner & Steenkamp, 2001, 2006;
Paulhus, 1991; Van Herk, 2000). The acquiescence response style is defined as a general
preference for the agreement response categories of item scales, and the disacquiescence
response style is defined as a general preference for the disagreement response categories of
item scales. These two response styles may be investigated by means of control scales.
Theorists (e.g., Baumgartner & Steenkamp, 2001, 2006; Knowles & Nathan, 1997; Paulhus,
77
1991; Van Herk, 2000) suggested limiting the influence of these two response styles on the
measurement of a trait by simultaneously using items that are indicative of that trait and items
that are counter-indicative of that trait. Both kinds of items were included in the measurement
instrument of customer satisfaction (see Section 5). The extreme response style is defined as a
general preference for extreme response categories (i.e., the endpoints) of item scales, and the
midpoint response style is defined as a general preference for the middle response category of
item scales. These two response styles also may be investigated by means of control scales
(e.g., Baumgartner & Steenkamp, 2001, 2006; Bronner & Kuijlen, 2007; Greenleaf 1992a,
1992b). For example, control scales may be used to measure general midpoint responding and
general extreme responding, and the corresponding scores may be correlated with
measurements of the trait of interest in order to assess the influence of stylistic responding on
the measurement of the trait. Noncontingent responding refers to the tendency to respond
randomly to items. This response style may be investigated by means of multivariate outlier
analyses (e.g., Tabachnick & Fidell, 1997, pp. 74-75). Socially desirable responding refers to
the tendency of persons to make themselves look good by providing socially desirable
responses to the items. This response style may be investigated by means of control scales
(e.g., Paulhus, 1991).
Stylistic responding is a threat to validity of measurement. Messick (1991; also, see
Jackson & Messick, 1958) argued that stylistic responding is inversely related to the extent
that responses of persons to items are content-driven. This is an important stance. First, this
stance implies that stylistic responding is inhibited by optimising. Optimising (Krosnick,
1991, 1999) is response behaviour that is characterised by giving much consideration to the
accuracy of the responses. For example, when a person puts effort in understanding an item
and in providing the optimal response to the item, he or she is said to optimise (Krosnick,
1999, p. 546-547). Second, this stance implies that stylistic responding is enhanced by
satisficing. Satisficing (Krosnick, 1991, 1999) is response behaviour that is characterised by
giving little consideration to the accuracy of the responses. For example, when a person does
not spend effort to generate the most accurate answer to a question but settles for a merely
satisfactory one, he or she is said to satisfice (Krosnick, 1999, p. 548). Third, Messick’s
(1991) stance implies that the conditions that enhance satisficing also enhance stylistic
responding. These conditions are (a) task difficulty, (b) persons’ abilities, and (c) persons’
motivation to optimise (Krosnick, 1999, p. 548).
It is beyond the scope of this study to assess the contamination of scale scores by all
method-related phenomena. For this reason, it was decided to start the study into effects of
78
these phenomena by addressing four issues that were important for further applications of the
instrument, and for satisfaction research in general. First, it was investigated whether the
location of satisfaction items in the questionnaire influenced satisfaction scores. Second, it
was investigated whether the presentation mode of response alternatives of satisfaction items
influenced satisfaction scores. Third, it was investigated whether persons’ positions on the
midpoint response style influenced satisfaction scores. Fourth, it was investigated whether
persons’ positions on the extreme response style influenced satisfaction scores.
The hypotheses
The expectations and questions with respect to construct representation and irrelevant
variance were formalised in a set of hypotheses. The hypotheses are listed in Table 4.
Table 4: List of Hypotheses Explicit construct representation
H1 Customer satisfaction is manifested in various expressions that are mutually related but not
sharply delineated
H2 The satisfaction items constitute a scale according to the MH model
H3 The satisfaction scores are positively related to other satisfaction scores
Implicit construct representation H4 Satisfaction scores are positively related to trust scores
H5 Satisfaction scores are positively related to quality scores
H6 Satisfaction scores are positively related to loyalty scores
H7 Satisfaction scores are positively related to future customer profitability
Concept related irrelevant variance H8 The satisfaction scores are not contaminated by trust
H9 The satisfaction scores are not contaminated by quality
H10 The satisfaction scores are not contaminated by loyalty
H11 The satisfaction scores are not contaminated by current customer profitability
Method related irrelevant variance
H12 The satisfaction scores are not affected by the location of items in the questionnaire
H13 The satisfaction scores are not affected by the presentation of the response categories of
satisfaction items
H14 The satisfaction scores are not affected by the midpoint response style
H15 The satisfaction scores are not affected by the extreme response style
79
80
81
Chapter 5
Method of the first empirical study into customer satisfaction with BANK
1 Introduction
This chapter addresses the method of the first empirical study into customer satisfaction with
BANK. The chapter provides an outline of the operationalisations of the constructs, and the
construction of the questionnaire, the pre-tests, the pilot study, and the main study.
2 Operationalisations
Customer satisfaction
Customer satisfaction was operationalised by means of nine Likert items (Table 3, Chapter 4)
with five ordered response categories each, ranging from totally agree (which was scored 4)
to totally disagree (which was scored 0) (Table 1). The nine items were expected to constitute
a unidimensional scale after re-scoring the counter-indicative items (Chapter 4).
Table 1: Items Reflecting Customer Satisfaction with BANK Code Item Aspect Score range
Q3a At BANK I feel at home Affect 0 – 4
Q3b I am satisfied with BANK General satisfaction 0 – 4
Q3d* There are good reasons to leave BANK Cognition 0 – 4
Q3e* I have mixed feelings about BANK Affect 0 – 4
Q3g BANK meets all my requirements for a bank Need fulfilment 0 – 4
Q4a Last year I had a pleasant relationship with BANK Affect 0 – 4
Q4b BANK has met my expectations Disconfirmation 0 – 4
Q4c* I have regretted my choice for BANK Regret 0 – 4
Q4d* Last year I had some problems with BANK Cognition 0 – 4
* = item is counter-indicative of customer satisfaction with BANK
American Customer Satisfaction Index
Customer satisfaction was also operationalised by means of a measurement instrument
adopted from the American Customer Satisfaction Index (ACSI; e.g., Fornell et al., 1996).
This instrument (Table 2) consisted of three items with ten ordered response categories each,
ranging from very negative (e.g., very dissatisfied, which was scored 0) to very positive (e.g.,
very satisfied, which was scored 9). The three items were expected to constitute a
unidimensional scale (see Chapter 3). The instrument is further denoted as the ACSI.
Table 2: American Customer Satisfaction Index Code Item Score range
Q20b How satisfied are you with BANK? 0 – 9
Q20c To what extent does BANK meet your ideal of a bank? 0 - 9
Q20d To what extent has BANK met your expectations? 0 - 9
Trust
Following Morgan and Hunt (1994), trust was defined as a person’s confidence in the
reliability and integrity of the company. On the basis of the definition of trust, a set of seven
Likert items was formulated. Each item had five ordered response categories that ranged from
totally agree (which was scored 4) to totally disagree (which was scored 0). Two items were
counter-indicative of trust, and covered distrust. The seven items are listed in Table 3.
In the context of retail banking, confidence in integrity and confidence in reliability are
intertwined (see also Chapter 3). Many expectations, such as the expectation that the company
will keep its promises and the expectation that the company will handle the banking matters
of a person properly, encompass both confidence in the reliability of the company and
confidence in the integrity of the company. Consequently, we expected the seven items to
constitute a unidimensional scale.
Table 3: Items Reflecting Trust Code Item Aspect Score range
Q5a I can depend on BANK to treat me fairly Integrity 0 - 4
Q5b I can depend on BANK to handle my banking affairs correctly Both 0 - 4
Q5c I can depend on BANK to keep its promises Both 0 - 4
Q5d* I sometimes doubt the competence of BANK Reliability 0 - 4
Q5e* I sometimes doubt the good will of BANK Integrity 0 - 4
Q5f I can trust BANK Both 0 - 4
Q5g I can depend on BANK to serve me well Both 0 - 4
*= item is counter-indicative of trust
82
Quality
In Chapter 3, quality was defined as a person’s perception of the quality of attributes of
products and services provided by the company. This definition is in agreement with the
conception of quality as perceived quality, which implies that quality had to be measured by
means of a psychological measurement instrument. Because quality pertains to distinct
attributes of products and services provided by the company, we expected the instrument to
yield a multidimensional measurement of quality. Furthermore, we expected the combination
of a customer’s positions on these dimensions to drive customer satisfaction (Chapter 3,
Section 5).
Wirtz and Bateson (1995; also Wirtz 2000) demonstrated that halo effects influenced
several measurements of quality, meaning that responses to items about quality of attributes
of products or services provided by the company were influenced by general satisfaction with
the company. The occurrence of halo effects may have been enhanced by the
operationalisations of quality. To control for halo effects, we decided to operationalise quality
in two different and concrete and detailed ways, which we hoped would stimulate the
respondent to contemplate about the quality of distinct attributes of products and services
rather than provide an overall and perhaps too impressionistic global evaluation.
First, quality was operationalised by means of a set of items regarding the experience of
problems with BANK in the preceding twelve months. A listing of problems was assessed on
the basis of an inventory of customer complaints with the company, and previous research
into drivers of customer satisfaction (e.g., Terpstra & Van Gastel, 2004). A total of 16
problems, thus defining 16 items, was included in the questionnaire (Table 4). Persons were
asked whether or not these problems had occurred to them in the preceding twelve months.
The response yes was scored 1, and the response no was scored 0. It was expected that the 16
items were not correlated or weakly correlated.
Second, quality was operationalised by means of a set of 24 items measuring
judgements about attributes of the products and services provided by the company (Table 5).
Each item had four ordered response categories that ranged from excellent (which was scored
3) to bad (which was scored 0). The set of attributes was assessed on the basis of previous
satisfaction research of the company (Terpstra & Van Gastel, 2004), and covered a broad
range of topics. Because it covered a broad range of topics, it was expected that the items
constituted multiple scales.
83
Table 4: Items Reflecting Quality. All Items are Counter-Indicative of Quality. Code Problem Score range
Q6a Errors in the execution of your banking affairs 0 - 1
Q6b Errors in the execution of your orders 0 - 1
Q6c Insufficient information on your banking affairs 0 - 1
Q6d Ambiguous information on your banking affairs 0 - 1
Q6e Unfair costs of banking services 0 - 1
Q6f Slow service 0 - 1
Q6g Slow money transfers 0 - 1
Q6h Not keeping an appointment 0 - 1
Q6i Insufficient accessibility by telephone 0 - 1
Q6j Insufficient accessibility by Internet 0 - 1
Q6k Insufficient accessibility of offices 0 - 1
Q6l Insufficient response to questions 0 - 1
Q6m Problems with debit cards 0 - 1
Q6n Problems with cash withdrawels 0 - 1
Q6o Problems with internet banking 0 - 1
Q6p Another problem 0 - 1
84
Table 5: Items Reflecting Quality Code Item Score range
Q7a Correct execution of orders 0 - 3
Q7b Speed of money transfers 0 - 3
Q7c Speed of service delivery 0 - 3
Q7d Adherence to promises 0 - 3
Q7e Correct execution of banking matters 0 - 3
Q7f Distribution of bank statements 0 - 3
Q8a Costs of accounts of the company 0 - 3
Q8b Convenience of products and services 0 - 3
Q8c Clarity of information provided 0 - 3
Q8d Sufficiency of information provided 0 - 3
Q8e Costs of services of the company 0 - 3
Q8f Interest rates of the company 0 - 3
Q9a Service by telephone 0 - 3
Q9b Service by the Internet 0 - 3
Q9c Service by bank offices 0 - 3
Q9d Service by mail correspondence 0 - 3
Q9e Accessibility of the company 0 - 3
Q9f Facilities for Internet banking 0 - 3
Q10a Friendliness of employees 0 - 3
Q10b Capability of employees 0 - 3
Q10c Reliability of employees 0 - 3
Q10d Openness for questions 0 - 3
Q10e Responsiveness of the company 0 - 3
Q10f Handling of complaints 0 - 3
Customer loyalty
Following Gremler and Brown (1996, 1999), customer loyalty was defined as the degree to
which a customer is doing repeat business with the company, possesses a positive attitudinal
disposition towards the provider, and considers only this provider when a need for this
service arises. According to this definition, customer loyalty encompasses (a) cognitions,
affects, and behaviour with respect to the company, and (b) a comparison of the company
with other firms. On the basis of this definition, a set of six Likert items was constructed to
operationalise customer loyalty (Table 6). Each item reflected a particular aspect of customer
85
loyalty (i.e., cognition, affect, or past behaviour), and had five ordered response categories
ranging from totally agree (which was scored 4) to totally disagree (which was scored 0). In
accordance with former studies using similar measurement instruments of customer loyalty
(e.g., Caruana, 2002; Gremler & Brown, 1999), we expected the six items to constitute a
unidimensional scale.
Table 6: Items Reflecting Customer Loyalty Code Item Aspect Score range
Q14a If I need new financial products, BANK is my first choice Cognition 0 – 4
Q14b I have more sympathy for BANK than for other banks Affect 0 – 4
Q14c* For some matters I am better of with another bank Cognition 0 – 4
Q14d* I consider switching from BANK to another bank Cognition 0 – 4
Q14e BANK offers me benefits other banks don’t offer Cognition 0 – 4
Q14f For many years BANK has been my primary bank Behaviour 0 – 4
* = item is counter-indicative of customer loyalty
Customer profitability
In Chapter 3, customer profitability (CP) was defined as the gross financial contribution of a
customer to a company in a specified period of time. Because a long time period is less
subject to behavioural anomalies than a short time period (Mulhern, 1999), we chose a time
period of a year for the measurement of CP. Thus, CP at time t was the gross financial
contribution of a customer to a company in the twelve months preceding time t.
CP consisted of interest profits and provision profits. Interest profits and provision
profits were a function of the balances held or the provisions paid by a customer on the one
hand, and the corresponding gross margins of the company on the other hand (the gross
margins are the margins of the company before the costs for servicing the customer, such as
transaction costs, contact costs, marketing costs, and overhead costs, are accounted for; see
for example Cooper & Kaplan, 1991, p. 469). For example, if a customer held 1000 euro
credit balance during one month, and the companies’ gross margin on 1 euro credit balance
was 0.002 euro per month, the interest profits yielded by the customer were equal to 2 euro.
The summation of all profits from a customer over 12 months preceding time t was labeled
CP at time t.
Three additional remarks are in order. First, CP at time t was computed monthly by the
company, and expressed in euro. The CP-figures from September 2005, September 2006, and
86
September 2007 were collected from the internal databases of the company (Section 6 of the
present chapter). Second, if an account (e.g., a mortgage) was held by two or more customers,
one of these customers was registered by the company as the primary owner of the product.
Only the accounts for which the customer was registered as the primary owner were included
in the calculation of profitability of the customer. Third, if a customer left the company, the
customer did not generate any profits from that month onwards, and after a year the profits
generated by this customer in the preceding twelve months were reduced to zero. The
company registered this as a missing value on CP at time t, but in this study this missing value
actually represents zero profits.
Interest
Interest was measured in order to test the quality of the survey data by means of correlating
items reflecting customer satisfaction and items reflecting interest (to be discussed in Section
2 from Chapter 6). We expected that items reflecting customer satisfaction were uncorrelated
with items reflecting interest, and a different result would raise suspicion about the quality of
the survey data. A customer’s interest in banking matters was operationalised on the basis of
two items (Table 7). Each item had five ordered response categories that ranged from highly
interested (which was scored 4) to not interested (which was scored 0). We expected the
items to be positively correlated.
Table 7: Items Reflecting Interest Code Item Score range
Q17 How interested are you in banking matters? 0 - 4
Q18 How interested are you in the development of new products and services
by banks?
0 - 4
3 The questionnaire
The questionnaire (Appendix 1; in Dutch) was composed of the items reflecting customer
satisfaction (represented by two item sets), trust, quality (also represented by two item sets),
customer loyalty, and interest. In addition, some items were included in the questionnaire for
business purposes, and some other items were included to optimise the design of the
questionnaire. For example, some items regarding product ownership and contacts with the
87
company were included in order to elicit the participant’s memories of the company before
the measurement of satisfaction with the company started. Furthermore, some items regarding
relations of the participant with other providers of financial services were included in order to
elicit his or her memories of other providers of financial services before proceeding with the
measurement of loyalty with the company.
The design of the questionnaire, the format of the items, and the wording of the items
were based upon general principles concerning survey research (see, e.g., Belson, 1986;
Dillman, Tortora, & Bowker, 1998; Sheatsley, 1983: Sudman & Bradburn, 1982). An
important issue was the inclusion of the no answer option among the response options of the
items. It is well known that items allowing respondents to use a no answer option may
provide problems in data analysis (e.g., Tabachnick & Fidell, 2007, pp. 62-63), and that a no
answer option may invoke satisficing (e.g., Krosnick, 1999; Krosnick & Fabrigar, 1997).
Nevertheless, because of four reasons it was decided to maintain the no answer option of
items:
(a) Interviews with participants after they had taken pre-tests of the questionnaire
revealed that they appreciated the no answer option. They claimed that they could not
answer particular items if they had no experience with the subject. An example of
such an item concerned the handling of complaints by the company. It was
considered useful to include these items in the questionnaire, in particular to collect
data on the seriousness of a particular problem.
(b) To limit the risk of satisficing (Krosnick, 1999), the item texts were kept short,
simple, and concrete in order to limit the difficulty of the participants’ task and
prevent participants from taking the easy way in answering the items thus using the
no answer option too light-heartedly;
(c) A pilot study (to be discussed in Section 5) demonstrated that the no answer option
was rarely used with respect to the satisfaction items. The response option apparently
did not invoke satisficing on this subject; and
(d) A practical reason for using the no answer option was that the questionnaire was to
be administered via the Internet. The administration mode encompassed a forcing
mechanism that required the participant to respond to an item before proceeding to
the next item. Such a mechanism may contaminate the data, because a participant
may have good reasons not to answer a particular question (Dillman et al., 1998).
Thus, the no answer option was also meant to neutralise the forcing mechanism.
88
The ordering of items within a block of items, such as the items within block Q3
(Appendix 1), was different across different administrations of the questionnaire. The effect
of the location of the satisfaction items (Q3, Q4, and Q20; Appendix 1) on the scale scores
was assessed in the pilot study. The objective of these measures was to test and to control for
order effects.
The questionnaire was improved by means of qualitative pre-tests among 10 persons and
a pilot study among 372 persons. The pre-tests (to be discussed in Section 4) demonstrated
that it took 15 to 35 minutes for participants to complete the questionnaire. We considered
this rather long and suspected this might demoralise participants, and stimulate satisficing
(e.g., Krosnick, 1999, pp. 248-249, Sheatsley, 1983, p. 223; Sudman & Bradburn, 1982, p.
262). In order to motivate participants to complete the questionnaire, we explained the
purpose of the study in the E-mail (Appendix 2; in Dutch) by which they were invited to
participate in the survey.
4 The pre-tests
The questionnaire was pre-tested between February 2005 and May 2005, by means of depth
interviews with mature customers of BANK. The first objective of the pre-tests was to test
how long it took participants to complete the questionnaire and to explore participants’
interpretations of the items in the questionnaire. The second objective of the pre-tests was to
test the first hypothesis of the empirical study (i.e., customer satisfaction is manifested in
various expressions that are mutually related but not sharply delineated; see Section 7 in
Chapter 4). The results of the pre-tests were used to improve the wording of the items and the
design of the questionnaire, before executing the pilot study and the main study. Furthermore,
the results were used to test the first hypothesis.
Target population
The target population of this study consisted of the mature customers of a Dutch retail bank.
These were adults who were registered by the company as the primary owner of at least one
banking product provided by the company.
Sample
The sample was composed of ten mature customers of the bank. Four were male and six were
female. Their age varied between 29 and 71 years. Their education ranged from professional
89
to academic. None of the persons was occupied in consumer research or the financial services
industry.
Procedure
The questionnaire was presented in paper-and-pencil format to the participant. The participant
filled out the questionnaire, and the interviewer registered the time it took to complete the
questionnaire. Afterwards, the interviewer interviewed the participant. The participant was
probed into his or her satisfaction with the company, into the meaning that he or she attached
to satisfaction with a retail bank, and into the answers he or she had given to the survey items.
The responses were registered on paper by the interviewer.
Data
The interviewer’s notes about the time span of the survey and the responses of participants to
the post-survey interview constituted the raw data.
5 The pilot study
The pilot survey was conducted in August 2005, among mature customers of the bank. The
first objective of the pilot study was to test the procedure of the survey. It was assessed (a)
how many participants completed the questionnaire, (b) how often missing values on items
occurred, and (c) what kind of comments the participants made with respect to the
questionnaire. The second objective was to test the hypotheses 12 and 13 (the hypotheses
regarding the effect of (a) location of satisfaction items and (b) ordering of response
categories on scale scores; see Section 7 in Chapter 4). The results of the pilot study were
used to decide on technical properties of the main survey, and to test the hypotheses 12 and
13.
Design
Four versions of the questionnaire were administered that differed with respect to the location
of the satisfaction items in the survey, and the ordering of the response categories of the
satisfaction items. On the basis of this design (Table 8) it was tested whether (a) the location
of the satisfaction items in the questionnaire, and (b) the ordering of the response categories
of satisfaction items, had an effect on the average satisfaction scores.
90
Table 8: Design of the Pilot Study Survey version location of items ordering of categories N
1 A A 90
2 A B 95
3 B A 89
4 B B 98
The location of the satisfaction items refers to the location of Q3, Q4 and Q20
(Appendix 1) in the questionnaire. In the survey versions 3 and 4, the locations of Q3 and Q4
on the one hand and Q20 on the other hand were reversed. The order of response categories
refers to the response categories of the Likert items, which were totally agree – agree –
neutral – disagree – totally disagree. In the survey versions 2 and 4, the response categories
were displayed in reversed order.
Target population
The target population of this study consisted of the mature customers of a Dutch retail bank.
These were adults who were registered by the company as the primary owner of at least one
banking product provided by the company.
Sample
The sample was drawn from the research panel of the company. This panel was composed of
a total of 3984 mature customers of the company who had agreed to participate in marketing
research via the Internet. The agreement encompassed that (a) the company is free to
approach the person for marketing research, (b) the person is free to participate in the research
or to decline, (c) the company is allowed to use the survey data for research purposes only,
and (d) the company is not allowed to distribute any personalised data to third parties. All
panel members could be approached by E-mail, and had a unique customer-id that was used
for identification purposes.
The reasons for using the research panel for this study were (a) its considerable size, (b)
its facilities for Internet research, and (c) the availability of a customer-id for each panel
member. The customer-id facilitated the enrichment of the survey data with the company data
that were needed in this study. The arguments in favour of the use of the research panel
outweighed the argument against the panel, which was the possibility that the panel might be
91
biased with respect to some psychological characteristics. For example, it cannot be ruled out
that (a) persons who were willing to participate in the panel had a different attitude towards
banking than persons who were not willing to participate in the panel, and (b) persons who
had access to the Internet had different psychological characteristics than persons who do not
have access to the Internet. Thus, the choice for using the research panel may have enhanced
coverage error (i.e., error due to the result that different units in the target population have
different probabilities of being included in the sample; e.g., Dillman & Bowker, 2001;
Groves, 1989).
Three additional remarks with respect to the research panel are in order. First, the
variable customer segment refers to a segmentation which reflects the value of the customers
to the company, and which was used by the company for marketing purposes. The company
distinguished three segments, which were Top Customers, Standard Customers, and
Development Customers. Each customer of the company, except the ones that were not
administered as the primary owner of a product provided by the company, was segmented in
one and only one of these segments. Because the company’s most valuable customers (i.e.,
Top Customers) were overrepresented in the research panel, the panel differed significantly
(χ2(2) = 1270, p < 0.001) from the target population with respect to the distribution of
customer segment (Table 9). Second, the panel differed significantly (χ2(2) = 324, p < 0.001)
from the target population with respect to the distribution of gender. Males were
overrepresented in the panel (Table 9). This was partly due to the overrepresentation of males
among the segment Top Customers (i.e., the segment that was overrepresented in the research
panel), and partly to unknown causes. Third, the panel differed significantly (χ2(2) = 299, p <
0.001) from the target population with respect to the distribution of age group (Table 9). The
average age in the panel was 47 years, and in the target population it was 48 years. The
average age in the target population appears to be high, but this is because only adults
constituted this population.
In total, 800 persons were invited to participate in the survey. These persons were
selected randomly from the research panel. The response rate in the pilot study was
approximately 47% (N = 372), and the participants were distributed more or less evenly
across the four versions of the questionnaire (Table 8). The distributions of customer segment,
gender, and age group within subsequently the company, the panel, and the sample are
reported in Table 9.
92
In line with our expectations, the sample differed significantly from the target
population with respect to customer segment (χ2(2) = 209, p < 0.001), gender (χ2(2) = 42, p <
0.001), and age group (χ2(2) = 35, p < 0.001). Furthermore, the sample differed significantly
from the panel with respect to customer segment (χ2(2) = 16.91, p < 0.001). Thus, respondents
differed significantly from non-respondents with respect to customer segment. The sample
was representative of the panel with respect to gender and age group.
Table 9: Distribution (Percentages) of Customer Segment, Gender, and Age Group in the Pilot Study Company Panel Sample
Customer segment
Top 30 56 64
Standard 44 32 30
Development 26 12 6
Gender
Female 44 31 30
Male 52 66 68
Unknown 4 3 2
Age group
18 to 39 years 35 30 28
40 to 59 years 38 51 52
60 years and older 27 19 20
Procedure
The survey was administered via the Internet. Persons were invited by E-mail to participate in
the survey. The questionnaire was made available at a site of the marketing research agency
that managed the survey. The questionnaire was accessible from 19 August 2005 until 4
September 2005. Persons had access to the site on the basis of a password and were identified
on the basis of a customer-id. After a participant completed the questionnaire, the data were
uploaded to the agency. The participants received a small incentive (i.e., saving points valued
10 euro). This is the common fee that the company paid to panel members that responded to a
survey of medium length.
93
Data
The research agency yielded a file containing the raw data, which were the coded responses of
the participants to the survey items (the research agency scored a no answer response as a
missing value). In order to enrich the raw data, the file was merged with the marketing
database. The merging was executed on the basis of customer-id, and it was successful for all
participants. Subsequently, three variables were added to the file, which were (a) customer
segment ultimo September 2005, (b) gender, and (c) age ultimo September 2005.
6 The main study
The main survey was conducted in October 2005, among mature customers of the bank. The
study was used to construct the measurements of the constructs, and to test the hypotheses
(see Section 7 in Chapter 4).
Target population
The target population of this study consisted of the mature customers of a Dutch retail bank.
These were adults who were registered by the company as the primary owner of at least one
banking product provided by the company.
Sample
A total of 3612 persons were invited to participate in the survey. They were the remainder of
the research panel of the company (i.e., the part of the panel that did not participate in the
pilot study). The response rate in the main study was approximately 47% (N = 1689). The
distributions of customer segment, gender, and age group within subsequently the company,
the remainder of the panel, and the sample are reported in Table 10.
In line with our expectations, the sample differed significantly from the target
population with respect to customer segment (χ2(2) = 813, p < 0.001), gender (χ2(2) = 183, p <
0.001), and age group (χ2(2) = 157, p < 0.001). Furthermore, the sample differed significantly
from the remainder of the panel with respect to customer segment (χ2(2) = 75, p < 0.001),
gender (χ2(2) = 9.95, p < 0.01), and age group (χ2(2) = 8.85, p < 0.05). Thus, respondents
differed significantly from non-respondents with respect to customer segment, gender, and
age group. For gender and age the absolute differences were very small, and for practical
purposes they may be ignored.
94
95
Table 10: Distributions (Percentages) of Customer Segment, Gender, and Age Group in the Main Study Company Remainder of Panel Sample
Customer segment
Top 30 55 61
Standard 44 32 30
Development 26 13 9
Gender
Female 44 31 30
Male 52 66 68
Unknown 4 3 2
Age group
18 to 39 years 35 30 28
40 to 59 years 38 51 52
60 years and older 27 19 20
Procedure
The survey was administered via the Internet. Persons were invited by E-mail to participate in
the survey. The questionnaire was made available at a site of the marketing research agency
that managed the survey. The questionnaire was accessible from 30 September 2005 until 16
October 2005. Persons had access to the site on the basis of a password and were identified on
the basis of a customer-id. After a participant completed the questionnaire, the data were
uploaded to the agency. The participants received a small incentive (i.e., saving points valued
10 euro). This is the common fee that the company paid to panel members that responded to a
survey of medium length.
Data
The research agency yielded a file containing the raw data, which were the coded responses of
the participants to the survey items (again, a no answer response was scored as a missing
value). In order to enrich the raw data, the file was merged with the marketing database. The
merging was executed on the basis of customer-id, and it was successful for all participants.
Subsequently, seven variables were added to the file, which were (a) customer segment ultimo
September 2005, (b) gender, (c) age ultimo September 2005, (d) CP ultimo September 2005,
(e) CP ultimo September 2006, (f) CP ultimo September 2007, and (g) indicator whether the
customer had deceased between September 2005 and September 2007.
96
Chapter 6
Results of the first empirical study into customer satisfaction with BANK
1 Introduction
This chapter addresses the results of the first empirical study into customer satisfaction with
BANK. First, the preliminary analyses are discussed. The purpose of these analyses was to
examine the data quality and to prepare the data for the subsequent analyses. Second, the
measurement analyses are discussed. The purpose of these analyses was to construct the
scales of customer satisfaction, trust, quality, and customer loyalty. Third, the tests of the
hypotheses explained in more detail in Chapter 4 are discussed. The purpose of these tests
was to collect empirical evidence regarding the validity of measurement of customer
satisfaction. Fourth, additional research into the relation between customer satisfaction and
future customer profitability (future CP) is discussed. The purpose of these analyses was to
explore this relation in more detail than we did for the tests of the hypotheses. Fifth, the
implications of the results of the empirical study are addressed. The discussion includes the
assessment of the strengths and weaknesses of the customer satisfaction scale. Sixth, the
conclusions of the study are presented.
2 Preliminary analyses
Method
This section addresses the preliminary analyses of the raw data from the pre-tests, the pilot
study and the main study.
Pre-test data
First, the data from the pre-tests were analysed. The interviewer reproduced the interviews
verbatim on the basis of the notes he made during the interview. The report of each interview
included (a) the registration of the time the participant took to complete the survey, (b) the
97
participant’s explanation of his or her satisfaction with the retail bank, and (c) the
participant’s comments on the survey and the questionnaire items.
Pilot study data
Second, the data from the pilot study were analysed. For this purpose, the dataset containing
the raw data was converted into a SAS dataset, and the items that were assumed to be counter-
indicative of the constructs (see the description of the measurement instruments in Chapter 5)
were recoded in the opposite direction. In order to get an impression of the distribution
characteristics of the variables, histograms and descriptive statistics of all variables in the
dataset were computed and examined. For this purpose, proc univariate (SAS STAT) and
proc means (SAS STAT) were used.
In order to test the data quality, the correlations between the items reflecting customer
satisfaction with the retail bank and the items reflecting interest in banking matters were
examined. For this purpose, proc corr (SAS STAT) was used. It was expected that, (a) the
items reflecting satisfaction were highly correlated, (b) the items reflecting interest were
highly correlated, and (c) the items reflecting satisfaction and the items reflecting interest
were uncorrelated.
Missing data may hamper the data analyses (e.g., Tabachnick & Fidell, 2007, p. 62).
Item-score imputation is a method for handling missing item scores in multiple-item
questionnaires. Suppose, the score of participant p on item i is missing. Then, the imputation
of an item score based on the observed part of the data for participant p and item i, to be
discussed shortly in more detail, is an effective and simple way to complete the data matrix
and not lose a large part of the sample, as with the popular missing data handling by means of
listwise deletion.
In the statistical literature (e.g., Little & Rubin, 2002; Schafer & Graham, 2002), it is
well known that the way in which missing data have to be handled depends on the mechanism
that underlies the missingness. This mechanism often is difficult to identify once the missing-
data problem has presented itself, and this complicates adequate missing-data handling in
much empirical research. For item-score missingness in multiple-item questionnaires, in
which multiple items are used to measure one underlying construct such as satisfaction,
Bernaards and Sijtsma (2000) and Van Ginkel, Van der Ark, and Sijtsma (2007) found that
imputation of item scores has little or no biasing effect on outcomes of statistical analyses
when the percentage of missing item scores in the data matrix does not exceed, say, 15
98
percent. Serious bias is absent even when the missingness mechanism cannot be ignored in
the sense that the missing item scores cannot be considered a random sample from the
complete data matrix. The explanation for this robustness is that the available data contain
much information on the underlying construct, and thus are well able to compensate for the
non-randomness of the missing data. Because in the pilot study and the main study the total
percentage of missing item scores did not exceed 15, item-score imputation could be used
safely (results discussed in the next section).
For the imputation of item scores, we used two-way imputation with normally
distributed errors (abbreviated method TW-E; e.g., Bernaards & Sijtsma, 2000; Sijtsma & van
der Ark, 2003; Van Ginkel, 2007; Van Ginkel, Van der Ark, & Sijtsma, 2007). Van Ginkel
(2007) demonstrated that this method yielded nearly unbiased results in important
psychometric quantities such as Cronbach’s alpha. Method TW-E is suited in particular for
item sets that measure one construct. Let the score of person p on item i be missing. In two-
way imputation, a real value TWpi is estimated on the basis of (a) the mean of person p’s
available scores on the other items of the scale (i.e., the person mean PMp), (b) the mean of
the available scores of the other persons in the sample on item i (i.e., the item mean IMi), and
(c) the mean of all available scores of all persons in the sample on all items which constitute
the scale (i.e., the overall mean OM), so that
TWpi = PMp + IMi – OM.
In two-way imputation with normally distributed errors, a random error εpi is added to TWpi,
so that
TWpi(E) = TWpi + εpi.
The random error is drawn from a normal distribution with zero mean and variance σε2.
Variance σε2 is obtained from the squared differences between the observed scores Xpi in the
data matrix, and the expected scores TWpi computed by means of method TW-E. If TWpi(E) is
a real number, it is rounded to the nearest integer within the range of feasible item scores, and
this rounded value is imputed in cell (p,i) of the data matrix.
Method TW-E requires that at least one item from the item set reflecting a construct is
answered by the participant. Otherwise, the person mean PMp, and consequently TWpi, cannot
99
be computed. Thus, no values were imputed for missing scores of participants who did not
answer at least one item from a particular scale.
We excluded participants with missing scores from particular analyses if it was plausible
that missing scores on an item were due to the item being non-applicable for these
participants. For example, missing scores on an item addressing quality of complaint handling
by the company (i.e., item Q10f; see Table 5 in Chapter 5) may be due to the item being non-
applicable for participants who never had a complaint about the company. Because it is
unrealistic to impute a score for a missing value that indicates that an item may not be
applicable for a participant, we did not impute values for these missing scores but rather
excluded this case from the analysis (also, see Chapter 5, in which the decision to include the
no answer option was discussed). We excluded variables from the dataset if it was suspected
that (a) the missingness was nonignorable, (b) there were no substantive arguments for the
imputation of the missing scores, and (c) the variables were considered to be dispensable for
the study. For example, two variables reflecting customer loyalty were deleted from the
dataset because of these reasons (to be further discussed in the Section Results).
Main study data
Third, the data from the main study were analysed, similar to the data from the pilot study.
These analyses included (a) the recoding of items that were assumed to be counter-indicative
of the construct of interest, (b) the examination of distribution characteristics of the variables
in the dataset, (c) the imputation of missing values, and (d) the examination of correlations
between items reflecting satisfaction and items reflecting interest in banking matters.
Furthermore, in the main study (but not in the pilot study) a weighting factor containing
weights for persons in the dataset was computed, and outlier analyses were done.
In Chapter 5, it was demonstrated that the sample differed significantly from the target
population with respect to customer segment, gender, and age group. The analyses
demonstrated that the difference with respect to customer segment between the sample and
the target population was larger than the differences with respect to gender and age group.
Because in-company research demonstrated that customer segment is an important variable in
customer profitability analyses (e.g., Terpstra, 2005) and because we intended to analyse the
relation between customer satisfaction and customer profitability, we decided to weight
participants in order to obtain proportional representation of customer segments in the sample.
Hox (1998) advocated weighting of persons if the sample is biased, and comparing the results
from statistical analyses with and without weighting. Following Hox (1998), we compared the
100
results of the analyses regarding the relation between customer satisfaction and future
customer profitability, with and without the weighting (Section 4). The weights of the
participants belonging to a particular customer segment were computed as the ratio between
the proportion of the customer segment in the company population and the proportion of the
customer segment in the sample. This means that the participants belonging to a customer
segment that was overrepresented in the sample were given a smaller weight than the
participants belonging to a segment that was underrepresented in the sample.
Univariate and multivariate outlier analyses were conducted to find cases that may
hamper the data analyses (e.g., Tabachnik & Fidell, 2007, pp. 72-77). For the detection of
univariate outliers, the histograms of variables were examined. For the detection of
multivariate outliers, the distances of persons to the centroid of the multivariate space defined
by the items in the dataset were examined. These distances can be expressed by the
Mahalanobis Distance (Mahalanobis, 1936) and by the leverage statistic, which is a function
of the Mahalanobis Distance (Tabachnick & Fidell, 2005, pp. 74-75). Let MD denote the
Mahalanobis Distance, and N the sample size, then for person p his/her leverage, denoted hpp,
is defined as:
hpp = (MD / N - 1) + (1 / N).
We chose the leverage statistic for the detection of multivariate outliers, because this statistic
is readily available in SAS. Following Tabachnick & Fidell (2007, pp. 74-75, 111-112),
regression analysis was used to calculate the leverage statistic. This was done using several
items that reflected different constructs as predictors and customer-id as criterion (because the
leverage statistic expresses the distances of persons to the centroid of the multivariate space
defined by the predictor variables in the regression analysis, the choice of the criterion
variable in the regression analysis is unimportant). Persons with a significant value for
leverage (p < 0.001) were defined as multivariate outliers, and their score patterns were
visually examined to find out what caused the high leverage value. The outliers were marked
in the dataset by an indicator variable. Furthermore, for each participant the proportion of
missing values on each set of items constituting a measurement instrument was computed. If
this proportion exceeded 0.5, a participant was marked as an outlier. To evaluate the impact of
outliers on the results, we did all analyses on the dataset including the outliers (i.e., the
complete dataset) and on the dataset without outliers (i.e., the reduced dataset).
101
Results
The pre-tests
The participants explained their satisfaction with the retail bank in different ways. The
participant’s explanations of his or her satisfaction with the retail bank are listed in Table 1,
and they are discussed in Section 4.
Table 1: Listing of Explanations of Satisfaction with the Company Participant Satisfaction Explanation of satisfaction with the retail bank
1 Very
satisfied
I feel good about [BANK]. My banking affairs are taken care of well with
[BANK].
2 Satisfied They [BANK] do nothing wrong. There is nothing to be dissatisfied about
… There is nothing to be enthusiastic about either. If [COMPETITOR]
would have current accounts, I would switch immediately.
3 Satisfied They [BANK] will not deceive you, such as [COMPETITOR] X]. That
was my former bank … [BANK] is easy to deal with, with limited costs.
4 Satisfied I’ve got the impression that they [BANK] will not deceive me, and then it’s
all right with me ... I’m not particularly concerned with banking affairs, my
partner takes care of banking affairs …
5 Very
satisfied
The staff is always friendly, and the bank is easy to deal with… I feel good
about [BANK].
6 Satisfied I trust [BANK] … I won’t go to [COMPETITOR], to me it’s important
that I can trust my bank.
7 Satisfied It [BANK] is a friendly bank … They [BANK] are accessible … There is
nothing to be dissatisfied about.
8 Satisfied They are friendly and they are accessible … Although a relative once had
an annoying incident with [BANK]. Her card was stolen and was used
abroad. First, they [BANK] refused to compensate. This is not what I
expected from [BANK].
9 Satisfied It is all right, it never goes wrong … I don’t care much about banking
affairs … I don’t have any referents.
10 Moderately
satisfied
In general it is all right, but last year I had an incident with [BANK]. It was
about the costs of banking services. They charge basic services, while they
make enormous profits with our money.
102
The pilot study
Histograms (not shown here) demonstrated that the polytomous items reflecting customer
satisfaction (two item sets), trust, quality (two item sets), customer loyalty, and interest were
single peaked, and mostly negatively skewed. For example, most participants responded
positively to items, which were indicative of satisfaction, and negatively to items, which were
counter-indicative of satisfaction (Table 2). This corresponds with the findings in other
satisfaction studies (e.g., Oliver, 1997; Peterson & Wilson, 1992). The histograms also
revealed a small group of outliers on the items adopted from the ACSI. Because these items
were not used in subsequent analyses of the pilot data, no further actions were undertaken
with respect to these outliers.
Table 2: Descriptive Statistics of Items Reflecting Customer Satisfaction (Before Imputation; N = 372) Code Item Nmiss Mean SD Skewness
Q3a At BANK I feel at home 2 2.94 0.73 -0.72 **
Q3b I am satisfied with BANK 1 2.96 0.69 -0.99 **
Q3d* There are good reasons to leave BANK 0 2.99 0.94 -1.13 **
Q3e* I have mixed feelings about BANK 2 2.72 0.95 -0.85 **
Q3g BANK meets all my requirements for a bank 0 2.67 0.83 -0.61 **
Q4a Last year I had a pleasant relationship with BANK 0 2.88 0.77 -1.36 **
Q4b BANK has met my expectations 0 2.85 0.73 -0.95 **
Q4c* I have regretted my choice for BANK 0 3.27 0.71 -1.20 **
Q4d* Last year I had some problems with BANK 1 2.85 1.03 -0.87 **
* = scored reversely, ** = p < 0.001
The descriptive statistics demonstrated a low incidence of missing values (i.e., smaller
than five percent) on the items reflecting customer satisfaction, trust, customer loyalty, and
interest, and a higher incidence of missing values on some items reflecting quality. This latter
result was probably due to items mentioning topics that were irrelevant to particular
participants; also, see Chapter 5. For the items constituting the measurement instrument for
customer satisfaction, Table 2 shows that there were few missing item scores; thus method
TW-E was used for imputing values for the missing item scores. The descriptive statistics
(i.e., mean, standard deviation, and skewness) for the items before imputation were almost
identical to the descriptive statistics for the items after imputation. This result supports the use
103
of method TW-E, and the items after imputation were used for subsequent analyses (i.e., the
analyses for the test of the hypotheses 12 and 13; see Chapter 4).
Table 3 shows the correlations between two items reflecting satisfaction (Table 1 in
Chapter 5) and two items reflecting interest in banking matters (Table 7 in Chapter 5). In
agreement with our expectations, (a) the satisfaction items correlated highly, (b) the interest
items correlated highly, and (c) the satisfaction items and the interest items were almost
uncorrelated. These results strengthened our confidence in the quality of the data.
Table 3: Correlations Between Two Items (Q3a and Q3b) reflecting Customer Satisfaction and Two Items (Q17 and Q18) Reflecting Interest Item Code Q3a Q3b Q17 Q18
At BANK I feel at home Q3a 0.72 0.08 0.03
I am satisfied with BANK Q3b 0.01 -0.04
How interested are you in banking matters? Q17 0.65
How interested are you in the development of
new products and services by banks?
Q18
The main study
The results from the preliminary analyses of the data from the main study were similar to the
results from the analyses of the pilot data. Histograms (not shown here) demonstrated that all
polytomous items reflecting customer satisfaction, trust, quality, customer loyalty, and
interest were single peaked, and mostly negatively skewed; see Table 4.
For the variables reflecting customer profitability (CP) in September 2005, September
2006, and September 2007, histograms (not shown here) showed single peaked and positively
skewed distributions. Forty-three participants had a standardised CP in September 2005,
2006, or 2007, which was larger than 3. These participants were outliers, but because they had
correct values for CP (i.e., not incorrect values due to, e.g., clerical errors), they were retained
for the data analyses.
Outliers are common in financial data. In the financial services industry, customer
profits (i.e., CP according to the gross CP conception; Chapter 3) often follow a Pareto-like
distribution (i.e., 20% of the customers is responsible for 80% of the company’s profits). To
reduce the skewness of the distribution and the influence of the outliers on subsequent
analyses, we applied a logarithmic transformation to CPt (Jack, 1967; Tabachnick & Fidell,
104
Table 4: Descriptive Statistics of Polytomous Items Reflecting Customer Satisfaction, Trust, Customer Loyalty and Interest (Before Imputation; N = 1689) Code Item Nmiss Mean SD Skewness
Customer satisfaction items
Q3a At BANK I feel at home 8 2.90 0.73 -0.73 **
Q3b I am satisfied with BANK 2 2.92 0.71 -1.21 **
Q3d* There are good reasons to leave BANK 25 3.01 0.95 -0.98 **
Q3e* I have mixed feelings about BANK 19 2.72 0.97 -0.60 **
Q3g BANK meets all my requirements for a bank 2 2.62 0.87 -0.65 **
Q4a Last year I had a pleasant relationship with BANK 10 2.82 0.71 -0.71 **
Q4b BANK has met my expectations 5 2.77 0.75 -1.00 **
Q4c* I have regretted my choice for BANK 14 3.21 0.76 -1.05 **
Q4d* Last year I had some problems with BANK 14 2.99 0.94 -1.03 **
Trust items
Q5a I can depend on BANK to treat me fairly 9 2.83 0.66 -1.00 **
Q5b I can depend on BANK to handle my banking aff. corr. 4 2.90 0.63 -1.12 **
Q5c I can depend on BANK to keep its promises 16 2.77 0.70 -1.04 **
Q5d* I sometimes doubt the competence of BANK 20 2.78 0.87 -0.72 **
Q5e* I sometimes doubt the good will of BANK 24 2.80 0.88 -0.80 **
Q5f I can trust BANK 4 2.89 0.64 -0.75 **
Q5g I can depend on BANK to serve me well 6 2.75 0.71 -0.84 **
Customer loyalty items
Q14a If I need new fin. products, BANK is my first choice 26 2.44 0.95 -0.41 **
Q14b I have more sympathy for BANK than for other banks 34 2.47 0.90 -0.41 **
Q14c* For some matters I am better of with another bank 124 1.83 1.01 0.30 **
Q14d* I consider switching from BANK to another bank 36 3.01 0.91 -0.95 **
Q14e BANK offers me benefits other banks don’t offer 97 2.18 0.82 -0.04
Q14f For many years BANK has been my primary bank 9 2.96 0.99 -1.08 **
Interest items
Q17 How interested are you in banking matters? 22 2.70 1.03 -0.47 **
Q18 How interested are you in dev. of new p&s by banks? 34 2.27 1.11 -0.22 **
ACSI items
Q20b How satisfied are you with BANK? 7 6.56 1.30 -1.13 **
Q20c To what extent does BANK meet your ideal of a bank? 48 6.11 1.44 -1.08 **
Q20d To what extent has BANK met your expectations? 19 6.54 1.37 -1.23 **
(*) = scored reversely, (**) = p < 0.001
105
2007, pp. 87-89). Let CPt denote CP at time t, TCPt transformed CPt, and ln the natural
logarithm. Because the minimum values for CPt was zero euro, we applied the following
transformation:
)1ln( += tt CPTCP .
Table 5 shows the correlations between two items reflecting satisfaction (Table 1 in
Chapter 5) and two items reflecting interest in banking matters (Table 7 in Chapter 5). In
agreement with our expectations, (a) the satisfaction items correlated highly, (b) the interest
items correlated highly, and (c) the satisfaction items and the interest items were almost
uncorrelated. These results strengthened our confidence in the quality of the data.
The items reflecting customer satisfaction (including the items from the ACSI), trust,
customer loyalty, and interest had few missing data on (i.e., 5% or less; see Table 4), so that
item-score imputation could be used safely. An exception was made for the items with respect
to customer loyalty; this is discussed shortly. The descriptive statistics of the items before
imputation were almost identical to the descriptive statistics of the items after imputation.
Some participants left more than 50 percent of the items reflecting satisfaction, trust, or
interest unanswered (Table 6). These participants were considered outliers, and indicator
variables identified them in the dataset.
Table 4 demonstrates substantial percentages of missing scores on two items reflecting
customer loyalty, which are the items Q14c (For some matters I am better off with another
bank; Nmiss = 6 percent) and Q14e (BANK offers me benefits other banks don’t offer; Nmiss =
7 percent). The meaning of item Q14c was probably too vague, because the phrase some
matters is ambiguous and imprecise, probably referring to a variety of products and services
that are provided by retail banks. The meaning of item Q14e also was probably too vague,
because the phrase offers me benefits was not articulated. Thus, it is unclear whether this
phrase refers to characteristics of the company, such as the location of a bank office or the
availability of Internet banking facilities, or to financial offers by the company, such as a
personalised interest rate. The unfortunate phrasing of these two items in combination with
the circumstance that these items were dispensable for the study led us to delete the items
from the dataset, even though the percentages of missing item scores were smaller than 15.
The missing data on the remainder of the items reflecting customer loyalty (Table 4) were
imputed using method TW-E. Some participants left more than 50 percent of the items
106
reflecting customer loyalty unanswered (Table 6). These participants were considered
outliers, and we created an indicator variable to identify them in the dataset.
Table 5: Correlations Between Two Items (Q3a and Q3b) Reflecting Customer Satisfaction and Two Items (Q17 and Q18) Reflecting Interest Item Code Q3a Q3b Q17 Q18
At BANK I feel at home Q3a 0.64 0.05 0.04
I am satisfied with BANK Q3b -0.01 -0.02
How interested are you in banking matters? Q17 0.62
How interested are you in the development of
new products and services by banks?
Q18
Table 6: Number of Participants Leaving More Than Half of the Items Unanswered
Satisfaction ACSI Trust Loyalty Interest
N 1 14 6 5 11
Histograms (not shown here) demonstrated that all polytomous items reflecting quality
were single peaked and mostly negatively skewed; see Table 7. The polytomous items
reflecting quality had many missing item scores, part of which may be due to items being
non-applicable for the participants involved (also, see Chapter 5). For example, a missing
score on an item concerning the quality of complaint-handling by the company (i.e., item
Q10f; Table 7) might indicate that the participant never had any complaints with the
company. Similarly, a missing score on an item concerning telephone service by the company
(i.e., item Q9a; Table 7), might indicate that the participant never phoned the company.
Because imputation of values for missing scores on such items would be meaningless, we
decided to exclude persons with missing scores on the polytomous items reflecting quality
from analyses of the data about quality. In general, the regular users of the BANK have a
greater chance of running into problems with transactions and services than the low-frequency
users. Thus, it is likely that the latter group is overrepresented in the missing scores on the
quality items.
In order to detect multivariate outliers, the leverage statistic was computed by means of
a regression analysis using customer-id as the criterion variable, and as the predictor variables
107
25 items reflecting customer satisfaction (Table 4), trust (Table 4), customer loyalty (Table 4;
the items Q14c and Q14e were excluded), interest (Table 4), and the items from the ACSI
(Table 4) (see Tabachnick & Fidell, 2007, pp. 75-76, 111-112). The analysis yielded 119
participants with a significant (p < 0.001) leverage value. Visual inspection of the data
demonstrated that these participants tended to give extremely positive or extremely negative
responses. Furthermore, the inspection demonstrated that the eight participants with the
highest leverage value alternated extremely positive and extremely negative responses to
different items having similar content. For example, a participant responded extremely
positive to one half of the items reflecting satisfaction with the company (i.e., items Q3a,
Q3b, Q3d, Q3e, and Q3g; Table 4) and extremely negative to the other half of the items
reflecting satisfaction with the company (i.e., items Q4a, Q4b, Q4c, and Q4d; Table 4).
Another example is a participant who answered extremely positive to one item from the ACSI
(i.e., item Q20b; Table 4) and extremely negative to the other items from the ACSI (i.e., items
Q20c and Q20d; Table 4). A third example is a participant who answered extremely negative
to all items reflecting satisfaction with the bank (customer satisfaction items; Table 4) and
extremely positive to the items from the ACSI (ACSI items; Table 4).
It was suspected that the eight participants with the highest leverage value had
responded inconsistently to the survey items. An indicator variable was created to identify
them in the dataset. This variable was joined with the variables marking the participants who
left the majority of items reflecting a particular construct unanswered (see Table 6). The union
of these variables identified 39 outliers in the dataset. These 39 outliers were excluded from
some analyses (the dataset including the 39 outliers was labeled the complete dataset, and the
dataset without the 39 outliers was labeled the reduced dataset).
The weights needed to achieve proportional representation with respect to customer
segment were computed on the basis of the distributions of customer segment in the company
population and in the sample (see Chapter 5). Subsequently, the weights were recorded in a
variable called the weighting factor (Table 8).
108
Table 7: Descriptive Statistics of Polytomous Items Reflecting Quality (N = 1689) Code Item Nmiss Mean SD Skewness
Q7a Correct execution of orders 11 2.11 0.57 -0.37 **
Q7b Speed of money transfers 12 1.67 0.82 -0.33 **
Q7c Speed of service delivery 37 1.81 0.63 -0.39 **
Q7d Adherence to promises 162 1.87 0.61 -0.63 **
Q7e Correct execution of banking matters 19 2.04 0.53 -0.31 **
Q7f Distribution of bank statements 10 1.38 0.86 -0.19
Q8a Costs of accounts of the company 201 1.09 0.70 0.27 **
Q8b Convenience of products and services 32 1.91 0.58 -0.35 **
Q8c Clarity of information provided 32 1.80 0.62 -0.61 **
Q8d Sufficiency of information provided 51 1.78 0.60 -0.57 **
Q8e Costs of services of the company 94 1.07 0.70 0.31 **
Q8f Interest rates of the company 144 0.85 0.69 0.41 **
Q9a Service by telephone 456 1.81 0.65 -0.47 **
Q9b Service by the internet 325 1.89 0.65 -0.52 **
Q9c Service by bank offices 288 1.71 0.77 -0.55 **
Q9d Service by mail correspondence 376 1.74 0.61 -0.55 **
Q9e Accessibility of the company 85 1.82 0.65 -0.50 **
Q9f Facilities for internet banking 302 1.91 0.68 -0.56 **
Q10a Friendliness of employees 202 2.02 0.57 -0.40 **
Q10b Capability of employees 250 1.84 0.60 -0.65 **
Q10c Reliability of employees 327 1.94 0.49 -0.62 **
Q10d Openness for questions 360 1.70 0.69 -0.66 **
Q10e Responsiveness of the company 219 1.97 0.50 -0.49 **
Q10f Handling of complaints 656 1.67 0.71 -0.67 **
** = p < 0.001
Table 8: Distribution of Customer Segment Within the Company, the Panel and the Sample Customer Segment Company Sample Weighting factor
Top 30% 61% 30 / 61
Standard 44% 30% 44 / 30
Development 26% 9% 26 / 9
109
3 Measurement analyses
Measurement analyses aim to construct scales and to evaluate their psychometric quality. We
used Mokken’s MH model (Chapter 4) to analyse the data representing the participants’
responses to the measurement instruments used in the empirical study. The use of the MH
model yielded the measurement scales and the participants’ scale scores. All measurement
analyses were done on the basis of the data from the main study.
The scales of customer satisfaction, customer satisfaction on the basis of the ACSI, trust,
and customer loyalty were constructed using the MH model. For this purpose, the software
program MSPwin5.0 was used (Molenaar & Sijtsma, 2000). Because it was hypothesised that
each set of items reflecting a construct constituted a unidimensional scale, the confirmatory
search strategy of Mokken scale analysis (Chapter 4) was used.
For the analysis of the item scores reflecting quality, both Mokken scale analyses and
factor analyses (Gorsuch, 1983, pp. 239-256) were used. The Mokken scale analyses were
done using MSPwin5.0, and the factor analyses were done using proc factor (SAS STAT).
Because it was expected that the items reflecting quality constituted multiple scales and we
had no hypothesis about the number of scales, we used exploratory strategies for scale
development.
Factor analysis (e.g., Bollen, 1989; Gorsuch, 1983) is a technique for investigating the
dimensionality of an item set. If the researcher has a hypothesis regarding the dimensionality
of the item set and which items load on particular factors, he or she may apply confirmatory
factor analysis to test this hypothesis. (e.g., Bollen, 1989). If the researcher does not have such
a hypothesis, exploratory factor analysis (e.g., Gorsuch, 1983) may be used for investigating
the structure of the item set and the identification of common factors that account for
correlations in the item set.
Hierarchical factor analysis (Gorsuch, 1983, pp. 239-256) is a type of exploratory factor
analysis, which may be used to explore the dimensionality in a dataset if dimensions are non-
orthogonal, meaning that factors are correlated. Instead of computing loadings for often
difficult to interpret oblique factors, the correlation matrix of oblique factors is further factor-
analysed. This analysis yields one or more higher-order factors that account for the common
variance that is due to all items, and two or more orthogonalised lower-order factors that
account for the common variance that is due to clusters of items (Gorsuch, 1983, pp. 248-
252).
Following Wirtz (2000) and Wirtz and Bateson (1995), who reported the presence of
halo effects in measurements of attribute satisfaction (Oliver, 1993), we suspected that halo
110
111
effects also could prevail in the measurement of the quality of attributes of products and
services provided by the company. These halo effects might strengthen the correlations
between all items, and cause strong correlations between factors reflecting different
dimensions of quality. Therefore we chose hierarchical factor analysis for the exploration of
the dimensionality of the data about quality. In order to explore the robustness of the results of
the factor analysis, we also applied Mokken scale analysis to the data.
Customer satisfaction
Customer satisfaction was operationalised using the measurement instrument presented in
Chapter 5 (Table 1 in Chapter 5). It was hypothesised that the nine items constitute a scale
according to the MH model. To test this hypothesis, Mokken scale analysis was done using
MSPwin5.0. First, the dimensionality of the item set was investigated using the confirmatory
strategy (Section 6 from Chapter 4). Second, the assumption of monotonicity was investigated
(Section 6 from Chapter 4). Third, the scale-score statistics (Molenaar & Sijtsma, 2000, pp.
60-61) were evaluated. Fourth, the scalability of the item set within distinct customer
segments, gender groups, and age groups was investigated. For this purpose, customer
segment, gender, and age group were defined as grouping variables (Molenaar & Sijtsma,
2000, pp. 28-29). Fifth, univariate analyses of variance were done to test whether subgroups
defined on the basis of customer segment, gender, and age differed significantly with respect
to scale scores. For this purpose, proc GLM (SAS STAT) was used. Sixth, the effect of
outliers on the results was investigated by repeating the analyses on the reduced dataset (i.e.,
the dataset without outliers, see Section 2).
The confirmatory Mokken scale analyses (item selection method = Test) demonstrated
that the nine items constituted a Mokken scale with a total-scale scalability coefficient H
equal to 0.59 and a reliability coefficient rho equal to 0.91 (Table 9). The lowest item
scalability coefficient Hi was equal to 0.50, which is well above the default lowerbound for
the Hi used in exploratory analyses (i.e., lowerbound Hi = 0.3). This result supported the
inclusion of all nine items in the scale, and thus the conception of customer satisfaction as a
unidimensional construct. The scale consists of items that are indicative of satisfaction and
items that are counter-indicative of satisfaction. This result supports the conception of
customer satisfaction as the bipolar opposite of customer dissatisfaction.
The check for item monotonicity on the basis of the default options in MSPwin5.0 (i.e.,
Minvi = 0.03 and Minsize = 168, which is 10 percent of the sample) did not reveal any
Tabl
e 9:
Cus
tom
er S
atis
fact
ion
Scal
e’s T
otal
-Sca
le S
cala
bilit
y C
oeff
icie
nts H
, Ite
m S
cala
bilit
y C
oeff
icie
nts H
i, an
d R
elia
bilit
y
Coe
ffic
ient
s Rho
in th
e C
ompl
ete
Dat
aset
(N =
168
9)
Labe
l To
tal g
roup
C
usto
mer
segm
ent
Gen
der
Age
T S
D
F M
U
18
-39
40-5
9 60
+
At B
AN
K I
feel
at h
ome
0.
57
0.57
0.
53
0.60
0.
59
0.56
0.
66
0.57
0.
59
0.48
I am
satis
fied
with
BA
NK
0.
63
0.63
0.
60
0.69
0.
67
0.61
0.
79
0.66
0.
64
0.57
Ther
e ar
e go
od re
ason
s to
leav
e B
AN
K *
0.
60
0.60
0.
57
0.62
0.
61
0.59
0.
74
0.59
0.
61
0.57
I hav
e m
ixed
feel
ings
abo
ut B
AN
K *
0.
60
0.61
0.
53
0.67
0.
58
0.60
0.
73
0.60
0.
61
0.54
BA
NK
mee
ts a
ll m
y re
quire
men
ts fo
r a b
ank
0.59
0.
59
0.55
0.
66
0.63
0.
57
0.65
0.
58
0.60
0.
57
Last
yea
r I h
ad a
ple
asan
t rel
atio
nshi
p w
ith B
AN
K
0.60
0.
60
0.58
0.
67
0.62
0.
59
0.70
0.
58
0.64
0.
54
BA
NK
has
met
my
expe
ctat
ions
0.
66
0.65
0.
63
0.73
0.
66
0.65
0.
76
0.67
0.
65
0.63
I hav
e re
gret
ted
my
choi
ce fo
r BA
NK
*
0.61
0.
60
0.60
0.
65
0.63
0.
60
0.76
0.
64
0.62
0.
54
Last
yea
r I h
ad so
me
prob
lem
s with
BA
NK
*
0.50
0.
51
0.45
0.
61
0.58
0.
46
0.67
0.
50
0.51
0.
48
H
0.59
0.
59
0.56
0.
65
0.62
0.
58
0.72
0.
60
0.60
0.
54
Rho
0.91
0.
91
0.90
0.
93
0.92
0.
90
0.94
0.
91
0.91
0.
89
112
* =
scor
ed re
vers
ely
Tabl
e 10
: Cus
tom
er S
atis
fact
ion
Scor
es in
the
Com
plet
e D
atas
et (N
= 1
689)
Cus
tom
er se
gmen
t G
ende
r A
ge g
roup
To
tal
T
S D
Fem
ale
Mal
e U
nkno
wn
18
-39
40-5
9 60
+
Mea
n 26
.44
25.6
0 23
.78
26
.23
25.8
8 24
.53
25
.31
26.1
4 26
.39
25
.96
F
16.3
5
1.99
4.67
p
< 0.
001
0.
13
<
0.01
violations of the assumption of monotonicity. This means that the ISRF’s of all items were
nondecreasing for all rest-score groups. However, the check for item monotonicity on the
basis of smaller rest-score groups (i.e., Minsize = 84, which is 5 percent of the sample)
yielded two significant violations of the assumption of monotonicity. These violations were
due to small decreases in the estimated ISRF for Q3d >= 2 (There are good reasons to leave
BANK; Table 4) (Figure 1) and the estimated ISRF for Q4c >= 4 (I have regretted my choice
for BANK; Table 4) (Figure 2). Thus, the MH model did not fit the data perfectly.
The psychometric properties of the scale were slightly improved if item Q3d was
removed from the scale. The 8-item scale yielded a total-scale scalability coefficient H equal
to 0.59 without significant violations of the assumption of monotonicity, a result that was also
found when the assumption was tested on the basis of small rest-score groups (i.e., Minsize =
84). However, it is doubtful whether the 8-item scale yielded better measurements of
satisfaction, because each item in the scale is important for sufficient content validity (i.e.,
equal coverage of all aspects of customer satisfaction in the scale). We decided to proceed
with the 9-item scale because the violations of monotonicity in the 9-item scale were small,
and the 9-item scale had the best content validity.
Figure 1: Item step response functions of item Q3d: There are good reasons to leave BANK
113
Figure 2: Item step response functions of item Q4c: I have regretted my choice for BANK
The customer satisfaction scale-score distribution is presented in Figure 3. It may be
noted that the distribution of scale scores was significantly skewed to the left (p < 0.001), and
that there were outliers in the skew tail. The negative skewness is a common result in
customer satisfaction measurements (Peterson & Wilson, 1992). It is unknown whether the
outliers were caused by extreme dissatisfaction of the corresponding participants with the
company or by stylistic responding. Stylistic responding is investigated in Chapter 8.
The Mokken scale analyses using the grouping variables customer segment (valued Top
Customers, Standard Customers, and Development Customers; see Chapter 5), gender (valued
female, male, and missing), and age (valued 18 to 39 years, 40 years to 59 years, and 60 years
onwards; see Chapter 5) demonstrated that the nine items constituted a strong Mokken scale
(i.e., H > 0.5) in each subgroup (Table 9). The checks for item monotonicity did not yield
significant violations of the assumption of monotonicity, a result that was also found for
smaller rest-score groups (i.e., Minsize = 84). For this reason, it was concluded that the 9-item
scale may be used to measure customer satisfaction in different subgroups of the target
population.
114
freq
0
50
100
150
200
250
300
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
freq
Figure 3: Distribution of customer satisfaction scores in the complete dataset (N = 1689, mean = 25.96, SD = 5.57, and skewness = -0.85)
Table 10 shows that customer segments differed significantly with respect to scale
score (based on analysis of variance). This result is consistent with results from previous
satisfaction studies done by the company (e.g., Terpstra, 2005), and it suggests that the three
customer segments differ with respect to the average satisfaction with the company. The
result also supports the pursuit of proportional representation of customer segments in
descriptive studies of customer satisfaction. Furthermore, gender groups did not differ
significantly with respect to scale score (Table 10). Age groups differed significantly with
respect to scale score (Table 10). The latter result was unexpected, but because the magnitude
of the differences between the age groups was small, we considered it unimportant in the
context of the present study.
The analyses of the reduced dataset yielded similar results as the analyses of the
complete dataset. The confirmatory Mokken scale analyses (item selection method = Test)
yielded a scale with a total-scale scalability coefficient H equal to 0.60 and a reliability
coefficient rho equal to 0.91 (Table 11). The check for item monotonicity on the basis of the
default options (i.e., Minvi = 0.03 and Minsize = 165, which is 10 percent of the sample) did
not reveal violations of the assumption of monotonicity. The same result was found for the
115
116
check for item monotonicity with smaller rest-score groups (i.e., Minsize = 83). Thus, the MH
model fitted the data in the reduced dataset.
The Mokken scale analyses using the grouping variables customer segment, gender, and
age yielded a strong Mokken scale (i.e., H > 0.5) in each subgroup (Table 11). The checks for
item monotonicity on the basis of the default options (i.e., Minvi = 0.03 and Minsize = 165,
which is 10 percent of the sample) did not yield significant violations of the assumption of
monotonicity in subgroups. However, the check for item monotonicity on the basis of smaller
rest-score groups (i.e., Minsize = 83) yielded a significant violation of the assumption of
monotonicity for item Q4c (Table 4) in the age group of 60 years and older. This was due to a
decrease of the estimated ISRF for Q4c >= 3 (i.e., the proportion of responses Q4c >= 3
decreased from 1.00 in the middle rest-score group to 0.96 in the highest rest-score group).
Because the magnitude of the decrease was small, we considered it not disturbing and we
concluded that the scale score is useful for the measurement of customer satisfaction in
different subgroups of the target population.
The customer satisfaction scale-score distribution (Figure 4) was significantly skewed
to the left (p < 0.001). Furthermore, univariate analyses of variance demonstrated that the
customer segments and the age groups differed significantly with respect to scale score (Table
12). Gender groups did not differ significantly.
freq
0
50
100
150
200
250
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
freq
Figure 4: Distribution of customer satisfaction scores in the reduced dataset (N = 1650, mean = 26.04, SD = 5.50, and skewness = -0.84)
Tabl
e 11
: Cus
tom
er S
atis
fact
ion
Scal
e’s T
otal
-Sca
le S
cala
bilit
y C
oeff
icie
nts H
, Ite
m S
cala
bilit
y C
oeff
icie
nts H
i, an
d
Rel
iabi
lity
Coe
ffic
ient
s Rho
in th
e R
educ
ed D
atas
et (N
= 1
650)
La
bel
Tota
l gro
up
Cus
tom
er se
gmen
t G
ende
r A
ge
T S
D
F M
U
18
-39
40-5
9 60
+
At B
AN
K I
feel
at h
ome
0.
57
0.58
0.
52
0.60
0.
59
0.56
0.
65
0.57
0.
58
0.53
I am
satis
fied
with
BA
NK
0.
65
0.64
0.
64
0.69
0.
66
0.64
0.
79
0.67
0.
65
0.61
Ther
e ar
e go
od re
ason
s to
leav
e B
AN
K *
0.
60
0.61
0.
57
0.62
0.
60
0.60
0.
74
0.59
0.
61
0.59
I hav
e m
ixed
feel
ings
abo
ut B
AN
K *
0.
60
0.61
0.
52
0.67
0.
58
0.60
0.
72
0.60
0.
60
0.55
BA
NK
mee
ts a
ll m
y re
quire
men
ts fo
r a b
ank
0.59
0.
59
0.55
0.
66
0.63
0.
57
0.64
0.
58
0.59
0.
60
Last
yea
r I h
ad a
ple
asan
t rel
atio
nshi
p w
ith B
AN
K
0.60
0.
60
0.57
0.
66
0.62
0.
60
0.69
0.
59
0.63
0.
55
BA
NK
has
met
my
expe
ctat
ions
0.
66
0.66
0.
62
0.74
0.
65
0.66
0.
76
0.68
0.
66
0.64
I hav
e re
gret
ted
my
choi
ce fo
r BA
NK
*
0.61
0.
60
0.60
0.
65
0.63
0.
60
0.76
0.
65
0.61
0.
54
Last
yea
r I h
ad so
me
prob
lem
s with
BA
NK
*
0.51
0.
52
0.45
0.
61
0.57
0.
47
0.67
0.
51
0.51
0.
49
H
0.60
0.
60
0.56
0.
65
0.61
0.
59
0.72
0.
60
0.60
0.
56
Rho
0.91
0.
91
0.90
0.
93
0.91
0.
91
0.94
0.
91
0.91
0.
90
117
* =
scor
ed re
vers
ely
Tabl
e 12
: Cus
tom
er S
atis
fact
ion
Scor
es in
the
Red
uced
Dat
aset
(N =
165
0)
C
usto
mer
segm
ent
Gen
der
Age
gro
up
Tota
l
T
S D
Fem
ale
Mal
e U
nkno
wn
18
-39
40-5
9 60
+
Mea
n 26
.51
25.6
8 23
.84
26
.26
25.9
9 24
.62
25
.36
26.2
3 26
.47
26
.04
F
16.3
9
1.71
5.01
p
< 0.
001
0.
18
<
0.01
American Customer Satisfaction Index
The ACSI (Table 2 in Chapter 5) was used as the second operationalisation of customer
satisfaction. The empirical data were analysed by means of Mokken scale analyses. The
analyses were done in both the complete dataset and the reduced dataset. First, the
dimensionality of the item set was investigated using the confirmatory strategy (see Chapter
4). Second, the assumption of monotonicity was tested using the default check for item
monotonicity (see Chapter 4). Third, the scale scores and the scale-score statistics were
computed.
The analyses of the complete dataset demonstrated that the three ACSI items
constituted a strong Mokken scale (Table 13). The default check for item monotonicity (i.e.,
Minvi = 0.03, Minsize = 168) did not yield violations of the assumption of monotonicity.
Thus, the MH model fitted the data. The scale-score distribution is presented in Figure 5, and
shows negative skewness, outliers in the skew tail, peaks for the scale-scores 18 and 21, and a
drop for scale-score 22. Because our major concern was the measurement of customer
satisfaction on the basis of the nine-item scale (Table 1 in Chapter 5) and we expected that the
irregularities of the ACSI score distribution would not seriously hamper the tests of the
hypotheses (Section 4), we refrained from inquiries into the causes of the irregularities of the
ACSI score distribution.
The analyses of the reduced dataset (Table 13) yielded similar results as the analyses in
the complete dataset. Thus, the MH model also fitted the data in the reduced dataset. The
scale-score distribution is presented in Figure 6. The results in the reduced dataset were
similar to the results in the complete dataset.
Table 13: ACSI’s Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Complete Dataset (CD) and the Reduced Dataset (RD) Item CD (N = 1684) RD (N = 1650)
How satisfied are you with BANK? 0.84 0.86
To what extent does BANK meet your ideal of a bank? 0.81 0.82
To what extent has BANK met your expectations? 0.82 0.83
H 0.82 0.83
Rho 0.92 0.92
118
freq
0
50
100
150
200
250
300
350
0 2 4 6 8 10 12 14 16 18 20 22 24 26
freq
Figure 5: Distribution of ACSI scores in the complete dataset (N = 1684, mean = 19.20, SD = 3.80 and skewness = -1.08)
freq
0
50
100
150
200
250
300
350
0 2 4 6 8 10 12 14 16 18 20 22 24 26
freq
Figure 6: Distribution of ACSI scores in the reduced dataset (N = 1650, mean = 19.22, SD = 3.77, and skewness = -1.06)
119
Trust
The empirical data collected by means of the trust instrument (see Chapter 5, Table 3) were
analysed by means of Mokken scale analyses. First, the dimensionality of the item set was
investigated using the confirmatory research method (Chapter 4). Second, the assumption of
monotonicity was tested on the basis of the default check for item monotonicity (Chapter 4).
Third, the scale scores and the scale-score statistics were computed.
The analyses of the complete dataset demonstrated that the seven items for trust
constituted a Mokken scale (Table 14). The default check for item monotonicity (i.e., Minvi =
0.03, Minsize = 168) yielded no violations of the assumption of monotonicity. Thus, the MH
model fitted the data. The scale-score distribution is presented in Figure 7. The distribution
was significantly skewed to the left, had outliers in the skew tail, and a large peak for scale-
score 21. Because our major concern was the measurement of customer satisfaction and we
expected that the irregularities of the trust score distribution would not seriously hamper the
tests of the hypotheses (Section 4), we refrained from inquiries into the causes of the
irregularities in the trust score distribution.
The analyses of the reduced dataset yielded similar results as the analyses of the
complete dataset (Table 14). Thus, the MH model also fitted the data in the reduced dataset.
The scale-score distribution is presented in Figure 8. The results in the reduced dataset were
similar to the results in the complete dataset.
Table 14: Trust Scale’s Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Complete Dataset (CD) and the Reduced Dataset (RD) Item CD (N = 1689) RD (N = 1650)
I can depend on BANK to treat me fairly 0.66 0.66
I can depend on BANK to handle my banking affairs correctly 0.69 0.69
I can depend on BANK to keep its promises 0.63 0.63
I sometimes doubt the competence of BANK * 0.57 0.58
I sometimes doubt the good will of BANK * 0.57 0.57
I can trust BANK 0.66 0.66
I can depend on BANK to serve me well 0.63 0.63
H 0.63 0.63
Rho 0.91 0.91
* = scored reversely
120
freq
050
100150200250300350400450500
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
freq
Figure 7: Distribution of trust scores in the complete dataset (N = 1689, mean = 19.71, SD = 4.02, and skewness = -0.71)
freq
050
100150200250300350400450500
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
freq
Figure 8: Distribution of trust scores in the reduced dataset (N = 1650, mean = 19.75, SD = 3.97, and skewness = -0.73)
121
Quality
Quality was operationalised using the set of 24 items measuring judgements of attributes of
products and services provided by the retail bank (Table 7). First, two items (i.e., Q10d and
Q10f; Table 7) were excluded from the analyses because of the large percentages of missing
values on these items. Second, because many item scores were missing due to
inappropriateness of item content for several participants, factor analysis and Mokken scale
analysis were done based on listwise deletion (see Section 2). The number of available
participants for the analyses of the remaining 22 items was N = 599 in the complete dataset
and N = 591 in the reduced dataset.
Factor analysis (e.g., Gorsuch, 1983) was used to establish the dimensionality of the
data set for the 22 quality items. Exploratory factor analysis was done to identify the factor
structure of the dataset, and hierarchical factor analysis was done to investigate the relations
among the factors. The results of these analyses were used to construct scales for quality.
Next, Mokken scale analysis was done to assess explore the robustness of the results. The
analyses were repeated in the reduced dataset.
The exploratory factor analysis with squared multiple correlations used as prior
communality estimates yielded only eleven positive eigenvalues (this is the result of inserting
estimates of the communalities in the trace of the correlation matrix; see also Tabachnick &
Fidell, 2007, p. 631), and the primary four factors explained almost 91 percent of the common
variance (Table 15). Because we expected a large number of factors, we decided to proceed
with all four factors in the hierarchical factor analysis. This decision was supported by the
simple structure (Gorsuch, 1983, pp. 176-179) of the non-orthogonally rotated (i.e., using
method promax) 4-factor solution, which was readily interpretable.
The hierarchical factor analysis was done using an iterative procedure to estimate the
communalities, and using an oblique rotation method (i.e., method promax). The eigenvalues
are reported in Table 15, and the inter-factor correlations of the four oblique-rotated factors
were high (Table 17). The factor analysis of the correlation matrix of the oblique factors
yielded one higher-order factor. The higher-order factor reflected all quality items and
accounted for approximately 72 percent of the common variance in the items (Table 16). The
four orthogonalised lower-order factors reflected quality of contact handling, quality of
Internet facilities, quality of processes, and equity of costs and revenues, respectively, and
accounted for approximately 28 percent of the common variance in the items (Table 16).
Because the major part of the common variance was explained by the higher-order
factor, we had doubts about the dimensionality of the quality items and the interpretation of
122
123
the lower-order factors. These doubts were enhanced by exploratory Mokken scale analyses
(item selection method = Search normal, and lowerbound Hi = 0.3 were used), which yielded
a 20-item scale in the complete dataset and a 21-item scale in the reduced dataset (Table 18).
It seems that a general perception of the quality of the company affected the participants’
responses to all items regarding quality of attributes of products and services provided by the
company.
Based on these results, we suspected that a halo effect (Thorndike, 1920) had affected
the responses to the items reflecting quality. Wirtz and Bateson (1995; also Wirtz, 2000)
reported a similar result in studies into drivers of customer satisfaction. In addition to the
complications caused by the missing data on the items reflecting quality, we decided to use in
the remainder of this study the data collected by means of the set of 16 items measuring the
experience of problems with BANK in the preceding twelve months (Table 4, Chapter 5).
Table 15: Eigenvalues (EV) and Percentages Common Variance Explained (PCVE) from Principal Factor Analyses (PFA) and Hierarchical Factor Analyses (HFA) on the Quality-Items Complete dataset (N = 599) Reduced dataset (N = 591)
PFA HFA PFA HFA
EV PCVE EV PCVE EV PCVE EV PCVE
1 8.44 67.955 8.46 67.196 8.30 67.425 8.32 66.667
2 1.29 10.386 1.35 10.723 1.27 10.317 1.33 10.657
3 0.93 7.488 0.98 7.784 0.95 7.717 1.00 8.013
4 0.59 4.750 0.61 4.845 0.58 4.712 0.61 4.888
5 0.40 3.221 0.37 2.939 0.40 3.249 0.37 2.965
6 0.31 2.496 0.30 2.383 0.31 2.518 0.30 2.404
7 0.22 1.771 0.21 1.668 0.23 1.868 0.22 1.763
8 0.12 0.966 0.13 1.033 0.13 1.056 0.13 1.042
9 0.08 0.644 0.09 0.715 0.09 0.731 0.10 0.801
10 0.03 0.242 0.06 0.477 0.04 0.325 0.06 0.481
11 0.01 0.081 0.03 0.238 0.01 0.081 0.04 0.321
Tabl
e 16
: Fac
tor P
atte
rn M
atric
es o
f the
Orth
ogon
alis
ed H
iera
rchi
cal F
acto
r Ana
lysi
s Sol
utio
n on
the
Qua
lity
Item
s,
in th
e C
ompl
ete
Dat
aset
and
the
Red
uced
Dat
aset
Com
plet
e D
atas
et (N
= 5
99)
Red
uced
Dat
aset
(N =
591
)
HO
LO1
LO2
LO3
LO4
HO
LO1
LO2
LO3
LO4
Cor
rect
exe
cutio
n of
ord
ers
0.63
-0.0
3-0
.02
0.42
-0
.04
0.63
-0.0
2-0
.02
0.43
-0.0
4Sp
eed
of m
oney
tran
sfer
s 0.
49-0
.02
0.00
0.26
0.
090.
48-0
.02
-0.0
10.
260.
10Sp
eed
of se
rvic
e de
liver
y 0.
700.
030.
020.
32
0.02
0.68
0.03
0.02
0.33
0.03
Adh
eren
ce to
pro
mis
es
0.70
0.10
0.09
0.17
0.
000.
690.
100.
090.
17-0
.01
Cor
rect
exe
cutio
n of
ban
king
mat
ters
0.
720.
030.
040.
33
-0.0
30.
710.
020.
040.
34-0
.03
Dis
tribu
tion
of b
ank
stat
emen
ts
0.38
0.00
0.04
0.06
0.
190.
38-0
.01
0.04
0.06
0.20
Cos
ts o
f acc
ount
s of t
he c
ompa
ny
0.49
-0.0
1-0
.03
0.05
0.
660.
50-0
.01
-0.0
30.
050.
65C
onve
nien
ce o
f pro
duct
s and
serv
ices
0.
700.
030.
170.
09
-0.0
20.
690.
020.
170.
09-0
.02
Cla
rity
of in
form
atio
n pr
ovid
ed
0.73
0.00
0.24
-0.0
2 0.
050.
72-0
.01
0.24
-0.0
30.
04Su
ffic
ienc
y of
info
rmat
ion
prov
ided
0.
730.
000.
210.
01
0.08
0.72
-0.0
20.
220.
000.
08C
osts
of s
ervi
ces o
f the
com
pany
0.
510.
03-0
.01
0.00
0.
660.
510.
03-0
.01
-0.0
10.
67In
tere
st ra
tes o
f the
com
pany
0.
38-0
.03
0.06
-0.0
4 0.
430.
40-0
.03
0.06
-0.0
40.
43Se
rvic
e by
tele
phon
e 0.
640.
310.
060.
03
-0.0
10.
630.
320.
060.
02-0
.02
Serv
ice
by th
e In
tern
et
0.68
-0.0
50.
27-0
.03
-0.0
20.
67-0
.04
0.25
-0.0
1-0
.03
Serv
ice
by b
ank
offic
es
0.40
0.17
0.06
-0.0
4 0.
080.
420.
170.
07-0
.04
0.09
Serv
ice
by m
ail c
orre
spon
denc
e 0.
640.
150.
130.
00
0.07
0.63
0.16
0.12
0.00
0.07
Acc
essi
bilit
y of
the
com
pany
0.
670.
230.
16-0
.04
-0.0
20.
660.
230.
15-0
.05
-0.0
3Fa
cilit
ies f
or In
tern
et b
anki
ng
0.64
-0.0
60.
200.
07
-0.0
50.
63-0
.05
0.19
0.09
-0.0
4Fr
iend
lines
s of e
mpl
oyee
s 0.
520.
53-0
.03
-0.0
5 0.
000.
510.
52-0
.03
-0.0
40.
01C
apab
ility
of e
mpl
oyee
s 0.
610.
47-0
.01
0.01
0.
010.
610.
46-0
.01
0.02
0.01
Rel
iabi
lity
of e
mpl
oyee
s 0.
630.
53-0
.02
0.02
-0
.02
0.62
0.52
-0.0
20.
02-0
.02
Res
pons
iven
ess o
f the
com
pany
0.
670.
510.
000.
03
-0.0
20.
660.
52-0
.01
0.03
-0.0
2
124
HO
= h
ighe
r ord
er fa
ctor
, LO
1 =
first
low
er o
rder
fact
or, L
O2
= se
cond
low
er o
rder
fact
or, L
O3
= th
ird lo
wer
ord
er fa
ctor
, and
LO
4 =
four
th lo
wer
or
der f
acto
r.
Table 17: Correlations Between the Four Factors Representing Quality (Upper Half = Complete Dataset, Lower Half = Reduced Dataset) Factor1 Factor2 Factor3 Factor4
Factor1 0.68 0.63 0.41
Factor2 0.67 0.73 0.53
Factor3 0.62 0.72 0.46
Factor4 0.42 0.54 0.44
Table 18: Quality Scale’s Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Complete Dataset and the Reduced Dataset Item Complete dataset Reduced dataset
Correct execution of orders 0.47 0.45
Speed of money transfers 0.35 0.34
Speed of service delivery 0.49 0.47
Adherence to promises 0.49 0.47
Correct execution of banking matters 0.53 0.51
Distribution of bank statements * *
Costs of accounts of the company 0.38 0.40
Convenience of products and services 0.49 0.47
Clarity of information provided 0.50 0.48
Sufficiency of information provided 0.50 0.49
Costs of services of the company 0.40 0.40
Interest rates of the company * 0.30
Service by telephone 0.48 0.46
Service by the internet 0.45 0.44
Service by bank offices 0.32 0.32
Service by mail correspondence 0.49 0.47
Accessibility of the company 0.49 0.46
Facilities for internet banking 0.44 0.42
Friendliness of employees 0.43 0.41
Capability of employees 0.47 0.46
Reliability of employees 0.52 0.50
Responsivenss of the company 0.52 0.50
H 0.46 0.43
Rho 0.93 0.93
* = excluded from the scale because item scalability coefficient Hi < 0.3
125
The distribution of the number of problems with BANK in the preceding twelve months
is presented in Table 19. In both the complete dataset and the reduced dataset, 57 percent of
the participants mentioned the incidence of at least one problem with BANK in the preceding
twelve months.
Exploratory Mokken scale analyses (item selection method = Search normal, and
lowerbound Hi = 0.3 were used) yielded five scales of two items each, and six items that were
non-scalable. This result indicates that the responses to the items were not the result of a
unidimensional trait such as a general perception of the quality of the company. This result is
consistent with the conception of quality as a multidimensional construct.
In the remainder of this study, quality was re-defined as absence of problems. This
definition of quality is in line with the conception of quality as absence of failures (e.g.,
Garvin, 1983; Kackar, 1989, p. 6; Woodall, 2001; see also Chapter 3). Because the experience
of a problem is counter-indicative of quality, the items reflecting experience of problems were
recoded into the opposite direction (Section 2). Quality was then operationalised as the total
score on the 16 recoded items regarding the incidence of problems with BANK in the
preceding twelve months (Table 19). The quality score (i.e., total score) ranged from 0 (if the
participant had 16 problems with BANK in the preceding 12 months) to 16 (if the participant
had 0 problems with BANK in the preceding 12 months).
The distribution of the quality scores was negatively skewed (Table 19). This may
hamper the tests of the hypothesis 5 (i.e., satisfaction scores are positively related to quality
scores) and hypothesis 9 (i.e., satisfaction scores are not contaminated by quality). Following
a suggestion of Tabachnick and Fidell (2007, pp. 87-89) to reflect negatively skewed
variables and transform the reflected variables, we applied a logarithmic transformation to the
variable number of problems. Let NP denote the number of problems, TNP transformed NP,
and ln the natural logarithm. Because the minimum value for NP was zero, we applied the
following transformation:
)1ln( += NPTNP .
The hypotheses 5 and 9 were tested once using the quality scores, and once using TNP.
126
Table 19: Distribution of the Number of Problems (NP), Transformed Number of Problems (TNP), and Quality Scores in the Complete Dataset and the Reduced Dataset
NP TNP Quality Score Percentage in Complete
Dataset (N=1689)
Percentage in Reduced
Dataset (N=1650)
0 0.69 16 43 43
1 1.10 15 25 25
2 1.39 14 16 16
3 1.61 13 8 8
4 1.79 12 4 4
5 1.95 11 2 2
6 2.08 10 1 1
>= 7 >=2.20 <=9 1 1
Customer loyalty
Two items (i.e., Q14c and Q14e; Table 4) were deleted from the customer loyalty item set
(Table 6, Chapter 5) due to unfortunate item wording (Section 2). Mokken scale analyses
were done to investigate whether the MH model fitted the data from the remaining four items.
The analyses were done in both the complete dataset and the reduced dataset. First, the
dimensionality of the data was investigated using the confirmatory research method (Chapter
4). Second, the assumption of monotonicity was tested on the basis of the default check for
item monotonicity (Chapter 4). Third, the scale scores and the scale-score statistics were
computed.
The analyses of the complete dataset yielded a total-scale scalability coefficient H
equal to 0.54 and a reliability coefficient rho equal to 0.80. However, the default check for
item monotonicity (i.e., Minvi = 0.03 and Minsize = 168) revealed significant violations of the
assumption of monotonicity. The estimated ISRF for Q14d >= 3 (I consider switching from
BANK to another bank; Table 4) decreased at the end of the rest-score scale, and the estimated
ISRF for Q14d>= 4 decreased at the beginning of the scale (Figure 9). The checks for item
monotonicity with smaller rest-score groups (i.e., Minsize = 84) revealed that the estimated
ISRF for Q14d >= 3 also decreased at the beginning of the scale (Figure 10). Thus, the MH
model did not fit the data. The analyses in the reduced dataset corroborated the results found
in the reduced dataset. Thus, the MH model also did not fit the data in the reduced dataset.
127
Figure 9: Item step response functions of item Q14d: I consider switching from BANK to another bank (Minsize = 168)
Figure 10: Item step response functions of item Q14d: I consider switching from BANK to another bank (Minsize=84)
128
Because the violations of monotonicity were substantial, we decided to repeat the
measurement analyses without item Q14d (Table 4). The analyses of the complete dataset
yielded a total-scale scalability coefficient H of 0.64 and a reliability coefficient rho of 0.82
(Table 20). The default item-check for monotonicity (i.e., Minvi = 0.03 and Minsize = 168)
did not reveal violations of the assumption of monotonicity. This result was also found with
smaller rest-score groups (i.e., Minsize = 84). Thus, the MH model fitted the data for the three
items in the complete dataset. The analyses of the reduced dataset yielded similar results as
the analyses of the complete dataset (Table 20). Thus, the MH model also fitted the three
items in the reduced dataset.
The content validity of the 3-item scale was considered sufficient because the three
items reflected the three aspects of customer loyalty (see Table 6 in Chapter 5). Because of
sufficient coverage of customer loyalty and because the 3-item scale met the requirements of
the MH-model, we decided to use the 3-item scale to measure customer loyalty in all
subsequent analyses. The corresponding scale-score distributions are presented in Figure 11
(complete dataset) and Figure 12 (reduced dataset). The scale-score distributions were skewed
to the left. We refrained from inquiries into the cause of the skewness, because we expected
that the skewness would not seriously hamper the test of the hypotheses (Section 4).
Table 20: Customer Loyalty Scale’s Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Complete Dataset (CD) and the Reduced Dataset (RD) Item CD (N = 1686) RD (N = 1650)
If I need new financial products, BANK is my first choice 0.65 0.64
I have more sympathy for BANK than for other banks 0.63 0.63
For many years BANK has been my primary bank 0.65 0.64
H 0.64 0.64
Rho 0.82 0.81
129
freq
050
100150200250300350400450
0 1 2 3 4 5 6 7 8 9 10 11 12
freq
Figure 11: Distribution of customer loyalty scores in the complete dataset (N = 1686, mean = 7.87, SD = 2.42, and skewness = -0.62)
freq
050
100150200250300350400450
0 1 2 3 4 5 6 7 8 9 10 11 12
freq
Figure 12: Distribution of customer loyalty scores in the reduced dataset (N = 1650, mean = 7.89, SD = 2.40, and skewness = -0.62)
130
4 Tests of the hypotheses
In this section, the tests of the hypotheses (see Chapter 4) are discussed. Successively, we
tested the hypotheses regarding explicit construct representation, concept-related irrelevant
variance, method-related irrelevant variance, and implicit construct representation. The
purpose of these tests was to collect empirical evidence with respect to the validity of the
measurement of customer satisfaction.
Explicit construct representation
Hypothesis 1 was: customer satisfaction is manifested in various expressions that are
mutually related but not sharply delineated. The hypothesis was tested by means of an
examination of the verbal explanations of satisfaction given by the participants to the pre-
tests. The pre-tests data demonstrated that participants attached diverse meanings to the term
satisfaction (see Table 1). When asked to explain their satisfaction with the company in their
own words, participants answered in terms of (a) general affect, (b) friendliness, (c) past
performances, (d) qualities of the company, (e) absence of dissatisfaction, and (f) trust in the
company. With respect to the last result, some participants answered ‘I trust the company’,
‘The company will not deceive me, such as … did ’, or ‘I don’t think they deceive me’. These
answers indicate that overall satisfaction with a particular retail bank and trust of the bank are
strongly interrelated. The results support the hypothesis that satisfaction is manifested in
various expressions that are mutually related but not sharply delineated.
Hypothesis 2 was: the satisfaction items constitute a scale according to the MH model.
The hypothesis was supported by the results of the measurement analyses (see Section 3),
which demonstrated that the items constituted a strong MH model scale in the whole sample
as well as all subgroups investigated.
Hypothesis 3 was: the satisfaction measure is positively correlated to other measures of
satisfaction. The hypothesis was tested by means of correlation analyses between the
satisfaction measure and the ACSI. The correlation was significant (p < 0.001) in both the
complete dataset and the reduced dataset (Table 21). Thus, the hypothesis was supported.
Table 21: Product-Moment Correlations (r) Between Satisfaction and the ACSI
Complete Dataset (N = 1681) Reduced Dataset (N = 1650)
r 95%-interval for ρ r 95%-interval for ρ
0.78* 80.076.0 ≤≤ ρ 0.79* 81.077.0 ≤≤ ρ
* = p <0.001
131
Concept-related irrelevant variance
Following Oort (1996), the hypotheses regarding concept-related irrelevant variance were
tested using restricted factor analysis. Restricted factor analysis is confirmatory factor
analysis with particular restrictions on the loadings. In restricted factor analysis, a model is
specified such that the indicators of the trait load on the factor reflecting the trait, and not on
the factor reflecting the violator. Thus, the loadings of the indicators of the trait on the factor
reflecting the violator are restricted to the value 0. The loadings of the indicators reflecting the
the violator on the factor reflecting the trait are also restricted to the value 0. Then, the fit of
the model is evaluated in order to determine the model’s tenability.
Oort (1996, pp. 46-49) suggested to use the modification indices (MI’s) or adjusted
modification indices (AMI’s; to be discussed later) to detect item bias (i.e., whether particular
indicators reflecting the trait are biased with respect to a violator). The MI is a statistic which
reveals how much the fit of the model will improve if the factor loading of an indicator I of
trait T on violator V is set free to be estimated. The MI is approximately chi-squared
distributed with one degree of freedom (Bollen, 1989, p. 299). If the MI’s reveal that the fit of
the model will improve significantly by allowing a particular indicator I to load on violator V,
this means that indicator I is biased with respect to violator V, and that the measurement of
trait T is contaminated with respect to violator V. If the MI’s reveal that the fit of the model
will not be improved significantly by allowing a particular indicator I to load on violator V,
this means that none of the indicators I is biased with respect to violator V, and that the
measurement of trait T is not contaminated with respect to violator V.
A larger number of significance tests and a larger sample size increase the likelihood of
finding significant MI’s and of obtaining false positives. In order to reduce the risk of false
positives, Oort (1996, p. 49) suggested to use AMI’s to detect biased items. The AMI is a
statistic, which reduces the power of the MI, and thus is useful for the detection of items that
are substantially biased. The AMI is defined as:
AMI = ((df – 1) / (χ2 – MI)) * MI,
where χ2 is the chi-squared value and df is the degrees of freedom under the null model (i.e.,
the restricted model). If the AMI exceeds a critical chi-squared value with one degree of
freedom, such as the critical chi-squared value for the 5 percent level of significance (i.e, χ2 =
3.84), the item is judged to be biased.
132
In this study, restricted factor analysis was performed using proc calis (SAS STAT).
Thus, a model was specified in which the nine items reflecting satisfaction loaded on a factor
reflecting satisfaction, and not on a factor reflecting the violator under investigation (Figure
13). The indicator of the violator loaded on the factor reflecting the violator and not on the
factor reflecting satisfaction. Because only one indicator loaded on the factor reflecting the
violator, no error term was specified for that indicator (Oort, 1996, p. 47). The AMI’s were
calculated by hand on the basis of the chi-squared value and the degrees of freedom under the
null model, and the MI’s of the nine items reflecting satisfaction. The fit of the model was
evaluated on the basis of the goodness of fit index (GFI), the normed fit index (NFI), and the
non-normed fit index (NNFI). As a rule of thumb, indices having a value of 0.90 or higher
indicate an acceptable fit (e.g., Bollen, 1989, pp. 269-281). The analyses were performed on
both the complete dataset (N = 1689) and the reduced dataset (N = 1650).
I10 I1 I2 I9
E1 E2
…………..
E9
V S
Figure 13. Graphical display of the factor model with nine indicators of customer satisfaction and one indicator of the violator.
Hypothesis 8 was: the satisfaction scores are not contaminated by trust. This
hypothesis was tested by means of a restricted factor analysis model using the nine items
reflecting satisfaction with the company, and the trust score. The factor model was specified
such that the nine items reflecting satisfaction loaded on the factor reflecting satisfaction, and
the trust score loaded on the factor reflecting the violator. The factor model fitted the data
133
well (i.e., GFI = 0.92; NFI = 0.93; NNFI = 0.92), and none of the AMI’s was significant
(Table 22; the complete dataset). Similar results were found in the reduced dataset (Table 22;
the reduced dataset). Thus, none of the items reflecting satisfaction was significantly biased
with respect to trust, and the hypothesis was supported.
Table 22: AMI’s of the Satisfaction Items in the Complete Dataset and the Reduced Dataset Complete dataset
(N=1689)
Reduced dataset
(N=1650)
Item AMI p-value AMI p-value
At BANK I feel at home 0.02 ns 0.00 ns
I am satisfied with BANK 0.66 ns 0.67 ns
There are good reasons to leave BANK * 0.18 ns 0.45 ns
I have mixed feelings about BANK * 0.11 ns 0.02 ns
BANK meets all my requirements for a bank 0.29 ns 0.65 ns
Last year I had a pleasant relationship with BANK 0.82 ns 1.52 ns
BANK has met my expectations 0.07 ns 0.05 ns
I have regretted my choice for BANK * 0.02 ns 0.10 ns
Last year I had some problems with BANK * 1.42 ns 1.42 ns
* = scored reversely
Hypothesis 9 was: the satisfaction scores are not contaminated by quality. This
hypothesis was tested by means of a restricted factor analysis model using the nine items
reflecting satisfaction with the company and the quality scores (because the analyses using
quality scores yielded similar results as the analyses using TNP (Section 3), we reported the
results from the former analyses). The factor model was specified such that the nine items
reflecting satisfaction loaded on the factor reflecting satisfaction, and the quality score loaded
on the factor reflecting the violator. The factor model did not fit the data well (i.e., NNFI =
0.89, which is below the critical value of 0.90 for the NNFI), and the AMI of item Q4d (Last
year I had some problems with BANK; Table 4) was significant (Table 23; the complete
dataset). Similar results were found in the reduced dataset (Table 23; the reduced dataset).
Thus, item Q4d was significantly biased with respect to quality, and the hypothesis was not
supported.
A restricted factor analysis without item Q4d (i.e., the factor model was specified such
that the remaining eight items reflecting satisfaction loaded on the factor reflecting
134
satisfaction) yielded a good fit (i.e., GFI = 0.93, NFI = 0.93, and NNFI = 0.91; the complete
dataset), and none of the AMI’s was significant. Similar results were found in the reduced
dataset. Thus, the contamination of satisfaction scores by quality was due to item Q4d only.
Table 23: AMI’s of the Satisfaction Items in the Complete Dataset and the Reduced Dataset Complete dataset
(N=1689)
Reduced dataset
(N=1650)
Item AMI p-value AMI p-value
At BANK I feel at home 1.74 ns 1.81 ns
I am satisfied with BANK 0.59 ns 0.55 ns
There are good reasons to leave BANK * 0.02 ns 0.00 ns
I have mixed feelings about BANK * 0.46 ns 0.45 ns
BANK meets all my requirements for a bank 0.01 ns 0.01 ns
Last year I had a pleasant relationship with BANK 0.04 ns 0.00 ns
BANK has met my expectations 0.00 ns 0.00 ns
I have regretted my choice for BANK * 0.14 ns 0.18 ns
Last year I had some problems with BANK * 6.52 <0.05 6.87 <0.01
* = scored reversely
Hypothesis 10 was: the satisfaction scores are not contaminated by loyalty. This
hypothesis was tested by means of a restricted factor analysis model using the nine items
reflecting satisfaction with the company, and the loyalty score. The factor model was
specified such that the nine items reflecting satisfaction loaded on the factor reflecting
satisfaction, and the loyalty score loaded on the factor reflecting the violator. The factor
model did not fit the data well (i.e., NNFI = 0.88, which is below the critical value of 0.90 for
the NNFI), and the AMI of item Q3a (At BANK I feel at home; Table 4) was significant (Table
24; the complete dataset). Similar results were found in the reduced dataset (Table 24; the
reduced dataset). Thus, item Q3a was significantly biased with respect to customer loyalty,
and the hypothesis was not supported.
A restricted factor analysis without item Q3a (i.e., the factor model was specified such
that the remaining eight items reflecting satisfaction loaded on the factor reflecting
satisfaction) yielded a good fit (i.e., GFI = 0.93, NFI = 0.93, and NNFI = 0.91; the complete
dataset), and none of the AMI’s was significant. Similar results were found in the reduced
dataset. Thus, the contamination of satisfaction scores by customer loyalty was due to item
Q3a only.
135
Table 24: AMI’s of the Satisfaction Items in the Complete Dataset and the Reduced Dataset Complete dataset
(N=1686)
Reduced dataset
(N=1650)
Item AMI p-value AMI p-value
At BANK I feel at home 10.73 <0.01 12.00 <0.01
I am satisfied with BANK 0.00 ns 0.01 ns
There are good reasons to leave BANK * 0.00 ns 0.04 ns
I have mixed feelings about BANK * 0.34 ns 0.42 ns
BANK meets all my requirements for a bank 0.18 ns 0.24 ns
Last year I had a pleasant relationship with BANK 0.00 ns 0.01 ns
BANK has met my expectations 1.16 ns 1.17 ns
I have regretted my choice for BANK * 0.04 ns 0.02 ns
Last year I had some problems with BANK * 2.84 ns 3.49 ns
* = scored reversely
Hypothesis 11 was: the satisfaction scores are not contaminated by current customer
profitability. This hypothesis was tested by means of a restricted factor analysis model using
the nine items reflecting satisfaction with the company, and TCP2005 (i.e., the logarithmic
transformation of CP2005; Section 2). The factor model was specified such that the nine items
reflecting satisfaction loaded on the factor reflecting satisfaction, and TCP2005 loaded on the
factor reflecting the violator. The factor model fitted the data well (i.e., GFI = 0.92, NFI =
0.92, and NNFI = 0.90), and none of the AMI’s was significant (Table 25; the complete
dataset). Similar results were found in the reduced dataset (Table 25; the reduced dataset).
Thus, none of the items reflecting satisfaction was significantly biased with respect TCP2005,
and the hypothesis was supported.
136
Table 25: AMI’s of the Satisfaction Items in the Complete Dataset and the Reduced Dataset Complete dataset
(N=1689)
Reduced dataset
(N=1650)
Item AMI p-value AMI p-value
At BANK I feel at home 0.71 ns 0.87 ns
I am satisfied with BANK 0.01 ns 0.02 ns
There are good reasons to leave BANK * 0.13 ns 0.12 ns
I have mixed feelings about BANK * 0.03 ns 0.01 ns
BANK meets all my requirements for a bank 0.00 ns 0.00 ns
Last year I had a pleasant relationship with BANK 0.30 ns 0.33 ns
BANK has met my expectations 0.00 ns 0.00 ns
I have regretted my choice for BANK * 0.01 ns 0.02 ns
Last year I had some problems with BANK * 1.07 ns 1.09 ns
* = scored reversely
Method-related irrelevant variance
Hypothesis 12 was: the satisfaction scores are not affected by the location of the satisfaction
items in the questionnaire. The hypothesis was tested by means of a t-test of the difference
between the average satisfaction score in the versions 1 and 2 of the pilot study, and the
average satisfaction score in the versions 3 and 4 of the pilot study (see Table 8 in Chapter 5;
note that the satisfaction score is the total score on the 9-item satisfaction scale). Because the
difference was not significant (Table 26), the hypothesis was supported.
Table 26: Differences of Satisfaction Scores in Groups of the Pilot Study Groups Difference t-statistic p-value
Hypothesis 12 -0.70 -1.19 ns
Hypothesis 13 -0.46 -0.79 ns
Hypothesis 13 was: the satisfaction scores are not affected by the presentation of the
response categories of the satisfaction items. The hypothesis was tested by means of a t-test
of the difference between the average satisfaction score in the versions 1 and 3 of the pilot
study, and the average satisfaction score in the versions 2 and 4 of the pilot study (see Table 8
in Chapter 5; note that the satisfaction score is the total score on the 9-item satisfaction scale).
Because the difference was not significant (Table 26), the hypothesis was supported.
137
Hypothesis 14 was: the satisfaction scores are not affected by the midpoint response
style. The test of this hypothesis required the measurement of general midpoint responding
(e.g., Baumgartner & Steenkamp, 2001, 2006). Because the present study did not yield
suitable data to create a suitable measure of general midpoint responding, the hypothesis was
tested in the second empirical study (Chapter 8).
Hypothesis 15 was: the satisfaction scores are not affected by the extreme response
style. The test of this hypothesis required the measurement of general extreme responding
(e.g., Baumgartner & Steenkamp, 2001, 2006). Because the present study did not yield
suitable data to create a suitable measure of general extreme responding, the hypothesis was
tested in the second empirical study (Chapter 8).
Implicit construct representation
The hypotheses regarding implicit construct representation were tested last, because the
results of the tests of other hypotheses were used in the tests of the hypotheses regarding
implicit construct representation. First, the test of hypothesis 9 demonstrated that item Q4d
(Last year I had some problems with BANK; Table 4) was biased with respect to quality. The
use of this item in the satisfaction scale was expected to inflate the correlation between
customer satisfaction and quality. Therefore, we decided to exclude the item from the
satisfaction scale when testing the hypothesis regarding the relation between customer
satisfaction and quality. Second, the test of hypothesis 10 demonstrated that item Q3a (At
BANK I feel at home; Table 4) was biased with respect to customer loyalty. The use of this
item in the satisfaction scale was also expected to inflate the correlation between customer
satisfaction and customer loyalty. Therefore, we decided to exclude this item from the
satisfaction scale when testing the hypothesis regarding the relation between customer
satisfaction and customer loyalty. The hypotheses concerning the relation of satisfaction
scores to trust scores, quality scores, and loyalty scores, respectively, were tested by means of
correlation analyses.
Hypothesis 4 was: satisfaction scores are positively related to trust scores. This
hypothesis was tested using the total score on the customer satisfaction scale and the total
score on the trust scale. The product-moment correlation between customer satisfaction and
trust was positive and significant (p < 0.001) in both the complete dataset and the reduced
dataset (see Table 27). Thus, the hypothesis was supported.
138
Hypothesis 5 was: satisfaction scores are positively related to quality scores. In order
to test this hypothesis, item Q4d (Table 4) was excluded from the customer satisfaction scale.
Thus, the hypothesis was tested using the total score on the 8-item customer satisfaction scale
and the quality scores (because the analyses using quality scores yielded similar results as the
analyses using TNP (Section 3), except that the correlations in the former analyses were
positive and the correlations in the latter analyses were negative, we reported the results from
the former analyses). The product-moment correlation between customer satisfaction and
quality was positive and significant (p < 0.001) in both the complete dataset and the reduced
dataset (Table 27). This means that the fewer problems a participant has had with BANK, the
higher his or her satisfaction with BANK was. Thus, the hypothesis was supported. Because it
may also be interesting to examine the relations between the experience of singular problems
and customer satisfaction, these relations were also reported (Table 28). These relations were
negative because the experience of a problem is counter-indicative of quality.
Hypothesis 6 was: satisfaction scores are positively related to loyalty scores. In order to
test this hypothesis, item Q3a (Table 4) was excluded from the customer satisfaction scale.
Thus, the hypothesis was tested using the total score on the 8-item customer satisfaction scale
and the total score on the customer loyalty scale. The product-moment correlation between
customer satisfaction and customer loyalty was positive and significant (p < 0.001) in both the
complete dataset and the reduced dataset (see Table 27). Thus, the hypothesis was supported.
Table 27: Product-Moment Correlations Between Customer Satisfaction and Other Concepts Complete dataset (N = 1689) Reduced dataset (N = 1650)
r 95%-interval for ρ r 95%-interval for ρ
Trust 0.78* 80.076.0 ≤≤ ρ 0.79* 81.077.0 ≤≤ ρ
Quality 0.47* 51.043.0 ≤≤ ρ 0.48* 52.044.0 ≤≤ ρ
Loyalty 0.51* 55.047.0 ≤≤ ρ 0.51* 55.047.0 ≤≤ ρ
* = p < 0.001
139
Table 28: Relations Between the Incidence of Singular Problems and Customer Satisfaction Complete dataset
(N=1689)
Reduced dataset
(N=1650)
Item Proportion Polychoric
correlation
Proportion Polychoric
correlation
Errors in the execution of your banking affairs 0.03 -0.33 0.03 -0.33
Errors in the execution of your orders 0.05 -0.28 0.05 -0.27
Insufficient information on your banking affairs 0.04 -0.44 0.05 -0.45
Ambiguous information on your banking affairs 0.06 -0.38 0.06 -0.38
Unfair costs of banking services 0.12 -0.40 0.12 -0.40
Slow service 0.06 -0.43 0.06 -0.45
Slow money transfers 0.16 -0.32 0.16 -0.33
Not keeping an appointment 0.03 -0.33 0.03 -0.32
Insufficient accessibility by telephone 0.05 -0.24 0.05 -0.24
Insufficient accessibility by internet 0.12 -0.24 0.12 -0.24
Insufficient accessibility of offices 0.09 -0.18 0.09 -0.18
Insufficient response to questions 0.06 -0.47 0.06 -0.47
Problems with debit cards 0.07 -0.20 0.07 -0.21
Problems with cash withdrawels 0.04 -0.09* 0.04 -0.10*
Problems with internet banking 0.14 -0.21 0.14 -0.22
Another problem 0.08 -0.29 0.08 -0.28
* = not significant at p <0.05
Hypothesis 7 was: satisfaction scores are positively related to future customer
profitability. In Chapter 3, the following model was suggested for the relation between
customer satisfaction (denoted CSt=0), other independent variables (denoted Xi), and future
customer profitability (denoted CPt>0):
εγβα ++++= ∑=> iitt XCSCP ...00 .
Because customer satisfaction was measured in September 2005, CSt=0 was operationalised as
customer satisfaction in September 2005 (denoted CS2005). Because it was expected that the
influence of customer satisfaction on CP manifested after one year (Section 5 from Chapter
3), CPt>0 was operationalised as CP in September 2006 (denoted CP2006). Furthermore,
140
because former studies indicated that current CP accounts for the largest part of future CP
(Section 5 from Chapter 3), Xi was operationalised as CP in September 2005 (denoted CP2005).
The preliminary analyses demonstrated that the distributions of CP2005 and CP2006 were
positively skewed and included many outliers in the skew tail (Section 2). Therefore, CP2005
and CP2006 were logarithmically transformed. The logarithmic transformation of CP2005 was
denoted TCP2005 and the logarithmic transformation of CP2006 was denoted TCP2006 (Section
2). Hypothesis 7 was tested by means of a regression analysis of TCP2006 on TCP2005 and
CS2005. TCP2006’ is the predicted value of TCP2006, a is the intercept, b1 is the effect of TCP2005
on TCP2006, and b2 is the effect of TCS2005 on TCP2006. The regression model was:
TCP2006’ = a + b1TCP2005 + b2CS2005. (Model 1)
We used hierarchical regression analyses (e.g., Cohen & Cohen, 1983, pp. 120-122; Hays,
1988, pp. 662-665; Tabachnick & Fidell, 2007, pp. 138-147) to compute the contribution of
each predictor to the explanation of TCP2006. Because we expected that TCP2005 accounted for
the largest part of TCP2006, TCP2005 was entered first in the analyses and CS2005 was entered
second. Statistic Fseq expresses the significance of sequential entries of predictor variables for
the explanation of the criterion variable. Let RM denote the restricted model without the
predictor variable of interest, ERM the error sum of squares under the restricted model, dfRM
the degrees of freedom under the restricted model, FM the full model including the predictor,
EFM the error sum of squares under the full model, and dfFM the degrees op freedom under the
full model. Then statistic Fseq is defined as (Maxwell & Delaney, 1990, pp. 73-74):
FMFM
FMRMFMRMseq /
)/()(dfE
dfdfEEF −−= .
which theoretically follows an F distribution with dfRM – dfFM and dfFM degrees of freedom.
Following Cohen and Cohen (1983, p.155), we also computed the effect size (denoted f2) for
sequential entries of predictor variables. Let be the variance explained under the
restricted model and the variance explained under the full model. Then effect size f2 is
defined as:
RM2R
FM2R
FM2
RM2
FM2
2
1 RRRf
−−
= .
141
142
The regression analyses were done using proc reg (SAS STAT). To assess the
robustness of the results, the full model was tested with and without weighting of participants
(Section 2), and with and without outliers (i.e., the complete dataset and the reduced dataset,
respectively). Thus, four regression analyses were done; the first analysis was in the complete
dataset without weighting of participants, the second analysis in the complete dataset with
weighting of participants, the third analysis in the reduced dataset without weighting of
participants, and the fourth analysis in the reduced dataset with weighting of participants.
Seven participants were excluded from the analyses because they had deceased since
September 2005.
The results from the regression analyses are presented in Table 29. The major statistics
reported are R2, which represents the cumulative proportion of the variance explained after
including a new predictor in the analysis; f2, which represents the effect size of each new
predictor entered in the analysis; Fseq, which represents the significance of each new predictor
for the explanation of CP2006; and SRW, which represents the standardised regression weight
(e.g., Hays, 1988, pp. 623-625) of each predictor. Because we reported the standardised
solution, intercept a was equal to zero and not reported in Table 29.
Each analysis demonstrated a significant contribution of CS2005 to the explanation of
TCP2006, when TCP2005 was accounted for (Fseq in Table 29). Furthermore, each analysis
yielded a positive effect for CS2005 on TCP2006 (SRW in Table 29). The similarity of the results
from the analyses demonstrates their robustness. Thus, hypothesis 7 was supported by the
results of the analyses.
The percentage explained variance of TCP2006 was 84% or more (R2 in Table 29) across
analyses. This result were almost completely due to TCP2005, which also had large effect size
(f2 in Table 29) in each analysis. Thus, current TCP was the main predictor of future TCP.
This result is in line with the results from former customer profitability analyses in the
financial services industry (e.g., Campbell & Frei, 2004; Terpstra, 2005, 2006b).
Tabl
e 29
: Res
ults
Fro
m H
iera
rchi
cal R
egre
ssio
n A
naly
ses E
stim
atin
g M
odel
1 (S
tand
ardi
sed
Solu
tion)
Com
plet
e da
tase
t (N
= 1
682)
R
educ
ed d
atas
et (N
= 1
644)
U
nwei
ghte
d da
tase
t W
eigh
ted
data
set
Unw
eigh
ted
data
set
Wei
ghte
d da
tase
t
ES
stat
istic
s
2R
f2
F seq
2
R
f2 F s
eq
2R
f2
F seq
2
R
F2 F s
eq
TCP 2
005
0.86
12
6.20
46
1042
2 (3 )
0.84
64
5.51
04
9255
(3 ) 0.
8619
6.
2411
10
246
(3 ) 0.
8466
5.
5189
90
61 (3 )
CS 2
005
0.86
20
0.00
60
10.2
0 (2 )
0.84
82
0.01
19
20.3
7 (3 )
0.86
26
0.00
51
8.46
(2 ) 0.
8482
0.
0105
17
.39
(3 )
MF
stat
istic
s
SSM
(df M
)
SSE
(df E
)
F-va
lue
SSM
(df M
)
SSE
(df E
)
F-va
lue
SSM
(df M
)
SSE
(df E
)
F-va
lue
SSM
(df M
)
SSE
(df E
)
F-va
lue
Full
mod
el
1449
(2)
232
(167
9)
5245
(3 ) 14
26
(2)
255
(167
9)
4691
(3 ) 14
17
(2)
226
(164
1)
5150
(3 ) 13
94
(2)
249
(164
1)
4584
(3 )
PV
stat
istic
s
SRW
SE
t-v
alue
SR
W
SE
t-val
ue
SRW
SE
t-v
alue
SR
W
SE
t-val
ue
TCP 2
005
0.92
62
0.00
91
101.
99(³)
0.
9152
0.
0096
95
.65(
³) 0.
9267
0.
0092
10
1.08
(³)
0.91
55
0.00
97
94.5
5(³)
CS 2
005
0.02
90
0.00
91
3.19
(2 ) 0.
0432
0.
0096
4.
51(3 )
0.02
67
0.00
92
2.91
(2 ) 0.
0404
0.
0097
4.
17(3 )
143
The
crite
rion
varia
ble
was
tran
sfor
med
cus
tom
er p
rofit
abili
ty in
Sep
tem
ber 2
006.
ES
stat
istic
s is
eff
ect s
ize
stat
istic
s;
is p
ropo
rtion
of v
aria
nce
expl
aine
d af
ter i
nclu
ding
the
pred
icto
r; f2 is
eff
ect s
ize;
Fse
q is
sequ
entia
l F-v
alue
; TC
P 200
5 is
trans
form
ed c
usto
mer
pro
fitab
ility
in S
epte
mbe
r 200
5; C
S 200
5 is
cust
omer
sa
tisfa
ctio
n in
Sep
tem
ber 2
005;
MF
stat
istic
s is
mod
el fi
t sta
tistic
s; S
S M is
mod
el su
m o
f squ
ares
; df M
is d
egre
es o
f fre
edom
use
d fo
r est
imat
ing
the
mod
el; S
S E
is e
rror
sum
of
squa
res;
df E
is d
egre
es o
f fr
eedo
m le
ft af
ter
estim
atin
g th
e m
odel
; ful
l mod
el is
the
mod
el in
clud
ing
all p
redi
ctor
var
iabl
es; P
V s
tatis
tics
is
pred
icto
r var
iabl
e st
atis
tics;
SRW
is s
tand
ardi
sed
regr
essi
on w
eigh
t; SE
is s
tand
ard
erro
r of r
egre
ssio
n w
eigh
t; (¹)
= s
igni
fican
t at p
<0.0
5; (²
) = s
igni
fican
t at
p<0.
01; (
³) =
sign
ifica
nt a
t p<0
.001
.
2R
5 Relation between customer satisfaction and future CP with a time-lag of two years
The test of hypothesis 7 demonstrated that customer satisfaction was positively related to
future CP. It is unknown how a time lag larger than one year between measurements of
customer satisfaction and future CP affects the relation between customer satisfaction and
future CP. This warrants further research into the relation between customer satisfaction and
future CP. We investigated the relationship of customer satisfaction and future CP on
available data pertaining to a two-year time-lag.
Method
Because CP2005 and CP2007 were skewed and included many outliers, we applied a logarithmic
transformation to CP2005 and CP2007 (Section 2). The logarithmically transformed CP2005 was
denoted TCP2005 and the logarithmically transformed CP2007 was denoted TCP2007 (Section 2).
We regressed TCP2007 on TCP2005 and CS2005. TCP2007’ is the predicted value of TCP2007, a is
the intercept, b1 is the effect of TCP2005 on TCP2007, and b2 is the effect of CS2005 on TCP2007.
The regression model was:
TCP2007’ = a + b1TCP2005 + b2CS2005. (Model 2)
We used hierarchical regression analyses (e.g., Cohen & Cohen, 1983, pp. 120-122; Hays,
1988, pp. 662-665; Tabachnick & Fidell, 2007, pp. 138-147) to compute the contribution of
each predictor to the explanation of TCP2007. Because we expected that TCP2005 accounted for
the largest part of TCP2007, TCP2005 was entered first in the analyses and CS2005 was entered
second. In order to explore the robustness of the results, we estimated the model with and
without weighting of participants, and with and without outliers. Thus, we did four regression
analyses.
Results
The results are reported in Table 30. Because we reported the standardised solutions, intercept
a was equal to zero and not reported in Table 30. Each analysis demonstrated a significant
contribution of CS2005 to the explanation of TCP2007, when TCP2005 was accounted for.
Furthermore, each analysis yielded a positive effect for CS2005. The similarity of the results
from the analyses demonstrates their robustness. Thus, there is evidence of a relation between
customer satisfaction and future TCP, when future TCP is measured with a time lag of two
years.
144
Tabl
e 30
: Res
ults
Fro
m H
iera
rchi
cal R
egre
ssio
n A
naly
ses E
stim
atin
g M
odel
2 (S
tand
ardi
sed
Solu
tion)
Com
plet
e da
tase
t (N
= 1
682)
R
educ
ed d
atas
et (N
= 1
644)
U
nwei
ghte
d da
tase
t W
eigh
ted
data
set
Unw
eigh
ted
data
set
Wei
ghte
d da
tase
t
ES
stat
istic
s
2R
f2
F seq
2
R
f2 F s
eq
2R
f2
F seq
2
R
f2 F s
eq
TCP 2
005
0.64
56
1.82
17
3060
(3 ) 0.
6351
1.
7405
29
24 (3 )
0.64
49
1.81
61
2982
(3 ) 0.
6341
1.
7330
28
45 (3 )
CS 2
005
0.64
83
0.00
77
13.0
0 (3 )
0.63
82
0.00
86
14.5
1 (3 )
0.64
74
0.00
69
11.4
2 (3 )
0.63
68
0.00
75
12.3
5 (3 )
MF
stat
istic
s
SSM
(df M
)
SSE
(df E
)
F-va
lue
SSM
(df M
)
SSE
(df E
)
F-va
lue
SSM
(df M
)
SSE
(df E
)
F-va
lue
SSM
(df M
)
SSE
(df E
)
F-va
lue
Full
mod
el
1090
(2)
591
(167
9)
1548
(3 ) 10
73
(2)
608
(167
9)
1481
(3 ) 10
64
(2)
579
(164
1)
1506
(3 ) 10
46
(2)
597
(164
1)
1438
(3 )
PV
stat
istic
s
SRW
SE
t-v
alue
SR
W
SE
t-val
ue
SRW
SE
t-v
alue
SR
W
SE
t-val
ue
TCP 2
005
0.80
03
0.01
45
55.2
0 (3 )
0.79
07
0.01
47
53.5
3 (3 )
0.80
00
0.01
47
54.4
7 (3 )
0.79
02
0.01
50
52.7
7 (3 )
CS 2
005
0.05
23
0.01
45
3.61
(3 ) 0.
0563
0.
0147
3.
81 (3 )
0.04
96
0.01
47
3.38
(3 ) 0.
0526
0.
0150
3.
51 (3 )
145
The
crite
rion
varia
ble
was
tran
sfor
med
cus
tom
er p
rofit
abili
ty in
Sep
tem
ber 2
007.
ES
stat
istic
s is
eff
ect s
ize
stat
istic
s;
is p
ropo
rtion
of v
aria
nce
expl
aine
d af
ter i
nclu
ding
the
pred
icto
r; f2 is
eff
ect s
ize;
Fse
q is
sequ
entia
l F-v
alue
; TC
P 200
5 is
trans
form
ed c
usto
mer
pro
fitab
ility
in S
epte
mbe
r 200
5; C
S 200
5 is
cust
omer
sa
tisfa
ctio
n in
Sep
tem
ber 2
005;
MF
stat
istic
s is
mod
el fi
t sta
tistic
s; S
S M is
mod
el su
m o
f squ
ares
; df M
is d
egre
es o
f fre
edom
use
d fo
r est
imat
ing
the
mod
el; S
S E
is e
rror
sum
of
squa
res;
df E
is d
egre
es o
f fr
eedo
m le
ft af
ter
estim
atin
g th
e m
odel
; ful
l mod
el is
the
mod
el in
clud
ing
all p
redi
ctor
var
iabl
es; P
V s
tatis
tics
is
pred
icto
r va
riabl
e st
atis
tics;
SRW
is s
tand
ardi
sed
regr
essi
on w
eigh
t; SE
is s
tand
ard
erro
r of
sta
ndar
dise
d re
gres
sion
wei
ght;
(¹) =
sig
nific
ant a
t p<0
.05;
(²)
= si
gnifi
cant
at p
<0.0
1; (³
) = si
gnifi
cant
at p
<0.0
01.
2R
The computation of the predicted values for TCP2007 on the basis of the unstandardised
solutions (not shown here), and the exponential transformation of the predicted values for
TCP2007 to predicted values for CP2007, demonstrated that the impact of CS2005 on the
predicted value for CP2007 was dependent on the value for TCP2005. For customers having a
small value for TCP2005, the score for CS2005 had almost no impact on the predicted value for
CP2007, while for customers having a large value for TCP2005, the score for CS2005 had a
substantial impact on the predicted value for CP2007. This result may be due to using
logarithmically transformed values for CP2005 and CP2007 in the regression analyses, but we
consider it a plausible result which is is in agreement with the opinion in marketing that it is
important to keep profitable customers satisfied.
6 Discussion
The first empirical study demonstrated that the set of nine items reflecting customer
satisfaction constituted a strong ( ) scale according to the MH-model. Furthermore, the
study demonstrated several strengths and weaknesses of the measurement instrument for
customer satisfaction and the corresponding scale scores. A first strength is the explicit and
implicit definitions of customer satisfaction underlying the measurement instrument. All
aspects of customer satisfaction were evenly represented in the instrument, and this supports
the claim that the scale scores cover the meaning of customer satisfaction well. A second
strength is the fit of the measurement model. The tests of the model yielded no substantial
violations of the MH model, which supports the use of the scale scores to measure customer
satisfaction. Because the measurement instrument was composed of items that were indicative
of customer satisfaction and items that were counter-indicative of the construct, the fit of the
measurement model also confirms the conception of customer satisfaction as the opposite of
customer dissatisfaction on a bipolar dimension. A third strength is the fit of the measurement
model in the subgroups based on customer segment, age, and gender. This supports the
generalisability of the scale across subgroups in the target population. A fourth strength is that
the inclusion of items that are indicative and items that are counter-indicative of customer
satisfaction in the measurement instrument seems to limit the effects of aquiescent responding
on the scale scores (e.g., Baumgartner & Steenkamp, 2001, 2006, Van Herk, 2000, p.55). A
fifth strength of the scale is that the scale is composed of a large number of items, which
limited the effect of a biased item on the scale score. Lack of bias also supports the
confidence in the validity of the scale-score interpretations.
5.≥H
146
The major weakness of the scale scores was their divergent validity. The tests of the
hypotheses regarding concept-related irrelevant variance revealed that the customer
satisfaction scores were contaminated by quality and customer loyalty. This was due to the
items Q3a (At BANK I feel at home; Table 4) and Q4d (Last year I had some problems with
BANK; Table 4). For this reason, the scale had to be modified for research into the
connections of customer satisfaction with these constructs. A point of concern were the
outliers in the left-skew tail of the distribution of the customer satisfaction scores. It cannot be
ruled out that the outliers were due to stylistic responding.
The analyses into relations between customer satisfaction and future CP with a time lag
of two years yielded some important results. It was demonstrated that the influence of
customer satisfaction on customer profitability lasts for at least two years. This warrants
further research into the effect of customer satisfaction on the cumulated customer
profitability. Furthermore, a comparison of the results of the analyses predicting future CP
with a time lag of one year (Table 29) and the analyses predicting future CP with a time lag of
two years (Table 30) reveals that the influence of current CP on future CP decreases when the
time lag between the measurements of current CP and future CP increases. The decaying
implies that, in the long run, companies cannot take the future CP of existing customers for
granted. It also implies that it may be dangerous to estimate customer lifetime value by solely
using current CP. Based on this research, not only current CP should be used for the
estimation of customer lifetime value, but for example also customer satisfaction and
customer loyalty.
Six additional remarks are in order. First, the items indicative of customer satisfaction
were all negatively skewed, and the items counter-indicative of customer satisfaction were all
positively skewed. This is in agreement with the results found in various satisfaction studies
in various domains (e.g., Oliver, 1997; Peterson & Welson, 1992), and suggests that being
satisfied is more or less the default satisfaction state of most persons. Second, the correlation
between customer satisfaction and trust was found to be very high, and matched the
correlation between customer satisfaction and the score on the ACSI. This indicates that there
is a large overlap between the construct of customer satisfaction and the construct of trust in
the context of retail banking. Third, current customer profitability had a large effect on future
customer profitability. Therefore we recommend including current customer profitability as a
predictor in regression models of future customer profitability in the financial services
industry (see also Donkers, Verhoef, & De Jong, 2007). Fourth, the results of the analyses in
the complete dataset and the reduced dataset were nearly similar. Thus, the outliers on the
147
148
items reflecting customer satisfaction, trust, customer loyalty, and interest did not influence
the results of the data analyses substantially. Fifth, the effect sizes for customer satisfaction on
future CP were small. This may be due to the omission of important predictors, such as the
total financial means of a customer (Chapter 3), in the regression analyses (e.g., Hays, 1988,
p. 655). Therefore we suggest including measurements of the total financial means of
customers in future research into the influence of customer satisfaction on future CP. Sixth,
the generalisability of the results of the study into the relation between customer satisfaction
and future CP has to be investigated. The sample was drawn from the research panel of the
company, and it cannot be ruled out that persons who were willing to participate in the panel
have a different attitude towards banking than persons who were not willing to participate in
the panel, and that the attitude towards banking influences the relation between customer
satisfaction and future CP. Therefore, we advocate research into the generalisability of the
results of the present study to other groups and companies within the financial service
industry.
7 Conclusion
So far, the results of the first empirical study yielded much evidence for construct validity,
meaning that the results warrant the interpretation of the scale scores in terms of satisfaction
with the company. However, the validation study was not completed because two hypotheses
regarding the contamination of scale scores by method related irrelevant variance were not
tested. These hypotheses were tested in the second empirical study (Chapter 8). Because the
test of these hypotheses yielded further information about the meaning of the scale scores in
the first empirical study, we prefer to present the final conclusions about the validity of
measurement after the presentation of the results of the second empirical study.
149
150
151
Chapter 7
Method of the second empirical study into customer satisfaction with
BANK
1 Introduction
The purpose of the second empirical study into customer satisfaction with BANK was to test
hypothesis 14 (i.e., the satisfaction scores are not affected by the midpoint response style) and
hypothesis 15 (i.e., the satisfaction scores are not affected by the extreme response style).
Testing these hypotheses required the measurement of (a) customer satisfaction, (b) general
midpoint responding, and (c) general extreme responding. We decided to operationalise
customer satisfaction on the basis of the 9-item measurement instrument (see Chapter 5),
because it was our purpose to combine the conclusions of the second empirical study with
those of the first empirical study. Furthermore, we decided to operationalise general midpoint
responding as a participant’s proportion of responses in the middle response category of
rating scales of items, and general extreme responding as a participant’s proportion of
responses in the extreme response categories of rating scales of items (Chapter 8).
Greenleaf (1992b), Van Herk (2000), and Baumgartner and Steenkamp (2001, 2006)
noted that measures of general midpoint responding and general extreme responding have to
be based on persons’ responses to many items with low inter-item correlations. This is in
agreement with Paulhus’ (1991, p. 49) remark that that persons exhibiting consistent extreme
response behaviour across time and stimuli may be said to have an extreme response style.
For this reason, Greenleaf (1992b) and Van Herk (2000) operationalised extreme response
style as a participant’s proportion of responses in the extreme response categories of rating
scales of various items. Generalising Paulhus’ (1991, p. 49) remark to midpoint responding,
persons exhibiting a consistent midpoint response behaviour across time and stimuli may be
said to have a midpoint response style. The midpoint response style may be operationalised as
a participant’s proportion of responses in the middle response category of rating scales of
various items.
Dependence of the operationalisations of response styles on operationalisations of the
construct of interest would complicate research into the contamination of measurements of the
construct of interest by response styles (Oort, 1996, pp. 13-14). For example, assume that the
measurement of general extreme responding was done on the basis of items reflecting the
construct of interest. Then a high score on general extreme responding can be achieved by
answering positively to the items indicative and negatively to the items counter-indicative of
the construct of interest. In that instance, a high measurement value for general extreme
responding might reflect a high preference for extreme responding as well as a high value on
the construct of interest, and these two possibilities cannot be distinguished. To prevent that
measurements of general midpoint responding and general extreme responding partly reflect
customer satisfaction, the items used for the former measurements had to be unrelated to
customer satisfaction. For this reason, we decided to measure four constructs, which we
expected to be unrelated to customer satisfaction, and to use the items reflecting these
constructs to compose the measures for stylistic responding. The constructs were (a)
expectations with respect to personal spending power, (b) expectations with respect to the
Dutch economy, (c) involvement with banking matters, and (d) understanding of the Dutch
banking market. Because the response format of items may affect stylistic responding (Van
Herk, 2000, p. 59), we used identical response formats for all items used in the study.
The second empirical study was conducted in August 2007, which was approximately
two years after the first empirical study. This chapter discusses the method used in the second
empirical study. It encompasses an outline of the operationalisations of the constructs, the
questionnaire, the target population, the sample, the procedure, and the data.
2 Operationalisations
The design of the questionnaire, the format of the items, and the wording of the items were
based upon general principles concerning survey research as formulated by Sudman and
Bradburn (1982), Sheatsley (1983), Belson (1986), and Dillman et al. (1998). All items used
were 5-point rating scale items. Similar to the first empirical study, we included a no answer
option in the response options of the items, and we varied the ordering of items within the
groups of items reflecting a construct, across different administrations of the questionnaire.
The operationalisations of the five constructs were the following.
Customer satisfaction
Customer satisfaction was operationalised by means of nine Likert items with five ordered
response categories each, ranging from totally agree (which was scored 4) to totally disagree
152
(which was scored 0) (Chapter 5; Table 1). Also in the sample used in the second study, we
expected the nine items to constitute a scale according to the MH model.
Expectations with respect to personal spending power
The customers’ positive expectations with respect to personal spending power (EPSP) were
measured using two items reflecting this concept (Table 1). Each item had five ordered
response categories that ranged from totally agree (which was scored 4) to totally disagree
(which was scored 0). We expected the two items to be negatively correlated.
Table 1: Items Reflecting Expectations Regarding Personal Spending Power Code Item Score range
Q6a I expect that my spending power will increase next year 0 - 4
Q6d* In five years my spending power will be lower than today 0 - 4
*= item is counter-indicative of the concept
Expectations with respect to the Dutch economy
The customers’ positive expectations with respect to the Dutch economy (EDE) were
measured using two items reflecting this concept (Table 2). Each item had five ordered
response categories that ranged from totally agree (which was scored 4) to totally disagree
(which was scored 0). We expected the two items to be negatively correlated.
Table 2: Items Reflecting Expectations Regarding the Dutch Economy Code Item Score range
Q7b* I expect that the Dutch economy will decrease next year 0 - 4
Q7c In five years, the Dutch economy will be better than today 0 - 4
*= item is counter-indicative of the concept
Involvement with banking matters
The customers’ involvement with banking matters (labeled involvement) was measured using
four items reflecting this concept (Table 3). Each item had five ordered response categories
that ranged from totally agree (which was scored 4) to totally disagree (which was scored 0).
We expected the four items to be positively correlated after having been scored in the same
direction.
153
Table 3: Items Reflecting Involvement With Banking Matters Code Item Score range
Q8b I find banking matters very important 0 - 4
Q8c Arranging banking matters properly makes life easier 0 - 4
Q8d* I find banking matters boring 0 - 4
Q8e* Banking matters leave me cold 0 - 4
*= item is counter-indicative of the concept
Understanding of the Dutch banking market
The customers’ understanding of the Dutch banking market (labeled understanding) was
measured using four items reflecting this concept (Table 4). Each item had five ordered
response categories that ranged from totally agree (which was scored as 4) to totally disagree
(which was scored as 0). We expected the four items to be positively correlated after the
correct scoring.
Table 4: Items Reflecting Understanding of the Dutch Banking Market Code Item Score range
Q9a I know the pros and cons of the retail banks in the Netherlands 0 - 4
Q9b* I find it difficult to judge the quality of BANK 0 - 4
Q9c* I find it difficult to compare the quality of retail banks 0 - 4
Q9d I know exactly what I may expect from BANK 0 - 4
*= item is counter-indicative of the concept
3 The questionnaire
The questionnaire (Appendix 3; in Dutch) was composed of the items reflecting customer
satisfaction, EPSP, EDE, involvement, and understanding. In addition, some items were
included in the questionnaire for business purposes, and some other items were included to
optimise the design of the questionnaire. For example, several items regarding product
possession and contacts with the company were included in order to elicit the participant’s
memories of the company, before the measurement of satisfaction with the company started.
The questionnaire was pre-tested in a small sample (N = 3). The pre-tests demonstrated that it
took a participant approximately 15 minutes to complete the questionnaire, which we
considered acceptable.
154
4 Procedure
The survey was administered via the Internet to the members of the company’s research
panel. The comparability of the target population (i.e., mature retail customers of a Dutch
bank), the research panel, and the final sample is discussed shortly. Panel members were
invited by E-mail to participate in the survey. The questionnaire was made available at a site
of the market research agency that managed the survey. The questionnaire was accessible
from 24 August 2007 until 3 September 2007. The persons had access to the site on the basis
of a password and were identified on the basis of a customer-id. After a person completed the
questionnaire, the data were uploaded to the agency. The participants received a small
incentive (i.e., saving points valued 10 euro). This is the common fee that the company paid
to panel members that responded to a survey of medium length.
5 Data
The research agency yielded a file containing the raw data, which were the coded responses of
the participants to the survey items (note that a no answer response was scored as a missing
value). In order to enrich the raw data, the file was merged with the marketing database. The
merging was executed on the basis of customer-id, and it was successful for all participants.
Subsequently, three variables were added to the file, (a) customer segment ultimo September
2007, (b) gender, and (c) age ultimo September 2007.
6 Target population, panel, and sample
Similar to the first empirical study, the target population consisted of the mature retail
customers of a Dutch bank. The participants were registered by the company as the primary
owner of at least one banking product provided by the company.
A total of 2972 persons were invited to participate in the survey. They were mature
retail customers who, in August 2007, participated in the company’s research panel. The
panel members had agreed to participate in marketing research via the Internet. The
agreement encompassed that (a) the company is free to approach the person for marketing
research, (b) the person is free to participate in the research or to decline, (c) the company is
allowed to use the survey data for research purposes only, and (d) the company is not allowed
to distribute any personalised data to third parties. All panel members could be approached by
E-mail, and had a unique customer-id that was used for identification purposes.
155
156
The research panel differed in three ways from the target population. First, because the
company’s most valuable customers were overrepresented in the research panel, the panel
differed significantly (χ2(2) = 1244, p < 0.001) from the target population with respect to the
distribution of customer segment (Table 5). Second, the panel differed significantly (χ2(2) =
212, p < 0.001) from the target population with respect to the distribution of gender. Males
were overrepresented in the panel (see Table 5). This was partly due to the overrepresentation
of males among the segment Top Customers (i.e., the segment that was overrepresented in the
research panel), and partly to unknown causes. Third, the panel differed significantly (χ2(2) =
191, p < 0.001) from the target population with respect to the distribution of age group (Table
5).
The response rate in the study was approximately 41% (N = 1227). Table 5 shows the
distributions of customer segment, gender, and age group within the company, the panel, and
the research sample. The research sample differed significantly from the target population
with respect to customer segment (χ2(2) = 710, p < 0.001), gender (χ2(2) = 144, p < 0.001),
and age group (χ2(2) = 110, p < 0.001). Furthermore, the research sample differed
significantly from the remainder of the panel with respect to customer segment (χ2(2) = 30, p
< 0.001), gender (χ2(2) = 14, p < 0.001), and age group (χ2(2) = 22, p < 0.001). Thus,
respondents differed significantly from non-respondents with respect to customer segment,
gender, and age group. This was in line with the first empirical study (see Chapter 5).
Table 5: Distribution (Percentages) of Customer Segment, Gender and Age Group in the Study Company Panel Sample
Customer segment
Top 34 64 70
Standard 41 25 22
Development 25 11 8
Gender
Female 44 33 29
Male 52 65 69
Unknown 4 2 2
Age group
18 to 39 years 34 26 22
40 to 59 years 37 49 50
60 years and older 29 25 28
157
158
159
Chapter 8
Results of the second empirical study into customer satisfaction with BANK
1 Introduction
This chapter presents and discusses the results of the second empirical study in which
hypothesis 14 (i.e., the customer satisfaction scores are not affected by the midpoint response
style) and hypothesis 15 (i.e., the customer satisfaction scores are not affected by the extreme
response style) were investigated. First, we discuss preliminary analyses of which the purpose
was to prepare the data for the subsequent analyses. Second, we discuss measurement
analyses, which aimed at checking whether the MH model fitted the items for customer
satisfaction and at constructing scales for stylistic responding. Third, we discuss the results of
the tests of hypotheses 14 and 15. Fourth, we discuss the generalisability of the results. Fifth,
based on both empirical studies, we discuss the conclusions regarding the validity of
measurement of customer satisfaction.
2 Preliminary data analyses
The dataset containing the raw data was converted into a SAS dataset, and the items that were
assumed to be counter-indicative of the constructs (see the description of the measurement
instruments in Chapter 7) were recoded in the opposite direction. Furthermore, we (a)
examined the distribution characteristics of the variables in the dataset, (b) explored the data
quality, (c) conducted missing data analyses, and (d) conducted outlier analyses.
To examine the distribution characteristics of the variables, we computed the
histograms and descriptive statistics of all variables in the dataset. For this purpose, proc
univariate (SAS STAT) and proc means (SAS STAT) were used. The histograms (not shown
here) demonstrated that all variables were single peaked, and that many were negatively
skewed. This finding was corroborated by descriptive statistics (Table 1).
The correlations between the items reflecting customer satisfaction with the retail bank
and the items reflecting expectations regarding personal spending power (EPSP) were
examined. For this purpose, proc corr (SAS STAT) was used. Following our expectations, (a)
the items reflecting customer satisfaction were highly correlated, (b) the items reflecting
EPSP were highly correlated, and (c) the items reflecting customer satisfaction and the items
reflecting EPSP were almost uncorrelated (Table 2). This result suggested that participants did
not respond randomly but instead responded to the items’ content.
Because it was required that the items reflecting EPSP, expectations regarding the
Dutch economy (EDE), involvement with banking matters (involvement), and understanding
of the Dutch banking market (understanding) were unrelated to customer satisfaction, the
correlations between the items reflecting these constructs and the items reflecting customer
satisfaction were computed. Table 3 shows that two items reflecting understanding (i.e., Q9b:
I find it difficult to judge the quality of BANK and Q9d: I know exactly what I may expect from
BANK; Table 1) correlated substantially with the items reflecting customer satisfaction. The
other items reflecting understanding, and the items reflecting EPSP, EDE, and involvement
were almost uncorrelated with the items reflecting customer satisfaction. This result
strengthened our confidence in the usefulness of the data for the purpose of the second
empirical study, which was the testing of hypotheses 14 and 15.
The items reflecting customer satisfaction, EPSP, EDE, involvement, and
understanding showed few missing data (i.e., 5% or less; see Table 1). Thus, following the
strategy explained in Chapter 6, item scores were imputed by means of method TW-E
(Bernaards & Sijtsma, 2000; Van Ginkel et al., 2007). As expected, the descriptive statistics
of the items before imputation were almost identical to the descriptive statistics of the items
after imputation. Some participants (N = 41) left more than 50 percent of the items reflecting
customer satisfaction, EPSP, EDE, involvement, or understanding unanswered (Table 4).
These participants were considered outliers, and we created indicator variables to mark them
in the dataset.
To detect multivariate outliers, the leverage statistic (see Chapter 6) was computed by
means of a regression analysis using customer-id as the criterion variable, and 21 items
reflecting customer satisfaction, EPSP, EDE, involvement, and understanding as the predictor
variables (Tabachnick & Fidell, 2007, pp. 75-76, 111-112). The analysis yielded 60
participants with a significant (p < 0.001) leverage value. Visual inspection of the data
revealed that these participants tended to give extremely positive or extremely negative
responses to the items. Furthermore, the inspection demonstrated that the two participants
with the highest leverage value had alternated extremely positive and extremely negative
responses to different items having similar content.
It was suspected that the two participants with the highest leverage value had responded
inconsistently to the items. An indicator variable was created to mark them in the dataset. This
variable was joined with the variables marking the participants who left the majority of items
160
reflecting a particular construct unanswered (Table 4). The union of these variables marked
43 outliers in the dataset. In agreement with the first empirical study, the results from analyses
with outliers and analyses without outliers were examined. Henceforth, the dataset including
these outliers is labeled the complete dataset, and the dataset without these outliers is labeled
the reduced dataset.
Table 1: Descriptive Statistics of Items Reflecting Customer Satisfaction, EPSP, EDE, Involvement, and Understanding (Before Imputation; N = 1227) Code Label Nmiss Mean SD Skewness
Customer satisfaction items Q3a At BANK I feel at home 1 2.88 0.81 -0.82
Q3b I am satisfied with BANK 0 2.86 0.81 -1.12
Q3d* There are good reasons to leave BANK 7 2.93 1.05 -0.85
Q3e* I have mixed feelings about BANK 6 2.73 1.07 -0.62
Q3g BANK meets all my requirements for a bank 3 2.62 0.95 -0.85
Q4a Last year I had a pleasant relationship with BANK 4 2.75 0.80 -0.74
Q4b BANK has met my expectations 1 2.69 0.90 -1.02
Q4c* I have regretted my choice for BANK 8 3.21 0.84 -1.09
Q4d* Last year I had some problems with BANK 8 2.91 1.05 -0.89
EPSP items Q6a I expect that my spending power will increase next year 28 1.90 0.89 -0.07
Q6d* In five years my spending power will be lower than today 27 2.17 1.00 -0.14
EDE items Q7b* I expect that the Dutch economy will decrease next year 22 2.26 0.83 -0.33
Q7c In five years, the Dutch economy will be better than today 28 2.09 0.79 -0.16
Involvement items Q8b I find banking matters very important 0 2.74 0.75 -0.75
Q8c Arranging banking matters properly makes life easier 5 2.96 0.57 -1.04
Q8d* I find banking matters boring 0 2.58 0.89 -0.43
Q8e* Banking matters leave me cold 3 2.99 0.77 -0.79
Understanding items Q9a I know the pros and cons of the retail banks in the Netherl. 21 1.89 0.88 0.01
Q9b* I find it difficult to judge the quality of BANK 3 2.43 0.87 -0.44
Q9c* I find it difficult to compare the quality of retail banks 8 1.71 0.95 0.39
Q9d I know exactly what I may expect from BANK 5 2.54 0.77 -0.75
*= scored reversely
161
Table 2: Correlations Between 2 Items reflecting Customer Satisfaction and 2 Items Reflecting EPSP Q3a Q3b Q6a Q6d*
At BANK I feel at home Q3a 0.74 -0.04 0.03
I am satisfied with BANK Q3b -0.05 0.02
I expect that my spending power will increase next year Q6a 0.62
In five years my spending power will be lower than today Q6d*
* = scored reversely Table 3: Correlations Between Items Reflecting Customer Satisfaction (Columns) and Items Reflecting Other Constructs (Rows) Q3a Q3b Q3d Q3e Q3g Q4a Q4b Q4c Q4d
Q6a -0.04 -0.05 -0.07 -0.04 -0.03 0.03 -0.02 -0.03 0.00
Q6d 0.04 0.02 0.00 0.01 0.03 0.08 0.04 0.04 0.05
Q7b 0.06 0.05 0.08 0.09 0.03 0.07 0.06 0.05 0.05
Q7c 0.04 0.03 0.03 0.06 0.03 0.04 0.04 0.02 0.03
Q8b 0.08 0.01 0.03 -0.01 -0.02 0.04 -0.02 0.04 -0.03
Q8c 0.12 0.08 0.09 0.06 0.10 0.11 0.08 0.13 0.03
Q8d 0.11 0.05 0.06 0.09 0.01 0.07 0.04 0.08 0.04
Q8e 0.06 -0.03 0.02 0.02 -0.04 0.04 -0.03 0.05 0.00
Q9a -0.08 -0.09 -0.10 -0.09 -0.13 -0.09 -0.12 -0.08 -0.10
Q9b 0.24 0.15 0.14 0.18 0.14 0.18 0.15 0.20 0.10
Q9c -0.04 -0.03 -0.03 -0.01 -0.06 -0.06 -0.06 -0.02 -0.04
Q9d 0.44 0.44 0.35 0.40 0.42 0.43 0.45 0.40 0.35
For the legenda see Table 1
Table 4: Number of Participants Leaving More Than Half of the Items Unanswered Customer
satisfaction
EPSP EDE Involvement Understanding
N 0 25 21 0 2
162
3 Mokken scale analysis of customer satisfaction
Customer satisfaction was operationalised using the measurement instrument presented in
Chapter 5 (Chapter 5; Table 1). In the first empirical study (Chapter 6), it was demonstrated
that the nine items constituted a scale according to the MH model. We hypothesised that the
nine items also constituted a scale according to the MH model in the second empirical study.
To test this hypothesis, Mokken scale analysis was done using MSPwin5.0 (Molenaar
& Sijtsma, 2000). First, the dimensionality of the item set was investigated using the
confirmatory strategy (Chapter 4). Second, the assumption of monotonicity was investigated
(Chapter 4). Third, the scale-scores statistics were computed (Chapter 4). Fourth, the
scalability of the item set within distinct customer segments, gender groups, and age groups
(Chapter 7) was investigated. Fifth, univariate analyses of variance were done to test whether
subgroups differed significantly with respect to the scale scores. For this purpose, proc GLM
(SAS STAT) was used. Sixth, in order to examine the effect of outliers on the results, the
analyses were repeated with the reduced dataset (i.e., the dataset without outliers, see Section
2).
Confirmatory Mokken scale analyses (item selection method = Test) demonstrated that
the nine items constituted a strong Mokken scale with a total-scale scalability coefficient H
equal to 0.67 and a reliability coefficient rho equal to 0.93 (Table 5). The lowest item
scalability coefficient Hi was equal to 0.57, which is well above the default lowerbound for Hi
used in exploratory analyses (i.e., lowerbound Hi = 0.3). The check for item monotonicity on
the basis of the default options in MSPwin5.0 (i.e., Minvi = 0.03 and Minsize = 122, which is
10 percent of the sample) did not reveal violations of the assumption of monotonicity. This
means that the ISRF’s of all items increased across all rest-score groups. Thus, the MH model
fitted the data well.
The customer satisfaction scale-score distribution is presented in Figure 1. The
distribution was significantly skewed to the left (p < 0.001). Furthermore, the histogram
demonstrates peaks for the scale-scores 27, 31, and 36. The peak for the scale-score 27 was
mainly caused by participants who agreed with all items indicative of customer satisfaction,
and disagreed with all items counter-indicative of customer satisfaction (i.e., 66 percent of the
participants having scale-score 27 responded agree to the five items indicative of customer
satisfaction and disagree to the four items counter-indicative of customer satisfaction). The
peak for scale-score 31 was mainly caused by participants who agreed with all items
indicative of customer satisfaction, and strongly disagreed with all items counter-indicative of
customer satisfaction (i.e., 65 percent of the participants having scale-score 31 responded
163
164
agree to the five items indicative of customer satisfaction and totally disagree to the four
items counter-indicative of customer satisfaction). The peak for scale-score 36 was caused by
participants who strongly agreed with all items indicative of customer satisfaction, and
strongly disagreed with all items counter-indicative of customer satisfaction, because scale-
score 36 could only be achieved by responding totally agree to the five items indicative of
customer satisfaction and totally disagree to the four items counter-indicative of customer
satisfaction. It may be noted that the distribution of scale scores in the first empirical study did
not contain sharp peaks for the scale-scores 27, 31, and 36 (Chapter 6, Figure 3). This result is
further discussed in Section 5 of the present chapter.
Mokken scale analyses using the grouping variables customer segment (valued Top
Customers, Standard Customers, and Development Customers; see Chapter 7), gender (valued
female, male, and missing), and age group (valued 18 to 39 years, 40 years to 59 years, and 60
years onwards; see Chapter 7) demonstrated that the nine items also constituted a strong
Mokken scale (i.e., H > 0.5) in each subgroup (Table 5). The checks for item monotonicity in
the subgroups yielded a significant violation of the assumption of monotonicity in the
subgroup Top Customers. This violation was due to a decrease of the estimated ISRF for Q4c
>= 1 (I have regretted my choice for BANK; Table 1). Because the magnitude of the violation
was small (i.e., the proportion of responses Q4c >= 1 decreased from 1.00 in the middle rest-
score group to 0.97 in the highest rest-score group), we considered it unimportant, and we
concluded that the MH model fitted the data in subgroups well enough.
Univariate analyses of variance demonstrated that the customer segments and age
groups differed significantly with respect to the customer satisfaction scale-scores (Table 6).
Furthermore, the histograms (not shown here) demonstrated peaks for the scale-scores 27, 31,
and 36 in all subgroups investigated. Thus, the peaks cannot be attributed to particular
customer segments, gender groups, or age groups.
Tabl
e 5:
Cus
tom
er S
atis
fact
ion
Scal
e’s
Tota
l-Sca
le S
cala
bilit
y C
oeff
icie
nts
H, I
tem
Sca
labi
lity
Coe
ffic
ient
s H
i, an
d R
elia
bilit
y C
oeff
icie
nts
Rho
in th
e C
ompl
ete
Dat
aset
(N =
122
7)
To
tal g
roup
Cus
tom
er S
egm
ent
Gen
der
Age
gro
up
T S
D
F M
U
18
-39
40-5
9 60
+
At B
AN
K I
feel
at h
ome
0.
67
0.66
0.
68
0.70
0.
67
0.67
0.
67
0.72
0.
66
0.64
I am
satis
fied
with
BA
NK
0.
75
0.74
0.
73
0.80
0.
76
0.74
0.
79
0.76
0.
74
0.75
Ther
e ar
e go
od re
ason
s to
leav
e B
AN
K *
0.
66
0.65
0.
67
0.69
0.
65
0.66
0.
78
0.67
0.
65
0.66
I hav
e m
ixed
feel
ings
abo
ut B
AN
K *
0.
67
0.67
0.
63
0.70
0.
63
0.67
0.
78
0.71
0.
64
0.67
BA
NK
mee
ts a
ll m
y re
quire
men
ts fo
r a b
ank
0.70
0.
70
0.65
0.
71
0.66
0.
70
0.77
0.
72
0.68
0.
70
Last
yea
r I h
ad a
ple
asan
t rel
atio
nshi
p w
ith B
AN
K
0.68
0.
68
0.66
0.
71
0.68
0.
67
0.77
0.
70
0.68
0.
66
BA
NK
has
met
my
expe
ctat
ions
0.
72
0.70
0.
71
0.78
0.
72
0.71
0.
84
0.71
0.
71
0.72
I hav
e re
gret
ted
my
choi
ce fo
r BA
NK
*
0.66
0.
64
0.66
0.
75
0.67
0.
65
0.87
0.
69
0.63
0.
67
Last
yea
r I h
ad so
me
prob
lem
s with
BA
NK
*
0.57
0.
58
0.54
0.
55
0.57
0.
57
0.57
0.
62
0.56
0.
55
H
0.67
0.
67
0.66
0.
71
0.67
0.
67
0.76
0.
70
0.66
0.
67
Rho
0.93
0.
93
0.93
0.
95
0.93
0.
93
0.96
0.
95
0.93
0.
93
165
* =
scor
ed re
vers
ely
Tabl
e 6:
Cus
tom
er S
atis
fact
ion
Scor
es in
the
Com
plet
e D
atas
et (N
= 1
227)
Cus
tom
er se
gmen
t G
ende
r A
ge g
roup
To
tal
T
S D
Fem
ale
Mal
e U
nkno
wn
18
-39
40-5
9 60
+
Mea
n 25
.91
25.5
1 23
.24
25
.98
25.4
6 25
.04
24
.69
25.5
4 26
.39
25
.60
F
7.34
0.88
4.97
p
<0.0
01
0.
42
<0
.01
freq
0
50
100
150
200
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
freq
Figure 1: Distribution of customer satisfaction scores in the complete dataset (N = 1227, mean = 25.60, SD = 6.66, and skewness = -0.86) The analyses of the reduced dataset yielded similar results as the analyses of the
complete dataset. Confirmatory Mokken scale analyses (item selection method = Test) yielded
a strong Mokken scale with a total-scale scalability coefficient H equal to 0.67 and a
reliability coefficient rho equal to 0.93 (Table 7). The check for item monotonicity on the
basis of the default options (i.e., Minvi = 0.03 and Minsize = 122, which is 10 percent of the
sample) did not reveal violations of the assumption of monotonicity. Thus, the MH model
fitted the data in the reduced dataset well.
Mokken scale analyses using the grouping variables customer segment, gender, and age
group yielded a strong Mokken scale (i.e., H > 0.5) in each subgroup (Table 7). The checks
for item monotonicity on the basis of the default options (i.e., Minvi = 0.03 and Minsize =
122, which is 10 percent of the sample) yielded a significant violation of the assumption of
monotonicity for item Q4c (Table 1) in the segment Top Customers, but the magnitude of the
violation was small. Therefore, we considered it unimportant, and we concluded that the MH
model fitted the data in the subgroups well enough.
Figure 2 shows the customer satisfaction scale-score distribution. The distribution was
significantly skewed to the left (p < 0.001), and there were peaks for scale-scores 27, 31, and
36 (66 percent of the participants having scale-score 27 responded agree to the five items
indicative of customer satisfaction and disagree to the four items counter-indicative of
166
167
satisfaction, 66 percent of the participants having scale-score 31 responded agree to the five
items indicative of customer satisfaction and totally disagree to the four items counter-
indicative of customer satisfaction, and all participants having scale-score 36 responded
totally agree to the five items indicative of customer satisfaction and totally disagree to the
four items counter-indicative of customer satisfaction). Similar distributions of scale scores
were found the customer segments, gender groups, and age groups. Univariate analyses of
variance demonstrated that the customer segments and the age groups differed significantly
with respect to the scale scores (Table 8).
freq
0
50
100
150
200
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
freq
Figure 2: Distribution of customer satisfaction scores in the reduced dataset (N = 1184, mean = 25.69, SD = 6.61, and skewness = -0.85)
Tabl
e 7:
Cus
tom
er S
atis
fact
ion
Scal
e’s
Tota
l-Sca
le S
cala
bilit
y C
oeff
icie
nts
H, I
tem
Sca
labi
lity
Coe
ffic
ient
s H
i, an
d R
elia
bilit
y C
oeff
icie
nts
Rho
in th
e R
educ
ed D
atas
et (N
= 1
184)
Tota
l gro
upC
usto
mer
segm
ent
Gen
der
Age
gro
up
T S
D
F M
U
18
-39
40-5
9 60
+
At B
AN
K I
feel
at h
ome
0.
68
0.66
0.
69
0.70
0.
68
0.68
0.
67
0.72
0.
66
0.66
I am
satis
fied
with
BA
NK
0.
75
0.75
0.
74
0.79
0.
77
0.75
0.
79
0.77
0.
74
0.76
Ther
e ar
e go
od re
ason
s to
leav
e B
AN
K *
0.
66
0.65
0.
68
0.69
0.
65
0.66
0.
78
0.67
0.
64
0.67
I hav
e m
ixed
feel
ings
abo
ut B
AN
K *
0.
67
0.67
0.
64
0.70
0.
64
0.67
0.
77
0.71
0.
63
0.68
BA
NK
mee
ts a
ll m
y re
quire
men
ts fo
r a b
ank
0.70
0.
70
0.66
0.
71
0.67
0.
70
0.77
0.
72
0.67
0.
71
Last
yea
r I h
ad a
ple
asan
t rel
atio
nshi
p w
ith B
AN
K
0.68
0.
68
0.67
0.
70
0.69
0.
68
0.76
0.
70
0.68
0.
68
BA
NK
has
met
my
expe
ctat
ions
0.
72
0.71
0.
72
0.78
0.
73
0.72
0.
85
0.72
0.
71
0.74
I hav
e re
gret
ted
my
choi
ce fo
r BA
NK
*
0.66
0.
64
0.67
0.
75
0.67
0.
65
0.89
0.
69
0.63
0.
68
Last
yea
r I h
ad so
me
prob
lem
s with
BA
NK
*
0.57
0.
58
0.56
0.
55
0.58
0.
57
0.56
0.
62
0.56
0.
56
H
0.67
0.
67
0.67
0.
71
0.67
0.
67
0.76
0.
70
0.65
0.
68
Rho
0.93
0.
93
0.93
0.
95
0.93
0.
93
0.96
0.
95
0.93
0.
94
168
* =
scor
ed re
vers
ely
Tabl
e 8:
Cus
tom
er S
atis
fact
ion
Scor
es in
the
Red
uced
Dat
aset
(N =
118
4)
C
usto
mer
segm
ent
Gen
der
Age
gro
up
Tota
l
T
S D
Fem
ale
Mal
e U
nkno
wn
18
-39
40-5
9 60
+
Mea
n 26
.05
25.4
7 23
.27
25
.99
25.5
7 25
.38
24
.59
25.7
1 26
.46
25
.69
F
8.05
0.51
5.82
p
<0.0
01
0.
60
<0
.01
4 Measures for stylistic responding
Preliminary analyses
Measures of general midpoint responding and general extreme responding were constructed
on the basis of items with (a) low inter-item correlations, and (b) low correlations with
customer satisfaction (see Chapter 7). We constructed these measures on the basis of four
constructs (i.e., EPSP, EDE, involvement, and understanding), which were hypothesised to be
unrelated to customer satisfaction. This hypothesis was tested by means of CFA (e.g., Bollen,
1989; Oort, 1996). A factor model was specified using the nine items reflecting customer
satisfaction, the two items reflecting EPSP, the two items reflecting EDE, the four items
reflecting involvement, and the four items reflecting understanding (Figure 3). The fit of the
model was evaluated on the basis of the goodness of fit index (GFI), the normed fit index
(NFI), the non-normed fit index (NNFI), and the AMI’s (Oort, 1996, p. 49; see also Chapter 6,
Section 4). Furthermore, the correlations between the factors were inspected.
CFA was done using proc calis (SAS STAT). The goodness of fit indices demonstrated
that the factor model did not fit the data well (because indices having a value of 0.9 or higher
indicate an acceptable fit (Bollen, 1989, pp. 269-281), we required a value of 0.9 or higher for
each index). Furthermore, the AMI’s (Table 9) demonstrated that two items reflecting
understanding (i.e., Q9b: I find it difficult to judge the quality of BANK, and Q9d: I know
exactly what I may expect from BANK; Table 1) were significantly biased (i.e., p < 0.001)
with respect to customer satisfaction (i.e., participants with a high value on customer
satisfaction were more inclined to respond positively to these understanding-items (note that
item Q9b was scored reversely; Section 2) than participants with a low value on customer
satisfaction, even when understanding is controlled for). Because it was required that the
items used for measuring general stylistic responding did not reflect customer satisfaction
(Chapter 7), we decided not to use these items for the measurement of stylistic responding.
Because the first factor model did not fit the data, a second factor model was tested. The
second factor model was specified using the same items for customer satisfaction, EPSP,
EDE, and involvement, and the two remaining items reflecting understanding (i.e., Q9a and
Q9c; Table 1). The second factor model fitted the data well (Table 10; the second factor
model), and none of the AMI’s (not shown here) was significant. Furthermore, the absolute
correlations between the factors reflecting customer satisfaction, EPSP, EDE, involvement,
and understanding (Table 11) were considered sufficiently low for the purpose of the current
study. Therefore, we decided to use the items reflecting EPSP, EDE, and involvement, and
169
170
Q3a
Q
4d
Q9d
Q
9a
Q8e
Q
8b
Q7b
Q
6a
Q7c
Q
6d
. . .
. . .
. . . F1
F3
F4
F5
F2
E3a
E4d
E6a
E6d
E7b
E7c
E8b
E8e
E9a
E9d
Figu
re 3
: Fac
tor M
odel
with
Q3a
thro
ugh
Q4d
refle
ctin
g sa
tisfa
ctio
n, Q
6a a
nd Q
6d re
flect
ing
EPSP
, Q7b
and
Q7c
refle
ctin
g ED
E, Q
8b
thro
ugh
Q8e
refle
ctin
g in
volv
emen
t, an
d Q
9a th
roug
h Q
9d re
flect
ing
unde
rsta
ndin
g
Tabl
e 9:
The
Tw
o La
rges
t AM
I’s i
n th
e C
ompl
ete
Dat
aset
and
The
Red
uced
Dat
aset
Com
plet
e D
atas
et (N
=122
7)
Fa
ctor
1
Satis
fact
ion
Fact
or 2
EPSP
Fact
or 3
EDE
Fact
or 4
Invo
lvem
ent
Fact
or 5
Und
erst
andi
ng
AM
I p-
valu
eAM
I p-
valu
eAM
I p-
valu
eAM
I p-
valu
eAM
I p-
valu
e
I fin
d it
diff
icul
t to
judg
e th
e qu
ality
of B
AN
K *
9.
38
<0.0
1 0.
12
ns
0.30
ns
4.
19
<0.0
5 -
-
I kno
w e
xact
ly w
hat I
may
exp
ect f
rom
BA
NK
49
.77
<0.0
01
0.02
ns
0.
60
ns
0.39
ns
-
-
R
educ
ed D
atas
et (N
= 1
184)
Fa
ctor
1
Satis
fact
ion
Fact
or 2
EPSP
Fact
or 3
EDE
Fact
or 4
Invo
lvem
ent
Fact
or 5
Und
erst
andi
ng
AM
I p-
valu
eAM
I p-
valu
eAM
I p-
valu
eAM
I p-
valu
eAM
I p-
valu
e
I fin
d it
diff
icul
t to
judg
e th
e qu
ality
of B
AN
K *
9.
44
<0.0
1 0.
10
ns
0.32
ns
4.
24
<0.0
5 -
-
I kno
w e
xact
ly w
hat I
may
exp
ect f
rom
BA
NK
51
.10
<0.0
01
0.02
ns
0.
63
ns
0.38
ns
-
-
171
* =
scor
ed re
vers
ely
two items reflecting understanding (i.e., Q9a and Q9c; Table 1), for the construction of the
measures of general midpoint responding and general extreme responding.
Table 10: Goodness of Fit of the Factor Models for Customer Satisfaction, EPSP, EDE, Involvement, and Understanding. First Factor Model Second Factor Model
CD (N = 1227) RD (N = 1184) CD (N =1227) RD (N = 1184)
GFI 0.89 0.89 0.93 0.93
NFI 0.87 0.87 0.92 0.93
NNFI 0.86 0.87 0.92 0.93
CD is complete dataset; RD is reduced dataset.
Table 11: Inter-Factor Correlations in the Second Factor Model (Upper Triangle = Complete Dataset; Lower Triangle = Reduced Dataset) Satisfaction EPSP EDE Involvement Understanding
Satisfaction -0.02 0.07 0.07 -0.12
EPSP -0.02 0.48 0.06 0.11
EDE 0.08 0.49 0.09 0.05
Involvement 0.08 0.06 0.08 0.28
Understanding -0.12 0.12 0.04 0.28
General midpoint responding
General midpoint responding was defined as the participant’s proportional use of the middle
response category (i.e, corresponding to score 2), which may vary between zero (if zero
responses were in the middle response category) and one (if all responses were in the middle
response category). To test the hypothesis that satisfaction scores were not affected by the
midpoint response style, a measure of general midpoint responding was constructed. For this
purpose, the two items reflecting EPSP, the two items reflecting EDE, the four items
reflecting involvement, and the two remaining items reflecting understanding (i.e., Q9a and
Q9c; Table 1) were used. Missing values were excluded from the operationalisation, because
they do not provide information about general midpoint responding. The scores on the
measure of general midpoint responding ranged from zero to one, with a mean equal to 0.29
(Table 12). The reliability (i.e., coefficient alpha) of the scores was valued 0.59, which is
rather low but perhaps high enough for research purposes.
172
Midpoint responding to customer satisfaction items
To explore whether general midpoint responding was related to midpoint responding to
customer satisfaction items, a measure of midpoint responding to customer satisfaction items
was constructed. The measure of midpoint responding to customer satisfaction items was
constructed similar to the measure of general midpoint responding. However, for the present
measure the nine items reflecting customer satisfaction were used. The scores on the measure
of midpoint responding to customer satisfaction items ranged from zero to one, with a mean
of 0.17 (Table 12). The reliability (i.e., coefficient alpha) of the scores was valued 0.80.
General extreme responding
General extreme responding was defined as the participant’s proportional use of the extreme
response categories (i.e., corresponding to scores 0 and 4), which may vary between zero (if
zero responses were in the extreme response categories) and one (if all responses were in the
extreme response categories). To test the hypothesis that customer satisfaction scores were not
affected by the extreme response style, a measure of general extreme responding was
constructed. For this purpose, the same items were used that were also used for the
construction of the measure for general midpoint responding. Missing values were excluded
from the operationalisation, because they do not provide information about extreme
responding. The scores on the measure of general extreme responding ranged from zero to
0.80, with a mean of 0.10 (Table 12). The reliability (i.e., coefficient alpha) of the scores was
valued 0.68.
Extreme responding to customer satisfaction items
To explore whether general extreme responding was related to extreme responding to
customer satisfaction items, a measure of extreme responding to customer satisfaction items
was constructed. The measure of extreme responding to customer satisfaction items was
constructed similar to the measure of general extreme responding. However, for the present
measure the nine items reflecting customer satisfaction were used. The scores on the measure
of extreme responding to satisfaction items ranged from zero to one, with a mean of 0.26
(Table 12). The reliability (i.e., coefficient alpha) of the scores was valued 0.89.
173
Table 12: Descriptive Statistics of General Midpoint Responding (GMR), Midpoint Responding to Customer Satisfaction Items (MRCSI), General Extreme Responding (GER), and Extreme Responding to Customer Satisfaction Items (ERCSI) Complete dataset (N = 1227)
Min Max Median Mean SD Skewness
GMR 0 1 0.30 0.29 0.20 0.72 *
MRCSI 0 1 0.11 0.17 0.23 1.47 *
GER 0 0.83 0 0.10 0.15 1.86 *
ERCSI 0 1 0.11 0.26 0.31 1.12 *
Reduced dataset (N = 1184)
Min Max Median Mean SD Skewness
GMR 0 1 0.30 0.29 0.20 0.72 *
MRCSI 0 1 0.11 0.17 0.23 1.49 *
GER 0 0.80 0 0.10 0.15 1.87 *
ERCSI 0 1 0.11 0.26 0.31 1.11 *
* = p < 0.001
5 Test of the hypotheses
The hypotheses 14 and 15 were tested in a similar way. First, the correlation was computed
between stylistic responding and customer satisfaction scores. This was done using proc corr
(SAS STAT). Second, to detect possible non-monotone relations between stylistic responding
and customer satisfaction scores, the stylistic responding scores were plotted against the
customer satisfaction scores. This was done using MS Excel. Third, the correlation was
computed between stylistic responding and stylistic responding to customer satisfaction items.
This was done using proc corr (SAS STAT).
Hypothesis 14
Hypothesis 14 was: the satisfaction scores are not affected by the midpoint response style.
The correlation between general midpoint responding and customer satisfaction was not
significant (Table 13). Furthermore, the plot of the customer satisfaction scores against the
general midpoint responding scores (Figure 4; complete dataset) did not demonstrate a
distinct non-monotone relation. There was a decrease in the standard deviation of the
customer satisfaction scores with increasing general midpoint responding scores, but the
magnitude of the decrease was small (Table 14) and we considered it unimportant. However,
the product-moment correlation between general midpoint responding and midpoint
174
Table 13: Product-Moment Correlations Between General Midpoint Responding (GMR), Midpoint Responding to Customer Satisfaction Items (MRCSI), and Customer Satisfaction (Satisfaction)
Complete dataset (N = 1227) Reduced dataset (N = 1184)
MRCSI Satisfaction MRCSI Satisfaction
GMR 0.14* -0.03 0.13* -0.03
* = p < 0.001
Table 14: Standard Deviation (SD) of Customer Satisfaction in GMR-Groups (N = Group Size)
Complete dataset (N = 1227)
GMR 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
N 139 220 212 235 176 113 68 33 13 11 7
SD 6.9 7.3 6.9 6.2 6.6 6.1 6.3 5.5 4.5 8.2 6.2
Reduced dataset (N= 1184)
GMR 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
N 131 213 211 220 171 109 67 33 11 11 7
SD 6.9 7.3 6.9 6.1 6.4 6.1 6.3 5.5 4.1 8.2 6.2
-50
5
10
1520
2530
35
40
-0,2 0 0,2 0,4 0,6 0,8 1 1,2
GMR
CS N
Figure 4: Plot of general midpoint responding scores versus customer satisfaction scores in the complete dataset (N = 1227). The smallest circle represents one participant and the largest circle represents 35 participants.
175
responding to customer satisfaction items was significant (Table 13). Because customer
satisfaction was almost unrelated to the items underlying the measure of general midpoint
responding, it is plausible that the correlation was caused by the midpoint response style. This
implies that it is plausible that the customer satisfaction scores were affected by the midpoint
response style. Thus, hypothesis 14 was not supported.
Hypothesis 15
Hypothesis 15 was: the satisfaction scores are not affected by the extreme response style. The
correlation between general extreme responding and customer satisfaction was significant in
the reduced dataset (Table 15). Furthermore, the plot of the customer satisfaction scores
against the general extreme responding scores (Figure 5; complete dataset) showed
heteroscedasticity, which means that the variance of customer satisfaction scores differed
across subgroups with different general extreme responding scores. The distribution of
customer satisfaction scores in subgroups having high general extreme responding scores
appears bimodal. This means that high general extreme responding scores corresponded with
very high or very low customer satisfaction scores. In agreement with this results, the
standard deviation of customer satisfaction scores increased as the general extreme
responding score increased (Table 16). The product-moment correlation between general
extreme responding and extreme responding to customer satisfaction items was also
significant (Table 15). Because customer satisfaction was almost unrelated to the items
underlying the measure of general extreme responding, it is plausible that the correlation was
caused by the extreme response style. Thus, hypothesis 15 was not supported.
176
Table 15: Product-Moment Correlations Between General Extreme Responding (GER), Extreme Responding to Customer Satisfaction Items (ERCSI), and Customer Satisfaction (Satisfaction)
Complete dataset (N = 1227) Reduced dataset (N = 1184)
ERCSI Satisfaction ERCSI Satisfaction
GER 0.37** 0.04 0.38** 0.07*
* = p < 0.05; ** = p<0.001
Table 16: Standard Deviation (SD) of Customer Satisfaction in GER-Groups (N = Group Size)
Complete dataset (N = 1227)
GER 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
N 701 206 134 93 47 24 10 6 6 0 0
SD 5.5 6.8 7.1 8.5 9.0 10.2 14.8 12.9 12.5 - -
Reduced dataset (N= 1184)
GER 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
N 678 204 131 86 44 21 9 6 5 0 0
SD 5.5 6.8 7.1 8.4 9.1 9.8 13.1 12.9 13.9 - -
-505
1015202530354045
-0,2 0 0,2 0,4 0,6 0,8 1
GER
CS N
Figure 5: Plot of general extreme responding scores versus customer satisfaction scores in the complete dataset (N = 1227). The smallest circle represents one participant, and the largest circle represents 120 participants.
177
6 Discussion
The second empirical study confirmed that the measurement instrument of customer
satisfaction constituted a scale according to the MH-model. Moreover, the results confirmed
that the scale also could be used in different subgroups. This result contributes to the validity
of the scale-score interpretations in terms of customer satisfaction with the company.
The tests of the hypotheses demonstrated that stylistic responding influenced the
customer satisfaction scale-scores. This means, for example, that the extreme scale-scores
were partly due to a high preference for extreme response categories in general. Because the
contamination of the scale scores due to stylistic responding was small (Tables 13 and 15), its
importance for the assessment of construct validity of the scale scores is also small. Still, it
limits the construct validity of the scale scores.
The distribution of scale scores showed remarkable peaks for the scale-scores 27, 31,
and 36. Each peak was mainly caused by a group of participants who responded to all nine
items in a similar way (see Section 3). For example, the peak for the scale-score 36 was
caused by participants who agreed strongly with all items indicative of customer satisfaction
and disagreed strongly with all items counter-indicative of customer satisfaction. Therefore,
we suspect that the peaks were caused by stylistic responding.
Because the measurement instrument for customer satisfaction, the location of customer
satisfaction items in the questionnaire, the composition of the sample, and the mode of
administration were largely similar in the first and the second study, it is possible that stylistic
responding also influenced the scale scores in the first empirical study. However, the
distribution of scale scores in the first empirical study did not show such sharp peaks as the
distribution of scale scores in the second empirical study. Therefore we suspect that stylistic
responding was less prevalent in the first empirical study than in the second empirical study.
The following difference between the methods used in the first empirical study and the
second empirical study may explain the differences between the distributions of the scale
scores found in these studies. In the first empirical study the questionnaire was accompanied
by an extensive E-mail in which persons were invited to participate in the survey and in which
the purpose of the study was explained, whereas in the second empirical study the
questionnaire was accompanied by a succinct E-mail in which persons were invited to
participate in the survey but which did not explain the purpose of the study. The explanation
of the purpose of the study in the former E-mail may have affected the motivation of
participants to complete the questionnaire conscientiously. Therefore, we suspect that
satisficing (e.g., Krosnick, 1999, pp. 546-548) was less prevalent in the first empirical study
178
than in the second empirical study, and that for that reason stylistic responding also was less
prevalent in the first empirical study than in the second empirical study.
Summarising, the fit of the MH model supports the interpretation of the scale scores
from the second empirical study in terms of customer satisfaction with the company. Because
the content of the measurement instrument also supported that interpretation (Chapter 4),
there is much evidence for construct validity. Still, the tests of the hypotheses indicated that
stylistic responding contaminated the scale scores, and this limits the construct validity of the
scale scores. The contamination of the scale scores may be taken into account in any follow-
up research using the scale scores for customer satisfaction from the second empirical study.
It cannot be ruled out that the scale scores were also contaminated by stylistic
responding in the first empirical study, but there is evidence that contamination of the scale
scores by stylisitc responding in the first empirical study was smaller than in the second
empirical study. Nevertheless, we suggest taking the possibility that the scale scores were
contaminated by stylistic responding into account in any follow-up research using the scale
scores for customer satisfaction from the first empirical study.
7 Conclusions
1 The content of the measurement instrument for customer satisfaction and the results
from the measurement analyses of the empirical studies supported the validity of the
scale-score interpretation in terms of overall satisfaction with the company. Moreover,
the results of the analyses demonstrated that the scale may be used in different customer
populations.
2 The items that were indicative of customer satisfaction and the other items that were
counter-indicative of the construct together constituted a unidimensional scale. This
result supports the conception of dissatisfaction as the opposite of satisfaction on a
bipolar continuum.
3 The quality of the measurement instrument may be improved by the substitution of the
items Q3a (At BANK I feel at home; Table 1) and Q4d (Last year I had some problems
with BANK; Table 1) with other items. This means that it should be investigated whether
the substitution of these items with two other items that reflect customer satisfaction
with a retail bank improves the validity of the measurements of customer satisfaction
with a retail bank.
179
180
4 The results of the second empirical study indicate that the scale scores partly reflected
stylistic responding. It is plausible that a part of the extreme satisfaction scores was
caused by a high general preference for extreme response categories. It is possible that
stylistic responding also influenced the scale scores in the first empirical study but
probably to a lesser extent than in the second empirical study.
5 There is strong evidence for the interpretation of the scale scores in terms of satisfaction
with the company in the first empirical study, and fair evidence for such an
interpretation in the second empirical study. Thus, the application of a measurement
instrument in one study may yield better scale scores than the application of the
instrument in another study (see also Messick, 1989, p. 81). This illustrates that
construct validity is a property of score interpretations and not of measurement
instruments, and that construct validity is always a matter of degree (see also Messick,
1989, p.13).
181
182
183
Chapter 9
General discussion
1 The meaning of customer satisfaction
The purpose of this thesis was to unravel the meaning of customer satisfaction in the context
of retail banking. Customer satisfaction is a psychological construct. Psychological constructs
are organisational principles with respect to behaviour. This means that they are schemes
through which we perceive and interpret behaviours of persons. The ontological status of
customer satisfaction as organisational principle constitutes an important component of the
meaning of customer satisfaction.
The meaning of satisfaction is context-specific (Giese & Cote, 2000). Moreover,
satisfaction with a retail bank may be the absence of dissatisfaction for one customer, a
judgement of the performance of the bank for another customer, and an affect for a third
customer. To account for the different manifestations of satisfaction, we defined customer
satisfaction with a retail bank as the valenced response of the customer, directed towards the
retail bank, and evoked by the customer’s experiences with the bank throughout time. This
definition expresses that customer satisfaction with a retail bank encompasses affects and
cognitions that can be placed on a dimension that ranges from negative to positive. Because
the negative response expresses dissatisfaction and the positive response expresses
satisfaction, the definition also covers customer dissatisfaction with a retail bank. This
definition constitutes an important component of the theoretical meaning of customer
satisfaction in the context of retail banking.
Marketing studies (e.g., Anderson et al., 1994; Hennig-Thurau et al., 2002, Oliver,
1997, Verhoef, 2001, Yi, 1990) suggest that customer satisfaction is related to various other
psychological constructs, such as trust, quality, customer loyalty, commitment, word-of-
mouth, and image, and to customer profitability (CP). There is evidence that customer
satisfaction is preceded by quality and trust, and that customer satisfaction precedes customer
loyalty and CP. We hypothesised that the latter relations also applied to customer satisfaction
in the context of retail banking. The hypothesised relations between customer satisfaction and
trust, quality, customer loyalty, and CP constitute the implicit definition of customer
satisfaction in the context of retail banking. This definition also constitutes an important
component of the theoretical meaning of the construct.
The empirical meaning of customer satisfaction is the behaviours that are associated
with customer satisfaction. In the context of retail banking, these are manifestations of
performance evaluations, disconfirmation, expectations, emotions, and regret (also, Oliver,
1997, pp. 316-318, 343-344). These manifestations can be used for the measurement of
customer satisfaction. Because customer satisfaction has a large behavioural domain, we
developed a nine-item measurement instrument for customer satisfaction with a bank, which
covered different manifestations of customer satisfaction. Five items were indicative of
customer satisfaction and four items were counter-indicative of customer satisfaction. The
first empirical study into customer satisfaction with BANK demonstrated that the nine items
constituted a unidimensional scale. This result supported the theoretical notion that customer
satisfaction is the opposite of customer dissatisfaction on a bipolar dimension.
We found positive correlations between customer satisfaction and quality, and between
customer satisfaction and customer loyalty. These results supported our hypotheses
concerning these correlations, but three remarks are in order. First, the measurement of
quality on the basis of items reflecting judgements about products and services provided by
the company resulted in missing data problems and halo effects. We did not find a satisfactory
solution for these problems. Eventually, we re-defined quality as absence of problems, and we
measured quality by means of the total score on the recoded items regarding the experience of
problems with BANK in the preceding twelve months. We found that absence of problems
with BANK in the preceding twelve months was positively correlated with customer
satisfaction with BANK. Second, we found that the customer satisfaction scale-scores were
contaminated by quality. The scale scores were corrected by excluding one item from the
customer satisfaction scale when testing for the correlation between customer satisfaction and
quality. Third, we found that the customer satisfaction scale-scores were contaminated by
customer loyalty. The scale scores were corrected by excluding one item from the customer
satisfaction scale when testing for the correlation between customer satisfaction and customer
loyalty.
The positive effects of customer satisfaction on future CP after one year and future CP
after two years supported the hypothesis that customer satisfaction influences CP, and
confirmed the importance of customer satisfaction in the context of retail banking. We found
that current CP (i.e., CP at the time of the measurement of customer satisfaction) is an
indispensable variable in analyses of the relation between customer satisfaction and future CP.
184
However, we also found that the size of the effect of current CP on future CP decreased as the
time-lag between current CP and future CP increased. This implies that companies cannot rely
on current CP as a guarantee for future CP, and this warrants taking more than only current
CP into account when estimating customer lifetime value. Furthermore, we found that CP
follows a Pareto-like distribution in the context of retail banking, and that CP had to be
transformed before analysing the relation between customer satisfaction and CP. The latter
results may be useful for the development of methods for investigating the influence of
customer satisfaction on CP and estimating customer lifetime value.
We also found a positive correlation between customer satisfaction and trust, which
supported our hypothesis concerning this correlation. It may be noted that the correlation
between the customer satisfaction scores and the trust scores was as large as the correlation
between the customer satisfaction scores and the ACSI scores. Customers were satisfied with
BANK when they trusted BANK, and dissatisfied with BANK when they did not trust
BANK. This was also an outcome of the pre-tests. There seems to be a large overlap between
the construct of customer satisfaction and the construct of trust in the context of retail
banking. Further research into the generalisability of this result is needed.
The second empirical study demonstrated that the customer satisfaction scores were
contaminated by stylistic responding of the participants. This means, for example, that the
extreme scale-scores were partly due to a high general preference for extreme response
categories. Because the contamination of the scale scores due to stylistic responding was
small, we considered its importance for the construct validity of the scale scores also small.
Still, it limits the construct validity of the scale scores. Therefore we suggested taking the
contamination of the scale scores into account when using these scores for any follow-up
research.
In all, the empirical studies yielded scale scores for customer satisfaction with BANK
and provided much evidence for the construct validity of the scale scores. Therefore we
concluded that the scale scores were rightly interpreted as customer satisfaction with BANK.
The scale scores constitute a special case of the empirical meaning of customer satisfaction.
2 The measurement of psychological constructs in marketing research
Another purpose of this thesis was to select a suitable methodology for the construction of a
measurement instrument for customer satisfaction and the validation of the customer
satisfaction scale-scores. Psychological constructs can be measured by means of
185
psychological tests (including measurement instruments for typical behaviour; see Chapter 1,
Section 4). For the measurement of psychological constructs in marketing research a test often
consists of a set of items that is administered in a survey. On the basis of a participant’s
responses to these items, his or her position on the scale for the property is inferred.
It is broadly acknowledged that validity of measurement is a key success factor for
satisfaction research and for marketing research in general. However, the practice of construct
validation in marketing research does not comply with theory of validity as formulated by
Messick (1989). We demonstrated (Chapter 3, Section 6) that construct validity was
insufficiently investigated in important satisfaction studies in the marketing literature (see also
Giese & Cote, 2000; Peterson & Wilson, 1992). This hampers the usefulness of satisfaction
research for scientific purposes, such as testing of satisfaction theories, and for business
purposes, such as marketing strategy development.
Construct validity is the appropriateness of test-score interpretations in terms of the
construct of interest (e.g., Cronbach; 1971; Messick, 1989, pp. 13, 34). Churchill’s (1979)
perspective on construct validity, which is the leading perspective in marketing measurement,
conflicts with this conception of construct validity. Churchill’s (1979) perspective is flawed
with respect to the conception of construct validity as a property of a test, the criteria for the
assessment of construct validity, and the procedures for validation research. Construct validity
is a property of test-score interpretations, and not of tests. This means, for example, that the
application of a test may yield valid measurements of a construct in one instance, and less
valid measurements of a construct in another instance (see also Chapter 8, Section 7).
Furthermore, Churchill’s (1979) criteria for the assessment of construct validity, which are
nomological validity, divergent validity, and convergent validity, do not address the two
major threats to construct validity, which are construct underrepresentation and construct-
irrelevant variance (Messick, 1989, 1995). Consequently, Churchill’s (1979) procedures for
validation research, which are MTMM framework and correlating a measure with a criterion
variable, do not suffice for the assessment of construct validity. Moreover, because the
methods applied in MTMM research are often similar, the agreement between two measures
of the same often trait provides evidence for reliability rather than validity (also, see Anastasi,
1988, p. 158). The flaws in Churchill’s (1979) perspective on construct validity justify
adopting of Messick’s (1989, 1995) perspective on construct validity and construct validation
research.
Because the deductive design (Schouwstra, 2000) is in agreement with Messick’s (1989,
1995) perspective on construct validity and validation research, we applied the deductive
186
design for the development of a test for customer satisfaction with BANK and the construct
validation of the test scores. The deductive design addresses test development and construct
validation for typical-behaviour properties (Table 1):
Table 1: Outline of Construct Validation Within the Deductive Design (Schouwstra, 2000, p. 60) Scientific arguments Construct representation Irrelevant variance
Rationales
a. Formulation
b. Translation
c. Modelling
Of what construct of interest is
Of construct of interest into test content
How test score reflects construct
And what not
And nothing else
And nothing else
Empirical evidence That test score reflects whole of construct And nothing else
Psychometric theory provides useful guidelines for the definition of the construct of
interest, the translation of the construct of interest into test content, and the choice of a
measurement model for modelling the participant’s responses to the test. For example, it is
well-known that single items often yield inadequate measurements of constructs (e.g.,
Messick, 1989, pp. 14, 35), and this may explain why customer satisfaction has to be
measured by means of a multiple-item scale. The empirical research is directed at the
collection of empirical evidence regarding construct representation and irrelevant variance.
Schouwstra (2000, pp. 69-71) suggested formulating and testing hypotheses regarding
construct representation and absence of irrelevant variance. Two remarks concerning the
empirical research are in order. First, it is not feasible to formulate and test all possible
hypotheses regarding construct representation and absence of irrelevant variance. Therefore,
the formulation and testing of hypotheses has to be restricted to the most important
hypotheses, and which are the most important hypotheses remains to some extent arbitrary.
Second, we consider the requirement that the test scores reflect the whole construct and noting
else too rigid. It is not feasible to exclude all possible irrelevant variance in the practice of
psychological measurement. Therefore, construct validity is always a matter of degree (see
also Messick, 1989, p. 13).
The conclusion that contamination of test scores cannot be avoided in the practice of
psychological measurement limits the construct validity of test scores. Therefore, in future
research we suggest to investigate the degree to which test scores are contaminated by other
187
attributes, and to take any contamination into account when using the test scores for follow-up
research.
In all, the application of the deductive design yielded a scale for customer satisfaction
with BANK and much evidence for the construct validity of the scale scores. Therefore, we
consider the deductive design a useful framework for measurement instrument development
and construct validation in marketing research.
3 Suggestions for future research
First, we suggest further research into the influence of customer satisfaction on CP in retail
banking. We recommend research into the generalisability of the results of the present study
to other groups and companies within the financial services industry. Furthermore, we
recommend future research into the definition and measurement of CP, such as the inclusion
or exclusion of various costs, and the accumulation of profits over longer time periods than
one year.
The second suggestion for future research concerns executing context-specific customer
satisfaction studies. We subscribe to Giese and Cote (2000) that the meaning of customer
satisfaction is context-specific, and that definitions and measures of customer satisfaction also
should be context-specific. We also expect that the antecedents of customer satisfaction are
context-specific. Context-specific customer satisfaction studies may contribute to the further
development of general theory about customer satisfaction.
The third suggestion for future research concerns the development of context-specific
definitions of quality and corresponding measurement procedures. We had much difficulty
with the measurement of quality in the present study. Moreover, different inquiries may
require different definitions and operationalisations of quality. Proper operationalisations of
quality are important for investigating the influence of quality on customer satisfaction, and
such investigations are important for making customer satisfaction actionable for companies.
Fourth, we suggest the deductive design (Schouwstra, 2000) for the measurement of
psychological constructs in marketing research. The marketing literature uses many
psychological constructs, and there appears to be much redundancy in the collection of
constructs. Marketing research may disentangle these constructs, and for that purpose it has to
define and measure them properly. Because Messick’s (1989, 1995) perspective on construct
validity can be put into action by the deductive design, we suggest the deductive design for
188
189
the measurement and the validation of measurements of psychological constructs in marketing
research.
4 Concluding remarks
This thesis explored the meaning of customer satisfaction in retail banking, and the usefulness
of psychometric methods for test development and construct validation. It was demonstrated
that, in the context of retail banking, customer satisfaction is manifested in performance
evaluations, disconfirmation, expectations, emotions, and regret. This is a useful result for the
further development of satisfaction theory and for customer satisfaction management in the
financial services industry. It explains why customer satisfaction is not exclusively driven by
technical quality of products, services, and processes. Therefore a bank’s customer
satisfaction management strategy may start with managing technical quality, and having
accomplished that, it may proceed with managing functional quality, complaints handling,
and corporate communication. Furthermore, the thesis provided strong evidence for the
influence of customer satisfaction on CP. This is a useful result for the further development of
satisfaction theory and eventually for marketing strategy development in the industry of retail
banking. Customer satisfaction influencing CP warrants the appointment of customer
satisfaction as a strategic goal of retail banks, the more because the influence of current CP on
future CP decreases when the time lag increases. The thesis also demonstrated that the
application of psychometric methods for the measurement of customer satisfaction yielded
scale scores that can be rightly interpreted as customer satisfaction scores. This is a useful
result for the methodology of marketing research and eventually for the development and
validation of marketing theories.
190
References
Anastasi, A. (1986). Evolving concepts of test validation. Annual Review of Psychology, 37,
1-15.
Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan Publishing
Company.
Angoff, W.H. (1988). Validity: An evolving concept. In H. Wainer & H.I. Braun (Eds.), Test
validity (pp. 19-32). Hillsdale, NJ: Lawrence Erlbaum Associates.
Anderson, E.W., Fornell, C., & Lehmann, D.R. (1994). Customer satisfaction, market share
and profitability: Findings from Sweden. Journal of Marketing, 58, 53-66.
Anderson, E.W., & Mittal, V. (2000). Strengthening the satisfaction-profit chain. Journal of
Service Research, 3, 107-123.
Anderson, E.W., Fornell, C., & Mazvancheryl, S.K. (2004). Customer satisfaction and
shareholder value. Journal of Marketing, 68, 172-185.
Baumgartner, H., & Steenkamp, J.B.E.M. (2001). Response styles in marketing research: A
cross-national investigation. Journal of Marketing Research, 38, 143-156.
Baumgartner, H., & Steenkamp, J.B.E.M. (2006). Response biases in marketing research. In
R. Grover & M. Vriens (Eds.), The handbook of marketing research: Uses, misuses and
future advances (pp. 95-109). Thousand Oaks: Sage Publications.
Bearden, W.O., Netemeyer, R.G., & Mobley, M.F. (1993). Handbook of marketing scales:
Multi-item measures for marketing and consumer behavior research. Newbury Park,
CA: Sage Publications.
Belson, W.A. (1981). The design and understanding of survey questions. Aldershot: Gower
Publishing Company Limited.
Belson, W.A. (1986). Validity in survey research. Aldershot: Gower Publishing Company
Limited.
Berens, G.A.J.M. (2004). Corporate branding: The development of corporate associations
and their influence on stakeholder reactions. Doctoral dissertation, Erasmus University,
Rotterdam.
Bernaards, C.A., & Sijtsma, K. (2000). Influence of imputation and EM methods on factor
analysis when item nonresponse in questionnaire data is nonignorable. Multivariate
Behavioral Research, 34, 277-313.
191
Bloemer, J.M.M. (1993). Loyaliteit en tevredenheid: Een studie naar de relatie tussen
merktrouw en consumententevredenheid. Doctoral dissertation, University of Maastricht,
Maastricht.
Bloemer, J.M.M., & Poiesz, T.B.C. (1989). The illusion of consumer satisfaction. Journal of
Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 2, 43-48.
Bloemer, J.M.M., & Kasper, H.D.P. (1995). The complex relationship between consumer
satisfaction and brand loyalty. Journal of Economic Psychology, 16, 311-329.
Bollen, K.A. (1989). Structural equations with latent variables. New York: Wiley.
Borsboom, D., Mellenbergh, G.J., & Van Heerden, J. (2003). The theoretical status of latent
variables. Psychological Review, 110, 203-219.
Borsboom, D., Mellenbergh, G.J., & Van Heerden, J. (2004). The concept of validity.
Psychological Review, 111, 1061-1071.
Borsboom, D. (2005). Measuring the mind. New York: Cambridge University Press.
Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425-440.
Bouwmeester, S., & Sijtsma, K. (2006) Constructing a transitive reasoning test for 6-to-13
year old children. European Journal of Psychological Assessment, 22, 225-232.
Bradburn, N.M. (1983). Response effects. In P.H. Rossi, J.D. Wright, & A.B. Anderson
(Eds.), Handbook of survey research (pp. 289-328). New York: Academic Press Inc..
Bronner, F., & Kuijlen, T. (2007). The live or digital interviewer: A comparison between
CASI, CAPI, and CATI with respect to differences in response behaviour. International
Journal of Market Research, 49, 167-190.
Buttle, F. (1996). SERVQUAL: Review, critique, research agenda. European Journal of
Marketing, 30, 8-32.
Byrne, B.M. (1989). A primer of LISREL: Basic applications and programming for
confirmatory factor analytic models. New York: Springer-Verlag.
Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the
multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.
Campbell, D., & Frei, F. (2004). The persistence of customer profitability: Empirical evidence
and implications from a financial services firm. Journal of Service Research, 7, 107-123.
Carnap, R. (1950). Logical foundations of probability. Chicago: University of Chicago Press.
Carnap, R. (1956). The methodological character of theoretical concepts. In H. Feigl & M.
Scriven (Eds.), Minnesota studies in the philosophy of science, Vol I. Minneapolis:
University of Minnesota Press.
192
Caruana, A. (2002). Service loyalty: The effects of service quality and the mediating role of
customer satisfaction. European Journal of Marketing, 36, 811-828.
Churchill, G.A. (1979). A paradigm for developing better measures of marketing constructs.
Journal of Marketing Research, 16, 64-73.
Churchill, G.A., & Suprenant, C. (1982). An investigation into the determinants of customer
satisfaction. Journal of Marketing Research, 19, 491-504.
Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the
behavioral sciences (2nd ed.). Hillsdale: Lawrence Erlbaum Associates.
Cook, T.D., & Campbell, D.T. (1979). Quasi-experimentation: Design and analysis issues for
field settings. Chicago: Rand McNally.
Coombs, C.H. (1964). A theory of data. New York: John Wiley and Sons.
Cooper, R., & Kaplan, R.S. (1991). The design of cost management systems: Text, cases, and
readings. Englewood Cliffs, NJ: Prentice Hall.
Coulthard, L.J.M. (2004). Measuring service quality: A review and critique of research using
SERVQUAL. The Market Research Society, 46, 479-497.
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16,
297-335.
Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological
Bulletin, 52, 281-302.
Cronbach, L.J. (1971). Test validation. In R.L. Thorndike (Ed.), Educational measurement
(pp. 443-507). Washington, DC: American Council on Education.
Cronbach, L.J. (1988). Five perspectives on the validity argument. In H. Wainer & H.I. Braun
(Eds.), Test validity (pp. 3-17). Hillsdale, NJ: Lawrence Erlbaum Associates.
Cronbach, L.J. (1989). Construct validation after thirty years. In R. Linn (Ed.), Intelligence:
Measurement, theory, and public policy (pp. 147-171). Urbana, IL: University of Illinois
Press.
Cronin, J.J., & Taylor, S.A. (1992). Measuring service quality: A reexamination and
extension. Journal of Marketing, 56, 55-68.
Cronin, J.J., & Taylor, S.A. (1994). SERVPERF versus SERVQUAL: Reconciling
perfomance-based and perceptions minus expectations measurement of service quality.
Journal of Marketing, 58, 125-131.
De Ruyter, K., Bloemer. J., & Peeters, P. (1997). Merging service quality and service
satisfaction: An empirical test of an integrative model. Journal of Economic Psychology,
18, 387-406.
193
Dick, A., & Basu, K. (1994). Customer loyalty: Toward an integrated conceptual framework.
Journal of Marketing Science, 22, 99-113.
Dillman, D.A., Tortora, R.S., & Bowker, D. (1998). Principles for constructing web surveys.
SESRC Technical Report 98-50. Washington State Universtity.
Dillman, D.A., & Bowker, D.K. (2001). The web questionnaire challenge to survey
methodologists. In U.D. Reips & M. Bosnjak (Eds.), Dimensions of internet science (pp.
159-178). Lengerich: Pabst Science Publishers.
Donkers, B., Verhoef, P.C., & De Jong, M.G. (2007). Modeling CLV: A test of competing
models in the insurance industry. Quantitative Marketing and Economics, 5, 163-190.
Embretson, S.E., & Reise, S.P. (2000). Item response theory for psychologists. Mahwah, NJ:
Lawrence Erlbaum Associates.
Fabrigar, L.R., Krosnick, J.A., & MacDougall, B.L. (2005). Attitude measurement:
Techniques for measuring the unobservable. In T.C. Brock & M.C. Green (Eds.),
Persuasion: Psychological insights and perspectives (pp. 17-40). Thousand Oaks, CA:
Sage.
Fornell, C., & Larcker, D.F. (1981). Evaluating structural equation models with unobservable
variables and measurement error. Journal of Marketing Research, 28, 39-50.
Fornell, C., & Wernerfelt, B. (1987). Defensive marketing strategy by customer complaint
management: A theoretical analysis. Journal of Marketing Research, 24, 337-346.
Fornell, C., & Wernerfelt, B. (1988). A model for customer complaint management.
Marketing Science, 7, 271-286.
Fornell, C. (1992). A national customer satisfaction barometer: The Swedish experience.
Journal of Marketing, 56, 6-21.
Fornell, C., Johnson, M.D., Anderson, E.W., Cha, J., & Bryant, B.E. (1996). The American
customer satisfaction index: Nature, purpose and findings. Journal of Marketing, 60, 7-
18.
Frege, G. (1892). On sence and reference. In P. Geach & M. Black (Eds.), (1952).
Translations of the philosophical writings of Gottlob Frege. Oxford England: Blackwell.
Friman, M. (2004). The structure of affective reactions to critical incidents. Journal of
Economic Psychology, 25, 331-353.
Gardner, H. (1999). Intelligence reframed: Multiple intelligences for the 21st century. New
York: Basic Books.
Garvin, D.A. (1983). Quality on the line. Harvard Business Review, 61, 65-73.
194
Giese, J.L., & Cote, J.A. (2000). Defining customer satisfaction. Academy of Marketing
Science Review. www.amsreview.org/articles/giese01-2000.pdf.
Goedee, J., Reijnders, W., & Van Thiel, D. (2008). Bankieren in 2020: De impact van
consumentenvertrouwen en technologische ontwikkelingen. Amsterdam: Pearson
Education Benelux.
Gorsuch, R.L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
Greenleaf, E.A. (1992a). Improving rating scale measures by detecting and correcting bias
components in some response styles. Journal of Marketing Research, 29, 176-188.
Greenleaf, E.A. (1992b). Measuring extreme response style. Public Opinion Quarterly, 56,
176-188.
Gremler, D.D., & Brown, S.W. (1996). Service loyalty, its nature, importance and
implications. In B. Edvardsson, S.W. Brown, R. Johnston, & E.E. Scheuing (Eds.),
Advancing service quality: A global perspective (pp. 171-180). International Service
Quality Association.
Gremler, D.D., & Brown, S.W. (1999). The loyalty ripple effect: Appreciating the full value
of customers. International Journal of Service Industry Management, 10, 271-299.
Grönroos, C. (1984). A service quality model and its marketing implications. European
Journal of Marketing, 18, 36-44.
Grönroos, C. (1990). Service management and marketing: Managing the moments of truth in
service competition. Lexington, MA: Lexington Books.
Groves, R.M. (1989). Survey errors and survey costs. New York: Wiley.
Gruca, T.S., & Rego, L.L. (2005). Customer satisfaction, cash flow and shareholder value.
Journal of Marketing, 69, 115-130.
Gustafsson, A, Johnsons, M.D., & Roos, I. (2005). The effects of customer satisfaction,
relationship commitment dimensions, and triggers on customer retention. Journal of
Marketing, 69, 210-218.
Guttman, L. (1954). An outline of some new methodology for social research. Public Opinion
Quarterly, 18, 395-404.
Hausknecht, D.R. (1990). Measurement scales in consumer satisfaction/dissatisfaction.
Journal of Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 3, 1-11.
Hays, W.L. (1988). Statistics (4th ed.). New York: Holt, Rinehart and Winston, Inc..
Heiser, W.J. (2006). Measurement without copper instruments and experiment without
complete control. Psychometrika, 71, 457-461.
195
Hennig-Thurau, T., Gwinner, K.P., & Gremler, D.D. (2002). Understanding relationship
marketing outcomes: An integration of relational benefits and relationship quality.
Journal of Service Research, 4, 230-247.
Herzberg, F., Mausner, B., & Snyderman, B.B. (1959). The motivaton to work. New York:
Wiley.
Howard, J.A., & Sheth, J.N. (1969). The theory of buyer behavior. New York: John Wiley
and Sons.
Homburg, C., Koschate, N., & Hoyer, W.D. (2005). Do satisfied customers really pay more?
A study of the relationship between customer satisfaction and willingness to pay.
Journal of Marketing, 69, 84-96.
Hox, J.J. (1997). From theoretical concept to survey question. In L. Lyberg, P. Biemer, M.
Collins, E. De Leeuw, C. Dippo, N. Schwarz, & D. Trewin (Eds.), Survey measurement
and process quality (pp. 47-70). New York: Wiley.
Hox, J.J. (1998). Er is nieuws onder de zon: Nieuwe oplossingen voor oude problemen.
Kwantitatieve Methoden, 19, 95-118.
Ittner, C.D., & Larcker, D.F. (1998). Are nonfinancial measures leading indicators of
financial performance? An analysis of customer satisfaction. Journal of Accounting
Research, 36, 1-35.
Jack, A., B. (1967). Sampling from a Pareto distribution. Metroeconomica, 19, 216-223.
Jackson, D.N., & Messick, S. (1958). Content and style in personality assessment.
Psychological Bulletin, 55, 243-252.
Jackson, D.N. (1971). The dynamics of structured personality tests: 1971. Psychological
Review, 78, 229-248.
Jackson, D.N. (1973). Structural personality assessment. In B.B. Wolman (Ed.), Handbook of
general psychology (pp. 775-792). NJ: Prentice Hall.
Jacoby, J. (1976). Consumer research: Telling it like it is. In B.B. Anderson (Ed.), Advances
in Consumer Research, 3, 1-11.
Jansen, B.R.J., & Van der Maas, H. (1997). Statistical tests of the rule assessment
methodology by latent class analysis. Developmental Review, 17, 321-357.
Johnson, M.D., Gustafsson, A., Andreassen, T.W., Lervik, L., & Cha, J. (2001). The
evolution and future of national customer satisfaction index models. Journal of
Economic Psychology, 22, 217-245.
Johnston, R. (1995). The determinants of service quality: Satisfiers and dissatisfiers.
International Journal of Service Industry Management, 6, 53-71.
196
Kackar, R.N. (1989). Taguchi’s quality philosophy: Analysis and commentary. In K. Dehnad
(Ed.), Quality control, robust design, and the Taguchi method (pp. 3-19). Pacific Grove:
Wadsworth and Brooks/Cole.
Kane, M. (2006). In praise of pluralism. A comment on Borsboom. Psychometrika, 71, 441-
445.
Kelley, T.L. (1927). Interpretation of educational measurements. New York: World Book
Company.
Knowles, E.S., & Nathan, K.T. (1997). Acquiescent responding in self reports: Cognitive
style or social concern. Journal of Research in Personality, 31, 293-301.
Krosnick, J.A. (1991). Response strategies for coping with the cognitive demands of attitude
measures in surveys. Applied Cognitive Psychology, 5, 213-236.
Krosnick, J.A., & Fabrigar, L.R. (1997). Designing rating scales for effective measurement in
surveys. In L. Lyberg, P. Biemer, M. Collins, E. De Leeuw, C. Dippo, N. Schwarz, & D.
Trewin (Eds.), Survey measurement and process quality (pp. 141-164). New York:
Wiley.
Krosnick, J.A. (1999). Survey research. Annual Review of Psychology, 50, 537-567.
Lehmann, D.R. (1999). Consumer behaviour and Y2K. Journal of Marketing, 63, 14-18.
Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140,
44-53.
Little, R.J.A., & Rubin, D.B. (2002). Statistical analysis with missing data (2nd ed.). New
York: Wiley.
Loevinger, J. (1948). The technique of homogeneous tests compared with some aspects of
‘scale analysis’ and factor analysis. Psychological Bulletin, 45, 507-530.
Lord, F.M., & Novick, M.R., (1968). Statistical theories of mental test scores. Reading:
Addison Wesley.
Luo, X., & Homburg, C. (2007). Neglected outcomes of customer satisfaction. Journal of
Marketing, 71, 133-149.
Mahalanobis, P.C. (1936). On the generalized distance in statistics. Proceedings of the
National Institute of Science of India, 12, 49-55.
Mano, H., & Oliver, R.L. (1993). Assessing the dimensionality and structure of the
consumption experience: Evaluation, feeling, and satisfaction. Journal of Consumer
Research, 20, 451-466.
Maxwell, S.E., & Delaney, H.D. (1990). Designing experiments and analyzing data: A model
comparison perspective. Belmont, CA: Wadsworth Publishing Company.
197
Medlin, C.J., & Quester, P.G. (2002). Inter-firm trust: Two theoretical dimensions versus a
global measure. Paper presented at the IMP conference in Perth, Australia.
www.impgroup.org/uploads/papers/4247.pdf.
Mellenbergh, G.J. (1985). Vraagonzuiverheid: Detectie, definitie en onderzoek. Nederlands
Tijdschrift voor de Psychologie, 40, 425-435.
Messick, S. (1988). The once and future issues of validity: Assessing the meaning and
consequences of measurement. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 33-
45). Hillsdale, NJ: Lawrence Erlbaum Associates.
Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed.) (pp. 13-
103). New York: Macmillan Publishing Co.
Messick, S. (1991). Psychology and methodology of response styles. In R.E. Snow & D.E.
Wiley (Eds.), Improving inquiry in social science: A volume in honor of Lee J. Cronbach
(pp. 161-200). Hillsdale, NJ: Lawrence Erlbaum Associates.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from
persons’ responses and performances as scientific inquiry into score meaning. American
Psychologist, 50, 741-749.
Mittal, V., & Kamakura, W.A. (2001). Satisfaction, repurchase intent, and repurchase
behavior: Investigating the moderating effects of customer characteristics. Journal of
Marketing Research, 38, 131-142.
Molenaar, I.W. (1995). Some background for item response theory and the Rasch model. In
I.W. Molenaar & G.H. Fischer (Eds.), Rasch models: Foundations, recent developments
and applications (pp. 3-14). New York: Springer-Verlag.
Molenaar, I.W., & Sijtsma, K. (2000). MSP5 for windows: User’s manual. Groningen:
ProGAMMA.
Mokken, R.J. (1971). A theory and procedure of scale analysis. The Hague: Mouton; Berlin:
De Gruyter.
Morgan, R.M., & Hunt, S.D. (1994). The commitment-trust theory of relationship marketing.
Journal of Marketing, 58, 20-38.
Mulhern, F.J. (1999). Customer profitability analysis: Measurement, concentration, and
research directions. Journal of Interactive Marketing, 13, 25-40.
Murphy, K.R., & Davidshofer, C.O. (1991). Psychological testing: Principles and
applications (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall.
198
Newman, K. (2001). Interrogating SERVQUAL: A critical assessment of service quality
measurement in a high street retail bank. International Journal of Bank Marketing, 19,
126-139.
Niraj, R., Gupta, M., & Narasimhan, C. (2001). Customer profitability in a supply chain.
Journal of Marketing, 65, 1-16.
Nunnally, J.C. (1978). Psychometric theory (2nd ed.). New York: McGraw Hill.
Oliver, R.L. (1980). A cognitive model of the antecedents and consequences of satisfaction
decisions. Journal of Marketing Research, 17, 460-469.
Oliver, R.L., & DeSarbo, W.S. (1988). Response determinants in satisfaction judgments.
Journal of Consumer Research, 14, 495-507.
Oliver, R.L., & Swan, J.E. (1989). Consumer perceptions of interpersonal equity and
satisfaction in transactions: A field survey approach. Journal of Marketing, 53, 21-35.
Oliver, R.L. (1993). Cognitive, affective, and attribute bases of the satisfaction response.
Journal of Consumer Research, 20, 418-430.
Oliver, R.L. (1997). Satisfaction: A behavioral perspective on the consumer. New York:
McGraw Hill.
Oliver, R.L. (1999). Whence consumer loyalty? Journal of Marketing, 63, 33-44.
Oliver, R.L., & Burke, R.R. (1999). Expectation processes in satisfaction formation. Journal
of Service Research, 1, 196-214.
Oort, F.J. (1996). Using restricted factor analysis in test construction. Doctoral dissertation,
University of Amsterdam, Amsterdam.
Oosterveld, P. (1996). Questionnaire design methods. Doctoral dissertation, University of
Amsterdam, Amsterdam.
Parasuraman, A., Zeithaml, V.A., & Berry, L. L. (1985). A conceptual model of service
quality and its implications for future research. Journal of Marketing, 49, 41-50.
Parasuraman, A., Zeithaml, V.A., & Berry, L. L. (1988). SERVQUAL: A multiple-item scale
for measuring consumer perceptions of service quality. Journal of Retailing, 64, 12-40.
Parasuraman, A., Zeithaml, V.A., & Berry, L. L. (1994). Reassessment of expectations as a
comparison standard in measuring service quality: Implications for future research.
Journal of Marketing, 58, 111-124.
Paulhus, D.L. (1991). Measurement and control of response bias. In J.P. Robinson, P.R.
Shaver, & L.S. Wrightsman (Eds.), Measures of personality and social psychological
attitudes (pp. 17-59). San Diego, CA: Academic Press Inc..
199
Peter, J.P. (1981). Construct validity: A review of basic issues and marketing practices.
Journal of Marketing Research, 18, 133-145.
Peterson, R.A., & Wilson, W.R. (1992). Measuring customer satisfaction: Fact and artefact.
Journal of the Academy of Marketing Science, 20, 61-71.
Pfeifer, P.E., Haskins, M.E., & Conroy, R.M. (2005). Customer lifetime value, customer
profitability, and the treatment of acquisition spending. Journal of Managerial Issues,
17, 11-25.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests.
Copenhagen: Danish Institute for Educational Research.
Reichheld, F.F., & Sasser, W.E. (1990). Zero defections: Quality comes to service. Harvard
Business Review, 68, 105-111.
Reichheld, F.F. (2006). The ultimate question: Driving good profits and true growth.
Cambridge: Harvard Business School Press.
Rotter, J. (1967). A new scale for the measurement of interpersonal trust. Journal of
Personality, 35, 651-665.
Rugg, D. (1941). Experiments in wording questions: II. Public Opinion Quarterly, 5, 91-92.
Russell, J.A., & Carroll, J.M. (1999a). On the bipolarity of positive and negative affect.
Psychological Bulletin, 125, 3-30.
Russell, J.A., & Carroll, J.M. (1999b). The phoenix of bipolarity: Reply to Watson and
Tellegen (1999). Psychological Bulletin, 125, 611-617.
Rust, R.T., & Zahorik, A.J. (1993). Customer satisfaction, customer retention and market
share. Journal of Retailing, 69, 193-21.
Saris, W.E., Van Wijk, T., & Scherpenzeel, A. (1998). Validity and reliability of subjective
social indicators: The effect of different measures of association. Social Indicators
Research, 45, 173-199.
Sartori, G. (1984). Guidelines for concept analysis. In G. Sartori (Ed.), Social science
concepts: A systematic analysis (pp. 15-85). Beverly Hills, CA: Sage.
Schafer, J.L., & Graham, J.W. (2002). Missing data: Our view of the state of the art.
Psychological Methods, 7, 147-177.
Scherpenzeel, A.C. (1995). A question of quality: Evaluating survey questions by multitrait-
multimethod studies. Doctoral dissertation, University of Amsterdam, Amsterdam.
Schouwstra, S.J. (2000). On testing plausible threats to construct validity. Doctoral
dissertation, University of Amsterdam, Amsterdam.
200
Schuman, H. & Presser, S. (1981). Questions and answers in attitude surveys: Experiments on
question form, wording, and context. New York: Academic Press Inc..
Sheatsley, P.B. (1983). Questionnaire construction and item writing. In P.H. Rossi, J.D.
Wright, & A.B. Anderson (Eds.), Handbook of survey research (pp. 195-230). New
York: Academic Press Inc..
Sijtsma, K., & Molenaar, I.W. (2002). Introduction to nonparametric item response theory.
Thousand Oaks: Sage.
Sijtsma, K., & Van der Ark, L.A. (2003). Investigation and treatment of missing item scores
in test and questionnaire data. Multivariate Behavioral Research, 38, 503-528.
Sijtsma, K. (2006). Psychometrics in psychological research: Role model or partner in
science? Psychometrika, 71, 451-455.
Sijtsma, K., Emons, W.H.M., Bouwmeester, S., Nyklicek, I., & Roorda, L.D. (2008).
Nonparametric IRT analysis of quality-of-life scales and its application to the world
health organization quality-of-life scale (WHOQOL-Bref). Quality of Life Research, 17,
275-290.
Singh, J., & Sirdeshmukh, D. (2000). Agency and trust mechanisms in consumer satisfaction
and loyalty judgments. Journal of the Academy of Marketing Science, 28, 150-167.
Soliman, H.M. (1970). Motivation-hygiene theory of job attitudes: An empirical investigation
and an attempt to reconcile both the one- and the two-factor theories of job attitudes.
Journal of Applied Psychology, 54, 452-461.
Stouthard, M.E.A., Mellenbergh, G.J., & Hoogstraten, J. (1993). Assessment of dental
anxiety: A facet approach. Anxiety, Stress, and Coping, 6, 89-105.
Sudman, S., & Bradburn, N.M. (1982). Asking questions: A practical guide to questionnaire
design. San Francisco: Jossey-Bass.
Tabachnick, B.G., & Fidell, L.S. (2007). Using multivariate statistics (5th edition). Boston:
Pearson Education Inc..
Terpstra, M.J. & Van Gastel, W. (2004). Inventory of customer satisfaction surveys.
Unpublished report, ING Group, Amsterdam.
Terpstra, M.J. (2005). Customer satisfaction, customer loyalty and customer profitability.
Unpublished report, ING Group, Amsterdam.
Terpstra, M.J. (2006a). Customer satisfaction, customer loyalty, and recommendation
intentions. Unpublished report, ING Group, Amsterdam.
Terpstra, M.J. (2006b). Business facts for ING Retail Netherlands. Unpublished report, ING
Group, Amsterdam.
201
Terpstra, M.J. (2008). A model for developing customer satisfaction business cases.
Unpublished report, ING Group, Amsterdam.
Thomson, G. (1961). The inspiration of science. London: Oxford University Press.
Thorndike, E.L. (1920). A constant error in psychological ratings. Journal of Applied
Psychology, 4, 25-29.
Torgerson, W.S. (1958). Theory and methods of scaling. New York: John Wiley and Sons.
Tse, D.K., & Wilton, P.C. (1988). Models of consumer satisfaction: An extension. Journal of
Marketing Research, 25, 204-212.
Van der Ark, L.A. (2005). Stochastic ordering of the latent trait by the sum score under
various polytomous IRT models. Psychometrika, 70, 283-304.
Van Dolen, W., Lemmink, J., Mattsson, J., & Rhoen, I. (2001). Affective consumer responses
in service encounters: The emotional content in narratives of critical incidents. Journal
of Economic Psychology, 22, 359-376.
Van Herk, H. (2000). Equivalence in a cross-national context: Methodological & empirical
issues in marketing research. Doctoral dissertation, University of Tilburg, Tilburg.
Van Montfort, K., Masurel, E., & Van Rijn, I. (2000). Service satisfaction: An empirical
analysis of consumer satisfaction in financial services. The Service Industries Journal,
20, 80-94.
Van Ginkel, J. R. (2007). Multiple imputation for incomplete test, questionnaire, and survey
data. Doctoral dissertation, University of Tilburg, Tilburg.
Van Ginkel, J.R., Van der Ark, L.A., & Sijtsma, K. (2007). Multiple imputation of item
scores in test and questionnaire data, and influence on psychometric results. Multivariate
Behavioral Research, 42, 387-414.
Verhoef, P.C. (2001). Analysing customer relationships: Linking relational constructs and
marketing instruments to customer behavior. Doctoral dissertation, Erasmus University,
Rotterdam.
Westbrook, R.A., & Oliver, R.L. (1981). Developing better measures of consumer
satisfaction: Some preliminary results. In K.B. Monroe (Ed.), Advances in consumer
research (8th ed.) (pp. 94-99). MI: Association for Consumer Research.
Westbrook, R.A., & Oliver, R.L. (1991). The dimensionality of consumption emotion patterns
and consumer satisfaction. Journal of Consumer Research, 18, 84-91.
Wirtz, J., & Bateson, J.E.G. (1995). An experimental investigation of halo effects in
satisfaction measures of service attributes. International Journal of Service Industry
Management, 6, 84-102.
202
203
Wirtz, J. (2000). An examination of the presence, magnitude and impact of halo on consumer
satisfaction measures. Journal of Retailing and Consumer Services, 7, 89-99.
Wirtz, J., & Lee, M.C. (2003). An examination of the quality and context-specific
applicability of commonly customer satisfaction measures. Journal of Service Research,
5, 345-355.
Wittgenstein, L. (1953). Philosophische untersuchungen/Philosophical investigations. In M.
Derksen (2002). Filosofische onderzoekingen. Amsterdam: Boom.
Wittgenstein, L. (1958). The blue and brown books. In W. Oranje (1996). Het blauwe en het
bruine boek. Amsterdam: Boom.
Wolf, M.G. (1970). Need gratification theory: A theoretical reformulation of job
satisfaction/dissatisfaction and job motivation. Journal of Applied Psychology, 54, 87-
94.
Woodall, T. (2001). Six sigma and service quality: Christian Grönroos revisited. Journal of
Marketing Management, 17, 595-607.
Yi, Y. (1990). A critical review of consumer satisfaction. In V.A. Zeithaml (Ed.), Review of
marketing (pp. 68-123). Chicago: American Marketing Association.
Zeithaml, V.A., & Bitner, M.J. (1996). Services marketing. New York: McGraw Hill.
Zeithaml, V.A., Parasuraman, A., Berry, L.L. (1990). Delivering quality service. New York:
The Free Press.
204
Samenvatting
Dit proefschrift gaat over de meting van tevredenheid van klanten in de sector van de
financiële dienstverlening door banken. Klanttevredenheid is een onderwerp van
maatschappelijk en economisch belang. Dit komt ook tot uiting in de omvangrijke
academische literatuur over dit onderwerp. Het blijkt dat tevredenheid zich moeilijk laat
definiëren en meten (Oliver, 1997, blz. 13). Dit rechtvaardigt nader onderzoek naar de
betekenis en de meting van tevredenheid.
Psychologische eigenschappen zoals tevredenheid zijn theoretische constructen, en
worden afgeleid uit het gedrag van personen. In marketingonderzoek worden psychologische
eigenschappen veelal gemeten door middel van vragenlijsten. Vaak gebruikt men in het
marketingonderzoek voor de meting van dit soort eigenschappen slechts een enkele vraag,
maar uit de psychometrie is bekend dat een enkele vraag de eigenschap onvolledig dekt
(Messick, 1989, blz. 14). Verder hanteren verschillende marketingstudies verschillende
definities en operationaliseringen van bepaalde eigenschappen. Deze factoren hinderen de
interpretatie en vergelijkbaarheid van resultaten van verschillende studies.
Hoofdstuk 1 behandelt de belangrijkste problemen in klanttevredenheidsonderzoek. Dit
zijn het ontbreken van een goed uitgewerkte definitie van klanttevredenheid, de gebrekkige
validiteit van metingen van klanttevredenheid, en het gebrek aan kennis over de invloed van
klanttevredenheid op klantrendement. Deze problemen hangen onderling samen, omdat het
ontbreken van een goed uitgewerkte definitie van tevredenheid het meten van tevredenheid
hindert, en omdat het ontbreken van valide metingen van tevredenheid de analyse van de
invloed van tevredenheid op klantrendement hindert. Dit proefschrift beoogt bij te dragen aan
de oplossing van deze problemen, en in het verlengde daarvan aan de wetenschappelijke
theorie over klanttevredenheid en de methodologie van klanttevredenheidsonderzoek.
De eerste studie in dit proefschrift gaat over theoretische kenmerken van psychologische
eigenschappen en meetprocedures voor psychologische eigenschappen. Psychologische
eigenschappen zijn theoretische constructen. Psychologische constructen zoals tevredenheid
hebben een bepaalde linguïstische en empirische betekenis. De linguïstische betekenis van
tevredenheid is het gebruik van de term tevredenheid in de alledaagse en wetenschappelijke
taal, en kan worden beschreven in een definitie van tevredenheid. De empirische betekenis
van tevredenheid betreft de gedragingen die worden geassocieerd met tevredenheid, en vormt
205
de basis voor metingen van tevredenheid. De meetprocedures voor psychologische
eigenschappen zijn procedures voor het gebruik van psychologische meetinstrumenten, zoals
psychologische testen en psychologische vragenlijsten, de constructie van schalen voor de
meting van eigenschappen, en het scoren van personen op de schalen.
Hoofdstuk 2 behandelt de definitie van psychologische eigenschappen, de ontwikkeling
van meetinstrumenten voor psychologische eigenschappen, het proces van het meten van
psychologische eigenschappen, de constructie van schalen, en de kwaliteit van meetwaarden.
Het hoofdstuk besluit met een discussie over verschillende opvattingen van
constructvaliditeit. In navolging van Messick (1989, 1995) vatten we constructvaliditeit op als
de passing van interpretaties van schaalscores in termen van het te meten construct. Deze
opvatting van constructvaliditeit vormde de aanleiding het deductive design te kiezen voor de
validatie van de metingen van tevredenheid.
De tweede studie betrof het gebruik van de eigenschappen tevredenheid en
ontevredenheid in de literatuur. Hoofdstuk 3 geeft een overzicht van de belangrijkste
definities en theorieën van deze eigenschappen in de marketing literatuur. Vastgesteld werd
dat tevredenheid en ontevredenheid worden gebruikt om bepaalde gevoelens en oordelen van
consumenten te beschrijven. Deze gevoelens en oordelen vormen een respons op ervaringen
van de klant met bijvoorbeeld een product, en verder heeft de respons betrekking op dit
product, en drukt hij een evaluatie van het product uit. Tevredenheid/ontevredenheid met een
bank werd gedefinieerd als de evaluatieve respons van de klant, die is gericht op de bank, en
die wordt veroorzaakt door het geheel van ervaringen van de klant met de bank. De positieve
respons drukt tevredenheid uit, en de negatieve respons drukt ontevredenheid uit. Tot slot
werd vastgesteld dat de bestaande vragenlijsten voor klanttevredenheid nauwelijks geschikt
zijn voor het meten van tevredenheid met een bank.
In hoofdstuk 4 wordt het deductive design (Schouwstra, 2000) voor de ontwikkeling van
psychologische vragenlijsten behandeld. Het deductive design werd gebruikt voor de
ontwikkeling van een psychologische vragenlijst voor klanttevredenheid over BANK, de
formulering van richtlijnen voor de afname van de vragenlijst, de specificatie van het
meetmodel voor de constructie van schalen, en de formulering van hypotheses over
eigenschappen van de schaalscores. De vragenlijst bestond uit negen gesloten vragen over
aspecten van tevredenheid/ontevredenheid over BANK. Het model van monotone
homogeniteit (Mokken, 1971) werd gebruikt om de schaalbaarheid van deze items te
onderzoeken. De hypotheses hadden betrekking op de eigenschappen van de schaalscores,
206
zoals de zuiverheid ervan en de relatie met metingen van andere constructen. De passing van
het meetmodel alsmede de hypotheses werden onderzocht in twee empirische studies.
De derde studie was een empirisch onderzoek naar klanttevredenheid over BANK. Dit
was het eerste empirische onderzoek. De doelen van het onderzoek waren de constructie van
een schaal voor klanttevredenheid, het beoordelen van de passing van de interpretatie van
schaalscores als meetwaarden voor klanttevredenheid over BANK, en het onderzoeken van de
invloed van klanttevredenheid op het klantrendement. Hoofdstuk 5 beschrijft de methode van
het onderzoek. De vragenlijst voor klanttevredenheid werd afgenomen in een steekproef van
3600 klanten van BANK, hetgeen 1689 respondenten opleverde. Ook werden in datzelfde
onderzoek de eigenschappen vertrouwen, kwaliteit, en loyaliteit gemeten. Het databestand
werd verrijkt met gegevens over het klantrendement op het tijdstip van het onderzoek, na
verloop van één jaar, en na verloop van twee jaar.
De resultaten van het eerste empirische onderzoek worden gerapporteerd in hoofdstuk 6.
Volgens het model van monotone homogeniteit wordt klanttevredenheid gemeten op een
eendimensionele schaal. Daarmee werd een opvatting uit de literatuur weerlegd die zegt dat
tevredenheid en ontevredenheid twee aparte dimensies representeren. De toetsen van de
hypotheses over de kenmerken van de schaalscores bevestigden de interpretatie van de
schaalscores als meetwaarden voor klanttevredenheid over BANK. Uit de toets van de
hypothese over de relatie tussen kwaliteit en klanttevredenheid bleek een sterke relatie tussen
de afwezigheid van problemen met BANK en tevredenheid over BANK. Dit resultaat
bevestigt het belang van proceskwaliteit voor klanttevredenheid. Tot slot werden positieve
effecten van klanttevredenheid op het klantrendement na verloop van respectievelijk één jaar
en twee jaar gevonden. Dit resultaat geeft aan dat tevredenheid van invloed is op
klantrendement.
De vierde studie was een empirisch onderzoek naar klanttevredenheid met BANK. Dit
was het tweede empirische onderzoek. Het doel van dit onderzoek was vast te stellen of de
schaalscores voor klanttevredenheid werden beïnvloed door responsstijlen, zoals een
algemene voorkeur voor de middelste antwoordcategorie van items of de extreme
antwoordcategorieën. Voor dit onderzoek werd de vragenlijst voor klanttevredenheid
afgenomen in een steekproef van bijna 3000 klanten van BANK, hetgeen 1227 respondenten
opleverde. Om de responsstijlen te meten werden ook gegevens verzameld over bijvoorbeeld
de verwachtingen van de klant over de ontwikkeling van de Nederlandse economie.
Hoofdstuk 7 beschrijft de methode van het onderzoek.
207
208
De resultaten van het tweede empirische onderzoek worden gerapporteerd in hoofdstuk
8. Uit de resultaten bleek dat de schaalscores voor klanttevredenheid enigzins vertekend
werden door responsstijlen. Derhalve kan niet worden uitgesloten dat responsstijlen ook de
schaalscores voor klanttevredenheid in het eerste empirische studie in lichte mate hebben
vertekend. Daarom wordt geadviseerd om bij gebruik van de vragenlijst in vervolgonderzoek
maatregelen te nemen ter correctie van de invloed van deze responsstijlen.
Hoofdstuk 9 betreft de algemene discussie. Geconcludeerd werd dat tevredenheid met
een bank zich manifesteert in emoties, spijt, verwachtingen, disconfirmatie, en rationele
oordelen. Dit is een nuttig resultaat voor wetenschappelijke theorievorming over
klanttevredenheid en voor klanttevredenheidsmanagement in de financiële dienstverlening.
Het verklaart bijvoorbeeld waarom klanttevredenheid niet uitsluitend wordt gedreven door
technische kwaliteit van wat een bedrijf levert, maar ook door functionele kwaliteit, dus hoe
een bedrijf zijn diensten levert, de communicatie met de klant, en reputatie van het bedrijf.
Verder levert het onderzoek ondersteuning voor de theorie over de invloed van
klanttevredenheid op klantbaten. Dit is een nuttig resultaat voor wetenschappelijke
theorievorming en voor strategie ontwikkeling in de financiële dienstverlening. Het gebruik
van moderne psychometrische methoden heeft bijgedragen aan ontwikkeling van een
meetinstrument voor klanttevredenheid met banken en de vaststelling van de validiteit van de
metingen van klanttevredenheid. Dit is een nuttig resultaat voor de methodologie van
wetenschappelijk en toegepast klanttevredenheidsonderzoek.
209
210
Appendix 1
Vragenlijst onderzoek 1
Vraag 0 Beschouwt u BANK als uw belangrijkste bank? Ja……. Nee….. Vraag 1. Welke financiële producten heeft u op dit moment bij BANK? Er zijn meerdere antwoorden mogelijk. Ja a Betaalrekening……………………………………... b Betaalpas…………………………………………… c Credit card…………………………………………. d Internetbankieren…………………………………... e Spaarproducten…………………………………….. f Beleggingsproducten………………………………. g Hypotheek………………………………………….. h Kredieten, leningen (voor consumptief gebruik)…... i Schadeverzekeringen………………………………. j Levensverzekeringen………………………………. Vraag 2. Via welk kanaal of kanalen heeft u in het afgelopen jaar contact met BANK gehad? Er zijn meerdere antwoorden mogelijk. Ja a Medewerker kantoor……………………………... b Adviseur aan huis…………………………………. c Telefoon1…….…………………………………... d Telefoon2….…………………………………….. e Correspondentie………………………………….. f E-mail……………………………………………. g Internet..………………………………………….. h Internetbankieren1………………………….…... i Internetbankieren2..………………………………. j Anders, namelijk ….………………………………. k Geen……………………………………………….
211
212
Vraag 3. (stellingen roteren) In dit blok staat een aantal stellingen over BANK. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van ‘zeer mee eens’ tot en met ‘zeer mee oneens’, en 1 categorie ‘geen antwoord’> A Ik voel me thuis bij BANK | B Ik ben tevreden over BANK | C Nvt | D Er zijn goede redenen om weg te gaan bij BANK | E IIk heb gemengde gevoelens over BANK | F Nvt | G BANK voldoet aan alle eisen die ik aan een bank stel | H Nvt | I Nvt |
Vraag 4. (stellingen roteren) Als u eens terugdenkt aan het afgelopen jaar, en in het bijzonder aan uw ervaringen met BANK, Wilt u nu van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van ‘zeer mee eens’ tot en met ‘zeer mee oneens’, en 1 categorie ‘geen antwoord’> A Ik had afgelopen jaar een prettige relatie met BANK | B BANK heeft aan mijn verwachtingen voldaan | C Ik heb spijt gehad van mijn keuze voor BANK | D Ik had afgelopen jaar problemen met BANK |
Vraag 5. (stellingen roteren) Er volgt nu een aantal stellingen over het vertrouwen in de dienstverlening van BANK. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van ‘zeer mee eens’ tot en met ‘zeer mee oneens’, en 1 antwoordcategorie ‘geen antwoord’> A Ik kan er op rekenen dat BANK mij eerlijk behandelt | B Ik kan er op rekenen dat BANK mijn zaken correct afhandelt | C Ik kan er op vertrouwen dat BANK beloftes en afspraken nakomt | D Ik twijfel soms aan de kwaliteiten van BANK | E Ik twijfel soms aan de goede wil van BANK | F Ik kan BANK vertrouwen | G Bij BANK kan ik rekenen op een goede service |
Vraag 6. Er volgt nu een aantal stellingen over problemen met BANK. Kunt u aangeven of u een dergelijk probleem heeft gehad, in het afgelopen jaar? Er zijn meerdere antwoorden mogelijk. Ja A Fouten in de afhandeling van uw bankzaken B Fouten in de verwerking van uw opdrachten C Onvoldoende informatie over uw bankzaken D Onduidelijke informatie over uw bankzaken E Onredelijke kosten voor het gebruik van diensten F Trage dienstverlening G Trage overboekingen H Slecht nakomen van afspraken door BANK I Onvoldoende bereikbaarheid via de telefoon J Onvoldoende bereikbaarheid via internet K Onvoldoende bereikbaarheid van kantoren L Slecht beantwoorden van uw vragen M Problemen met passen N Problemen met pinnen O Problemen met internetbankieren P Een ander probleem Q Geen probleem Vraag 7. (stellingen roteren) Er volgt nu een aantal aspecten van de dienstverlening van BANK. Kunt u op grond van uw persoonlijke ervaringen de prestatie van BANK op deze aspecten beoordelen? <4 antwoordcategorieen van ‘uitstekend’ tot en met ‘slecht’, en 1 antwoordcategorie ‘weet niet’> A De juiste verwerking van opdrachten die u geeft | B De snelheid waarmee overboekingen worden verricht | C De snelheid van de dienstverlening door BANK. | D Het nakomen van afspraken en beloftes door BANK | E Het correct afhandelen van uw bankzaken | F De frequentie waarmee u rekeningafschriften ontvangt van BANK | Vraag 8. (stellingen roteren) Er volgt nu een aantal aspecten van producten en diensten van BANK. Kunt u op grond van uw persoonlijke ervaringen de prestatie van BANK op deze aspecten beoordelen? <4 antwoordcategorieen van ‘uitstekend’ tot en met ‘slecht’, en 1 antwoordcategorie ‘weet niet’> A De tarieven van betaalpakketten van BANK | B De gemak van de producten en diensten van BANK | C De duidelijkheid van de informatie die BANK u verstrekt over uw
bankzaken |
D De toereikendheid van informatie die BANK u verstrekt over uw bankzaken
|
E De kosten die BANK rekent voor het gebruik van diensten | F De rentes van producten van BANK |
213
Vraag 9. (stellingen roteren) Er volgt nu een aantal aspecten van de dienstverlening via de verschillende kanalen van BANK. Kunt u op grond van uw persoonlijke ervaringen de prestatie van BANK op deze aspecten beoordelen? <4 antwoordcategorieen van ‘uitstekend’ tot en met ‘slecht’, en 1 antwoordcategorie ‘weet niet’> A De dienstverlening via telefoon | B De dienstverlening via internet | C De dienstverlening via het kantoor | D De dienstverlening via post/correspondentie | E Het gemak waarmee u BANK kunt bereiken | F De voorzieningen voor internetbankieren | Vraag 10. (stellingen roteren) Er volgt nu een aantal aspecten van contacten met BANK en medewerkers van BANK. Kunt u op grond van uw persoonlijke ervaringen de prestatie van BANK op deze aspecten beoordelen? <4 antwoordcategorieen van ‘uitstekend’ tot en met ‘slecht’, en 1 antwoordcategorie ‘weet niet’> A De vriendelijkheid van medewerkers van BANK | B De deskundigheid van medewerkers van BANK | C De betrouwbaarheid van medewerkers van BANK | D De mate waarin BANK luistert naar uw wensen en vragen | E De manier waarop BANK u te woord staat | F De manier waarop BANK klachten behandelt | Vraag 11. Met welke banken heeft u verder een relatie? Kunt u per bank aangeven of u hier bankzaken heeft lopen? Met bankzaken doelen wij op alle soorten van bankzaken, zoals betalen, sparen, beleggen, lenen, hypotheken, verzekeren, internetbankieren et cetera. Ja a Bank1 b Bank2 c Bank3 d Bank4 e Bank5 f Bank6 g Andere bank, namelijk…………………………….. h Geen andere bank Vraag 12. (alleen BANK en de banken uit vraag 11) Hoe belangrijk is elk van de volgende banken voor u? <5 antwoordcategorieen, van ‘zeer mee eens’ tot en met ‘zeer mee oneens’, en 1 categorie ‘geen antwoord’> A BANK | B Bank … |
214
Vraag 13. (alleen BANK en de banken uit vraag 11) Hoe tevreden bent u over de volgende banken? <10 antwoordcategorieen, van ‘bijzonder ontevreden’ tot en met ‘bijzonder tevreden’, en 1 categorie ‘geen antwoord’> A BANK | B Bank … | Vraag 14. (stellingen roteren) Er volgt nu een aantal stellingen over uw houding ten opzichte van BANK, in vergelijking tot andere banken. Kunt u van elke stelling aangeven in hoeverre u het met de stelling eens dan wel oneens ben? <5 antwoordcategorieen, van ‘zeer mee eens’ tot en met ‘zeer mee oneens’, en 1 antwoordcategorie ‘geen antwoord’> A Indien ik nieuwe bankproducten nodig heb, is BANK mijn eerste
keuze |
B Ik heb meer sympathie voor BANK dan voor andere banken | C Voor sommige dingen kan ik het beste terecht bij een andere bank | D Ik overweeg om over te stappen van BANK naar een andere bank | E BANK biedt mij voordelen die andere banken niet bieden | F BANK is al jarenlang mijn belangrijkste bank | Vraag 15. Nvt Vraag 16. Nvt Vraag 17. Hoeveel interesse heeft u voor bankzaken? <5 antwoordcategorieen, van ‘veel interesse’ tot en met ‘geen interesse’, en 1 categorie ‘geen antwoord’> <antwoord> | Vraag 18. Hoeveel interesse heeft u voor nieuwe financiele producten en diensten die banken afnemen? <5 antwoordcategorieen, van ‘veel interesse’ tot en met ‘geen interesse’, en 1 categorie ‘geen antwoord’> <antwoord> | Vraag 19. Nvt
215
216
Vraag 20. ( vraag 20b t/m 20e roteren) In dit blok staat een aantal vragen over BANK. Wilt u elk van deze vragen beantwoorden? Vraag 20a Nvt Vraag 20b Hoe tevreden bent u over BANK? Wilt u uw oordeel uitdrukken in een cijfer, waarbij 1 betekent ‘bijzonder ontevreden’ en 10 betekent ‘bijzonder tevreden’. <10 antwoordcategorieen, van ‘bijzonder ontevreden’ tot en met ‘bijzonder tevreden’, en 1 categorie ‘geen antwoord’> cijfer | Vraag 20c Hoe goed voldoet BANK aan uw ideaalbeeld van een bank? Wilt u uw oordeel uitdrukken in een cijfer, waarbij 1 betekent ‘ verre van ideaal’ en 10 betekent ‘ideaal’. <10 antwoordcategorieen, van ‘verre van ideaal’ tot en met ‘ideaal’, en 1 categorie ‘geen antwoord’> cijfer | Vraag 20d Hoe goed heeft BANK, in het afgelopen jaar, aan uw verwachtingen voldaan? Wilt u uw oordeel uitdrukken in een cijfer, waarbij 1 betekent ‘ zeer slecht’ en 10 betekent ‘uitstekend’. <10 antwoordcategorieen, van ‘zeer slecht’ tot en met ‘uitstekend’, en 1 categorie ‘geen antwoord’> cijfer | Vraag 20e Nvt Vraag 20f Nvt
Appendix 2
E-mail bij onderzoek 1
Geachte <aanhef>, Graag nodigen we u uit om aan een vragenlijst van het BANK-Klantenpanel mee te doen. Dit onderzoek gaat over uw tevredenheid over BANK. Over diverse aspecten wordt uw waardering gevraagd. Misschien herkent u enkele vragen die we vorig jaar ook al eens gesteld hebben. Deze vragen herhalen we om beter inzicht te krijgen in hoe klanten over BANK denken in verhouding tot vorig jaar. Met uw deelname helpt u BANK dus om haar dienstverlening beter te laten aansluiten op uw wensen. Sommige vragen in dit onderzoek lijken sterk op elkaar. Wij vragen hiervoor uw begrip. Dit is een bewuste keuze aangezien deze vragenlijst ook een wetenschappelijk doel heeft. BANK wil achterhalen hoe zij klanttevredenheid het beste kan onderzoeken. BANK doet dit in samenwerking met de Universiteit van Tilburg. Als u de vragen beantwoordt zoals u gewend bent, helpt u tevens mee aan de ontwikkeling van ons marktonderzoek. Hoe werkt dit onderzoek? Als u onderstaande link aanklikt komt u vanzelf in de vragenlijst. Het invullen duurt ongeveer 20 minuten. U kunt tot en met 12 oktober aanstaande meedoen aan dit onderzoek.Voor deelneming aan dit onderzoek ontvangt u 10 punten (waarde € 1,=). Deze 10 punten worden binnen 72 uur na het invullen van de vragenlijst aan uw saldo toegevoegd. Na minimaal twee onderzoeken kunt u met uw punten een leuke attentie bestellen of uw punten schenken aan een goed doel: Artsen Zonder Grenzen, Natuurmonumenten of SOS Kinderdorpen. Met uw persoonlijke nummer (UserID) en unieke code (wachtwoord) kunt u inloggen op uw persoonlijke pagina van www.BANK-klantenpanel.nl. Klik op de onderstaande link om de vragenlijst te starten. Hartelijk dank voor uw medewerking aan het BANK-Klantenpanel. Met vriendelijke groet, helpdesk BANK-Klantenpanel www.BANK-klantenpanel.nl
217
Appendix 3
Vragenlijst onderzoek 2
Vraag 1 Welke financiële producten heeft u op dit moment bij BANK? Er zijn meerdere antwoorden mogelijk. Ja a Betaalrekening……………………………………... b Betaalpas…………………………………………… c Credit card…………………………………………. d Internetbankieren…………………………………... e Spaarproducten…………………………………….. f Beleggingsproducten………………………………. g Hypotheek………………………………………….. h Kredieten, leningen (voor consumptief gebruik)…... i Schadeverzekeringen………………………………. j Levensverzekeringen………………………………. Vraag 2 Via welk kanaal of kanalen heeft u in het afgelopen jaar contact met BANK gehad? Er zijn meerdere antwoorden mogelijk. Ja a Medewerker kantoor….…………………………... b Adviseur aan huis…………………………………. c Telefoon1…….…………………………………... d Telefoon2…….…………………………………….. e Correspondentie………………………………….. f E-mail……………………………………………. g Internet..………………………………………….. h Internetbankieren…………………………….…... i Anders ……….. ….………………………………. j Geen………………………………………………. Vraag 3 (stellingen roteren) In dit blok staat een aantal stellingen over BANK. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van ‘zeer mee eens’ tot en met ‘zeer mee oneens’, en 1 categorie ‘geen antwoord’> A Ik voel me thuis bij BANK | B Ik ben tevreden over BANK | D Er zijn goede redenen om weg te gaan bij BANK | E Ik heb gemengde gevoelens over BANK | G BANK voldoet aan alle eisen die ik aan een bank stel |
218
Vraag 4 (stellingen roteren) Als u eens terugdenkt aan het afgelopen jaar, en in het bijzonder aan uw ervaringen met BANK, Wilt u nu van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van ‘zeer mee eens’ tot en met ‘zeer mee oneens’, en 1 categorie ‘geen antwoord’> A Ik had afgelopen jaar een prettige relatie met BANK | B BANK heeft aan mijn verwachtingen voldaan | C Ik heb spijt gehad van mijn keuze voor BANK | D | Ik had afgelopen jaar problemen met BANK Vraag 5 Nvt Vraag 6. (stellingen roteren) Er volgt nu een aantal stellingen over uw verwachtingen ten aanzien van de ontwikkeling van uw koopkracht. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van ‘zeer mee eens’ tot en met ‘zeer mee oneens’, en 1 categorie ‘geen antwoord’> A Ik verwacht dat mijn koopkracht het komend jaar gaat verbeteren | B Ik verwacht dat mijn koopkracht het komend jaar gaat verslechteren | C Ik verwacht dat mijn koopkracht over 5 jaar beter is dan nu | D | Ik verwacht dat mijn koopkracht over 5 jaar slechter is dan nu Vraag 7. (stellingen roteren) Er volgt nu een aantal stellingen over uw verwachtingen ten aanzien van de economische ontwikkeling van Nederland. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van ‘zeer mee eens’ tot en met ‘zeer mee oneens’, en 1 categorie ‘geen antwoord’> A Ik verwacht dat de economie van Nederland het komend jaar gaat
verbeteren |
B Ik verwacht dat de economie van Nederland het komend jaar gaat verslechteren
|
C Ik verwacht dat de economie van Nederland over 5 jaar beter is dan nu | D | Ik verwacht dat de economie van Nederland over 5 jaar slechter is dan nu
219
220
Vraag 8 (stellingen roteren) In dit blok staan zes stellingen over uw houding ten opzichte van bankzaken, zoals betalen, sparen, lenen, hypotheken, verzekeren, beleggen, et cetera. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van ‘zeer mee eens’ tot en met ‘zeer mee oneens’, en 1 categorie ‘geen antwoord’> A Ik maak me nooit druk over bankzaken | B Ik vind bankzaken erg belangrijk | C Het goed regelen van bankzaken maakt het leven gemakkelijker | D Ik vind bankzaken vervelend | E Bankzaken laten mij koud | F Het goed regelen van bankzaken kan veel geld opleveren | Vraag 9. (stellingen roteren) In dit blok staan vier stellingen over de transparantie van de financiële markt. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van ‘zeer mee eens’ tot en met ‘zeer mee oneens’, en 1 categorie ‘geen antwoord’> A Ik ken de voor- en nadelen van de banken in de Nederlandse markt | B Ik kan de kwaliteit van BANK moeilijk beoordelen | C Ik kan de kwaliteit van verschillende banken moeilijk vergelijken | D
Ik weet precies wat ik van BANK kan verwachten |
Vraag 10 Nvt
top related