explaining the underlying psychological factors of ... greene_final phd... · appropriateness of...
TRANSCRIPT
Explaining the Underlying Psychological
Factors of Consumer Behaviour with
Artificial Neural Networks
By
Max N Greene
A Thesis Submitted in Fulfilment of the Requirements for the Degree of
Doctor of Philosophy of Cardiff University
Marketing and Strategy Section of Cardiff Business School, Cardiff University
January 2016
i
DECLARATION This work has not previously been accepted in substance for any degree and is not concurrently submitted in candidature for any degree.
Signed (candidate)
Date 29/01/2016 STATEMENT 1 This thesis is being submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy.
Signed (candidate)
Date 29/01/2016 STATEMENT 2 This thesis is the result of my own independent work/investigation, except where otherwise stated. Other sources are acknowledged by footnotes giving explicit references.
Signed (candidate)
Date 29/01/2016 STATEMENT 3 I hereby give consent for my thesis, if accepted, to be available for photocopying and for inter-library loan, and for the title and summary to be made available to outside organisations.
Signed (candidate)
Date 29/01/2016
ii
Acknowledgements
I would like to express my sincere appreciation to my supervisors Professor
Gorgon Foxall and Dr Peter Morgan for their continuous support and mentorship
throughout these past years. I am indebted to them for numerous hours of
discussion that helped formulate the ideas expressed here. They have always
offered a tremendous encouragement for my research interest and helped my
development as a research scientist through their invaluable advice.
I would like to thank the academic, administrative, and technical staff members
of the Cardiff Business School who have always been there to help and advise in
their respective roles.
I am very grateful to my friends and family for their continuous support and
encouragement.
This research has been supported by the Economic and Social Research Council
and by the Marketing and Strategy section at Cardiff University Business School.
Data supplied by TNS UK Limited. The use of TNS UK Ltd data in this work does
not imply the endorsement of TNS UK Ltd in relation to the interpretation or
analysis of the data. All errors and omissions remain the responsibility of the
authors.
iii
Summary
This thesis intends to advance our understanding of consumer behaviour, and
proposes an extension to the theoretical and methodological framework of the
Behavioural Perspective Model. Drawing on the intellectual tradition of
connectionism and employing advanced artificial neural network modelling
techniques, the research programme described here assesses the
appropriateness of connectionist architectures in explaining consumer behaviour.
This thesis traces the developments in the fields of consumer behaviour analysis
to critically evaluate the significance of limitations inherited from radical
behaviourism, and proposes a hybrid connectionist approach to address these
methodological constraints.
The study is both highly quantitative and interpretative in nature, and generates a
large body of empirical evidence to support the methodological and theoretical
deliberations. Two types of data are used here: a simulated dataset to assess the
capacity of the pruning algorithms to reveal the underlying relations within the
data; and a consumer panel dataset to which the neural network algorithms are
applied to develop predictive, descriptive, and interpretative connectionist
models that aim to explain the consumer purchasing decision-making process.
Even though it is beyond the scope of this research project to propose
mechanisms to explain all elements of consumer purchasing decision and it will
therefore remain to be addressed as part of an ongoing collaborative research
programme, the main conclusion to be drawn from this work is that the
connectionist framework and artificial neural networks can be considered a
significant contribution to the extension of theoretical and philosophical
framework of intentional behaviourism.
iv
Contents
1. Introduction ......................................................................................................... 1
1.1 The thesis structure and contents .............................................................. 2
2. Consumer behaviour ........................................................................................... 6
2.1 Explanation of consumer behaviour ........................................................... 8
2.2 Intentional Behaviourism explained ......................................................... 10
2.3 Radical behaviourism ................................................................................. 12
2.3.1 Interpretative Behaviour Analysis...................................................... 14
2.3.2 Plausibility ........................................................................................... 15
2.4 The Behavioural Perspective Model ......................................................... 16
2.4.1 The framework of Behavioural Perspective Model .......................... 21
2.4.2 Brand choice ....................................................................................... 22
2.4.3 Verbal and affective response ........................................................... 23
2.4.4 Attitudinal-behavioural consistency .................................................. 23
2.5 Intentional psychology ............................................................................... 25
2.5.1 Intentionality....................................................................................... 25
2.5.2 Philosophy of intentionality ............................................................... 28
2.5.3 Intentional psychology ....................................................................... 30
2.6 Intentional behaviourism ........................................................................... 31
2.6.1 Cognitive psychology and radical behaviourism ............................... 32
2.6.2 Intentionality and behaviourism ........................................................ 39
2.6.3 Intentional behaviourism ................................................................... 41
2.7 Super-personal cognitive psychology ....................................................... 43
v
2.8 Cognitive interpretation of behaviour ...................................................... 46
2.8.1 Phenomenology .................................................................................. 48
2.8.2 Interpretative phenomenological analysis ........................................ 51
2.9 Summary ..................................................................................................... 52
3. Connectionism and Artificial Neural Networks ................................................ 54
3.1 Why Connectionism? ................................................................................. 54
3.1.1 The ultimate purpose of AI ................................................................ 55
3.1.2 Networks and symbolic systems ........................................................ 57
3.1.3 The foundations of connectionism .................................................... 58
3.1.4 Symbolic models ................................................................................. 61
3.1.5 Connectionist models ......................................................................... 62
3.1.6 Argument for symbolic models .......................................................... 64
3.1.7 Argument for connectionist models .................................................. 68
3.1.8 The appeal of connectionist systems ................................................ 73
3.2 The Neural Network architecture ............................................................. 78
3.2.1 Neural Network model structure ...................................................... 79
3.2.2 Neural Network architecture design features .................................. 82
3.3 Machine learning ....................................................................................... 92
3.3.1 Traditional approaches....................................................................... 93
3.3.2 Neural Networks approach ................................................................ 96
3.3.3 Difficulties with machine learning ................................................... 104
3.4 Pattern recognition .................................................................................. 106
3.4.1 Pattern recognition algorithms ........................................................ 107
3.4.2 Intentionality of pattern recognition ............................................... 113
3.4.3 Categorisation with connectionist models ...................................... 117
vi
3.4.4 Pattern recognition in mental processes ........................................ 123
3.5 Knowledge representation in connectionism ......................................... 125
3.5.1 Knowledge representation in Cognitive Science ............................ 125
3.5.2 Types of knowledge .......................................................................... 127
3.5.3 Expert knowledge ............................................................................. 129
3.5.4 Vision knowledge .............................................................................. 130
3.5.5 Logical inferences ............................................................................. 132
3.6 Summary ................................................................................................... 133
4. Methods ........................................................................................................... 135
4.1 Overview ................................................................................................... 135
4.1.1 NNs models and linear models ........................................................ 136
4.1.2 Architecture of NNs models ............................................................. 138
4.2 Research questions and hypotheses ....................................................... 140
4.3 Philosophical position .............................................................................. 142
4.3.1 Consumer behaviour position .......................................................... 142
4.3.2 Ontology ............................................................................................ 143
4.3.3 Epistemology .................................................................................... 146
4.3.4 Axiology ............................................................................................. 149
4.4 Research methods justification ............................................................... 150
4.4.1 Assessment of the quantitative method ......................................... 150
4.4.2 Theoretical justification .................................................................... 151
4.4.3 Alternative approaches .................................................................... 153
4.5 Research method ..................................................................................... 154
4.5.1 Sample ............................................................................................... 154
4.5.2 Research design ................................................................................ 155
vii
4.5.3 Apparatus .......................................................................................... 157
4.5.4 Sequential account of the research process ................................... 159
4.6 Summary ................................................................................................... 161
5. Analysis ............................................................................................................. 162
5.1 Preliminary data manipulations .............................................................. 162
5.2 Exploratory analysis ................................................................................. 163
5.3 Regression and NNs comparative analysis ............................................. 165
5.4 Exploratory NNs modelling ...................................................................... 169
5.4.1 RSNNS ................................................................................................ 170
5.4.2 NeuralNetTools ................................................................................. 171
5.5 Advanced connectionist models: hidden layers ..................................... 172
5.6 Pruning ...................................................................................................... 176
5.6.1 Assessing pruning performance using simulated data ................... 176
5.6.2 Assessing pruning performance with consumer data .................... 181
5.6.3 Pruning different network architectures ........................................ 185
5.6.4 Assessing explanatory capacity of a connectionist model with a
single hidden layer and 2 neurons .................................................................. 190
5.7 Connectionist model predictive capacity ................................................ 193
5.8 BPM connectionist model ........................................................................ 197
5.9 Summary ................................................................................................... 198
6. Interpretation of results .................................................................................. 199
6.1 Informational and Utilitarian Reinforcement ......................................... 199
6.2 Results discussion ..................................................................................... 200
6.2.1 Exploratory data examination .......................................................... 200
6.2.2 Results of regression and NNs comparative analysis ..................... 201
viii
6.2.3 Exploratory NNs modelling results and discussion ......................... 208
6.3 Advanced connectionist models results and discussion ........................ 211
6.3.1 Connectionist network predictive capacity ..................................... 211
6.3.2 Variable contribution analysis and explanatory models ................ 214
6.4 Interpreting connectionist model output parameters and architecture
219
6.4.1 Number of hidden layers and nodes ............................................... 219
6.4.2 Interpreting model weights ............................................................. 222
6.4.3 Pruning connectionist models ......................................................... 227
6.4.4 Pruning connectionist models: simulated data .............................. 233
6.4.5 Pruning connectionist models: consumer data .............................. 237
6.4.6 Concise explanatory connectionist model ...................................... 245
6.4.7 Predictive connectionist model and pruning .................................. 246
6.4.8 Pruning network architecture for interpretation ........................... 247
6.5 Theoretical implications ........................................................................... 251
6.6 Summary ................................................................................................... 253
7. Critical assessment of the research project ................................................... 255
7.1 Connectionism and cognitive science ..................................................... 255
7.1.1 Artificial intelligence ......................................................................... 257
7.2 Distributed representation ...................................................................... 262
7.2.1 Memory ............................................................................................. 263
7.2.2 Generalisation ................................................................................... 263
7.2.3 Learning history and behaviour continuity ..................................... 264
7.3 The implications for consumer behaviour .............................................. 265
7.3.1 Utilitarian and informational reinforcement, and NNs .................. 265
ix
7.4 Cognitive process simulation ................................................................... 269
7.4.1 Past-tense acquisition model ........................................................... 270
7.4.2 Kinship knowledge model ................................................................ 277
7.5 Phenomenological critique of computational models of intelligence .. 279
7.6 Summary ................................................................................................... 280
8. Future work ...................................................................................................... 282
8.1 Individual respondent level of behaviour analysis ................................. 282
8.2 Multi-category behaviour analysis .......................................................... 288
8.3 Swarm intelligence methods ................................................................... 294
8.4 Consumer provided content and design................................................. 299
8.5 Collective consumer innovation .............................................................. 302
8.6 Commercial application and data mining ............................................... 304
8.7 Summary ................................................................................................... 307
9. Concluding remarks ......................................................................................... 308
9.1 Contributions ............................................................................................ 308
9.2 Summary ................................................................................................... 311
x
List of Figures
Figure 1. Methodology of Intentional Behaviourism. .............................................. 10
Figure 2. The Behavioural Perspective Model. ........................................................ 21
Figure 3. Operant Classes of Consumer Behaviour. ................................................ 22
Figure 4. Regression coefficients and significance levels for regression-NNs
validation subset 1. .................................................................................................. 167
Figure 5. NNs model results for regression-NNs validation subset 1. .................. 168
Figure 6. Regression coefficients and significance levels for regression-NNs
validation subset 2. .................................................................................................. 168
Figure 7. NNs model results for regression-NNs validation subset 2. .................. 169
Figure 8. Connectionist network 54-1-1 architecture using consumer data with 1
hidden layer and a single neuron, 1000 iterations, no pruning. ........................... 172
Figure 9. Connectionist network 54-2-1 architecture using consumer data with a
single hidden layer and 2 neurons, 1000 iterations, no pruning. ......................... 173
Figure 10. Connectionist network 54-4-1 architecture using consumer data with a
single hidden layer and 4 neurons, 1000 iterations, no pruning. ......................... 173
Figure 11. Connectionist network 54-4-2-1 architecture using consumer data with
2 hidden layers and 4-2 hidden neurons, 1000 iterations, no pruning. ............... 174
Figure 12. Connectionist network 54-8-4-2-1 architecture using consumer data
with 3 hidden layers and 8-4-2 hidden neurons, 1000 iterations, no pruning. ... 175
Figure 13. Visual representation of y = sin (x1 - 2 * x2) + 1 + rnorm (1000, 0, 0.2)
.................................................................................................................................. 177
Figure 14. Connectionist model with 2-2-2-2-2-1 architecture design, no pruning
used in model development. .................................................................................. 178
Figure 15. Connectionist model with 2-2-2-2-2-1 architecture design, pruning
used in model development. .................................................................................. 179
Figure 16. NNs model with 2-2-1 nodes in a single hidden layer, no pruning. .... 180
Figure 17. Connectionist network 54-8-8-8-1 architecture using consumer data
with 3 hidden layers and 8 neurons each, 100 000 iterations, no pruning. ........ 183
xi
Figure 18. Connectionist network 54-8-8-8-1 architecture using consumer data
with 3 hidden layers and 8 neurons each, 1000 iterations, no pruning. .............. 184
Figure 19. Connectionist network 54-8-8-8-1 architecture using consumer data
with 3 hidden layers and 8 neurons each, 1000 iterations, pruning with 100
retrain cycles. ........................................................................................................... 184
Figure 20. Connectionist network 54-8-8-8-1 architecture using consumer data
with 3 hidden layers and 8 neurons each, 1000 iterations, pruning with 250
retrain cycles. ........................................................................................................... 184
Figure 21. Connectionist network 54-4-4-4-1 architecture using consumer data
with 3 hidden layers and 4 neurons each, 10000 iterations, no pruning. ........... 186
Figure 22. Connectionist network 54-4-4-4-1 architecture using consumer data
with 3 hidden layers and 4 neurons each, 10000 iterations, pruning with 100
retrain cycles. ........................................................................................................... 187
Figure 23. Connectionist network 54-4-4-4-1 architecture using consumer data
with 3 hidden layers and 4 neurons each, 10000 iterations, pruning with 100
retrain cycles. ........................................................................................................... 188
Figure 24. Connectionist network 54-6-4-2-1 architecture using consumer data
with 3 hidden layers and 4 neurons each, 10000 iterations, pruning with 100
retrain cycles. ........................................................................................................... 189
Figure 25. Connectionist network 54-2-4-6-1 architecture using consumer data
with 3 hidden layers and 4 neurons each, 10000 iterations, pruning with 100
retrain cycles. ........................................................................................................... 189
Figure 26. Connectionist network 54-2-1 architecture using consumer data with a
single hidden layer and 2 neurons, 1000 iterations, no pruning. ......................... 191
Figure 27. Connectionist network 54-2-1 architecture using consumer data with a
single hidden layer and 2 neurons, 1000 iterations, pruning with 100 retrain
cycles. ....................................................................................................................... 191
Figure 28. Connectionist network 54-2-1 architecture using consumer data with a
single hidden layer and 2 neurons, 10 000 iterations, pruning with 100 retrain
cycles. ....................................................................................................................... 192
xii
Figure 29. Connectionist network 54-2-1 architecture using consumer data with a
single hidden layer and 2 neurons, 10 000 iterations, no pruning. ...................... 196
Figure 30. Connectionist network 54-2-1 architecture using consumer data with a
single hidden layer and 2 neurons, 10 000 iterations, pruning with 100 retrain
cycles. ....................................................................................................................... 196
Figure 31. Connectionist network 54-4-4-4-1 architecture using consumer data
with 3 hidden layers and 4 neurons each, 100 000 iterations, no pruning. ........ 248
Figure 32. Connectionist network 54-4-4-4-1 architecture using consumer data
with 3 hidden layers and 4 neurons each, 100 000 iterations, pruning with 100
retrain cycles. ........................................................................................................... 249
1
1. Introduction
The study of consumer behaviour is an intricate and complex undertaking, and
may often involve countless factors and variables that have an impact on the
consumer and their physical and social environment: individuals, groups, and
organisation participate in a number of processes with a purpose to select
products, services, experiences, or even ideas that would satisfy their certain
specific needs. Due to the nature of this complexity, it is a multidisciplinary
endeavour that draws together elements of psychology, marketing, economics,
and artificial sciences. The overall goal is of course to explain the consumer
decision-making process, and provide a plausible model on an individual
underlying level from both psychological and physiological point of view. Classical
approach normally would involve a comprehensive interrogation of variables that
may offer a degree of predictive and descriptive capacity to identify the level of
linear relationship and significance they may exert over the ultimate consumer
choice. Approaches that are more recent go a step further to employ advanced
concepts of distributed representation to examine the consumer behaviour as an
emergent process as a result of learning continuity. Nevertheless, consumer
behaviour remains extremely difficult to predict and explain – this serves as a
motivating factor to continue advancing the research in this direction and as a
result continuously consider and critically review innovative methods and
applications to extend the current body of knowledge.
2
The complexity of a subject matter that would benefit from an individual level of
comprehensive analysis not only on the level of behaviour, but also on the level
of the consumer learning and the mind and even the neurophysiological level
information processing, often triggers the process of scientific decomposition of
complex phenomena to study the comprising elements of the process
independently – as a result, the scientific account is either fragmented and
incomplete, or provides a variant of a Black Box Model where certain elements of
the overall process are either assumed or otherwise effectively disregarded. To
overcome this, a robust comprehensive theoretical and empirical framework to
describe and explain consumer behaviour and the underlying psychological and
physiological factors would be indispensable, and any progress in such direction
would be not only essential to facilitate progress in the field of consumer
behaviour, but could also be extended to wider context and provide a
contribution to explanation and understanding such fundamental phenomena as
learning, intelligence, and cognition – both human and artificial.
1.1 The thesis structure and contents
Chapter 1 introduces the research project and briefly outlines the structure and
contents. The chapter provides a discussion around the overall motivation for the
research project, and the focus of the research is succinctly discussed and
summarised.
3
Chapters 2 and 3 demonstrate a wider context for the research project, which by
the nature of the research questions addressed here touches upon a number of
disciplines and research areas in the course of the inquiry as described in the
chapters below, resulting in a multidisciplinary work that embraces the elements
of critical behaviourism and cognitive sciences, traditional and neural networks
modelling approaches, and theoretical frameworks that propose to extend the
established theory around Behavioural Perspective Model with connectionist
architectures. Chapter 2 provides an overview of the field of consumer behaviour,
discusses the theoretical and philosophical frameworks of radical behaviourism,
and offers Behavioural Perspective Model. Chapter 3 extends the discussion into
the science of artificial, and introduces the field of artificial intelligence. The
capacity of symbolic modelling methods are discussed at length and compared
with the connectionist networks, offering a detailed account of neural network
modelling techniques and architecture optimisation algorithms.
In Chapter 4, the research methods are explained in detail. The chapter provides
an overview of the research questions and research methods employed as part of
this research project. Here the research questions are proposed, followed by the
discussion that aims to establish the philosophical position adopted here. Next,
research method is outlined, where the data structures and variables are
described. The modelling approach is explained and justified, and research
process is outlined in a sequential manner.
Chapter 5 covers the analyses part of the project, providing a comprehensive
account of the research methods employed. The statistical analyses employed
4
throughout this research project are discussed in detail, and the specifics of the
models developed in the course of research project are described. The testing
procedure to support the line of inquiry is explained in detail, offering an
overview of the results.
In Chapter 6, the results are discussed and interpreted within the wider context
of consumer behaviour in general. The opening part of the discussion that
revolved around the variable contribution and advanced connectionist modelling
takes place here. The discussion is then extended to argue the appropriateness of
connectionist modelling to provide the explanatory and interpretative account of
consumer behaviour employing the pruning algorithms to optimise the network
architecture to expose the core architecture. Theoretical implications are then
discussed before the arguments are reviewed in the next chapter.
Chapter 7 offers a critical assessment of the research project and demonstrates
precision, thoroughness, contribution, and comparison with its closest rival, the
tradition of cognitive science.
In chapter 8, a number of possible future research directions are identified and
briefly discussed: a number of possible strings of inquiry are identified, ranging
from purely commercial applications to apply and test the methods proposed
here in the industry, to highly theoretical and philosophical endeavours that
would aim to explore the concept of distributed representation further and
extend the line of inquiry into the field of swarm intelligence.
5
Finally, Chapter 9 provides closing remarks, touching upon the contributions this
research project aims to offer, and concludes with a summary of the research
project.
6
2. Consumer behaviour
The interdisciplinary nature of academic marketing implies the tendency to adopt
the perspectives and methodological techniques from the established fields such
as economics and psychology rather than relying on the deliberately developed
theoretical foundations. Thus, the philosophical and methodological assumptions
tend to reflect the original body of inquiry and provide for marketing a
misrepresentative and transitory theoretical foundation. In contrast, consumer
behaviour analysis is concerned with phenomena central to marketing – the
explanation of consumer choice – and offers a cohesive philosophical and
theoretical foundation for the inquiry. Borrowing largely from the already widely
adopted in academic marketing paradigms of cognitive psychology and
behaviourism, the behaviour analysis provides a framework for explanation of
consumer behaviour in its natural environment.
The research programme generates a body of knowledge concerned with the
adequacy of radical behaviourism in explaining consumer choice, and involves
advances in theory and philosophy of behaviour analysis and modelling of
consumer behaviour, and offers means for consumer behaviour interpretation
based on empirical research (Foxall, 2005). In doing so, research programme aims
to determine the degree to which consumer behaviour can be sufficiently
explained with radical behaviourism, and subsequently offers extension of theory
from other fields of inquiry. Resulting efforts manifest in development of
Behavioural Perspective Model (BPM) of consumer choice and lead to empirical
7
research in consumer behaviour analysis and patterns of consumer choice; and
offer novel ways for interpretation of consumer behaviour and extending the
theoretical and philosophical base. Thus, it is possible to predict consumer
behaviour to certain extent, and to demonstrate an insight into what contributing
factors control it, but not so much to explain the behaviour beyond the
identification of controlling stimulus conditions.
The conceptual framework developed in attempt to provide explanation of
complex human behaviour is discussed in this chapter. Consumer behaviour in
particular is the focus of this research project, which will draw upon the
conceptual fields of behavioural economics, psychology, biology, and philosophy,
aimed at building a unified interdisciplinary model of consumer choice.
As it may seem that a strictly behaviourist approach would not be able to provide
a sufficient account of consumer behaviour on the individual consumer level, the
concepts of intentional behaviourism and super-personal cognitive psychology are
introduced (Foxall, 2004). This is further developed and explored in
Understanding Consumer Choice (Foxall, 2005) and Interpreting Consumer
Choice (Foxall, 2009), where consumption patterns are explored empirically
employing the above mentioned concepts to expose the role of contextual-
intentional psychology in everyday consumer decision making. In this chapter, the
underlying philosophical assumptions are discussed in the context of theoretical
and empirical aspects of consumer behaviour analysis.
8
2.1 Explanation of consumer behaviour
Even though some consumer behaviours can be identified in a rather obvious
manner and marketing strategies are used by retailers in attempts to condition
the customer, psychological approach of radical behaviourism has a lot to offer in
terms of marketing concepts. It is necessary to explain the consumer behaviour
employing the social and behavioural sciences and identify the deterministic
factors that influence the decision environment. Radical behaviourism is primarily
concerned with explanation of behaviour, assuming that behaviour is explained
through environmental stimuli that predict the behaviour – the basic notion that
carries not only the explanatory power, but also the comparative means to
critically review and consider other methods of explaining consumer behaviour as
a social, economic, and biological phenomenon. Thus, radical behaviourism can
be viewed not as a sufficient element capable of providing a comprehensive
account of human behaviour, but rather as an essential constituent in the
theoretical and empirical pursuit of developing such system.
The research programme led by Foxall provides a critical review of radical
behaviourism, while developing the theoretical framework that suggests the
possibility of prediction, control, and explanation of behaviour in a process of
contextualization of behaviourism in a broader scientific explanation. As a result
of the research program, new theories of human behaviour have been
introduced based on the principle of identifying the patterns of operant
behaviour in the selective environment that shapes and maintains the behaviour
9
while surpassing the theoretical constraints of radical behaviourism: intentional
behaviourism and super-personal cognitive psychology.
The focus of the research programme in early years revolved around the
theoretical developments consisting of critique of predominant at the time
cognitive view of consumer research from the radical behaviourist perspective,
followed by establishing the basis for the alternative interpretation of consumer
behaviour – Behavioural Perspective Model (Foxall, 1990). Subsequently, the
empirical investigation was undertaken to substantiate the proposed argument
that establishes behaviourist interpretation of consumer choice as a powerful
alternative to cognitive theory and other non-behaviourist reasoning; and
provides an understanding what behaviourism and other approaches are able to
offer exclusively, identifying not only the inadequacies and limitations of radical
behaviourism that need be supplemented, but also creating a basis for evaluating
the alternative non-behaviourist approaches.
Research programme aims to establish the means by which reliable scientific
interpretation of consumer choice is plausible – but one of multiple
interpretations in the relativist sense, all subsequently tested with scientific
method with the purpose of producing a comprehensive body of knowledge.
10
2.2 Intentional Behaviourism explained
It is important to outline the methodology of Intentional Behaviourism research
programme before its elements are explained in detail in the following sections,
which comprises three conceptual stages as shown in Figure 1 (Foxall, 2016).
Figure 1. Methodology of Intentional Behaviourism.
Stage One is Theoretical Minimalism/Contextual Explanation, and is primarily
associated with the extensional behavioural science used to delineate the scope
of behaviourist explanation. Choice is represented at this stage as selection
amongst alternatives by carrying out one type of behaviour as a proportion of all
behaviour instances. Reliant on extensional explanation and three-term
contingency of radical behaviourism, behaviour is evaluated empirically
employing experimental and statistical methods by deconstructing observed
behaviour and identifying factors responsible for prediction and control.
11
The purpose here is twofold: as part of delineation of behaviourist explanation of
consumer choice, it is identified which particular aspects of behaviour are
inadequately explained by behaviourism, and determined what form would be
required for the intentional explanation to interpret the behaviour. Often
stimulus conditions required for prediction and control of behaviour are
inaccessible, and a number of behaviourist limitations become apparent: inability
to account for continuity of behaviour, absence of personal lever of explanation,
and delineation of behaviourist explanation. Continuity of behaviour issue in
behaviourism occurs when antecedent and consequent stimuli normally used to
explain behaviour with n-term contingency are inaccessible with interpersonal
level of explanation. Personal level of explanation is a behaviourist response to
phenomenological subjectivism aimed to provide a behaviourist account for
thoughts, feelings, and other private intentional constructs, which cannot be
explained with extensional terms. Delineation of behaviourism would aim to
establish the scope and limits for the behaviourist approach to explain behaviour
(Foxall, 2016).
Stage Two is the Intentional Interpretation, where consumer is treated as an
intentional system and attributed with the intentionality to maximise utility
relying on the learning history within a given behaviour setting. While account of
behaviour here attributes a set of thoughts and emotions to the consumer as
part of the intentional interpretation, this is consistent with the results observed
as part of the empirical modelling programme in Theoretical Minimalism stage.
The possibility to refer to this empirically grounded foundation serves as a
12
constraining element for the intentional interpretation of consumer behaviour to
restrain psychological speculation (Foxall, 2016).
In Stage Three, Cognitive Interpretation, empirically supported model of the
underlying cognitive structure and functioning is proposed, consistent with the
intentional interpretation account of behaviour offered in Stage Two. The aim
here is twofold: develop a micro-cognitive psychological construct consistent with
both intentional interpretation and sub-personal level of consideration in the
neuroscientific sense; and define a macro-cognitive psychology that
demonstrates the consistency of intentional interpretation with super-person
level as studied by behavioural scientists (Foxall, 2016).
Although intentionality is not ascribed to sub-personal and super-personal levels
of explanation, it is an essential point of Interpretative Behaviourism that
intentional interpretation must be corroborated and supported by the
extensional account of behaviour typified by empirical scientific method of radical
behaviourism and neuroscience (Foxall, 2016).
The elements of Intentional Behaviourism will be discussed in detail in the
following sections.
2.3 Radical behaviourism
Radical behaviourism, the metatheory of behaviour analysis, is established on the
principle that objective and empirical methods of natural sciences can be applied
to the analysis of human behaviour; and states that the behaviour is explained
13
when the environmental factors that influence the rate of repeat behaviour are
identified, and response can be predicted and controlled through the
manipulation of reinforcement contingencies. Notice that no causal reference is
made to the internal states or processes or events such as mood or intention or
personality traits as is common in cognitive theories – they are not ignored in
behaviourism but rather are classed as responses that require a separate
explanation, rejecting the incomplete theoretic development reliant on mental or
conceptual entities that can be said to reside at other than observed behavioural
level. Such theories can be considered incomplete as they fail to identify the
factors that account for internal processes and events such as environmental
precursors causal to behaviour; fictional as they tend to infer the internal causes
from the behaviours they are supposed to explain; and superfluous as
behaviourism provides a simpler explanation of behaviour through environmental
factors that control behaviour without relying on fictional inference.
In behaviour analysis, the stimuli that are said to control behaviour need be
explicitly described and related to the rate of response. In the laboratory setting
of experimental operant psychology with animal subjects, it is possible to
establish explicitly the discriminative and reinforcing stimuli and their causal
relationships with response rate in pursuit to identify the elements of controlling
contingencies, and predict response rates according to the reinforcement
schedules, thus explaining simple behaviours by reference to their environmental
antecedents and consequences. The three-term contingency is able to succinctly
describe the causal mechanisms of behaviour analysis: the (1) discriminative
14
stimuli in the presence of which (2) the response is emitted, and the (3)
reinforcing or punishing consequence produced as a result.
In the context of complex human social interaction however, unlike the
laboratory setting, it may often be impossible to isolate environmental
contingencies controlling behaviour with any degree of certainty, resorting as a
result to conceptual extrapolation of learning process derived from animal
operant conditioning studies. Experimental laboratory operant analysis is said to
provide the scientific explanation of behaviour, which is then extended to suggest
not an explanation but rather a plausible interpretation of more complex
behaviour situations.
2.3.1 Interpretative Behaviour Analysis
Skinner (reprinted in 2014) argues that even though it may not be possible to
determine the contingencies that control response rate in complex behaviours
with any degree of accuracy and precision comparable to laboratory
experimentation, it is feasible to offer a plausible account of complex behaviours,
such as verbal behaviour, based on the extension of the scientific laws formulated
during the analysis of smaller decomposed problems. The behaviour analytic
interpretation, even if unverifiable experimentally, is preferable nonetheless to
the interpretations that are not supported by the experimental work on smaller
decomposed phenomena. Astronomy could be used to illustrate the same
principle, where main source of information about celestial bodies and other
objects is electromagnetic radiation and numerical models are employed to
15
reveal the existence of phenomena and effects otherwise unobservable. Thus,
the extension of learning principles of human and non-human operant responses
derived in the scientific laboratory setting is employed in the extended behaviour
analytic account (derived from the scientific knowledge) of human behaviour in
complex social situations.
2.3.2 Plausibility
During the process of interpretation, operant analysis is inevitably altered –
behaviourism speaks of plausibility in terms of persuasive power of the
interpretative account thus offered referring to the larger body of experimental
inquiry to support the claim, where interpretation is not equal to extrapolation.
The consumer behaviour model is then assessed on the explanatory plausibility of
the variables to provide a persuasive account that demonstrates behaviour
patterns, and the empirical correspondence and ability to derive operational
variables useful in further investigation. The plausibility of interpretative account
relies on its empirical correspondence with the objectively acquired information.
All science relies on interpretations when explanation is no possible, as it often
dictates the route of investigation and explanation, and is an inherent component
in synthesis and amalgamation of information.
As compared with operant behaviourism that predominantly specializes in simple
observable behaviours with empirically identifiable determinants such as pigeon
pecking, interpretative accounts of complexity based on the findings from simpler
scientific experiments offer qualitatively different type of knowledge. This
16
fundamental difference explains why basic research objectives and technology
requirements set forth by behaviourists – prediction and control in particular –
are not the sole focal point while extending the operant principles of explanation
in the form of interpretative plausible account.
The BPM is able to demonstrate that only minimal modifications of the basic
behaviour analytical model are required to offer a plausible interpretation of
complex behaviours within a critically derived behaviour analytic framework and
only deviating from radical behaviourism in a manner of extending it based on
logical criticism and empirical evidence. One of the basic guiding principles of
BPM is the proposition that human behaviour in any setting can be operant,
where the continuity of environmental influence in a variety of situations is a
fundamental assumption.
2.4 The Behavioural Perspective Model
The Behavioural Perspective Model extends the framework of behaviour analysis
by recognising the discriminative stimuli and reinforcers as independent variables
that determine the response schedule, and relating them to the rate of response
in a purchasing decision setting as a dependent variable – as a result, the model is
able to provide an interpretation of complex social consumer behaviour situation
in a complementary to other alternative approaches (predominantly cognitive)
manner by considering the ontological and methodological aspects. BPM is
ultimately aimed to develop a comprehensive account of consumer behaviour
17
that would incorporate a synthesis of both cognitive and behavioural
explanations in two ways: (1) by providing a plausible reference to
conceptualization of situational influences on consumer behaviour, and (2)
suggesting novel consideration of marketing strategy. Cognitive decision science
assumes a purchasing decision to be a result of goal-directed process where
consumers deliberately plan the course of action and utilize resources to acquire
a desired benefit, the process during which external influences on consumer
choice are not sufficiently taken into consideration and decision setting
inadvertently decontextualized as a result. Those scarce attempts to explain
consumer behaviour in terms of external stimuli suggest the prospect of
developing a unified framework, but fail to propose a model of purchasing
decision-making that is both based on empirical foundation of operant
psychology and is relevant to marketing and strategy. Theoretical and conceptual
framework of BPM is developed to address these concerns. Moreover, the model
associates the previously identified and described patterns of behaviour with
appropriate contingencies in a reliable manner, offering behavioural
interpretation of consumer behaviour that is not postulated in conflict with
alternative theories of explanation, and embracing the multiplicity of explanatory
mechanisms in a relativist sense. Current research programme is tasked with the
synthesis of intrapersonal and environmental causes of behaviour, where the
relationship between the utilitarian and informational reinforcement and the
affective cognitive theory is contemplated. The applied nature of BPM offers
considerable benefit to the field of marketing research as the inherent
18
categorization of contingencies that influence consumer decision-making and
explanatory variables are practically applied in customer-centric marketing
strategy.
Operant behaviourism as a school of psychology, as opposed to cognitivism and
phenomenology, is better defined in terms of coherent philosophy of science,
subject matter, methodology, and explanatory power – the notion that suggests
operant behaviourism as a particularly well suited theoretical approach for the
field of consumer research. Moreover, it has been shown that economic
behaviour of animals is operant; and some consumer researchers have
considered the possibility of employing the behaviour analysis to purchasing
decision-making and the process of consumption. It is unclear therefore why
operant behaviourism is not commonly used in explaining consumption in terms
of situational and environmental influences, and why the comprehensive
theoretical perspective of consumer psychology able to deal with the situational
effects on choice has never evolved despite the research on consumer task
orientation, temporal perspective, antecedent states, and physical and social
consumer surroundings. Several factors that contribute to attribution of
oversimplified nature to behaviour analysis may be responsible in this matter.
Firstly, given the interdisciplinary relativist nature of consumer psychology, it may
seem odd to have behaviourism excluded as one possible explanation of
behaviour because of the misattributed idea that behaviourism was superseded
by the cognitive paradigm due to inherent ontological restrictions of
behaviourism proving it inadequate as a mental discipline. On the contrary,
19
behaviourism is able to provide a philosophical outlook from which other
approaches may be critiqued, and encourage empirical data otherwise
unavailable to be generated – as recognized by consumer behaviour researchers.
Secondly, researchers have largely failed to account for the great difference
between human and non-human cognitive capacity and the human ability to
create and adopt rules that describe contingencies of reinforcement while
extrapolating from the general findings with animals to human consumer
decision-making, assuming unwarranted proportion of continuity between animal
and human behaviour. Thirdly, the overall complexity of the human consumption
environment and the non-price elements of marketing have been largely
overlooked, focusing instead almost exclusively on the effects of price on
demand. Quite the opposite, in advanced modern economies demand is
contrived and deliberately created by most of the marketing effort in a
consumption environment that contains the vast amount of choice alternatives
available to human consumers. Finally, it is a common practise in marketing and
related disciplines to disregard the philosophical and explanatory implications of
behaviourism, rather directing research efforts towards the use of reinforcement
schedules to increase the rate of retail purchasing, often forgetting that
behaviourism is not the science of human behaviour, but the philosophy of the
science of human behaviour. Thus, BPM is set to address these issues.
Unlike marketing and consumer psychology that merely draw upon certain
aspects of behaviour analysis, the objective of BPM research programme is to
develop a critical understanding of consumption by developing as complete a
20
model of consumer behaviour based on behaviour analytical method as is
reasonably possible, establishing in the process the extent to which behaviour
analysis alone is able to explain consumer behaviour, and determining where it
may be practical to modify and extend the framework of analysis. Research
programme is also concerned with the evaluation of the ability of behaviourist
account to explain complex consumer behaviours, and its contribution to the
advancement of consumer psychology. In doing so, BPM distinguishes relatively
closed consumer behaviour setting where the environment is similar to operant
conditioning, and relatively open consumer behaviour setting where the
environment is rich with alternative to operant conditioning explanations of
behaviour. Furthermore, the model recognizes not only utilitarian type of
reinforcement such as pleasurable and utilitarian consequence of behaviour, but
also informational reinforcement that can take a form of feedback from other
members of social system for example. And most importantly, the behaviours are
attributed to proximal latent internal causal elements such as verbal
discriminative stimuli in covert rules of behaviour and reinforcement
contingencies; as well as environmental determinants of consumer behaviour
that require both analysis and interpretation. Thus, the BPM aims to assess the
adequacy of behaviour analysis to provide a rigorous scientific account of
purchasing and consumption phenomena in a complex setting with multiple
sources of causation.
21
2.4.1 The framework of Behavioural Perspective Model
Patterns of consumer choice are related to the differing environmental
consequences in BPM (Foxall, 1990, 2004, 2005, 2009), proposing three kinds of
consumer behaviour consequence: (1) utilitarian reinforcement representative of
benefits from buying or owning the product, (2) informational reinforcement
represented by the social aspects of consumption, and (3) aversive consequence
posited by such events as relinquishing money and product alternatives (see
Figure 2). Antecedent events comprised of any physical, social, or temporal
elements that serve as signals for potential consequence form the behaviour
setting continuum capable of either facilitating or inhibiting the consumer choice,
ranging from open setting that offers great freedom for a consumer to act and
choose to a closed setting where consumer behaviour is largely dictated by other
than consumer agents. The consumer is represented through their learning
history that takes into account the aggregate consequence of previous
behaviours and the present state of the consumer.
Figure 2. The Behavioural Perspective Model.
22
The combination of high/low utilitarian (UR) and informational reinforcement (IR)
identifies four distinct classes of consumer behaviour (see Figure 3): (1) low
utilitarian and informational reinforcement constitutes Maintenance and involves
the satisfaction of basic needs, (2) low utilitarian reinforcement and high
informational reinforcement constitutes Accumulation such as saving or
collecting, (3) high utilitarian reinforcement low informational reinforcement
constitutes Hedonism such as consumption of popular product, and (4) high
utilitarian and informational reinforcement represents Accomplishment reflective
of social and economic achievement (Foxall, 2009). The addition of behaviour
setting continuum provides eight contingency categories as a functional
consumer behaviour analysis and as means of interpreting factors of consumer
behaviour.
Figure 3. Operant Classes of Consumer Behaviour.
2.4.2 Brand choice
Contrary to traditional marketing literature that assumes consumers to explore
the entire spectrum of choice, majority of consumers tend to exhibit multi-brand
23
purchasing patterns from a small repertoire of preferred brands, a subset of the
entire range. Even though previous research describes consumer behaviour in
terms of determinants of patterns, few offer discussions on the consumer goals
or the underlying factors that influence the consumer decision-making.
Experimental behaviour analysis however is not only able to demonstrate that
consumer choice adheres to the patterns proposed by behaviour analysis and
behavioural economics, but also offer a possible theoretical extension of operant
psychology.
2.4.3 Verbal and affective response
Behaviour analysis has traditionally relied primarily on the idea of schedules of
reinforcement – something that may impose extreme difficulty of
implementations in an open market environment. Alternatively, interpreting
complex consumer behaviour in terms of reinforcement pattern can be
demonstrated with verbal and emotional responses to a consumer situation. Each
of the eight contingency categories was shown to correspond with a certain
combination of basic emotional response: pleasure with utilitarian reinforcement,
arousal with informational reinforcement, and dominance with behaviour setting
continuum.
2.4.4 Attitudinal-behavioural consistency
Psychological research on attitudes may actually be interpreted as behavioural,
where observed variables often are verbal behaviour in nature: measures of
attitude and intent tend to be situation-specific, emphasizing contextual
24
determinants of behaviour as a result rather than showing the cognitive causes of
behaviour. Previous history of behaviour holds a significant role – being the best
predictor of current or future behaviour, and the adherence to the social
cognitive paradigm – the two aspects that pose a difficulty for the attitudinal
researchers to examine the continuity between the environmental factors and
the verbal and non-verbal behaviour response. For these reasons, recent findings
in attitudinal psychology may support behavioural rather than cognitive model
for human behaviour.
Behaviour analytic approach states that initial behaviour is rule-based, and
subsequently comes under the contingency control. Foxall argues that the
process follows a somewhat more complex path, suggesting that initially
consumer may have no previous experience with a new product or brand. Rules
associated with other similar product or situation may be utilized however, which
are deliberately assessed in terms of compatibility and accumulated experience
before being adopted by the consumer. Repeat successive behaviour and
accumulation of experience will generate rules that will guide behaviour,
replacing the deliberation process, and transforming it from rule-governed to
contingency-based. The difficulty lies in illuminating this transitional process
without reliance on the implied theoretical structures.
25
2.5 Intentional psychology
As part of any discussion on behaviourism as a psychological approach, it would
be fundamental to review how the language is used by its practitioners, and why
the discussion on intensionality may not be simply circumvented while the usage
of intensional language is interrelated with intentional explanation.
2.5.1 Intentionality
Social cognitive psychology argues that consumer decision-making is guided by
preferences, likes, wants, and needs, and has a positive attitude or intent towards
the purchase – the level of explanation that relies on employing the mentalistic
terms that would not suffice if the aim is to go beyond the cognition-behaviour
approach and contemplate the philosophical basis for the explanation of
consumer behaviour. The mentalistic terms used above are all intensionalistic in a
way that they offer a qualitatively different type of explanation, as they tend to
implicitly refer to something outside themselves: prefer something rather than
just simply prefer, want something rather than just want, and so on. Quite the
reverse, the words that are not essentially mentalistic such as run do not rely on
the preposition that follows them to provide the precise meaning (i.e. prefer this
product to that one): it is never run something but rather just run. Precisely this
intensional use of language where what follows the proposition (i.e. intensional
operators that added to extensional statement to produce intensional
statements) cannot be substituted with an equal in meaning term without
changing the truth-value of the statement is what behaviourists would prefer to
26
avoid in explaining behaviour. This is paramount in extending behavioural
science, as intensional language is not reducible to extensional language through
paraphrasing or other means, thus making the extensional account not possible.
When describing inherently intentional phenomena though such as perceiving or
believing, it is inevitable that intensional language is being used, and therefore
intentional explanation is being used – suggesting that in addition to extensional
behavioural science an intentional explanation may be unavoidable to the
exposition of science.
It is possible to argue that radical behaviourism may not be able to fulfil all the
essential requirements central to explanation of behaviour, and behaviour deals
on a personal level of inquiry, provides adequate account of behaviour continuity,
and demonstrates that explanations of complex human behaviour can be
sufficiently attributed to reinforcement stimuli.
Radical behaviourism strives to provide account of behaviour in a strictly
extensional manner, distinguishing itself from cognitivism by deliberate aversion
of intensional language. The scientific method of behaviourism, the behaviour
analysis, is focused on prediction and control of behaviour in terms of
environmental consequences and antecedent stimuli. Assuming behaviour is
environmentally determined in terms of learning history and reinforcing or
punishing consequences, behaviourism is able to provide an undeniable capacity
for explanation of behaviour – it does not necessarily dictate however that
further conceptualization need be confined to the behaviourist philosophy.
27
Personal level refers to human activities and sensations rather than sub-personal
physiological processes in the brain and nervous system, and comprises of first-
person (intra-personal subjective experiences) and third-person (inter-personal
objective experiences) perspectives. Extensional radical behaviourism does not
consider either one in any adequate manner: first-person perspective
necessitates the use of intensional language to provide a comprehensive account
of certain behaviours, and third-person perspective is not considered qualitatively
distinct and is explained in terms of first-person perspective.
Continuity of behaviour is central to radical behaviourist theory as it is closely
interlinked with the concept of behaviour reinforcement and continuous learning
history formulation. What exactly is learned during the learning process however
is not evident from the behaviourist theory, as it may be difficult to identify the
required elements necessary for the learning to occur (discriminative stimulus,
response and reinforcing stimulus), at which point it is presumed that something
does happen within the individual (likely on physiological level) that satisfies the
behaviour continuity requirement.
Thus, the limitations of radical behaviourism to provide a plausible interpretative
account of behaviour in a variety of situations in a manner satisfactory of the
rigorous scientific standard require intentionalistic supplementation to extend its
significance.
If radical behaviourist theory were to be extended to consider intentionality, it
would require appropriate definition in extensional terms. One way to do this is
28
to distinguish two qualitatively separate levels: the physiological level that applies
to neurological networks and processing, and the level of intentionalistic
interpretation to which intentional abstractions like beliefs are attributed.
Whereas the physiological level is commonly well understood, it cannot be said
the same about the intentionalistic level, as the conceptual description is circular
at best (i.e. pain is painful). Thus, the intentional level should not be seen as
providing additional characteristic of phenomena, but rather a qualitatively
different to the extensional explanation manner of interpretation of the same
process or event.
2.5.2 Philosophy of intentionality
Philosophy refers to intentionality as capacity of the mind where mental states
refer to or are about something other than themselves.
Two distinct types of intentionality may be recognized: the intrinsic intentionality
that exists in a human mind possesses the ability to transfer its inherent
intentionality into an object (so called representational artefact), at which point
the derived intentionality emerges. Even though the intentionality may seem to
be stripped of its analytic ability in the process as seemingly all objects are about
something, the derived intentionality is nevertheless dependent on the original
intrinsic intentionality. Thus, the private intrinsic intentionality relates to the
concept of the subjective mind and includes the experiences of believing and
desiring, and therefore making it possible to attribute these experiences to a
third person only because of experiencing them ourselves.
29
The subjective experiential level is dependent on the inherent properties of the
conscious phenomena: the consciousness that includes the emotions and
thinking or cognition, the ontological subjectivity that specifies the existence of a
conscious state only if experienced by the subject, and unity which is required for
various aspects of consciousness to function in a collective manner to produce
the experience of the situation.
Conscious experience could be comparable with Skinnerian private (covert)
behaviour, but may also be internally augmented by interpretation in addition to
being externally determined by contingencies. Consequently, different language
from extensional behaviourism should be used as private (covert) behaviour
contains intentional behaviour such as believing.
Speaking of intentionality in terms of objective approach (third-person), it is
necessary to consider what object may be considered as being or having a mind –
something that theory of mind considers. Intentional behaviourism necessitates
only the first-order intentionality. The need for intentional stance all together
must be considered as well, as the concept of mind is relevant only while
speaking about human persons where physical, design, or contextual stance may
prove insufficient; where the experience includes not only the physiological and
behavioural aspects, and the mind is more than just the brain but also includes
the subjective consciousness and the cognitive awareness. This creates the
setting for the first-person experiences such as thoughts and beliefs, and the
third-person experiences are inferable from analysis of the physiological and
behavioural factors associated with such experiences.
30
2.5.3 Intentional psychology
Content does not occur within the neural event itself, but rather is a
supplementary interpretation of such event, a justification for registering certain
local content on the personal level. Intentionality describes what extensional
theory is able to describe, but in a different manner. Thus, the sub-personal
extensional physiology and the personal intentional explanations belong to
qualitatively different distinct content levels. Content is added by the
evolutionary logic principle: findings produced from the natural selection process
on the physiological level must allow for explanation and prediction of behaviour
on the intentional (personal) level.
Dennett (1981) makes a distinction between the three kinds of intentional
psychology: (1) folk psychology, (2) intentional systems theory, and (3) sub-
personal cognitive psychology. Folk psychology revolves around the causal theory
of behaviour and presumably provides the source for the other two. Intentional
systems theory develops the belief and desire aspects of folk psychology to
predict and explain behaviour semantically on the personal level, but largely
ignores the internal structure of the complex intentional system. On the contrary,
sub-personal cognitive psychology deals with the syntactic explanation of the
brain function. The underlying internal structure is required to provide an
explanation for components of the informational systems theory to predict
systemic behaviour on the personal level – something that is the primary task of
sub-personal cognitive psychology as it identifies the constraints of design and
31
clarifies how personal systems excel in intentional systems. This will be further
discussed in the sections below.
2.6 Intentional behaviourism
In this section, the two prominent accounts of behaviour – the radical
behaviourism and the intentional psychology – are discussed in attempt to
provide an overarching theoretical framework that would incorporate the two
traditionally opposing fields in a unifying psychological model of intentional
behaviourism.
The vocabulary of behaviour theory is constrained by design to include purely
behaviouristic terms to describe and explain behaviour, and behaviourists are
deemed to face a considerable difficulty while trying to accommodate the
thinking of cognitive psychologists and therefore tend to adopt one of the two
routes: either incorporate the language of intentionality, which would inevitably
lead to means of explanation detached from behaviourist method; or more often
prefer to stay with the restricted behaviourist vocabulary, and as a result limit the
range of explanation available to behaviourists, effectively restraining the
potential level of contribution to the wider discipline of psychology. The beliefs
and desires assume a central role in intentional psychology and serve as a base
for cognitive psychology, whereas the opposing behaviourist theory strictly
excludes any intensional terms deemed unable to provide reasonable means of
explaining behaviour. As a consequence, while critical behaviourism is able to
32
provide predictive account of behaviour, it struggles to address within
behaviourist terms such essential notions as personal level of analysis and
continuity of behaviour – which generally tend to be either ignored altogether or
inevitably explained with intensionalist terms. The following paragraphs will
discuss this in detail.
2.6.1 Cognitive psychology and radical behaviourism
Before the discussion and critique of theoretical and ontological specificities of
radical behaviourism can be continued, first of all it is important to acknowledge
what can be effectively seen as a dualist nature of the current state of the field
that makes the definitions and any subsequent discussions difficult: the radical
behaviourism that can be associated with Skinner (1938, 2014) as a central figure
in refining and expanding the paradigm, as opposed to those who undertake an
active role in extending the discipline into the realm of intellectual inquiry to
provide a more comprehensive account of behaviour (for example Rachlin, 1994).
Even though those behaviourists that belong to the latter category remain with
the extensional mode of explanation, it is often the case that they go beyond the
precisely defined constraints of radical behaviourism as described by Skinner, and
operate past the experimental laboratory setting and in the realm of
interpretation and theory development. Without explicitly explaining the extent
of deviation from the Skinnerian radical behaviourism, the explanation leads to
wider implications as a result of inevitably adopting new forms of language that
belongs to intentional systems of explanation. Thus, the two fields are taken to
represent the incommensurably opposing views in explaining the phenomenon of
33
behaviour, and overarching conceptual framework relying on both radical
behaviourist and cognitivist modes of interpretation would be required to
provide a more robust and all-embracing account of behaviour – intentional
behaviourism (Foxall, 2009).
The essential difference in explaining behaviour employing cognitive and
behavioural approach is linguistic in nature: whereas radical behaviourism
unsympathetically evades the use of intensional idioms as explanatory means,
intentional explanation is inherently imbedded in the underlying philosophical
structure of cognitivist theory. The reason for avoidance of intensional idioms in
radical behaviourism is that the extensional and intensional sentences are
fundamentally different, and since radical behaviourism firmly relies on the use of
extensional knowledge by definition, the adoption of intensional language to
explain behaviour warrants the supposition that the very explanatory mode of
the researcher has effectively gone astray from the radical behaviourist
paradigm. Moreover, the difference between the extensional and intensional
sentence types is more fundamental than that – it is not possible to simply
translate one type of sentence into another type, as reducing intensional
language to extensional would inevitably require adding additional information in
the process to maintain the meaning and the truth value of the sentence
unchanged. If intensional language is employed by researchers to explain
behaviour, it is by definition a method of explanation that refers to some other
than radical behaviourism theoretical framework. Extensional language, as
opposed to intensional, does not contain intensional terms and is referentially
34
transparent in a way that it allows the substitution of any expression within the
sentence to be replaced by another with same extension without changing the
truth-value of the sentence: for example, it may be true that Jones believes that
the train to London has arrived, but not that Jones believes that the train number
128 has arrived, even though the train to London is in fact the train number 128
and the two have the same extension. Nevertheless, the problem lies in the fact
that these two expressions have two different intensions and therefore different
subjective meanings, and if the truth-value of the sentence about Jones to be
preserved, the two are not substitutable. This contradicts with the normal use of
language for scientific expression, as the use if intensionality presupposes a
certain degree of subjectivity, and intensional idioms allow themselves to be
understood only in each other’s terms without the possibility to circumvent the
use of intensional vocabulary by any other means save entirely abandoning the
use of intensional language altogether (Quine, Churchland, & Follesdal, 2013).
2.6.1.1 All-inclusive explanation of behaviour
When considering complex behaviours however, it becomes readily apparent that
while radical behaviourism is exceptionally good at controlling and successfully
identifying the environmental events to predict behaviour, the explanation (in a
more general sense of a word, not as it is understood and defined by radical
behaviourists) of behaviour it is able to provide could be insufficient in a number
of ways. Radical behaviourism is unable to provide an adequate account of (1)
behaviour on a personal level in addition to behaviour-environment relation, (2)
continuity of behaviour, and (3) delimitation of behaviourist interpretation. This
35
of course does not require a cardinal change of radical behaviourism – on the
contrary, the field must continue advancing the behaviourist programme within
the paradigm and provide a robust predictive account of how behaviour can be
determined by its consequences. Nonetheless, it should be continuously
challenged by another theoretical approach to identify the apparent constraints
which radical behaviourism imposes on the explanation of behaviour in attempt
to develop a more robust all-inclusive explanation by incorporating useful
constructs from cognitivist and intentional psychology. The three areas that draw
attention to the limitations of radical behaviourism are briefly discussed in the
following paragraphs.
The notion of personal level explanation of behaviour is twofold in radical
behaviourism: the matter of first- and third-personal sentences analysis. Talking
about first- and third-personal sentence structure, in the practice of radical
behaviourism both types can be and should be analysed and interpreted in the
same manner following the uniform process of inquiry – first-personal verbal
behaviour for example is just behaviour to be explained using control variables at
the time of occurrence, and the verbal behaviour is not seen as a reference to
something but rather is explained within the history of context in which it
happens to occur. However, when one says using intentional language, “I can’t
find my keys” – the statement cannot be translated into extensional language, as
inevitably new information will have to be provided. Statements like “I am looking
for my keys” or “I am not succeeding at the task of trying to find my keys” do not
carry identical meaning to the statement “I can’t find my keys” as the personal
36
subjective dimension that adds to the explanation of the behaviour associated
with the experience is unverifiable externally. “I can’t find my keys” in fact is the
only unique description of the behaviour, whether it can be observed to include
the tasks usually performed by the persons who are trying to find their keys so
they can leave, or they are just trying to find their keys to put them in the correct
place to make sure they can leave as soon as they need to in the future and can
avoid going through this process in the inconvenient time, or perhaps even
express a general desire that somebody else should perform the task of looking
for their keys for them – something that only that particular person can be said to
just know about their behaviour. This first-personal knowing is different to a
private event as described in radical behaviourist doctrine: although equally
contingent on learning history, first-personal knowing is not a result of analysis of
the learning history, but rather an intentional statement that is a product of
personal experience and therefore is outside the realm of scientific inquiry; and
translating intentional sentence into extensional would not be possible without
adding new descriptive value to the statement (or losing certain descriptive
elements) in the process.
Continuity of behaviour limitation attributable to radical behaviourism could be
discussed touching upon a number of points. First, even though it can be said that
intentionality is unable to provide a conclusive account of continuity of behaviour
either, it becomes apparent that in a less controlled setting interpretation rather
than experimental laboratory-type work becomes more essential. It is at this
point, where intentional language is being adopted by some radical behaviourists
37
– perhaps unknowingly – in attempt to broaden the scope and improve the
explanatory account. Second, in practical application, it is extremely difficult to
keep good track of the history of reinforcing contingencies even in a controlled
laboratory setting – basically impossible in the case with human decision-makers,
questioning the possibility to carry out a behavioural analysis to the full extent
altogether without eventual contribution and borrowing from cognitive
psychology. Third, it has been proposed that eventual physiological findings
would be able to provide a mechanism that would reveal why the process of
occurrence of behavioural continuity – the notion which goes against the
asserted stand-alone non-reliant on other scientific disciplines nature of radical
behaviourism. It is not to say however that this behaviour is independent in its
entirety from the external contingencies – what this means is that there is no
legitimate way to understand and explain this behaviour other than using
intentional terms; this knowing is intentional, and so is the explanation and the
expression of it – something that cannot be accounted for within the terms of
radical behaviourist paradigm. There is an occasional remark in behaviourist
literature however, that describes the tendency for respondents to develop
subjective rules on a personal level during operant experimental work that render
them insensitive to the varying levels of reinforcement. Naturally, such
interpretations cannot proceed without incorporating intentional terms to
describe private events such as thoughts, and would inevitably require going
beyond the delimiting mode of explanation defined by radical behaviourism – as
38
is the case with the discussion of self-rule formulation for example, where the
description could only be expressed in intentional terms of the individual.
Finally, the topic of delimitation of radical behaviourist interpretation needs to be
discussed, as it becomes increasingly difficult to employ the thee-term
contingency to develop a comprehensive account of behaviour beyond the
laboratory setting: the plausibility commonly taken to be sufficient for radical
behaviourist account of interpretative research is not sufficient to meet the
claims of validity and reliability when applied to complex human behaviour.
Rachlin (1994) proposes the concept of teleological behaviourism, where
complex behaviours are interpreted in terms of final consequences of behaviour:
one might be looking for the keys to leave the house – and to continue, to go to
work, to earn the salary, to provide for the family, to be a good father, and so on.
It should be obvious that considering behaviour as such a sequence while
disallowing intentionality would immediately pose an unsolvable problem in a
number of ways. Primarily, the necessity to examine the whole sequence of
consequences to provide a comprehensive explanation of behaviour may very
well be practically impossible, as it will have to be extended indefinitely.
Furthermore, even if the final event offering the ultimate cause could be
identified and the complete sequence of consequences examined and analysed, it
would nevertheless be an unconvincing method to provide an explanatory
account for the original behaviour which may be so remote from the final cause
that no empirical event of the original behaviour had any contact with the final
ultimate cause. In addition, the very notion of deliberately developing the
39
sequences to arrive at the final cause relies on such inherently intentionalistic
concepts as rationality, optimisation, maximisation, and others – otherwise what
else is there that would prevent the sequence to be developed instead as one of
the following scenarios: looking for the keys to put them in the fridge, or to throw
them into the rainwater drainage, or any other scenario that does not follow one
of the rational in one way or another and intentional in nature motivation?
2.6.2 Intentionality and behaviourism
As discussed above, the incompleteness of radical behaviourism is dependent on
the prescribed confinement to exclusively using the extensional language, which
provides limited benefits beyond the laboratory settings to explain the observed
behaviour: for example, it is very difficult to say anything more than the basics in
terms of contingencies of reinforcement about behaviours which are not
sensitive to schedules of operation without employing the concepts of private
events. Avoidance of intensional language in radical behaviourism strengthens
operant class, but as a result only able to provide a description rather than an
explanation of generalizations, and as such could only be sufficient for prediction
and control, but not for a comprehensive understanding of behaviour. In
addition, it is necessary to consider other modes of explanation if comprehensive
explanation of behaviour is not possible in extensional behaviourist terms –
modes such as intentionality and intentional idioms. Nevertheless, complex
human behaviour can be explained not only in terms of contingency shaping, but
also in terms of rule-governance.
40
By incorporating the language of rule-governance for explanation of behaviour,
the private behaviour and the related private mode of observation, as opposed to
a public mode, would be assumed, which would necessitate the inclusion of both
phenomenological experience level and intentional terms as part of explanation
of behaviour. Intentional behaviourism does not need to be in a direct conflict
with concepts of discriminative control and learning history in radical
behaviourism – it is that but also more, as it involves setting variables such as
discriminative stimuli, motivating operations, and implications they carry in terms
of reinforcement. However, it would be worth to discuss whether the intentional
explanations of behaviour could be considered causal. It is commonly understood
by many philosophy of mind disciplines and is an underlying assumption of
behavioural analysis as well that experiencing and attitudinizing can be
recognised to serve as causal factors of behaviour. This can be closely related
with the notion of reasons such as feelings, beliefs, or desires serving as causation
of behaviour; and even though it is not entirely clear in what way specifically
reasons cause behaviour and the mechanics and the process are not yet fully
understood. Even so, the very existence of causal relationships should be
undeniable: otherwise, if reasons did not serve as the cause for the behaviour to
be the effect, and in the absence of the cause there would be absence of effect,
then causal relationships would not need to exist. However, even if
philosophically conceived mental constructs were a necessary prerequisite for
the behaviour that follows afterwards to occur, not all thinking produces
behaviour: there are myriads of thoughts that do not materialise in any manner,
41
while other non-mental functions such as learning history are able to carry causal
capacity, thus questioning this account as a plausible explanation of causal
relations. In science, causality can be demonstrated with an experimental method
within extensional context – when employed to study human behaviour however,
the results, as they intermix with mental explanations of causality, require
deliberate interpretation. As radical behaviourists accept the possibility to
attribute causal effects to public events, and individuals can be assumed to be
able to modify these publicly acquired rules, it should be possible to argue that
private verbal behaviour carries the causal capacity, and the rules formulated in
such a way are in fact intentionalistic in nature. Nevertheless, this does not
suggest that intentionality is causal, but rather that behaviour would be
predictable in terms of radical behaviourist contingencies when the rule-forming
process agrees with the intensions ascribed. Even though it is not entirely clear
whether contingencies may or may not be modifiable by the rule formations, it
should be apparent that the intentionality is able to contribute another
dimension to the explanation of behaviour.
2.6.3 Intentional behaviourism
Thinking and feeling are the personal subjective experiences that can be used to
ascribe meaning to the observed experiences of others, thus attributing
description to their behaviour in attempt to explain and predict it. In
commonsense psychology, even though such relations cannot be proven, they
can be taken as causation of behaviour nevertheless – something that is seen in
intentional behaviourism as nothing more than a placeholder rather than a causal
42
element without a proper validation. Intentional behaviourism is not concerned
with ontology but rather considers thoughts and feelings as linguistic concepts
that carry the capacity to serve as modules of behavioural explanation employing
subjective experience: theoretical contributions can be developed from
contrasting the sentences that use the intentionalistic language with those that
do not. Radical behaviourism wholly relies on the use of extensional language and
frameworks, and some behaviourists rely on neuroscience which is expected to
provide a yet to be discovered physiological neural basis of behaviour. As such,
intentional behaviourism does not propose to consider anything beyond the
intentionality in terms only to identify the evolutionarily consistent neural
functions without attributing causal properties to personal events, and therefore
does not consider a neuro-physiological level of behaviour.
Intentional behaviourism is a philosophy of psychology that follows the original
arguments proposed by Dennett in 1969 (reprinted in 2002) in his attempt to
resolve the matter of intentionality within the analytical framework of
conceptualisation, where it is claimed that it is necessary to describe a certain
dimension of human behaviour with intensional language set against the
extensional language of radical behaviourism. The use of intentional idioms to
explain elements of behaviour on a personal level should be seen as adding
intentional content in a systematic manner on another interpretative level
entirely – thus contributing to and being consistent with the explanation offered
by extensional sciences such as neurobiology. Hence, using intentional ascription
would not be a simple matter of additional interpretation to the extensional
43
description: explanation confined to extensional theoretical framework would be
able to provide a descriptive and predictive scientific account of the behaviour in
terms of structures and systems, whereas explanation available from the
contribution of intentional theoretical framework would be able to offer the
understanding of the actions and what the behaviour is accomplishing – in
addition to the extensional explanation of how it is able to accomplish it. It is a
matter of developing a comprehensive account of behaviour, and intentionality
would be able to contribute substantially on the explanatory dimension in
attempt to develop an all-inclusive explanatory account, while extensional
framework of radical behaviourism has clearly been able to offer an extensive
programme of predictive and behavioural controlled capability. For example,
describing the continuity of human behaviour as a pattern with a certain goal or
achievement in mind is no doubt an intentional in nature type of explanation – an
explanatory method that radical behaviourists proposed (for example Rachlin,
1994) or perhaps unintentionally employed before, which is effectively the
equivalent as the theoretical framework of intentional behaviourism.
2.7 Super-personal cognitive psychology
In this section, the link between empirical science of radical behaviourism and the
philosophical framework of intentional behaviourism is discussed, and
corresponding model of super-personal cognitive psychology is explained and
discussed in some detail.
44
One important distinction that sets super-personal cognitive psychology aside
and differentiates it from intentional behaviourism is that super-personal
cognitive psychology is able to provide a decision process necessary to influence
the observable behaviour for those aspects of theoretical framework that would
benefit from the intentional explanation as identified by intentional behaviourism
(Foxall, 2009). Thus, super-personal cognitive psychology makes it possible to
specify the cognitive operations in a manner consistent with physiological and
behavioural frameworks while exposing the process of decision-making and
choice. Following on the premise of radical behaviourism that anticipated
advances in physiology would provide the necessary contribution to the
explanatory dimension – without a properly defined framework in place such as
super-personal cognitive psychology or alike there will be no structure to
facilitate the process of incorporating the explanatory dimension that may come
from the said physiological advances, nor there would be a mechanism in place to
direct the physiological research efforts towards the more plausible and likely to
bring successful results areas of study. Moreover, the framework is also
necessary to make it possible to recognise and verify these very advances in
physiological research, and also confirm the potential results as fruitful once the
research programme reaches its ultimate goal (Foxall, 2009).
It would be of some relevance to the discussion about super-personal cognitive
psychology to contemplate in what manner exactly is intentionality able to
contribute to the explanation of behaviour if it intentions are taken to be non-
causal. The explanatory dimension that intentionality is able to provide comes
45
after the causal relations have been accounted for with extensional methods –
the causality thus determined provides enough evidence to support the
explanation of behaviour for particular instances and events, yet it is the
intentional expressions employing linguistic elements such as thoughts and
feelings that provide the explanation for the sequences of events and continuous
behaviour, the personal level of explanation, and the limits of radical
behaviourism. Even though it has been suggested that super-personal cognitive
psychology should be useful to explore the possibility to reconcile the temporal
and spatial disconnection of dependent and independent variables to determine
causal relations in continuity of behaviour, and to provide adequate account of
the behaviour-environment relations; the purely linguistic non-ontological nature
of incorporating intentionality into extensional framework of explanation could
effectively suggest that these two frameworks would not be able to function as a
uniform performance theory and rather would have to be discussed using two
separate linguistic and scientific modes of explanation. To circumvent this
limitation, the elements of sub-personal cognitive psychology would have to
serve as the basis for causal theory and carry a sufficient account capable to
supplement the extensional science employing intentionality in those areas
where the limitations have been identified in radical behaviourism. This can be
achieved by demonstrating the crucial role intentional and cognitive entities play
as part of causal relations that explain behaviour by either including them as
additional variables of experimental analysis directly, or a more probable scenario
– developing proxy elements comprised of extensional variables to symbolise the
46
emergent intentional and cognitive properties of behaviour. If this can be
corroborated experimentally and indeed eventually be taken as a factual case,
intentional and cognitive elements of super-personal cognitive psychology and
intentional behaviourism could take a form of explanatory value comparable to
that of extensional sciences, while functioning on a different level of explanation
– otherwise be substituted by the eventual future performance theories
developed within the field physiology or other. Thus, the explanation that the
extensional account of radical behaviourism is able to offer is predictive in nature
and demonstrates the causal relations between the variables determined with
the process of experimental design, whereas the explanation that the intentional
account of super-personal cognitive science and intentional behaviourism are
able to offer is not necessarily compliant with rigorous scientific approach of
radical behaviourism yet imperative to the all-inclusive explanatory account of
behaviour (such as addressing personal events and continuity of behaviour) and
therefore can be considered to be interpretative in nature. An intentional system
therefore is an entity capable of using the intentional dimension on a personal
level rather than something that can be predicted using the attributes and factors
that comprise the intentional dimension.
2.8 Cognitive interpretation of behaviour
Having discussed the rigorous scientific method of radical behaviourism, and the
potential benefits that intentional psychology could contribute to the
understanding and explanation of behaviour, it is important to consider some
47
areas of behaviour that are likely to remain for the time being out of scope for
the deliberations presented here, and rather within the realm of cognitive
explanation of behaviour.
Indeed, some of the frameworks presented above would be as useful with the
more complex elements of cognitive explanation of behaviour as they are with
the interpretative elements, and would provide the framework of reference to
structure the consequent inquiry. In the same way, certain aspects of cognitive
explanation could be designated as placeholders for future discoveries in
neuropsychology and physiology. Until that time however, explanation of
behaviour as it is considered here can be said to form three conceptual phases in
terms of explanation, as follows.
First phase would be associated with explanation of behaviour as it is understood
in the realm of radical behaviourism, which can be explained as a product of
learned associations between a stimulus and a response, and reinforced or
punished as a matter of consequence. This is something that was investigated in
detail as part of earlier research programme (Greene, 2011) where Informational
and Utilitarian Reinforcement were modelled as comprising elements of the input
layer, showing significant predictive capacity as part of the connectionist model.
Second phase deals with interpretative elements of behaviour explanation, and is
the focus of the research programme described here, which builds upon the
findings of previous research investigation (Greene, 2011) and now proposes to
consider Informational and Utilitarian Reinforcement as an emergent property
48
which can be modelled within the hidden layers of connectionist networks to
study Informational and Utilitarian Reinforcement as an emergent phenomenon
which is formed as part of a model learning process and is subsequently used to
develop understanding and explanation of behaviour. This will be described in
detail in the following chapters.
Finally, the third phase of behaviour explanation would focus on cognitive
elements of behaviour, which can often be inherently subjective and not yet
understood on a sufficient level. Therefore, application of a scientific method and
any kind of work that would rely on experiments would be inadequate at this
level. Instead, explanation could come from the area of qualitative psychology
that relies on phenomenology and meaning-making as part of contribution to the
wider research programme that aims to understand and explain behaviour.
2.8.1 Phenomenology
In philosophy, phenomenology is a school of thought that studies the structure of
experience and consciousness, attempting to establish necessary conditions for
the objective study of certain topics which can typically be seen as subjective in
nature, such as consciousness and the content of conscious experience – for
example perceptions and emotions (Husserl, 1970, 2012). The structure and
essential properties of experience are determined through a process of
systematic reflection – approach that aims to be scientific, yet deemed
qualitatively different from the method applied in clinical psychology or
neuroscience. A number of assumptions that form the foundation of
49
phenomenology would juxtapose it with the rigorous scientific methods of radical
behaviourism – such as the rejection of the concept of objective research; or the
preference to explore the personal conscious experience over the traditional
scientific data analysis, to study human behaviour through a unique way it
reflects the society of a person.
Phenomenology is generally considered anti-reductionist – even though varying
degree of reductions are used in many of the methods employed to describe the
underlying mechanisms of consciousness, the ultimate goal is to explain how the
different aspects of reductions are constituted as an actual phenomenon as
experienced by the person.
Intentionality is of course an important element of phenomenology, and
stipulates that consciousness is always about something – the object of
consciousness is referred to as intentional object, which doesn’t necessarily need
to be a perceptible object and rather could be anything at which consciousness is
directed and of which it is conscious of, such as an idea or a memory for example.
Intentional object can be constituted for consciousness in different ways
(perception, memory, retention, etc.), and even though these different structures
can be interpreted as different intentionalities that prescribe different ways of
being about the intentional object, the object is constituted as the same
intentional object throughout.
Another central element of phenomenology relevant to the present discussion in
particular is the notion of intersubjectivity. Perhaps best explained referring to
50
the concept of empathy as it is defined in the context of phenomenology, which
requires a person focusing on the subjectivity of another person as part of
intersubjective engagement, relying on the apperception built on the personal
experiences. Therefore, one person is said to apply their own experience as their
subjectivity to the experience of another person through apperception, which can
be constituted as another subjectivity, thus facilitating recognition of another
person’s ideas, intentions, thoughts, etc. This experience of empathy is essential
in the phenomenological account where intersubjectivity effectively constitutes
objectivity: what is experienced subjectively is also intersubjectively available for
other subjects. Thus, the notion of objectivity is not reduced to subjectivity, nor it
is sufficient to constitute a relativist position, and instead a person is said to
experience themselves as a subjectivity in an objective existence.
For an extended discussion on philosophy of phenomenology please see Husserl
(1970, 2012).
In psychology, phenomenology is a study of subjective experience. Even though
philosophical psychology of late 19th century principally relied upon introspection,
deliberations concerning the mind based on such observations were largely
criticised as speculation by emergent research movement that strived to uphold
psychology to a more rigorous scientific approach – amongst them pioneers of
radical behaviourism. The central philosophical issue is the problem of qualia,
which is often referred to as a subjective conscious experience, where it is
impossible to confirm whether experience of one person such as a feeling or an
interpretation of meaning about an object is the same as that of another person.
51
To retort these criticisms, it is often claimed that phenomenological inquiry is a
qualitative approach to study psychological subject matter that deals with the
process of how meaning is construed and therefore interpretative in nature.
2.8.2 Interpretative phenomenological analysis
One of the approaches in phenomenological psychology notable for combining
idiographic, psychological, and interpretative elements is interpretative
phenomenological analysis (Husserl, 1970, 2012). Rooted in the
phenomenological and hermeneutic theory, qualitative research approach strives
to examine how a phenomenon is experienced by a certain person. Studies
normally involve only a few participants – sometimes even just one person –
whose experiences are closely examined using in-depth interview, diaries, or
similar qualitative unstructured techniques that produce a detailed verbatim
account for subsequent research analysis.
Due to the nature of the approach, participants tend to share relevant
experiences in a particular context around the subject of investigation rather than
being randomly sampled, thus allowing to scrutinise how phenomenon is
understood from a shared perspective, and sometimes developing the research
design further to include elements of longitudinal analysis by collecting multiple
accounts over time. Hypothesis or theory testing is not normally part of the
analysis, and interpretative phenomenological analysis would rather be employed
with research questions that aim to understand and interpret a certain
experience and explain how it was understood by a particular person. Instead, a
52
reflective and often theory-developing account of hermeneutic inquiry is carried
out with a focus on meaning-making, as researcher dissects in detail participant’s
key claims and codes them, offering interpretations of reoccurring theme
patterns and their implications – thus attempting to interpret participant’s
interpretations and forming a situation of double hermeneutic.
As a result, the subject-focused approach that relates phenomena to experiences
of some personal significance produces an idiographic analysis that interconnects
phenomenological examination with interpretative elucidations, frequently
illustrating the points by using participants’ verbatim quotes and contextual
commentary and details.
2.9 Summary
In this chapter, the consumer behaviour analysis is discussed in detail, outlining
the framework of Behavioural Perspective Model. Radical behaviourism is
juxtaposed with intentionality and cognitive psychology to explain and identify
the underlying philosophical and methodological underpinnings of both
theoretical frameworks. Intentional behaviourism is proposed as a possible
extension of radical behaviourist framework to address the shortcomings and
limitations of extensional sciences by employing the intentional linguistic
elements in attempt to develop a better all-inclusive explanatory account of
behaviour.
53
Once it is recognised that explaining behaviour is not only limited to predictive
account of causal relations but also involves the interpretative dimension to
develop an all-inclusive explanatory account of behaviour explanation, using
intentionalistic linguistic elements and irreducibly inferential terms as proposed
by intentional behaviourism should provide as a result an entirely different
account to what is otherwise available from extensional science such as radical
behaviourism. It is not a matter of integrating intentional terms into the
behaviourist approach as it is already the case and intentionality is common in
behaviourist explanation, but rather a matter of how these intentional terms
should be integrated into the philosophy of psychology and extensional science.
Intentional behaviourism is proposed as a competence theory of behaviour,
specifying the mechanism to explain complex human behaviour by attributing
interpretative intentional content in a systematic manner consistent with the
rigorous scientific method behaviour analysis and extensional sciences. Super-
personal cognitive psychology provides a structure and a form for the anticipated
advances in the field of physiological research to be shaped and centred on the
crucial questions of radical and intentional behaviourism such as the links of
intentionality and cognition with the underlying neurophysiological processes,
and a framework to identify and recognise these advances when the time comes.
54
3. Connectionism and Artificial Neural Networks
The philosophical underpinnings of artificial neural networks (NNs) have been
explored by the researchers in the field of artificial intelligence (AI) at
considerable length (Luger, 2005). To understand the theoretical and
philosophical foundation of connectionist models, it is important to consider the
historical developments of the conceptual design that the very fundamental
structure of connectionist modelling is based on. To do so, the development of
the field of AI will be reviewed briefly from its very inception and the influences it
has had on the development of the NNs models and methodology over the years.
3.1 Why Connectionism?
For a number of years, research in the field of consumer behaviour relied on the
traditional statistical methods to generate a substantial amount of empirical
evidence to support the theoretical framework in. So why consider connectionist
models at all?
Curry and Moutinho explore application of neural networks to study and model
consumer behaviour, and offer a comprehensive discussion of theoretical and
practical implications (Curry & Moutinho, 1993; Moutinho, Davies, & Curry,
1996). Authors deliberate application of expert systems as one possible
alternative, but caution about limitations and potential overoptimistic notions in
the field. Instead, artificial intelligence based application is suggested: artificial
neural networks. A typical connectionist structure that incorporates a number of
55
hidden layers brings certain advantages through a more sophisticated platform
for modelling consumer behaviour, as hidden layers are able to distinguish key
conceptual phenomena predisposed to indirect measurement. Another relevant
to consumer behaviour feature is that connectionist models are trained: either
through a supervised learning process where example connections of input and
output pairs are fed into the model, or otherwise through relying on clustering
methods in unsupervised learning. The ability to extrapolate patterns from
training sample data offers a superior position to connectionist models against to
rule-based arrangement commonly encountered in traditional systems, making
neural networks particularly appropriate in tasks that involve notions of cognitive
behaviour or pattern identification. This is discussed in further detail in the
following sections.
3.1.1 The ultimate purpose of AI
The ultimate aim of AI research is to develop machines (or algorithms) to the
level of performance and ability comparable to humans in tasks such as vision,
natural language processing, learning, planning, reasoning, and other. Some of
these tasks may seem simple enough and even intuitive if considered by a person
new to the field. If the focus is shifted to machines however, it becomes readily
apparent how immensely complicated these tasks could actually be. In fact, it is
currently unclear the extent to which the most ambitious of these are achievable
at all. Consider driving a car for example – many things need be performed
simultaneously including at the very least visual recognition of the road and
obstacles, geographical location and overall direction and the route with the final
56
destination, traffic and participants (drivers, pedestrians, road workers, etc.), and
the list goes on. This is not even mentioning the adjustments necessary for the
time of the day, weather conditions, in-car distractions and overall condition of
the driver, etc. This task poses a serious problem for a machine, and yet millions
of people perform this with a seeming ease – many on the intuitive level even
(Gallant, 1993).
Original approaches of AI to tackle a problem of this sort were mostly concerned
with decomposing the overly complicated AI task into simpler ones in a
systematic manner: the road lines recognition with the adjustments for the light
and weather conditions, followed by three-dimensional figure recognition task
combined with the geographical positioning and route planning, etc. It however
becomes apparent that it is unfeasible to account for all possible circumstances
of the task following this method. Alternative approach is to devise a heuristic
trick rule such as following the centre line. This however is arguably even more
fragile: what if there is an accident, or there is no centre line to speak of at all?
Upon consideration of the complexities it could be tempting to declare these AI
tasks unsolvable – with the exception that billions of biological creatures perform
them on a regular basis with seeming ease (Gallant, 1993).
It became apparent that a novel approach was required that would be able to
cope with high demands of AI tasks. It is then only natural to consider
connectionist computational models that are inherently similar if only in structure
to human information processing faculties. Since it is not entirely impossible that
57
connectionist structures are even required for some of the AI tasks, it is no
surprise that NNs gained such a widespread recognition in the field of AI.
This is not to say though that NNs are the computational modelling method for
AI, as science should always strive to develop better and simpler methods
(Gallant, 1993). The characteristics of NNs described in the following sections
however provide a supportive evidence that at least some of the AI tasks can be
advanced and explanations further developed.
3.1.2 Networks and symbolic systems
The traditional approach to cognitive science mostly evolved in such fields as
cognitive psychology, psycholinguistics, neuropsychology, philosophy, and
artificial intelligence because they share certain core assumptions of symbolic
cognitive representation. Connectionism challenges these assumptions and offers
a new promising approach to cognition via parallel distributed processing and
neural networks methods in a number of ways.
The theoretical framework of connectionist networks is based on our knowledge
of nervous system: the basic idea is that the neural networks comprise of simple
elementary units with certain degree of activation, which are connected to other
units thus making it possible to excite or inhibit other units in a dynamic way.
Depending on the network design, initial input is spread through the network
until the particular state of equilibrium is achieved – which in cognitive and
decision-making tasks is in itself a solution to a predetermined problem (provided
an appropriate interpretation is available). Dissimilar to the symbolic systems, the
58
connectionism based adaptive network systems do not follow the rules based
approach – which in itself is a rather primitive and simplistic design and will be
further discussed in later paragraphs. On the contrary, the adaptive neural
networks follow a rather different system design and focus on causal processing
where units excite of inhibit each other and for the most part do not take into
account stored symbols and their governing rule systems.
3.1.3 The foundations of connectionism
Initially connectionist models were developed following the basic functionality of
human brain. Given the very limited knowledge of how the human brain actually
works, neural networks are not intended to model the brain in its all complexity
but rather simulate specific cognitive functioning in artificial systems that are able
to exhibit some of the basic properties similar to those of neurons and synaptic
connections in the brain. The first models designed to utilize the connectionist
framework showed how the models consisting of a number of interconnected
simple computational neurons could solve logical operations and, or and not.
Furthermore, it was demonstrated that any process that could be performed
using a finite number of these logical operations could be also performed by the
connectionist network provided the necessary characteristics are met (for
example the necessary memory capacity). Further advances came from
converting the neurons from a binary to activated by a statistical pattern based
on a number of input units, and as a result significantly increasing network
reliability through parallel processing which is built as an inherent design feature
– an early example of distributed representation. Following the string of research
59
that revolved around the formal characteristics of behaviour exhibited by the
connectionist models, the potential applications for carrying out cognitive tasks
were examined (Pitts & McCulloch, 1947). One of the central and very frequently
researched tasks in connectionist modelling is that of pattern recognition.
Rosenblatt was one of the early researchers to advance the connectionist theory,
introduced the continuous connection weights to replace the binary nature of
neurons of Pitts and McCulloch models, and explored the networks where the
excitations could be sent backwards, referring to these systems as perceptrons.
He also devised methods to adjust the connection weights effectively establishing
the procedures to train the network. Through the milestone Perceptron
Convergence Theorem, it was demonstrated that if a set of connection weights
capable of producing a correct response existed, it would be possible for the
network to learn the correct response through a finite number of repetitions
(Rosenblatt, 1958). The perceptron, being based on statistical patterns over a
large number of units and accounting for noise and variations rather than logical
principles, was established as a new type of information processing system that
was closest in explaining the functionality of nervous system and capable of
having original ideas. Thus, the feasibility for non-human cognitive system with
connectionist networks has been established within the field of artificial
intelligence.
Another notable early researcher was Selfridge (1958) who explored the pattern
recognition capacity of connectionist models. His model Pandemonium was
tasked with the recognition of handwritten letters. The model performed the
60
analysis in parallel and processed the features of the letter through the levels.
First level dealt with feature recognition task and carried the outcome to the next
level that gathered the information on the particular features for each of the
letters. The notable characteristic of the model outlined is that it is still capable to
perform a reasonable assessment even if some of the features were unordinary
or missing altogether (Selfridge, 1958).
In addition to pattern recognition tasks, it was acknowledged early on that
connectionist networks might be useful in explaining the mechanics of memory
and how the associations between the different patterns are stored. Hebb (1949,
2005) suggested that the synaptic connections between the neurons in the brain
that are jointly active are strengthened – the string of research further developed
by Taylor (1956). Taylor explored networks consisting of analog units with
activations on a continuous range where the outcome suggested that networks
are able to generate patterns similar to those of the units with which they are
associated.
The significance of exploring the concept of cognition with connectionist
networks should be assessed taking a number of wider research directions – as
an overall course of inquiry which aims not only with modelling the brain directly,
but also with the understanding the cognitive performance more generally,
effectively establishing the foundations for later neural network and artificial
intelligence research programmes. In the 1960s and 1970s, however the symbolic
approach remained the predominant paradigm in cognitive science.
61
3.1.4 Symbolic models
As alluded to before, one of the origins of symbolic models comes from logic and
philosophy – logical systems comprising rules for symbol manipulation with a
clear outcome deliverable. Deductive logic aims to identify a set of rules that
would make it possible to generate the true propositions, and a system of such
rules is referred to as truth preserving. Therefore, it should be possible to
develop a system of procedures capable of displaying cognitive behaviour if
intelligence depended solely upon logical reason. This however is not the case, as
it is often required of humans to make estimated predictions, which would fall
within the domain of inductive logic that aims to develop formal rules that lead
from propositions that are known to be true to those that are estimated to be
true. Following this, intelligent cognitive process could be thought of as a logical
manipulation of symbols, where symbols are regarded as ideas with rules to
govern them. The definition of a symbol has now changed with the application of
computers in modelling where symbols are stored in memory and are extracted
and manipulated according to the computer programmes without any
considerations for semantics. Alternative approach to interpret the semantics of
the computational model offered by Newell and Simon (1981) views computer as
a physical symbol system able of not only following the prescribed algorithm, but
also more importantly capable of using heuristic methods as shown in the work
on artificial intelligence (Simon, 1977).
Thus, artificial intelligence research has its origins in cognitive sciences and
symbolic models, but since has notably deviated through its pursuit of the idea
62
that computers are symbol-manipulating systems in a more general sense.
Artificial intelligence models represent the closest simulation of human cognition
and show competence in such tasks as playing chess. From this, the two
suggested outcomes are possible: on the one hand, the human brain is the only
true symbolic system and computer is merely a very capable calculator capable to
execute complex algorithms; on the other hand, the computer is the true symbol
manipulator and human cognitive faculties are carried out in a different manner
that merely resemble those of the connectionist models.
3.1.5 Connectionist models
The publication of Perceptrons in 1969 (Minsky & Papert) in some way or another
had an detrimental impact on the direction of research in artificial intelligence.
The pessimistic predictions outlined in the book contributed to the research
efforts being concentrated on symbolic models instead – the turn of events
proven to be unfortunate when later discoveries showed the predictions of the
book being inaccurate. In the course of the analysis of network models at the
time a number of criticisms were demonstrated – namely the inability of the two-
layer network to evaluate certain logical functions like exclusive or (A XOR B is
defined where A is true and B is not, or B is true and A is not). It is necessary for a
network to include additional hidden layers – which brought upon the additional
problem, as at the time no training algorithms for multi-layered networks existed.
As a result, many researchers viewed these criticisms as an indication of a larger
issue and network models came to be identified with the associationism deemed
inadequate for effective cognitive modelling.
63
Nevertheless, in the 1980s papers employing network models to simulate
cognitive processes started to emerge in the published research yet again (for
example J. A. Anderson & Hinton, 1981), and subsequently increased
substantially. The network model renaissance was prompted by a number of
factors, among which are some of the following.
First, the emergence of new powerful network modelling and training techniques
and architectures, coupled with the advances in the mathematical descriptions of
parallel systems directly applicable to modelling of cognitive processes. Second,
the credibility of researchers newly converting to network models played a crucial
role. Third, the structural resemblance of network models with the arrangement
of the human nervous system, facilitated by the increased interest of cognitive
researchers in neuroscience. The growing diversity and complexity of rule-based
symbolic models resulted in the nostalgia for the parsimonious theoretical
ground (much like what behaviourism was set out to offer previously). Finally, a
growing number of researchers started to scrutinize the limitations of symbolic
systems that revealed such weaknesses as inflexibility, unwarranted complexity,
domain specificity, insufficient generalization, and scaling issues due to searches
in large systems. These and other notions were able to deter some of the
cognitive scientists from the symbolic models resulting in the increase in the
increased appeal of network modelling, reaching a sizeable presence by the end
of 1980s. In response, the symbolic models introduced a number of modifications
to address the criticisms outlined above, such as using the rules on a smaller
granularity and employing the selection and weighing criteria, and even started
64
to incorporate some of the network features. Some of the key differences
between the network and symbolic models remain however, including the
ordered symbol sequences and consecutive operations of symbolic models that
are not part of network model structure.
3.1.6 Argument for symbolic models
In the field of artificial intelligence, the cognitive models have relied
predominantly upon the symbolic tradition for a number of decades. With the re-
emergence of connectionism as an alternative approach to cognitive modelling,
many proponents of symbolic tradition have developed a body of research to
argue the inadequacy and limitations of connectionist approach.
Two most prominent critiques of connectionism raised by Fodor and Pylyshyn
(1988), and Pinker and Prince (1988) are discussed in the following paragraphs.
3.1.6.1 Symbolic representation with constituent structure
Fodor and Pylyshyn (1988) argue that connectionist networks fail to fulfil the
requirements of the representationalist system (intentional, semantic) without
the capacity offered by a symbolic representation system, and therefore are
inadequate for modelling cognitive processes. Symbolic representations have a
language-like character, which Fodor refers to while hypothesizing on language
of thought (1975), and combinatorial syntax and semantics to govern the
formation of molecular representations from the constituents. Composition and
other syntactic rules can be applied irrespective of symbol semantics in a way
that syntactic engine mirrors the semantic engine (Dennett, 1981) – something
65
that connectionist systems are said to be lacking, as individual or groups of units
cannot be developed into linguistic expressions that follow syntactic rules and
composing simple representations into representations of higher complexity
(Fodor & Pylyshyn, 1988). Thus, it is argued that only a system with symbolic
representations and constituent structure is suitable for modelling cognitive
processes - such as a language of thought that requires the following
combinatorial syntactic and semantic features: (1) the capacity to produce and
understand propositions from an infinite set, (2) the systematicity of thought that
manifests in the intrinsic connections between the ability to comprehend one
thought and other thoughts, and (3) the ability to make syntactic and semantic
inferences. On that premise, the localist connectionist networks do not possess
the necessary resources for cognition. In response, Smolensky's (1988a) criticism
points out the oversimplification in their analysis of the networks by Fodor and
Pylyshyn, as units in distributed connectionist systems are able to encode
representational features and microfeatures, and therefore are more suitable as
cognitive systems. In reply, Fodor and Pylyshyn argue that the ability of
distributed networks to recognize the compositional microfeatures of an entity is
not the same as the ability to identify one syntactic unit as part of a larger
syntactic unit. Networks lack such syntactic relation, thus reducing connectionism
to only an account of implementation of the symbolic representational system
(on the nervous system level). In contrast to Fodor and Pylyshyn's account,
Rumelhart and McClelland (1985a) distinguish between the level of information-
processing account of behaviour and the level of abstract accounts. As such, it
66
should be possible to suggest the multi-level account where abstract account is
the subject of such disciplines as linguistics; connectionist and information-
processing systems operate at the hierarchically lower level of analysis that is the
subject of artificial intelligence and cognitive psychology; and neuro-physiological
account at an even lower level (contrary to Fodor and Pylyshyn's only two levels
where connectionism is assigned to the lower level).
Connectionism not only aspires to provide an adequate account of the
phenomena that is successfully handled by rules; but also, without additional
mechanisms, offers an elegant account of other phenomena as well.
3.1.6.2 Argument for rules
Pinker and Prince (1988) focus their critique around the children's language
acquisition that necessitates the use of rules. In response to Rumelhart and
McClelland's (1985b) connectionist model that simulates acquisition of English
past tense and applies a uniform procedure for every case, Pinker and Prince
develop and extensive analysis to determine whether it is a plausible model of
human language acquisition. As a result, Rumelhart and McClelland's (1985b)
past tense model was held to a much higher standard than it was intended to
meet (that no other language acquisition model was able to meet either),
ignoring the substantial development learning simulation that was attained with
such a simple network architecture.
One line of criticism revolves around the type of decomposition of linguistic
phenomenon in which rule-based and connectionism models differ, where Pinker
67
and Prince (1988) argue that the mechanistic type of decomposition
implemented in the connectionism model, in contrast to a more abstract
decomposition employed by symbolic models, is inappropriate. Additionally, the
ability of the network to analyse the phonological strings for patterning is
attributed to the Wickelfeature (for an extended discussion please see Coltheart,
Curtis, Atkins, & Haller, 1993) structure and not the network architecture. They
also point out that Wickelphones (Coltheart et al., 1993) are limited to encoding
phonetic information, disregarding syntactic, semantic, and morphological
information important for past tense formation; thus limiting the ability of
connectionist model to encoding the input-output patterns and not the abstract
linguistic information. The fact that regular and irregular past tense forms are
considered linguistically different in symbolic models, yet learned by the same
mechanism in connectionist model, is also misguidedly seen as a shortfall.
Many of these concerns are addressed yet again referring to already discussed
multiple levels of hierarchy assumed in connectionism: (1) neural processing, (2)
information processing and (3) abstract level. The last point however is the
central issue that Rumelhart and McClelland's (1985b) model strives to address –
the ability to provide account of both regular and irregular forms with a single
mechanism, irrespective to the linguistic prescription that necessitates different
decomposition processes for the two forms. Further criticism refers to the
already acknowledged limitations of the two-layer network. Hidden layers and
back-propagation learning technique offer significant improvements of the
network capacity (Rumelhart, Hinton, & Williams, 1985, 1988), and structured
68
networks employ inter-network architectures (Touretzky & Hinton, 1988). In the
end, Pinker and Prince (1988) reluctantly acknowledge that advances in network
architecture and learning mechanisms may potentially enable the connectionist
models to meet the criteria they specified, still unlikely to be able to provide
more than a mere implementation of standard grammar.
3.1.7 Argument for connectionist models
In the next sections, three kinds of connectionist response are considered in
response to the claims that prescribe the use of rules and symbolic
representations as compulsory.
3.1.7.1 Approximationist approach
One connectionist view relies on the premise that symbolic models are abstract
accounts of the phenomena and lose some level of detail in providing an efficient
account of regularities, and therefore are approximates of the connectionist
model account of cognitive performance that offers the highest level of detail
(Smolensky, 1988a). Thus, in the case of language, it is the symbolic models that
perform the task of approximation and connectionist models, once sufficiently
developed, would be able to provide a full account of language. Rumelhart and
McClelland (1985b) also promote this position, as in their view symbolic rule-
based systems are too brittle and therefore unable to capture flexibility and
subtlety of the cognitive process in its entirety, as cognitive behaviour is not
governed by rules but is rather only approximately described by the rules at best.
The behaviour is thought to be governed by a unified mechanism at a lower than
69
rules level – sub-conceptual level (Smolensky, 1988b) or microstructure level
(Rumelhart, 1975). Alternative approach would be to develop intricate rule-based
systems that operate on a lower level and utilize soft constraints for the micro-
rules (Holland, Holyoak, Nisbett, & Thagard, 1986)
Connectionist models described earlier that are able to attain considerable
success in extracting the rule-like behaviour without the explicitly defined rules
(for example past-tense acquisition model, Rumelhart & McClelland, 1985b)
provide some preliminary evidence in support of the approximationist approach.
The next step would be a model capable of syntactic processing without the
reliance on rules at a level of performance comparable to a rule-based model in
the very least. One such attempt is a network system for processing finite state
grammar strings by Cleeremans, Servan-Schreiber, and McClelland (1989). One of
the challenges for their network was the requirement to consider the preceding
input while the current input is being processed – something a feed-forward
network is not capable of attaining. To overcome such limitation, a novel
architecture was devised by Cleeremans et al. (1989) as suggested by Elman
(1989, 1990) – a recurrent network which, in addition to the regular feed-forward
architecture, includes a subset of context units. These units do not receive
external input, but rather receive activation from the hidden layer, making the
previous interpretation of input by the hidden layer available during the current
processing. As a result, the network was able to attain a high performance
parameter and able to identify the preceding input in the string presented with
appropriate training and network architecture. To better understand the learning
70
process of the network, Cleeremans et al. (1989) used cluster analysis method to
examine the information encoded by hidden units, which extracts activation
pattern regularities that occur in hidden layers at specific input sequences to
construct a tree cluster representation of similar patterns. Using just a few
hidden units, the network was able to extract close approximations of abstract
grammar rules - the outcome consistent with the symbolic perspective.
Increasing the number of hidden units within the network architecture resulted in
the learning patterns that develop a highly complex structure at different levels
of cluster analysis, capturing not only the previous state of input but also the
occurrence within the sequence. What Cleeremans et al. (1989) are able to show
is that using the back-propagation learning, in the state of limited available
resources (comprised of only a few hidden units), the networks resorts to
extrapolation of abstract high-level rule-like patterns akin to that in symbolic
models. When the network is not constricted however (more hidden units than
minimally required for the task), it proceeds to extrapolate high-detail lower-level
patterns that are more elaborate than the abstract rules in the traditional
symbolic system.
3.1.7.2 Compatibilist approach
If the approximationist approach works from the bottom up, the compatibilist
approach, on the contrary, works from the top down, and assumes at least some
human explicit symbolic processing. Taking into account success of network
models that do not use explicit rules, Touretzky and Hinton (1988) believe that it
does not necessarily suggests the abandonment of explicit representations of
71
rules in reasoning tasks entirely. Instead, embedded symbolic representation is
implemented within a distributed subsymbolic connectionist architecture to
achieve a powerful, parallel, fault resistant system of reasoning. As a result,
instead of the usual training process where the network is left to its own devices
to extract the patterns from the available data, in compatibilist approach the
network is designed to implement the rules from the top down. In production
system, (J. R. Anderson, 1981, 1983a), symbolic expressions are manipulated by
explicit production rules. Touretzky and Hinton (1988) aimed to develop a
connectionist implementation of production system – a system capable of using
rules for symbol manipulation rather than approximationist network that
generates approximate behaviour without reliance on symbols or rules. For
exhaustive description of the Distributed Connectionist Production System, please
see Touretzky and Hinton (1988).
3.1.7.3 Using external symbols for symbolic processing
Third alternative to approximationist and compatibilist approaches was proposed
by Rumelhart, Smolensky, McClelland, and Hinton (1986) that revolves around
the idea of networks developing the capacity to interpret and produce external to
the network symbols. Natural language as a symbolic system fulfils a dual
purpose as an internal and external tool – external symbolic formulations
internalized through the conscious rule interpreter, which is a separate entity
from the intuitive processor that operates on the inherent subconscious level
(Smolensky, 1988b). Thus, it may be tempting to assume that humans need to
function inherently as a rule processing system in order to operate as conscious
72
rule interpreters. From the connectionist view, however, it may be possible to
provide an alternative explanation of this account. Human developmental
process occurs largely in social environments saturated with external symbols. As
part of this developmental process, we learn the capacity to interact with
external symbols by means of lower-level processes devoid (at least initially) of
the symbol internalization ability, i.e. learning how to use external symbols. Even
if it may appear that the use of symbols is eventually internalised to aid the
reasoning faculties in mature adults, it is unclear in what way it is internalised
exactly. Connectionist systems aim to identify and explain the causal relations of
symbolic processing on a subsymbolic level – the step necessary to confirm that
processing mechanisms at a higher symbolic level are in fact necessary. An
alternative outcome could then suggest that connectionist pattern recognition
ability may be sufficient to account for the symbol processing. Either way, the
network approach to study external symbols may seem like a promising research
direction.
The general idea here is dissimilar to compatibilist approach for the following
reason: instead of developing a rule system, the network is trained to use a
system that may contain the symbolic information such as rules. In such a sway,
network is required to exhibit the usual behaviours such as pattern recognition
working with the external symbols, and possibly benefiting from external storage
function and other elements. Rumelhart, Smolensky, McClelland, and Hinton
(1986) argue that the ability to simultaneously manipulate the environment and
process the environment that is being created in the process using external
73
symbols to solve difficult complex tasks by decomposing them into smaller
simpler ones is the real and primary symbol processing function that humans are
capable of performing. Thus, a system can learn to process and manipulate
external symbols that are arranged according to some logical order. As it pertains
to external symbol internalization, it is suggested that an internal mental
representation of the external symbol environment is constructed, and the
mental procedures operate on the internal representation instead. The output of
mental model is used as input for the next mental procedure, and the output of
that procedure used as input for the mental model, maintaining series of mental
operations as a loop. Symbols are understood as patterns in the network, where
the stable states of the network are symbols on a subsymbolic dynamic level.
The symbol manipulations are therefore treated as a learned capacity initially
performed in the external environment, where symbols are the human artefacts
that may be internalized similarly to nonsymbolic information.
3.1.8 The appeal of connectionist systems
One of the principal reasons why network models have generated an increasing
interest within the cognitive science community is the demonstration of many
properties that exhibit similar behaviour to human cognition which are not
observed in symbolic models. There are a number of qualitative differences that
set NNs apart from other AI approaches, namely the learning and
representational abilities. Other distinguishing features worth noting are inherent
74
parallelism and nonlinearity, and the ability to exhibit exceptional performance
with noisy data (Gallant, 1993).
The next paragraphs provide some details on that aspect.
3.1.8.1 Natural plausibility
It should not be a surprise that network models are compatible with what is
known about the human nervous system. After all, network models were initially
designed to model the human nervous system and the brain: network activation
and propagation is based on the elements that can be observed in nervous
system. Other elements of connectionist network models do not resemble the
biological elements of a natural nervous system, establishing a stronger position
for the artificial and cognitive nature of network systems. Here, network models
are examined as systems of the artificial tasked with modelling a cognitive
process, thus the natural plausibility is not as useful as it would be otherwise in a
discussion concerned with the neurophysiologic aspects of nervous system.
3.1.8.2 Soft constraints
A connection between the units in the network system, much like the rules in a
symbolic system, constitutes a constraint between the two units: if the first unit is
active, the second unit is constrained to be active as well (in an excitatory
connection). Rules however are of deterministic nature, so if the antecedent for
the rule is satisfied – it is to trigger the consequent action is sure to occur.
Network connections serve similar purpose, but, unlike the rules, receive input
from a multitude of other units and therefore represent the situation with
75
multiple constraints. Thus, the best solution is determined by multiple constraints
and therefore does not necessarily constitute an optimal solution for each of the
individual constraints imposed – this is usually referred to as soft constraints.
Soft constraints seem to be better suited for cognition modelling in a number of
tasks: decision-making is but one such case where a person is often confronted
with multiple alternatives. In network models, soft constraints provide a natural
way to account for competing alternatives without specifying the underlying rules
that govern the competition, at the same time not limited by the constraints of
linear models in symbolic systems. Another benefit that soft constraints are able
to provide is the improved performance of the system while dealing with
information previously not encountered. The implementation of soft constraint,
which is an inherent characteristic of network models, provides the ability to
overcome some of the difficulties of symbolic models, such as exceptions to the
rules (particularly in psycholinguistics) or common mistakes for example. Soft
constraints of connectionist models tend to override these limitations and
account for both – the regular behaviour that can be governed by rules, and the
exceptions – with a single mechanism, where different connections carry out
different functions in alternative contexts. Thus, connectionist network models
that employ soft constraints are able to overcome some of the limitations of the
inflexible rule-based symbolic models.
76
3.1.8.3 Graceful degradation
Network systems exhibit a range of features similar to the functionality of a
human brain when it reaches the limit of performance. Normally a very reliable
system, while under overwhelming strain or physical damage, human brain
begins to show less than optimal performance rather than crashing – some of the
requests or parts of information are ignored, affecting performance according to
the level of overload. Similar behaviour can be seen in connectionist networks:
when elements of the system are destroyed, it is very rare indeed to observe the
total loss of specific function, and instead manifested as a nonspecific gradient
loss of functionality and increasing limitation of abilities – the effect generally
referred to as graceful degradation.
Symbolic systems are not able to perform in such a manner. If a rule is
eliminated, the system loses the functionality this rule is able to provide
completely. Redundancy and error-checking mechanisms are able to cope with
system damage to some extent – still failing to exhibit the full effect of graceful
degradation nonetheless. Whereas destroying connections or even units in a
connectionist network, on the contrary, does not significantly deteriorate the
performance of the system overall. Deleting particular units would remove the
locally encoded information, but deleting connections would result in graceful
degradation. Subsequently, the network would still be capable to offer plausible
solutions using the available information and learning rather than crashing.
77
System that employs distributed representation would even be able to exhibit
only slightly warped behaviour if some of the units were disabled. With additional
damage, system response accuracy would deteriorate further but would still be
able to produce a response according to its current distorted pattern. Thus,
connectionist systems possess an inherent ability of graceful degradation as a
consequence of the network architecture.
3.1.8.4 Content addressable memory
The information that can be retrieved from memory using the cues that
constitute parts of the memory itself is usually referred to as content addressable
memory. Modelling this type of information system using the symbolic models
poses a considerable difficulty. An example would be a filing system, where
information is organized and stored according to some rule – usually a one-
dimensional (chronographic for example), or two-dimensional (chronographic
and alphabetic for example) at most. Accessing the information in any other way
(not chronographic or alphabetic but rather performance based for instance)
poses a considerable problem, as the information system was never designed for
such a way and therefore would involve a serial search. Indexing systems could
help to some extent, but this would require determining all possible paths for
information retrieval on the onset, which could be unworkable.
Connectionist networks however, as discussed above, provide natural means to
develop the content addressable memory system. It is even capable of dealing
with certain inaccuracies with the cues and is able to provide the best alternative
78
solutions if precise answer is impossible to determine due to inaccuracy. The
effect is particularly apparent in distributed representation systems, where
remembering a previous state effectively is no different from the process of
making inferences and constructing a new state.
3.1.8.5 Learning from experience
Network models are capable of learning from experience by adjusting the
connection weights, which will be discussed in detail in the following sections.
Symbolic models have been shown to be particularly suitable to represent
learning that follows strict isolated rules such as instructions, whereas network
models are particularly adept at large scale conceptual frameworks such as
language acquisition. Further research is required to examine the usefulness of
network models at acquiring instruction type memory.
3.2 The Neural Network architecture
Connectionist networks are adaptive systems comprising of simple computational
units. Often containing thousands of interconnected units, unlike the traditional
models with sequential processing, networks models are capable of displaying
complex behaviour even with just a few units due to its parallel architecture.
79
3.2.1 Neural Network model structure
To illustrate connectionist processing, consider a model designed to simulate the
functionality of a content-addressable memory system: the hypothetical dataset
describes two groups of people with demographical and occupational attributes.
3.2.1.1 Network components
To have this data encoded, the connectionist network uses a number of distinct
components. A set of (1) computational units is joined by (2) connections, and at
certain times units examine its input and computes (3) activation as an output,
which is passed to other units along the connections. Each connection carries a
certain signed (4) weight, which determines whether the activations influence the
receiving computational unit in a similar or opposite way according to the sign of
the weight; and size of the weight determines the magnitude of the influence
upon the receiving computational unit. Connections and weights are the
imperative parameters of the model and determine the model behaviour.
3.2.1.1.1 Activations
Activation values for the units are determined by the equations. Initially set to a
certain value, activations change once the simulation is run and are adjusted
accordingly in response to the effects of external input, propagation of
activations exhibited by other units in the system, and decay over time. Only the
input layer could be affected by the external input – units in the hidden layer
(each unit representing the group member for example) are only affected by the
propagation of activation from other units and decay over time.
80
3.2.1.1.2 Connections
Weighted connections transfer the activations between the units. In the
hypothetical dataset described above, the units for each of the group members
with each of their attributes are excitatory connections. As a result, property unit
propagates activation to the unit that represents a group member that possesses
such attribute. To those units that represent a group members that do not
possess such attribute the connection is inhibitory, as are the connections
between the mutually exclusive attributes. Thus, if certain age group is activated,
all the units that represent other age categories will become less active (due to
inhibitory activations between the mutually exclusive properties) and units that
represent group members that are of the appropriate age will be activated
(excited) and those that are of a different age will not (inhibited).
3.2.1.2 Dynamics of the network
To illustrate the functionality of the network, consider a sample task of memory
retrieval. The external input is supplied to one or several input units of the
network. As a result, the excitatory connections will transfer the activation to the
units associated with the input unit supplied with external activation and
inhibitory connections will decrease the activations with the units that are not
associates with the externally activated unit. Thus, if the unit representing group
member’s name is activated externally the activation will be carried to all the
units that represent group member’s characteristics, and will decrease activation
with all the other units that represent other group members and characteristics
other that the associated with the particular group member that is being
81
externally activated. These effects will continue to reverberate throughout the
network across numerous cycles until the system achieves the state of
equilibrium where additional cycles no longer improve the performance. All the
attributes associated with the externally activated group member will now have
high activation values, thus recovering group member’s characteristics from the
system.
A more practical query may be the reverse to the one described above is the task
of content-addressable memory, where the group member’s name is retrieved
through activating the characteristics. Simultaneously activating a number of
characteristics will activate the hidden unit that represents the group member,
which in turn will activate the group member’s name unit. Moreover, it will
activate all the units in the hidden layer that represent the group members that
have excitatory connections with externally activated attributes, thus activating
all the group members’ name units to a varying degree – behaviour of the system
that offers high level of generalization.
The process outline above is able to produce more subtle effects similar to
human performance tasks of categorization and prototype formation. The
network is able to obtain category instances with the external input to one of the
category units, highlighting all group member units. During this process, all group
members are characterized as to how well they represent the category. As
category unit is activated, the activation is propagated to all member units, which
in turn propagate the activation to all their characteristic units. Therefore, the
most common group member characteristics become activated the most,
82
sending their activation back to the individual member units thus forming the
prototype.
Another example shows the practical example of utilizing regularities. If multiple
characteristics units are activated externally, they will immediately propagate
activation to those individual member units that share the activated externally
characteristics, thus changing the activation for those members. In turn, the
member units will propagate activation to other properties that are characteristic
to the activated member units. This then will activate additional member units
that do not possess the initial characteristics activated externally, but would
identify the other group members who are most likely to show the best fit with
the primary group. This network behaviour effectively allows making inferences
from known characteristics to other characteristics.
3.2.2 Neural Network architecture design features
Illustrated in the previous network architecture may very well be very fitting to
the memory retrieval tasks and provides attractive network behaviour associated
with the task. This design however is not particularly suitable for most other
applications. Other connectionist architectures may be distinguished with regard
to the four features: (1) the way units are interconnected, (2) unit activation
mechanisms, (3) learning procedures that alter the connections, and (4) the
semantic interpretations of the system.
83
3.2.2.1 Connectivity pattern
Connectionist networks may be divided into two major classes in respect to the
way the units are connected: (1) feedforward networks that have one-way
connections where the activation goes from the input layer to the output layer
and activation is forward propagated, and (2) interactive networks that have two-
way connections where dynamically changing activations reverberate between
the units in the network over many cycles.
3.2.2.1.1 Feedforward networks
Units in a feedforward network are arranged into distinct layers – the simplest
two-layer configuration consists of only input and output layers. All input units
are connected to all the output units and once the connection weights are
configured appropriately, the network is able to produce a suitable response to
the input pattern with a distinctive output pattern and for that reason is
sometimes referred to as pattern associator. Pattern associator could be used as
a classification device where inputs are sorted into few output categories. The
limited architecture of two-layer network however is insufficient to address
certain problems such as XOR (exclusive or) function. To accommodate such
limitation, it is necessary to introduce the hidden layer into the network
configuration. Situated between the input and output layers, hidden layer
modifies the information processing and provide considerable additional
functionality to multi-layered feedforward networks. A number of modifications
are possible in the multi-layered networks. One such modification connects units
not only to the next layer, but also to the units in the layers beyond the next one
84
– so in a three-layer network input units would not only connect to the units in
the hidden layer, but also directly to the units in the output layer in addition to
the connections between the hidden layer and the output layer. Another
modification establishes the recurrent network, where the system receives the
input in a sequential manner and response is altered according to the information
of previous steps of the sequence. The pattern achieved on a higher layer is fed
back into the lower layer and serves as a form of input.
3.2.2.1.2 Interactive networks
In contrast to feedforward networks, interactive networks include at least some
number of two-way connections and input is processed across multiple cycles. If
processing units are organized into layers, the processing could go forwards and
backwards.
3.2.2.2 Activation rules
Another difference characteristic to network models in addition to the pattern of
connectivity is the rules that govern the unit activation values. Activations values
could be grouped into classes that include (1) discrete activations that typically
involve a binary value (for instance 0 and 1) or (2) continuous activations, either
bounded (a range of -1 o +1) or unbounded. Activation rules specify the
calculation for the level of activation for each of the units. The following
paragraphs will discuss the rules in detail.
85
3.2.2.2.1 Activation rules in feedforward networks
The input is composed of the two components: the external input and the effect
of activity in other connected units. In a two-layer feedforward network, input
layer units are dedicated to receive external input and take the input pattern
value as activation and therefore does not require an activation rule. On the
contrary, the output layer units are dedicated to receive activation from other
units in the network. The output from the input layer units is sent to the output
layer units, where it is multiplied by the connection weight – summative output
from all the input layer units provides the total input for each of the output layer
unit. Additionally, a bias could be introduced to regulate the responsiveness of
each output unit as an additional input for the output layer units that are
unaffected by the dynamics of the network: low value will result in conservative
response of the output unit whereas high value will do the opposite. Activation
for each of the units is then determined by applying the activation rule using the
summative output from the input layer units. If linear activation rule is used,
activation equals the summative output of input layer units – provided certain
constraints are met. Introduction of hidden layers provides the additional power
necessary to violate the constraints, making it necessary to change the activation
rule accordingly to a nonlinear function – logistic for instance.
These rules could be adapted to be used with discrete rather than continuous
activation values (for networks that use binary units). With the linear activation
rule, the continuous output of the unit is compared with the specified threshold
value unit and depending on the result; the continuous output of the unit is
86
converted to binary (either 0 or 1). It is possible to use the threshold units within
the hidden and output layers of feedforward networks, as well as in interactive
networks. If the logistic activation rule used in a feedforward network with binary
units, the binary activation takes the probabilistic form where the equation
determines the relative frequency of the discrete result.
3.2.2.2.2 Activation rules in interactive networks
In addition to the equations used in feedforward networks, interactive networks
necessitate the parameter for time (t) or cycle(c), as activations are updated
numerous times for each of the units in response to the particular input. Unit
activations may be updated once per cycle if synchronous update procedures are
employed, or each unit is updated separately according to some random
determinant in the case of asynchronous update procedure being employed
(which helps with preventing the unstable oscillations of the network). Another
difference is that each update requires a separate application of activation rule to
be performed; unlike to feed-forward networks where there is only one forward
wave of activation changes and once for each of the units. Hopfield nets for
instance (Hopfield, 1982) comprise of linear threshold units where on each input
unit acquires an activation of 1 if the input is above the threshold – otherwise
activation is set to 0. Asynchronous update procedure is then employed for units
to determine a random time to update activation according to the state of input
of the network at the time of an update until none of the units would receive an
update that would lead to a change of the activation. It is then the network is said
to achieve the state of equilibrium, which constitutes the network's identification
87
of the initial input (if the network indeed settles into the equilibrium and does not
rather oscillate between multiple configurations).
An imperative role in acceptance of the model played the fact that Hopfield
(1982) demonstrated the measure of the network state (energy, E), showing the
analogy between the network ability to achieve equilibrium with that of the
physical system - state of lowest energy in thermodynamic system. The E
measure could be adapted to show the goodness of fit (G) of the network’s end
state of equilibrium (Rumelhart et al., 1986). Hopfield nets are increasingly useful
in a number of optimization applications where the connections represent the
constraints for possible configurations of network equilibrium (a solution to
supplied input).
One of the difficulties Hopfield nets demonstrated is that the network can settle
into local minima – the stable state where different parts of the network settle
into incompatible configurations and as a result, the network is not able to
achieve the overall state of lowed possible E value. To reduce such network
tendency, Hopfield net has been adapted by Hinton and Sejnowski in their
Boltzmann machine (1985; 1984). The difference with the Hopfield net is that it
employs a stochastic activation function rather than a deterministic one –
essentially, it is a probabilistic version of logistic function discussed in paragraph
on activation rules in feedforward networks above.
Anderson’s spreading activation models (1981; 1983b) that utilize negative
exponential function of current activation to achieve nonlinearity in semantic
networks, using decay function for interactive processing. Used in a service of a
88
production system, the networks allows parallel processing within the system
architecture by tolerating a number of similar to some degree active processes to
run simultaneously in competition (similar to notion of soft constraints).
Spreading activation models could be distinguished from the network models on
the following criteria.
First, spreading activation models maintain the structure of control, whereas
connectionist networks are dedicated to retain no control over cognition
modelling other than internal decentralized local control of the network. Second,
certain types of distributed representation is emphasized in connectionism
(McClelland, Rumelhart, & Group, 1986). Third, propagation equations contain
differences.
To summarize, all the different types of networks have certain common
elements. New activation of a unit is dependent upon net input received from
other units, which in turn is determined by the connection weights. Connectionist
networks usually have the ability to alter their connection weights in adaptive
manner through a process that is often referred to as learning. Different learning
procedures are discussed in the following sections.
3.2.2.3 Learning procedures
In connectionism, the process of learning signifies the ability of the network to
modify connection weights between the units. The weights determine in some
measure the end state a network could reach as a result of the processing, and
therefore transform the network characteristics. It is the goal of a learning
89
procedure to define a basic procedure for the network capable to achieve the
desired output without the external control system – that is a local system of
weight change control. Readily available inputs of each of the units include the
current value of the weight itself and the activations of the units to which it is
connected.
Donald Hebb proposed an idea that suggests that learning occurs in the nervous
system through strengthening of the connections between the neurons
whenever they fire simultaneously. Based on this proposal, one of the simple
learning procedures in connectionism specifies the weight of the connection
between the two units is increased (or decreased) in proportion to the product of
their activations – the Hebbian earning rule. Consequently, when both units have
the same sign the weight is increased proportionately to the product of their
activations or decreased in the same way when the signs are different. Although
capable of producing impressive results, Hebbian rule is presents some serious
limitations. A number of differing learning procedures will be discussed below –
all however based on the same principles that specify learning as procedure of
changing connection weights employing only the information available locally.
3.2.2.4 Semantics of connectionist systems
If a connectionist network is to simulate human behaviour or cognitive
performance, one must consider the representation of the concepts of that
domain in the network. It is possible to either designate each unit to a particular
90
concept in localist networks; or designate multiple units to represent the
concept, as is the case in distributed networks.
3.2.2.4.1 Localist networks
In localist networks, each concept is represented by a designated unit. One of the
obvious advantages that this offers is the considerable ease with which
researchers could monitor the network performance in terms of the domain and
objects studied. This however carries a possible caveat in a way that it is easy to
forget that the appointed to each unit concept only carries meaning to the
researcher and not the network itself. It is necessary to rely on external system
employed for semantic interpretation, which could limit the performance of the
network by the design of the architecture. Despite this, it is may be even more
difficult to design a distributed network, and localist networks may be preferable
in a range of tasks. Concepts are represented by units, and constraints between
those concepts are represented by the connections – positive connection
emphasizes the condition of the network where both units have the same
activation, whereas negative connection emphasizes the preference towards the
opposite activations. The state that the network achieves at its global minimum is
the state that best satisfies the soft constraints.
3.2.2.4.2 Distributed networks
In the distributed networks the situation is quite different, as the concept is
represented by an activation pattern across a number of units rather than a
single unit representing the concept. One way to design a distributed
91
representation of a concept in a network is through featural analysis of the
concept (method often employed in the field of psychology) to encode across the
appropriate units. The separate features derived such an analysis are usually
encoded as individual units in the network architecture – so are a localist
representations of the features that form a distributed representation of the
concept. On way to obtain a workable featural analysis of a concept is to rely on
established theoretical framework, as often connectionists are not concerned so
much with the features of the concept but rather with how the network utilizes
the distributed representations. Another way is to allow the network to perform
the analysis. During the learning process when only input and output parameters
are identified by the researcher, hidden units of a multi-layered network will
develop sensitivity to certain features of the concept. Network learning is
effectively an intricate feature extraction mechanism, and usually networks do
not attain the apparent from the input localist solutions but rather each of the
hidden units develops sensitivity to a complex and understated regularity
commonly referred to as microfeature. This, hidden layers are able to provide
distributed coding of input.
Such distributed representation design carries additional benefits to the model
architecture. Once the concept is distributed pattern across units, the processing
capacity of the network is also distributed and therefore is capable to
compensate for the missing, partial, or even inaccurate data through processing
in other units. Thus, the system is more resilient to failure. Moreover, the system
is able to learn new information or provide a response to previously unseen
92
input. This network capacity is akin to human process of making generalizations –
ability to infer some of the unknown properties of the entity based on the known
properties – and therefore may be particularly suitable in such tasks (McClelland
et al., 1986).
Another technique employed in distributed networks is coarse coding, and is
discussed elsewhere (Touretzky & Hinton, 1988).
3.3 Machine learning
Machine learning broadly refers to ability of a model to improve its performance
based upon input information. It is generally considered that research on
machine learning presents the highest potential to eventually develop models
able to perform complicated AI tasks, as algorithms that learn from training and
experience are superior to those based on a subset of contingency rules
developed by human scientists. Machine learning may be divided into supervised
and unsupervised learning.
Supervised learning is a learning algorithm that analyses a training data (i.e.
labelled data: pairs of input and output values) to produce an inferred or a
regression function able to predict the correct output for any input. It is required
for the learning algorithm to make certain generalizations from the training data
that could be used to analyse previously unseen data – a process that is
analogous to concept learning in human and animal psychology. Unsupervised
learning refers to the machine learning problem aimed to determine the
93
underlying structure of unlabelled data. In unlabelled data there is no error signal
to evaluate possible solution, and therefore relies on techniques such as
clustering that examine the core features of the data – the self-organizing map
(Kohonen, 1990, 1998) is one such algorithm often used in NNs models (for a
strategic marketing application, see Curry, Davies, Phillips, Evans, & Moutinho,
2001).
The capacity of Neural Network Models to learn is one of the features most
notable to researchers. This complex question requires particular attention, and
learning algorithms, both connectionist and other, are discussed here. Certain
philosophical issues concerning the connectionist learning are also addressed.
3.3.1 Traditional approaches
Following the two distinctive philosophical approaches to learning, the
theoretical findings in disciplines such as psychology and linguistics (and others)
have customarily been divided to follow one of the two major intellectual
traditions: the empiricism or the rationalism.
3.3.1.1 Empiricism
The philosophical empiricism (largely based on the work of Bacon, Locke, Berkley,
and Hume) refuses the excessive reliance on established principles of reasoning,
and views the sensory experiences as primary requirement for the acquisition of
knowledge. Certain integral elements of the theoretical framework however pose
a particular interest. Associationism, for instance, describes the sensory processes
to result in simple ideas, which in turn are composed into complex ideas through
94
the spatial contiguity that produces the association. Temporal contiguity is
viewed as an integral part to the concept of causation, as idea of a cause would
elicit the associated idea of effect. Simple ideas that are sensations are composed
into complex ideas through simple additive mechanisms, and therefore are
sufficient to predict the properties of complex ideas.
Behaviourist models employed a kind of associationism in such a way that the
entities involved in the association were limited to those available for the
observation – the environmental events and the behavioural responses. During
the period dominated by the behaviourist theories, learning was one of the
central research domains. In behaviourism, learning could be defined
operationally as a change in response frequency. Researchers investigated
different ways of arranging the environment and employed mathematical
modelling to establish the theory, for the large part deliberately ignoring the
internal processes and mechanisms of the system. This and some other
limitations made behaviourists susceptive to the emergent information
processing theories.
3.3.1.2 Rationalism
The other major philosophical tradition that influenced the development of
cognitive sciences is rationalism (represented by Decartes, Spinoza, and Liebniz).
Contrary to empiricism, ideas in rationalism are not restricted to experiences but
rather are innate: what is important is how these ideas are used in reasoning. In
psycholinguistics, Chomsky (1957, 1968) introduced the concept of innate
95
Universal Grammar, arguing that the amount of information that children receive
in their early years would not be sufficient to develop the grammar rules of a
child (poverty of the stimulus argument, in review of B. F. Skinner’s Verbal
Behaviour, 1959). Following the Chomskian tradition, the child is said to be born
with a set of default parameters that could be reset according to experience.
Neither empiricists nor rationalist frameworks were able to provide a convincing
account of the mechanisms of the language acquisition process.
3.3.1.3 Contemporary cognitive science
Unlike empiricists and rationalists, cognitive psychologists and artificial
intelligence researchers have for the most part seemed to ignore the learning
process until recently, addressing other areas where immediate result could be
made using symbolic rule-based models (information representation, memory
systems, etc.). The rise of connectionist approaches to learning in the 1980s
generated and increased interest to learning. Research area known as machine
learning emerged within the artificial intelligence framework that focuses on
developing strategies for the machines to learn from experience. In rule-based
systems, learning strategies focus on addition or modification of the rules – a
rather challenging task, as modifying the rules to accommodate certain
circumstances may result in a drastic deteriorating effect elsewhere. Another
critique is that adding and modifying rules is arguably too rudimentary of a
mechanism to capture the learning process.
96
Therefore, in the 1990s, empiricists continue to develop increasingly
sophisticated methods to modify the symbolic rules; rationalists offer novel
interpretations of adjustments to innate grammar system in language acquisition;
and connectionists develop new algorithms that are able to provide the sub-
symbolic network architectures of the learning process.
3.3.2 Neural Networks approach
Learning in connectionist models is a process of adjusting connection weights
between the units, which would have an effect on subsequent processing of the
input by the network. While the network is trained, the activations and weights
change on each trial – subsequently after the network training is complete, the
network is tested to examine the effect of the input on the activations only. Both
weight and activation changes are determined based on the local information
available immediately to each unit: remote units in the network are affected by
spreading local propagation. A number of learning procedures in connectionist
networks have been developed, and employ either supervised (classified
according to specified input-output) or unsupervised learning (no feedback on
input-output provided).
3.3.2.1 Two-layer feedforward network learning procedures
The objective of the learning procedure is to determine the weights to allow the
appropriate response of the network to a number of cases. Each case is
comprised of the input layer pattern of activations and output layer of pattern
activations that form the n-dimensional vectors, where n is a number of units.
97
The learning ability is normally assessed during the training and testing stages.
During the training, the weights are successively modified according to the
constraints of the set of cases. Depending on the learning procedure used and
the difficulty of the set of cases, it may require a large number of epochs for a
network to achieve the desired state and, once the level of acceptable
performance is attained, should be able to respond to the input patterns with the
appropriate output pattern. Some particular learning procedures are the Hebbian
and delta rules.
3.3.2.1.1 The Hebbian rule
The two-layer feedforward network that uses the linear activation and Hebbian
rule to a set of input-output cases forms a learning system referred to as linear
associator. During training input and corresponding output pattern is presented,
and Hebbian rule is used to adjust the connection weights: the activations of the
two connected units are multiplied with the learning rate. Consequently, if the
two unit activations are both positive or negative, the connection weight will be
adjusted by the amount specified; if one unit is positive and another is negative,
the connection weight will decrease according to the negative equation value.
During the testing stage, only the input patterns are presented. Hebbian rule
offers good results as long as the input patterns are not correlated – a
requirement that imposes a substantial limitation.
98
3.3.2.1.2 The least mean square rule
Similar to Hebbian rule in a way that it considers the input and relevant output
unit for a change in weight, but substantially more powerful, is the least mean
square (LMS) learning rule. During the training stages, the LMS rule generates the
actual output pattern using the input pattern and compares it to the desired
output pattern, and changes the weight accordingly to minimize the discrepancy
for each of the units. Thus, the rule effectively is an error correction procedure.
Once the discrepancy for each of the units is calculated, they are squared and
added together to compute the pattern sum of squares (pss) value. By summing
all the pattern sum of squares values a total sum of squares (tss) value is
obtained, which indicates the level of potential improvement still obtainable until
the perfect performance is attained on the whole set of input-output cases. Thus,
the essential principle is to change the connection weights as to minimize the
total error, and LMS rule need not be restricted to the uncorrelated only sets of
input patterns.
When the two rules are contrasted, the difference that gives the LMS method
substantial increase in performance over the Hebbian rule is that the LMS
method is able to utilize the discrepancy between the actual and desired output
to change the connection weights during the training stages; whereas Hebbian
rule is only able to use it for evaluation purposes during the test stages. Even
though the two-layer network could be quite powerful given that certain
conditions are met (linearly independent set – i.e. none of the input patterns are
a linear combination of other patterns): it is capable of learning the inclusive or
99
(OR) function. It is however not capable of learning the exclusive or (XOR)
function as there is no set of weighs capable of generating the correct output.
The network will however aim to minimize the tss and will learn to generalize to
new input patterns based on their similarity to the input patterns of the training
stage. As a result, the network will do a good job in identifying the acceptable
output pattern.
In the 1980s, the new powerful training algorithms for training hidden units such
as back-propagation finally allowed to overcome the linear reparability
constraint, leading to major breakthrough of connectionism.
3.3.2.2 Back-propagation learning procedure in multi-layered
networks
By introducing the hidden layer between the input and the output, the
information flow becomes increasingly more sophisticated. This allows the
network to process intermediate results obtained from the input activations that
are then used in the output. The network architecture becomes substantially
more complex as well, as now there are a number of ways available to the
researcher pertaining to the network design that need to be addressed. Number
of hidden layers and hidden units in each layer is one such question, and
researcher may choose to perform some exploratory analyses to determine the
optimal model size. With multiple layers present in the network structure, it is
necessary to consider the extent of interconnectivity between the layers in the
network, as now not only the successive connections are possible (input-hidden-
100
output), but also additional connections (input-output in addition to input-
hidden-output). In addition, the more intricate network architecture requires a
more sophisticated nonlinear activation rule. Finally, the learning procedure
capable of handling the now available hidden units is essential, such as modified
LMS procedure where the activations propagate forward and then error and
weight adjustments propagate back through the network (back-propagation).
Despite the questions of the design, the actual network response is developed
through the learning process by adjusting the connection weights and not
determined by the researcher. An example of such network is NETtalk model
(Sejnowski & Rosenberg, 1987) that was tasked with reading English. By supplying
the continuous speech corpus of 1,024 words with desired output consistent with
the phonetic speech, network was able to achieve 80 percent accuracy after
10,000 training trials, and 95 percent after 50,000 words presented to the model,
with 78 percent accuracy on a previously unseen text. The voice synthesizer was
actually able to produce recognizable speech using the model output. Further
analysis exposed the functional features of hidden layers in the network being
relevant to theoretically appropriate elements of language.
Some of the shortcomings of the back-propagation learning procedure that have
been identified are concerned with a rather high computational demand and the
fact that the network may take a long time to learn; as well as inability to
distinctively attribute back-propagation to any known biological process. If
viewed on a psychological level of analysis however, back-propagation is a
101
mechanism that allows a multi-layered network to achieve gradient descent, i.e.
learning by reducing the output error.
3.3.2.3 Boltzmann learning procedure
Interactive networks (such as Boltzmann machines) have their own distinct
architecture different to that of the feedforward networks: each input pattern
triggers numerous cycles of activation processing across the network, which
maintain the interaction across the cycles until the network settles into the state
of thermal equilibrium. Unlike the learning epochs in feedforward networks, the
cycles do not involve the modification of weights but rather computation of
activations, and respond to a single input pattern. One issue that may arise is the
tendency of the network to settle into local minima – a stable state that
nevertheless does not satisfy the constraints in the best possible way (addressed
with the help of the simulated annealing technique that involves the gradual
decreasing of the designated parameter in a stochastic function).
Boltzmann machines can be trained using a learning technique conceptually
similar to that of a back-propagation (McClelland et al., 1986). While in training
mode during stage one, the input and output units are fixed and other unit
activations are updated in a random order using the stochastic equation with the
simulated annealing until the network reaches the thermal equilibrium. Each of
the input-output cases is then processed and simultaneous activation time is
recorded as an expected probability of unit activation. In stage two, essentially
the same process is carried out but only the input units are fixed this time and the
102
output is determined by the network. The variation of the probability obtained in
the two stages determines the connection weight change which will result in
minimization of output units error (given the learning rate parameter is set
sufficiently slow). Thus, the underlying reasoning is similar to that of the LMS
procedure where discrepancies between the desired and actual output direct the
adjustment of connection weights. One of the drawbacks of the procedure is the
slow learning rate due to the time required for the network to settle into the
equilibrium for each input pattern.
3.3.2.4 Competitive learning
Another type of learning procedure is competitive learning that is a variation of
unsupervised learning, where a network is presented with input patterns and is
tasked to identify regularities to allow grouping the patterns into clusters of
similar patterns with no feedback on the correctness of the procedure. The
simplest architecture would have a fully connected input and output layers, and
the number of units in the output layer specifies the number of clusters for a
network to identify, and the activation rule is set to ensure only one unit is
chosen inhibiting the other units at the same time. Learning rule reallocates the
weight of the chosen unit in a way that increases the connection weights with the
active input units and decreases the weights with the inactive ones keeping the
total weight constant.
Inclusion of more than one set of competing output units (possibly with a
different number of units specifying a different number of clusters) would
103
increase the complexity of the network behaviour. Multiple layers would also
result in behaviour that is more complex and allow the network to identify higher
order regularities.
3.3.2.5 Reinforcement learning
In reinforcement learning, the network is given the information on whether or
not the output pattern was close to the desired pattern without supplying the
actual desired pattern – therefore only the global performance indicators are
used to adjust the weights. The procedure may not seem as advanced as back-
propagation; nevertheless, it does satisfy the basic notion central to
behaviourism and modifying behaviour through reinforcement. Essentially the
network performs a large number of trials with varying weight combinations
recording the global reinforcement delivered with each trial. The weight
combinations that deliver higher reinforcement gradually become recognized,
resulting in increased consequent trial frequency that leads to the weights that
highest global reinforcement.
Reinforcement learning is much simpler of a procedure compared with the back-
propagation since the error calculations for each of the weights are omitted; but
may take a long time to produce the result and does not scale very well. It does
however offer a substantial theoretical benefit of relating the connectionist
method to the field of traditional learning theory.
104
3.3.3 Difficulties with machine learning
Some of the criticisms of connectionist models of learning are discussed in the
following paragraphs.
3.3.3.1 Associationism
One of the critiques of connectionism is that it is essentially a return to
associationism, which would have an adverse effect on the progress of cognitive
science and the advances made by the symbol manipulation systems. It is not
however merely a return to associationism, but is rather based on the core
principle of associationism that suggests that contiguities produce connections.
Connectionism employs the powerful idea and develops it with unmatched
sophistication with such mechanisms and concepts as distributed representation,
hidden units capable of capturing microfeatures, back –propagation procedures,
supervised learning with error reduction function and other. A simplified
connectionist network that uses Hebbian learning rule is most similar to classical
associationism where simple units were ideas and, based on the contiguity,
associations are increased or decreased between these ideas. The more
sophisticated multi-level connectionist networks could attain deeper level of
associationism as hidden units can decompose ideas into microfeatures and
propagate their activity within the network to achieve contiguity in a different
manner.
Rule-like systems could also be modelled with connectionist architecture on a
micro level. Therefore, connectionism provides the mechanism that can operate
105
on a fine level of detail and using the cognitivist high-level description of the
cognitive process. Moreover, connectionist models of learning offer a novel
approach to the process of concept and cognitive skill acquisition. Ability to
provide plausible explanatory models of rule-like behaviour and offer powerful
learning mechanisms may play an imperative role in cognitive science through
the integration of associationism and cognitivism that may carry broad
implications for the field.
3.3.3.2 Poverty of stimulus
In Chomsky’s criticism of Skinner’s verbal behaviour and language acquisition, the
central nativist argument revolved around inability for a child to learn the
language from available experience – the poverty of stimulus. The argument
however should not be construed in a form whether anything in the organism is
innate or exist prior to the sensory experiences, but rather what is innate since
any kind of learning presupposes some sort of structure to be present within
which the learning could occur. In the case of symbolic approach for example, at
least some of the initial functionality in symbol manipulation could be considered
innate.
In connectionism, nativism is rarely regarded as an issue – possibly due to the fact
that connectionism has the roots in associationism. Something else to consider is
that most challenging connectionist problems are statistical and computational
and deal with the science of artificial in one way or another, and therefore are
106
not concerned with nativism. Extended discussion on this topic could be found
elsewhere (McClelland et al., 1986; Shepard, 1989).
3.4 Pattern recognition
Chapters above provide an overview of what connectionist networks are capable
to achieve by mapping one set of patters onto another by encoding statistical
regularities in connection weights modified by the network learning process. This
chapter provides a discussion on the claim that networks are particularly
appropriate for modelling behaviour, which entails that patterns are central to a
variety of human faculties and connectionist networks are particularly
appropriate for it.
Pattern recognition could be defined as mapping a specific pattern onto a more
general pattern. Another type of mapping is pattern completion, which is
mapping an incomplete pattern onto the same but completed pattern. Pattern
transformation is mapping one pattern onto a different related pattern. Pattern
association is mapping a pattern onto a different unrelated pattern.
In human behaviour, pattern recognition is most apparently evident in
perceptions, where local classifications are combined into higher order patterns,
which in turn serve as inputs for high-level recognition faculties and abstractions
that are recognised by human languages. Thus, categorization does not only refer
to semantically interpretable cognitive level of pattern recognition, but also to
lower-level sensation and perception (McClelland, 1979).
107
3.4.1 Pattern recognition algorithms
The following sections demonstrate mapping abilities of connectionist networks.
3.4.1.1 Pattern recognition in two-layer networks
For a system to be considered capable of pattern recognition, it should display a
consistent response to the instances of pattern presented to it. Two-layer
networks are quite competent at such tasks and can learn to recognize the
pattern using the learning rules discussed above. Moreover, while doing so the
network would develop a good generalizing capacity. It not only will be able to
respond well to input patterns seen in training, but also to patterns previously
unseen – producing an output closely resembling the output of the similar known
input pattern.
This however is somewhat different to the typical human learning process as the
exposure is not restricted to the perfect examples but rather is an assortment of
similar cases with varying levels of distortions. Therefore, the network was tested
using the distorted inputs and output patterns. As a result, the network was able
to provide a qualitatively correct response very close within the numeric values to
a desired output – even to previously unseen patterns. Thus, the simple two-layer
network is able of learning to recognize several input pattern categories, and can
handle distortions in the pattern and respond to new patterns quite well and in a
natural manner.
108
3.4.1.2 Pattern recognition in multi-layer networks
Mapping input patterns directly onto output patterns is not sufficient for some
pattern recognition, and may require additional intermediate layers to facilitate
the information extraction. One such early interaction model was designed to
recognise visual patterns – four-letter words in a certain font (McClelland &
Rumelhart, 1981; Rumelhart & McClelland, 1982). Input layer contained
individual elements of letters, intermediate layer contained letters, and output
layer contained four-letter words. The intermediate layer in this particular model
is not a true hidden layer as the containing features were designated by the
researcher rather than being extracted as the result of the network learning
process. Instead, the intermediate layer was set up as a sort of an extra output
layer, where the network is able to report on either the letter or the words
depending on the task parameters. In a true multi-layer network with hidden
layers there is usually little reason to report the hidden layer as it normally
contains a sophisticated set of microfeatures extracted by the network that are
not easily interpretable.
Even though the network was designed decades ago and before the back-
propagation learning procedures were developed, it is able to accommodate in a
very human manner a range of conditions such as low contrast and missing
elements, and exhibit graceful degradation due to multiple soft constraints that
the network is aimed to satisfy.
109
Another fascinating aspect of human pattern recognition addressed by the model
is word superiority effect where letter recognition is improved when presented in
a context of a word (Reicher, 1969). Explaining the underlying processing that
considers the actual word in the course of letter recognition was something that
presented a challenge for researchers. McClelland and Rumelhart’s model (1981;
1982) provides one auspicious explanation where word recognition may affect
component letter recognition. As described earlier, the network contains a layer
of letter features, layer of letters, and layer of four-letter words. Each of the
feature units is positively connected to those letter units that contain the
features and negatively to those that do not. In the same manner, letter units are
positively connected to those word units that contain the letters in the
appropriate position, and negatively to those that do not; and word units are
positively connected to the letter units that the words contain. Additionally, all
competing combinations are negatively interconnected. When the input to the
network is supplied by activating all the feature units for all four letters of the
word, they excite the letter units to which they are positively connected, which in
turn excite appropriate word units. Word units will send the activation back to
the letter units, and the activation propagation in the interactive network will
continue for a number of processing cycles. This activation propagation direction
forward and in reverse is critical in our discussion of word superiority effect, as it
shows how the letter layer is activated from feature layer and from the word
layer in reverse direction. Thus, if feature units do not readily identify a word
through the letter units, the word unit with the best fit for the features would
110
activate those letters that are able to form a word. This allows the network to
deal with inconsistent and partial data such as misspelled or unclear words. If
there is one best fitting solution, the network will identify that word by activating
the missing letter unit through many cycles.
If however there are a number of possible solutions that fit, network behaviour is
even more interesting. The partial feature input would initially equally activate all
letters that fit, and once the activations would start going in reverse order the
word units would be able to exert activation onto letter units. Since certain words
are more frequently encountered, they would carry higher resting activation and
therefore would be able to inhibit other word units and therefore the appropriate
letter units, eventually being identified by the network as a solution. Thus, this
illustrates how a higher-level knowledge is able to exert influence on the lower-
level letter and feature units, suggesting that additional layer that would consider
word units in a context could have an effect on resting activations and therefore
on the overall network behaviour. This demonstrates the ability of network
models to complete patterns by predicting what is missing in addition to already
discussed pattern recognition ability. A more advanced network design could use
sensory inputs as lower-level units and theoretical or philosophical statements as
higher-level units, which could with the help of learning procedures exert
influence on lower-lever sensory units to facilitate the pattern recognition
process.
The network performance illustrated here is remarkable, and shows how the task
can be accomplished with the use of connection weights and activation functions
111
rather than the use of rules. Since, more sophisticated multi-layer networks have
been developed that make use of learning procedures such as back-propagation
and proper hidden layers able to extract microfeatures from the input patterns
useful in advanced information analyses.
Networks have also been used for semantic categories recognition modelling, as
discussed next.
3.4.1.3 Generalization and similarity
One of the key performance faculties that the networks exhibit is their ability to
generalize. Once the network is trained to classify the input patterns into certain
classes, when presented with a previously unseen pattern, it will provide a
response comparable to the response to a similar known pattern. What
constitutes this very similarity is a fascinating contemplation. One explanation
would rely on the properties shared between the two or more entities, which
bring upon the philosophical argument that any two entities share an infinite
number of properties. Thus, assessing similarity in terms of number of shared
properties is deficient unless constraints are imposed.
Humans however are quite adept at judging similarity. Networks also have a very
clear way of doing so – the similarity structure is an inherent component of the
weight matrix. Similarity however is a complex concept, and does not necessarily
have an objective measure or device capable of assessing it: if a network
generalizes in a manner dissimilar to human reasoning, it is natural to assume a
failure. However, it is crucial to consider the possibility that network may be
112
capable to generalize on a different level, perhaps not readily comprehensible or
qualitatively different. In fact, considering that current connectionist network
architectures are rather quite simplistic compared to the neurophysiologic
complexity of human brain, it should not be surprising to observe dissimilar
behaviours in the networks and in the mind. Environment has been shown to play
an important role in the developmental process of human cognition, and the
connectionist network devoid of such experiences may be unable to comprehend
entirely the sense of similarity in the same manner as humans do.
The fact that networks generalize following the same mechanism as the process
of pattern recognition should be considered a benefit as it does not need to
involve the extensive philosophical discussion briefly touched upon here.
3.4.1.4 Pattern recognition beyond perception
In connectionist networks, pattern recognition plays an imperative role at all
levels of analysis – sensational level to reasoning – without clearly defined
boundaries between the concepts of perception and cognition. In contrast, some
symbolic theories (for example Fodor, 1975) consider symbolic processing as
isolated from the sensational processing and is not regarded in terms of pattern
recognition. Some other symbolists (for example J. R. Anderson, 1983a) are
similar to connectionist networks as the pattern recognition does occur at all
levels of analysis. Any system substantially reliant on pattern recognition is a
prospective candidate to offer a plausible account for the intentionality of mental
states. This notion is discussed in detail in the following section.
113
3.4.2 Intentionality of pattern recognition
In philosophy, intentionality refers to the notion that mental states have meaning
and content. Intentional states are concerned with the phenomena that are
outside the cognition, and it has been one of the more testing tasks in cognitive
philosophy to describe how mental states become associated with the specific
phenomena and acquire the intentionality. The difficulty revolves around the
relation between the mental state and the external phenomenon that is unlike
any other. If one person believes something about the other, the first person
seems to have a relation to the other, and both need to exist for a relation to be
true. However, the other person very well may not exist at all, and yet the belief
could still be possible. For that reason, such a connection cannot be handled by
the means of relation alone.
Trying to solve the intentionality issue with the traditional symbolic approach to
cognitive modelling is particularly difficult, as the representations employed by
the symbolists are formal and the best that can be achieved is describing the
objects to which the symbols refer. This, however, only repositions the issue, as it
is now necessary to explain how the symbols used in the description relate to the
external phenomenon. The difficulty in explaining intentionality lies in finding a
way to relate representational states to the real phenomena. One such approach
is to consider the causal mechanisms that generate symbols in terms of the
transmitted information and how the symbol relates the information about the
object. Thus, when the symbol is activated without being caused by the referent,
it is still by the object that would cause activation in normal circumstances. This
114
framework however does not provide a plausible account of representing
nonexistent objects. Another problem with the symbolic approach in trying to
relate representational states to explain intentionality is that the arbitrary
treatment of symbols, resulting in the inability to relate the symbol to the
referent and the symbols becoming context-free. This is also evident from
psycholinguistics where meaning of certain words is dependent on the context –
much like referent may vary with the context. Explaining such causal relation with
symbolic models poses a problem in trying to explain significant variations in
intended referents depending on the use of the symbol within a different
context. Employing a more complex system of symbols to account for varying
contexts is one possible solution of addressing the issue. Another proposal
(Barsalou, 1983) suggests that concepts are not fixed, but rather are construed
each time from the individual elements appropriate to the context.
This proposal would be appealing to the connectionist perspective, where
symbolic elements could be treated as microfeatures spread across the units in
the network and additional input units could account for the context sensitivity.
Connectionism proposes the occurrence of processing within the system is
uninterrupted with the processes taking place in the external environment,
avoiding the separation of the sensation and perception in symbol processing.
Hence, the cognitive processing is position as occurring within the external
environment, where individuals use skills and behaviours to interact with objects
at varying level of abstractness. Pattern recognition networks capture the
115
regularities in behaviours at different abstractness levels suggesting a good fit
with the system and the external environment.
One of the key differences that separate connectionism from traditional symbolic
approach is that the connections that represent the interface of the system with
the external environment are not arbitrary, but rather are the result of the
learning process where only the relevant connections are defined as a pattern
that represents the interaction of the system with the external environment.
Consequently, a two-layer pattern recognition network would modify the
connection weights to reflect the input-output relation dictated by the
environment directly, whereas multi-layer network in addition would encode the
higher-order information such as microfeatures into the hidden layers. The input
in most cognitive modelling networks however is specified by the researcher and
does not incorporate the environmental parameters, and therefore would not
provide a sufficient evidence for the claim that network representations are
directly linked with the object. It is quite a common practise in other disciplines
though (for example engineering) where the networks are supplied with the
ability to receive a limited input about the outside environment, and in that case
the representation is very much about the external object.
A point of the essence to be made here is that representations in the hidden
layer are the result of network accommodating to the environment, and do not
constitute causal connection with any sensory input which makes them arbitrary
from the standpoint of system functionality. Connectionist learning systems are
designed to perform specified tasks, which involves functioning in a certain
116
external environment. The learning procedure defines a goal for the network
(error minimization in determining output pattern to a given input pattern), and
representations constructed in the hidden layer serve these goals by embodying
the external to the system information for the system, thus making these
representation about the external environment.
System response to a particular input is not necessarily context-free. Information
used in unit activations may correspond only to a general body of information
about the environment (contextual and other) in which the system is present;
and, depending upon the goals set, system learns to identify it through the
responses to the patterns. Therefore, system response is a complex combination
influenced by numerous factors some of which are only partly related to task at
hand and yet exerting influence on the general patterns of activation from within
the system. This versatility allows the system to adjust the response in relation to
other information available.
Connectionist approach to modelling cognition is able to provide knowledge
about the intentionality of mental states, where representational values
constitute the network’s response to the input pattern. Since the network is
adapted to the input pattern, the network state could be directly link to the
external environment if properly connected (sensory input units). Sensitivity of
the representations to the external and internal context makes even better of a
case in attempting the explanation of the nature of representations.
117
Discussed earlier instance of explaining the mental states that represent
nonexistent object could be resolved by the symbolic models with a relative ease
(employing a symbol for nonexistent object), still unable to provide an
explanation as to why the arbitrary symbol is linked in such manner. In
connectionist networks, it is possible that the output pattern will be the result of
internal network activity, thus providing an output that does not necessarily
correspond to any of the input patterns and therefore would represent a
nonexistent object. Such outputs would still be based on the featural elements
defined by the system and for that reason be a representation of such objects
and not the others.
This shows the important role that pattern recognition can play in intentionality.
Related to this philosophical discussion is a string of research in cognitive
psychology on the formation of semantic categories, which is discussed in the
following section.
3.4.3 Categorisation with connectionist models
This section outlines the progress in the body of psychological research on
concepts and categorization, and how symbolic and connectionist models could
be of relevance.
In categorization, methods of symbolic and connectionist modelling does not
differ to the high degree. It is possible to assign a symbol to the category, but
even do the symbols could serve little purpose without some sort of distributed
representation that relate the features to the categorical assignment – generally
118
referred to as exemplars. For that reason, even within the symbolic approach to
cognitive modelling categorization has been handled in a conceptually similar way
to pattern recognition: exemplars assigned to semantic category according to
their featural compound. Thus, a lot of research on categorization conducted
within the symbolic approach could be transferable to connectionist networks
modelling techniques. Both approaches acknowledge the primary dissimilarity
between the pattern recognition and assignment of exemplars to semantic
categories to be for the most part concerned with the featural level of
abstraction: low level (strokes in handwritten word recognition task) versus
intermediate level (possession of gills in animal classification task). Contrary to
the symbolic models, distributed representation across features may be sufficient
to represent the semantic category in a connectionist network, i.e. it does not
require a designated symbol or unit to denote the category.
To demonstrate how the connectionist pattern recognition model is able to
accommodate the classification mechanism, consider a two-layer network. Input
units represent specific features of the exemplars, and the particular pattern
across the inputs is a distributed encoding of the exemplar across the appropriate
features. In the same way, the output units represent the distributed encoding of
the probable categories, where the connection weights appropriate exemplars to
the suitable category – uncharacteristic to the category features would have low
connection weights with the category. Once presented with the exemplar,
network would propagate the input activations along the connections, and each
output unit would receive a summative activation from its inputs. Additive
119
combination of features is a modelling technique not a uniquely distinctive
characteristic of connectionist networks but rather is quite common to a whole
class of characterization models.
The method of distributing categorical representations across features and their
utilization have been a major research interest within the field of psychological
categorization modelling. The classical view follows the philosophical analysis,
where it is assumed that categories identify the sets that are defined by the
necessary and sufficient conditions; and knowing those conditions constitutes
knowing the categories. Consequently, the view suggests that all categories are
processed in a comparatively similar manner and exemplars are treated equally.
In the 1970s however, fundamental changes were proposed by Rosch and others
(for example Rosch & Lloyd, 1978) that challenge both consequential views. It
was demonstrated that in class-inclusion hierarchy one level among others is the
basic level and therefore is processed and acquired more easily; and categories
have a ranking structure where some exemplars are recognized as better
representatives of the category (Rosch & Lloyd, 1978; Rosch & Mervis, 1975). In
addition, it was demonstrated that prototypicality played an important role
across many information-processing faculties, and recognition and categorization
of typical exemplars shows better results as far as time and accuracy.
The typicality of the exemplars to the category is judged based on the features
shared. The features though do not necessarily follow the classical definition of
the category, nor need they be common among all member of the category or be
120
distinctive. Moreover, the typicality could be demonstrated among such
obviously defined categories as odd and even numbers (Armstrong, Gleitman, &
Gleitman, 1983), which indicated that typicality must depend on elements other
than classical definition. Therefore, categorization should consider category
definitions along with the typicality effects, which may still be insufficient to
represent knowledge structure on categories in an adequate manner.
Early cognitive models adopted the classical approach to represent knowledge of
categories, relying on logical statements and necessary and sufficient conditions.
With similarly aims, semantic networks (for example J. R. Anderson, 1974;
Norman, Rumelhart, & Group, 1975) implement highly localist structures where
units encode semantic concepts interconnected by a small number of
connections conveying the relations between those concepts. Hence, cognitive
propositional model and semantic networks represent two approaches for
symbolic knowledge architecture. Following the discussion on the prototype, the
semantic networks approach could be adapted to accommodate the idea that
mental representation revolves around the prototype rather than propositional
logics that determine the category: some representatives of the category may
have distinctively differing qualities than the others and therefore would lack the
corresponding connections. To account for effects of typicality, semantic
networks adopted the process of spreading activation (J. R. Anderson, 1983b; J. R.
Anderson & Pirolli, 1984), which signified a significant breakthrough in the
advancement of network models in the later years (having particularly high
121
resemblance with the localist connectionist networks and both can account for
typicality by the means of summative weighted features).
Prototype and abstraction models specify a somewhat different theoretical
position where belonging to a category in exemplars is assessed based on their
similarity to prototype. Multidimensional scaling of lists of features could be
employed to represent conceptual frameworks that comprise of features with
varying parameters, suggesting a characteristic rather than a defining feature
(including both continuous and discrete features in the model). Exemplar models
that followed inherited the probabilistic view of categorical structure, but
contrary to the prototype models, the category is represented in more detail by
the exemplars rather than the prototype (Medin, 1989). Individual featural
representations for every exemplar are stored and weighted, and therefore
similarity computations may involve weighted summative feature analysis as in
connectionist networks.
In categorization tasks, exemplar models present a direct competition to
connectionist models, providing an alternative way of distributed representation
across features in prototype extraction and categorization tasks. The distinctly
dissimilar assumptions regarding storage and processing in exemplar and
connectionist models may result in essentially different result of computation.
Connectionist networks retain information about exemplars only if this has an
effect on the weight matrix, and shift to prototype extraction for similar
exemplars as the exemplar numbers increase (retaining as much information
about exemplars as possible), using feature vectors for temporary activation
122
patterns. Exemplar models store information about particular exemplars as
feature vectors, which are used to compute the prototype. Ability of
connectionist network to model the information about exemplars and prototypes
within the same framework may offer certain advantages – especially considering
the much broader application in modelling a variety of cognitive tasks, and not
only categorization.
Categorization, as proposed by (Barsalou, 1983), could also be interpreted not as
a stable mental grouping of fixed entities stored and retrieved from memory
(Rosch & Mervis, 1975) as required but rather are produced as the particular task
is being performed – idea supported by the fact that people are able to construct
new categories upon request. The emergence of these concepts then relies on
vast amounts of continuous knowledge stored in long-term memory, which is
used to form temporary relevant to immediate context concepts in working
memory. Unlike symbolic models, connectionist framework is able to interpret
these findings in the following manner: concepts could represent stable patterns
of activation that determine further processing. However, on a different
occasion, the resulting patters could be altered due to activity elsewhere even
using the same weights. This may represent the continuous knowledge in long-
term memory.
Thus, pattern recognition capacity of connectionist networks is able to perform
categorization tasks, exhibiting typicality and task-sensitive variability effects –
some of the requirements that must be met by a successful model of human
categorization.
123
3.4.4 Pattern recognition in mental processes
Human cognitive abilities go far beyond the relatively lower-level tasks of
perception and semantic categorization and classification: phenomena could be
contemplated sans the actual perception taking place, a person could perform a
hypothetical planning of future behaviour, and other higher-lever tasks usually
construed in terms of performing logical inferences on symbolic representations.
On the condition that pattern recognition actually underlies much of cognitive
faculties that necessitate reasoning, connectionist framework would be able to
provide a plausible account of higher-level tasks in similar manner as in the case
with the lower-level tasks already discussed above.
One possible structure that may enable the relation between the pattern
recognition and all-inclusive account of cognitive ability is to utilize the stable
state representing one pattern as an input for the next level pattern recognition
system of higher order. Thus, the reasoning steps of cognitive performance could
be represented as a sequence of multiple levels of pattern recognition. The work
of (Margolis, 1987) supports the proposition that human function could be
explained in terms of pattern recognition, where humans decompose complex
situations by recognizing something and invoking the most suitable to the
situation pattern, which is then used to recognize something else and therefore
modified to reflect the situation to a greater degree, constituting the learning
process. In addition to the process making a judgement through the process of
reasoning, the process of reasoning why it is the case. This review of the process
of reasoning and making a judgement is in itself a separate pattern recognition
124
event that may provide and insight into the pattern recognition process on a
broader spectrum, and result in a modification of the pattern recognition
mechanisms based on the acquired learning.
The two arguments that (Margolis, 1987) offers to substantiate his claim revolve
around the seeming human limitation in logical and statistical faculty, and the
example describing the ability to adopt new scientific paradigm by some
scientists and resistance to such change by other. The first, based on the work of
Tversky and Kahneman (1973, 1974), describes the human difficulty to perform
an accurate statistical probabilistic evaluation – the fallacy that occurs according
to Margolis due to the inability to elicit appropriate pattern and is, therefore, not
a statistical but rather a pattern recognition error.
The second example is drawn from the adoption of the evolution of scientific
paradigm. Margolis argues that the ability to adopt and embrace the new
scientific paradigm requires learning to recognise new patterns. Developing new
patterns often requires the abandonment of the old established patterns and
may result in temporal deterioration of performance. This however does not
constitute that cognition comprises exclusively of pattern recognition faculties –
merely suggesting connectionism as one plausible explanation of cognitive
function, and without the use of symbolic rules. For a discussion on the
mechanism that illustrates how higher-order cognitive tasks may be performed
using the pattern recognition function rather than logical reasoning of a symbolic
system please see Rumelhart et al. (1986).
125
3.5 Knowledge representation in connectionism
The propositional knowledge representation that revolves around the idea that
knowledge is expressed and therefore can be transferred in propositions such as
sentences is accepted in a quite intuitive way. It is generally accepted by the
cognitive science disciplines (cognitive psychology, artificial intelligence, etc.) that
knowledge is represented in propositions: it is transmitted by books and lectures
that consist of sentences formed from mental sentence-like structures. Many
report a kind of an internal dialogue when describing their thought process, and
traditional information-processing models of cognition and language share the
assumption that propositions are what is processed. Connectionism on the
contrary, poses a challenge to this general assumption regarding the knowledge
representation, and networks are able to encode the knowledge in a qualitatively
different manner without the necessity to employ propositions, effectively
rendering the concept of propositions for cognitive modelling unnecessary.
3.5.1 Knowledge representation in Cognitive Science
Earlier efforts to model cognitive representations have relied on unstructured
declarative statements arranged according to predicate calculus rules: predicates
followed by a number of arguments that were seen as basic conceptual parts.
Propositions were connected through the rule of repetition and involved
recurring concepts. Further research by psychologists and artificial intelligence
researchers revealed the need for higher order structures sufficient to organize
the propositions, initially called schemata (Rumelhart, 1975) or frames (Minsky,
126
1975). Schemata represent the structured framework of knowledge where
propositions are allocated appropriate location. Once activated in the course of
cognitive process, schemata offered a response by default unless contradicting
information was available as a substitute and an update of the schemata tailored
to the immediate environment. Schemata and other higher order structures of
proposition organisation introduced to the propositions representation a limited
ability and characteristic of a pattern recognition system, where some knowledge
parts are semantically connected to other knowledge parts to make possible
further knowledge processing.
In 1970s, one of the challenges to dominant at the time propositional approach
came from the researchers that argue that knowledge was represented as images
– analogue and iconic in form in contrast with the abstract and arbitrary
propositions (J. R. Anderson, 1987). These distinctions between the literal
pictures and non-literal representations in visual and spatial form prompted the
emergence of multi-code models (J. R. Anderson, 1983a; Paivio, 2013) with
modality-specific knowledge representation, where visual information is encoded
as images and literal information as a verbal code – challenging the claim that
propositional representations are sufficient to encode all knowledge. Further
studies with clever design were able to establish linear relations between the
analogue dimensions and the response reaction time (Kosslyn, 1980); and even
suggested that analogue representations are useful in carrying out inferences by
employing mentally ordering objects in spatial array (Huttenlocher, Higgins, &
127
Clark, 1971), challenging the purely propositional knowledge representation
further.
Until the re-emergence of connectionism, cognitive scientists held models of
pictorial knowledge representation as alternative to propositional models – as
complementary only and useful in representing certain types of knowledge rather
than able to completely substitute propositional representation models.
Connectionism on the other hand, aims to provide a plausible account to some or
possibly all cognitive performance without any use of propositional
representation, thus occupying an opposite position to the traditionally
established approaches.
While considering connectionism as a model of cognitive performance, it is useful
to consider the concept of cognitive performance itself outside the propositional
representation theoretical framework of knowledge representation.
3.5.2 Types of knowledge
The distinction between the knowing how and knowing that developed by Ryle
(2009) is based on the human ability of not only knowing certain facts, but also
on knowing how to perform certain activities. The capacity to possess both types
of knowledge therefore is also different: knowing that requires the storage and
consecutive retrieval of the specific or relative proposition from memory.
Whereas knowing how may require a specific knowledge of the process and the
control of associated perceptual and motor systems necessary to complete the
activity successfully, such as planning, execution, monitoring, etc. Thus, the
128
propositional knowledge represents only a portion of human intelligence and is
not primary; and it is often the success rate in performing certain activities, both
physical and cognitive in nature, we are interested in while assessing the level of
intelligence. In fact, the intelligent practise or and theorizing itself is but one of
the activities that could be conducted in an intelligent or a stupid manner.
The behaviouristic account of Ryle’s knowing how consists of the disposition to
perform the activity in the appropriate circumstances, without specifying the
internal mechanisms involved. In cognitivism, it may be appropriate to attempt
an explanation of knowing how in terms of learning how to perform the activities
and what mental activities obtaining such knowledge involves. Cognitive science
provides an account of knowing how from the rule-based systems point of view,
where it is referred to as procedural knowledge. People often learn how to
perform new activities from others by receiving verbal instructions on what to do,
thus receiving procedural knowledge that enables them to perform the said
activity. As such, procedural knowledge is rule-based and propositional,
specifying the set of actions to be taken – for instance generative grammar in
linguistics (Chomsky, 1957), where the rules are used as abstract representations
of the competence level rather than a performance model. Verbal instructions
alone however are not sufficient to perform the behaviour in a satisfactory
manner, and require practice – actual or mental (Newell, 1994). Even though
adapting the cognitive rule-based models initially developed to work with the
propositional knowledge to accommodate the procedural knowledge has shown
considerable success, it does not necessarily provide an explanation of knowing
129
how in qualitatively different terms from the knowledge based on declarative
propositions.
Connectionist networks on the contrary, are not ordered in strings but rather
consist of interconnected units, and in many instances these units cannot be
straightforwardly interpreted in terms of symbols. The knowing how may refer to
the propagation activation in the network models, and therefore is more like a
dynamical processing in connectionism rather than a sequential application of
propositional rules.
3.5.3 Expert knowledge
Expert systems received a considerable amount of interest from the cognitive
psychology and artificial intelligence researchers over the years. Many different
tasks have been studied extensively, involving such complex activities as playing
chess, medical diagnosis and other (J. R. Anderson, 1981). The typical approach
aims to formulate a set of rules capable of achieving performance levels
comparable to a human expert through interviewing the experts and surveying
their methods and processes, which is then encoded as a computer programme
that simulates human expert performance. Many expert systems show high levels
of competence both theoretical and applied, which supports the notion that it is
possible to incorporate the knowing how into the propositional systems initially
designed to provide an explanation for knowing that. Not everybody is convinced
however, and Dreyfus, Dreyfus, and Athanasiou (2000) after carrying out an
extensive analysis of human skill acquisition and performance concluded that
130
expert systems approach is inadequate in simulating the human expert
performance, as expert systems are inherently limited in the level of performance
they are potentially able to achieve. Based on their analysis, a five-level scale of
skill development is proposed, where only the highest levels manifest the true
expertise:
1. Novice.
2. Advanced beginner.
3. Competent performer.
4. Proficient performer.
5. Expert.
Dreyfus, Dreyfus, and Athanasiou (2000) argue that the work on expert systems is
capable of addressing only the first three levels with the symbolic modelling
where the major cognitive tasks can be grouped into assessment of
circumstances, choosing the appropriate response and managing the rules to
accomplish the objectives. Developing additional cognitive tasks of the same level
or combining them to produce ones that are more complex would not be
sufficient to advance the competent performer to a level of an expert.
3.5.4 Vision knowledge
Knowing how to see is generally considered such a basic function that many
philosophers did not believe it may actually require knowledge to execute.
Sensation and perception was considered in terms of evaluating the observation
sentences to assess the truth-values, and the truth determination process
131
considered unproblematic as a simple visual recording mechanism. This
assumption however was challenged by Kuhn claiming that what we see depends
upon what we know: someone familiar with FMRI would see FMRI machine for
what it is, whereas somebody unfamiliar would see it as ‘some sort of a device or
an assembly’ instead. Thus, the epistemological threat to the objectivity of the
scientific process is revealed if the theory determines what the researchers will
see during the observations, effectively introducing relativism into science. What
is applicable here though is the notion that perception is a learned function and
therefore relies upon knowledge that is not represented propositionally. Hanson
(1958) accumulated a body of knowledge against the viewpoint that perception is
simply a recording mechanism, and argued that all people rather than all seeing
the same thing, each individual sees it from one of the dimensions first and then
may adopt to see it from another dimension. Thus, FMRI technician would see
the FMRI machine for what it actually is first, whereas layperson would first see it
as some sort of a device and can then make an inference; which is dependent
upon the learning, as to layperson needs to learn what the FMRI technician
knows first before being able to see what the technician sees.
What remains to be explained now is the process that determines a mechanism
for learning a set of propositions to facilitate perception, i.e. for perception
system to see what it would not be able to otherwise see as a result of learning.
This of course cannot be the bottom-up process of inference. Kuhn (2012) also
supported the notion that perception needs to be tuned to the discipline and in
case of significant shifts as a course of scientific progress, it is necessary for
132
scientists’ perception to be re-learned. This suggests that at least part of
practicing science consists of knowing how to perceive objects and events. Thus,
the actual process that describes in what manner does knowing how have an
effect on the perceptual system, and how this type of knowledge is acquired,
remains unsolved. Connectionism may be able to offer a way to resolve this. One
of the features of connectionist pattern recognition system is the ability to
perform even in situations with altered variations, as precise matches are not
necessary between the current and already learned patterns. Non-obvious subtle
regularities (often not easily describable in words) are extracted to identify
exemplars and prototypes and are stored in connection weights – a
nonpropositional way to encode the necessary for the task knowledge.
3.5.5 Logical inferences
Symbolic perspective designates logical inferences to be a lower level cognitive
ability. The normal process of learning the logical inference rules for symbol
manipulation consists of learning the rules and learning how to apply them as
part of subsequent practice. Practise improves the outcomes, but the process
does not become flawless and mistakes still happen. One possible explanation
suggests that some rules may be learned incorrectly as separate rules are
collapsed into one general rule later to be split into distinct separate rules. As a
result, the general rule may be still incorrectly utilized on some occasions. This
can be modelled by attaching the probabilistic parameters to the utilization of the
rules, where learning is expressed in terms of changing the said parameter (J. R.
Anderson, 1983a). Observations reveal however, that learning how to apply the
133
rules is an essential part of the overall learning process as it develops the knowing
how part in addition to knowing that – the explicit rules. Rule-based models are
usually designed to incorporate certain functionality of pattern recognition.
Would it perhaps be possible to eliminate the rules entirely, leaving the
connectionist network and pattern recognition to do all the work?
One of the challenges in developing a simulation model of consumer behaviour
lies in identifying the features and information that is used in the decision-making
process – one option is to rely on the established and developed theory that
specifies what is necessary.
Important to note that networks may require large number of training trials due
the fact that network models normally start from zero (tabula rasa) – unlike
humans that possess certain prior knowledge and training that may be applicable
to certain extent. Moreover, networks, alike to humans, are capable of attaining
high performance values as a result of repeated learning and error corrections in
the course of pattern recognition activity not reliant on proposition-like rules.
Thus, a distinctive differentiation between the knowing how and knowing that is
made apparent in the ability to apply pattern recognition processes to linguistic
symbols.
3.6 Summary
In this chapter, the arguments against the connectionist networks were discussed
that outline the inadequacies of the connectionist networks to model cognition
134
as opposed to the symbolic models. In response, the three connectionist
approaches were presented that include: (1) the approximationist approach that
argues that network models do provide a more accurate account of cognition
than symbolic models, (2) the compatibilist approach aimed at building symbolic
architecture into the networks, and (3) the external symbols approach. In this
paper we are for the most part concerned with the kind of combination of the
first and the third, as they offer a novel plausible explanation of cognition
opposing the traditional symbolic approach.
135
4. Methods
This section provides an overview of the research questions and research
methods employed. The data and analysis are described. The modelling approach
and variables used are explained and justified, and research process is outlined in
a sequential manner.
4.1 Overview
It is important to establish the boundaries of this research project, and discuss
the overall goals it is set to achieve. Firstly, as further discussed in the following
sections, the research objectives are set to explore the field of consumer
behaviour, a primarily positivistic field of study (P. F. Anderson, 1986b), and
attempts to model and examine the underlying architecture of the consumer
decision-making process employing connectionist NNs models. It is argued here
that utilitarian and information reinforcement are latent emergent variables that
are represented by the input items and thus need to occur at the level higher
than input level of independent variables (Foxall, 2009). Traditional methods such
as regressions do not have any levels other than Input and Output, whereas
connectionist network structures are able to incorporate a number of levels as
hidden layers – where the utilitarian and informational reinforcement should exist
conceptually as emergent concepts and representations. This is then principally
an attempt to develop explanatory modelling that would allow examination of
136
such higher-order attributes, and potentially offer a method to evaluate and
approximate utilitarian and informational reinforcement quantitatively.
It is important not to overlook the predictive capacity of the model, and its ability
to extract important patterns from the data, together with other dimensions
considered as well, such as explanatory dimensions, including both descriptive
and prescriptive application (Bryman & Bell, 2007). The overall context for the
extended discussion here is of course the extension of theoretical framework of
BPM to incorporate connectionist view as one possible direction to go forward.
To provide a comprehensive account of this research process, this section will
focus on the following: (1) research questions and hypotheses, (2) philosophical
position, (3) theoretical justification and evaluation of the approach and possible
alternatives, (4) research methods, which includes a sequential account of
research process, and (5) a concluding remark.
4.1.1 NNs models and linear models
Linear regression is undoubtedly one of the most widely used and important tools
to describe possible relationships between variables in behavioural science. Many
factors account for such widespread adoption – the seeming ease and
intuitiveness of interpretations certainly being one of them. Linear regression
models are usually fitted using the least squares approach designed to minimize
the lack of fit, allowing either to quantify the relationship strength between
variables or to develop a predictive model as a result. One of the most common
applications is trend line estimation in time series data to show change over time
137
– simple technique that does not require a control group or sophisticated
experimental design. Furthermore, predictor variables are often intuitively
transformed to improve the function fit, making linear regression an
exceptionally powerful inference method indeed – as is the case with polynomial
regression that can be too powerful and may often show tendency to overfit the
data. Employing the interactive variables is able to further improve upon the
modelling results, providing the possibility to examine nonlinear relationships.
Even so, considerable difficulties may be encountered when interactive variables
are employed to examine the relationships in large datasets with many variables,
and when relationships between three or more variables are examined, as the
number of interactions increases exponentially, and readily becomes
impractically large. Given n predictors, the number of items in a linear model that
includes a constant, predictors, and every possible interaction is 2n – with only 10
variables for example, total number of only 2-variable (excluding those of 3 and
more) interactions to examine and evaluate is 1024, and the selection process is
very tedious and manual, and oftentimes impossible. As a result, researchers may
examine only a few interactive variables that first come to mind, or none at all.
Another issue is the possibility of running out of degrees of freedom.
NNs, on the other hand, are inherently designed to examine dynamically all
possible interactions within the data during the learning process. All interactions
that carry predictive capacity are captured in the final network architecture by a
learning algorithm, and pruning methods systematically simplify the network to
138
expose the core explanatory architecture by removing the connections that do
not offer sufficient predictive capacity.
4.1.2 Architecture of NNs models
In the simplest form, where the number of hidden layers is set to zero (that is a
NNs model where the input layer is connected to the output layer with no hidden
layers between the two); NNs model develops a structure similar to the structure
of a logistic regression. As a result, the coefficient numerical values of logistic
regression would be identical to those of the weights in NNs model. Thus,
referring to the common ‘black box’ argument, it should be clear that NNs
models in the very least are able to provide level of explanatory capacity
equivalent to those of traditionally employed linear methods such as logistic
regression. This however has already been explored in greater details elsewhere
(for comparative analysis please see Greene, 2011), and in this paper NNs models
of higher structural complexity are examined.
In the high-level task of pattern recognition while examining complex behaviour
phenomena, linear models could only be useful in explaining linear relations. For
the purposes of the present discussion however, this would be insufficient as
consumer behaviour and the process of decision-making in a modern market and
socio-economic environment is without a doubt a very intricate and multifarious
phenomenon composed of a multitude of interrelated developments, where
simple changes in one part of the system are able to produce complex effects
throughout. It has been indeed a common practice to attempt to decompose the
139
larger phenomena and isolate the process into individual elements for the
following analysis controlling for all other variables. The learning thus obtained
could then be propagated to the higher level of the process. This method
however is very inefficient and poses a serious scalability problem – that is of
course in addition to the limitation concerning the ability of researcher to identify
the individual parts of the process correctly (the task some believe to be
impossible). A better method would be to examine the relations between all
components simultaneously.
NNs are able to examine all variables and account for nonlinear relations within
the data once the hidden layers are introduced into the model structure. This
results in high predictive ability, but also the weights could be examined for
explanatory purposes and are able to provide an insight into the intrinsic nature
of the process. Consumer decision making is an intricate continuous behaviour
exhibited by persons that NNs seem to be particularly suited for as a method of
analysis for a number of reasons. First, a NNs model framework as a method of
analysis resembles physiological inner workings and structure of a human brain –
making it a particularly good fit to study human processes. Second,
connectionism (the theoretical framework of NNs) is a set of approaches in the
fields of artificial intelligence and cognitive psychology that is particularly suited
for modelling behaviour as the emergent processes of interconnected networks
of simple units from the conceptual point. The hidden layers and nodes that are
developed in the process of training a NNs model (NNs models are repeatedly fed
data and adjust the weights in the process up to a point of equilibrium where the
140
model cannot improve anymore – method commonly referred to as training as it
indeed resembles the process of training in the traditional sense) are not like
input and output variables that come from the data, but could rather represent
underlying abstract concepts identified in the process of training that play a
major role in explaining the relation between the input and the output layers
(independent and dependent variables).
This paper will contemplate the idea of interpreting the NNs models number of
hidden layers and nodes and weight values in attempt to provide an explanatory
account of consumer behaviour. Previous findings will be summarized and
synthesized, and original models developed and assessed (both predictive and
explanatory capacity).
4.2 Research questions and hypotheses
The discipline of consumer behaviour encompasses contributions from a number
of complementary fields of study, including psychology, philosophy, marketing,
and economics (Bashford, 2009; Calder & Tybout, 1987; Holbrook, 1987; McKee,
1984; Pachauri, 2002). It is a common practice to produce research which is
highly quantitative in nature (for example Cornwell et al., 2005; Cunningham,
Young, Moonkyu, & Ulaga, 2006; Güneren & Öztüren, 2008; Lu Hsu & Han-Peng,
2008; van Kenhove, Vermeir, & Verniers, 2001; Watson & Wright, 2000), and is
also the case for this project.
141
As discussed above, the central aim of this research project is concerned with
extending the theoretical framework of BPM into the realm of Connectionism
with the help of NNs models, assessing the ability of connectionist models to
predict and explain the underlying psychological factors that influence and drive
observable consumer behaviour. One way to operationalise this is to assess the
capacity of a connectionist model to predict and explain the consumer disposition
to pay more or less for a unit of product they eventually receive.
Therefore, the hypotheses are proposed as follows:
H1: Artificial neural network models with pruning offer means to simplify
network architecture, while maintaining a level of predictive capacity
comparable to unpruned neural network models
H2: Consumer behaviour models based on connectionist framework offer
means to examine the latent or emergent variables that represent
complex consumer behaviour structures, which traditional linear models
such as logistic regression are unable to elucidate
The ability of connectionist models to develop the latent variables employing the
distributed representations during the learning phases is given a particular
attention in the discussion chapter, as this capacity offers unparalleled
opportunity to develop this project further.
142
4.3 Philosophical position
There are a number of ways to obtain what we generally recognise as knowledge
of consumer behaviour – ranging from a simple observation to a controlled
laboratory experimental work (1988). As such, researchers tend to consider
certain methods more suitable than others as applied to study particular
phenomena. It is therefore important for the purposes of present research
project to disclose and deliberate at least some underlying philosophical
assumptions and perspectives generally adopted here. For illustrative and
comparative purposes, the key philosophical aspects of each of the perspectives
are juxtaposed and the manner in which they may influence the general direction
of research are discussed: underlying ontology, epistemology, and axiology.
Potential challenges that either of perspective presumes are identified. The
section is then concluded with a summary remark.
4.3.1 Consumer behaviour position
For the most part, it could be considered a general knowledge that the field of
consumer behaviour is predominantly lies within the domain of positivism
(Marsden & Littler, 1996; Prus & Frisby, 1987).
As early as 1690 Locke (reprinted in 1997) argued that the method in social
sciences should follow the same principle as it is the case in physical sciences,
allowing for variances in prediction accuracy of course due to the obvious
complexities of the subject matter. It was hundreds of years later the concept
143
was adopted and further developed as School of Positivism by the likes of
Quételet, Saint-Simon, and most notably Comte.
As proposed by The law of three stages, phenomena are to be explained through
religion, metaphysics, and positivism, employing scientific laws, reason, and
empirical data (Bernard, 1995). In modern positivism, Comte’s original ideas are
still present in the following form: scientific method is the optimal approach to
generate effective knowledge with a reasonable degree of control, which can be
used to improve the general human existence. Logical positivism is one of the
later developments by Vienna Circle, also recognised as logical empiricism and
instrumental positivism, stipulates that social science is to become a purely
statistical exercise (Fullerton, 1987; Hunt, 1991).
Durkheim rejected most of Comte save the method, which was retained and
advanced to establish methods and techniques for scientific research (Durkheim,
1964). Later alternative perspectives emerged as a result of critiques of positivism
by Popper (1959), Kuhn (1996), and Foucault (1995), one of which is relativism.
In the following paragraphs, the two perspectives are compared and discussed in
further detail.
4.3.2 Ontology
Ontological assumptions revolve around the concepts of what constitutes the
nature of reality and of social beings.
144
4.3.2.1 Positivism
In positivism it is generally assumed for a single objective physical reality to exist,
independently of one’s perception (Peter, 1992).
Again, irrespective of individual perception, a single objective social reality is said
to exist. Reality is composed of parts and interconnections, and it is separable
and detachable; and it is possible to measure reality in a valid and reliable
manner. Thus, to achieve a greater understanding of the subject examined, it is
possible to control some of the other variables; and while individual inquiry may
only be able to provide an estimation of reality, collective effort should allow
developing a greater understanding and representation of reality. This implies the
concept of decomposition of complex phenomena, where parts of the
phenomena are taken out of the complexity and examined individually in
isolation one part at a time to determine the intricate relationships and broader
context.
A number of assumptions can be identified in positivism when it comes to the nature of
social beings. It is possible to interpret human behaviour as reactive: in behaviour
analysis for example behaviours are said to be reinforced by external factors acting upon
the individual, and reinforcers systematically change the frequency of reinforced
behaviour (Hildum & Brown, 1956; Insko, 1965). Similar predeterminism by outside
factors is suggested by a cognitive view: rather than reinforcers directly affect behaviour
it is the internal rationalization of the person that allows to make an optimum decision
based on the available information and experiences previously processed by the
individuals (Slovic, Fischhoff, & Lichtenstein, 1977).
145
4.3.2.2 Relativism
In relativism, subjective or objective reality exists only relative to the relativiser: a
person, a theory, or other. If the relativiser is a person, it may be a perceptional
judgment that is relativised and claimed to be true for that particular person –
referred to as semantic relativism. A perception which is claimed to be real for
that particular person is referred to as ontological relativism, and these two types
of relativism are not always explicitly distinguishable (Long, 1998). When all
judgments a person makes are true for that particular person, it can be described
as full semantic relativism – thus not only the truths but also the whole reality is
relative, which is in direct contradiction with positivism. By the same logic, what
leads a person to construe a judgement as truth also lies within this same reality
(Hunt, 1990). Therefore semantic relativism entails a version of ontological
relativism, where that which constitutes truth for the person is within the full
semantic reality (Nola, 1988).
Perceptional experiences could reveal ontological relativism as well: every person
has their own perceptual experiences that no other person has had, and these
individual perceptual experiences are subject to a prompt change. This would
suggest perception dependency on the perceiver, or that every person exists
within his or her own world of perceptual experiences. In the context of scientific
inquiry it could then be claimed that concepts exist only relative to certain
scientific theories, paradigms, and scientific frameworks (Nola, 1988).
146
Constructivism could be said to be interrelated with relativism in the context of
what accounts for existence of phenomena: if researchers play an active role in
creation and unravelling of scientific theory, their activities could be interpreted
as relativistic in terms of what objects are considered by the theory (Goulding,
1999). It is then possible for positivists to agree that researchers indeed play an
active role in the process – at the same time rejecting the implication that
researchers actively impact the objects of theories (Nola, 1988). Socially
constructed realities could then be seen as a product of relativism, as they are
dependent on other entities – contrary to the positivist view of a single, objective
reality.
4.3.3 Epistemology
As any other social science, positivism and relativism hold certain assumptions
around the concepts of what constitutes knowledge, the nature of causality, and
position of the researcher relative to the subject of inquiry.
4.3.3.1 Positivism
Positivism ultimately aims to derive the abstract generalizable laws that could be
applied to a wide range of individuals, situations, and phenomena. In other
words, positivists focus on determining the generalisations irrelevant of time and
space that are context-free as much as reasonably possible. Single events do not
hold any particular value unless they can be extended across systematic or
sequential generalizable instances (Bernard, 1995).
147
The meaning of causality plays a central role in positivism. Fundamental to the
underlying goals and values of the perspective, it is generally assumed that clear
linkages could be established between behaviour and prior events that led to it.
The deterministic nature of social beings is closely interrelated with the concept
of causality, as external events that act upon the individual are presumed to
affect individual behaviours or serve as affective factors otherwise (Bernard,
1995).
It is generally recognised in positivism that it is possible for researchers to largely
remain outside the research, and consistently strive to distance oneself from the
subject matter not extend any significant influence over the experimentation
criteria. The researcher, drawing upon expertise and research methodology, is
capable of manufacturing a hypothetical observation deck and, hidden by
impartiality and detachment, remain objective. The position of the researcher in
relation to the subject of study is then naturally assumed to be largely detached
(Hunt, 1993).
4.3.3.2 Relativism
A number of ways illustrate epistemological relativism. Firstly, as suggested
above, epistemology is inherently relativistic at least to some extent in a way that
what is known is relative to the underlying theory, framework, culture, person,
and so on. Therefore, what is known in relativist terms is dissimilar to what
constitutes knowledge in positivist terms, as relativist knowledge is relative to
something or somewhat rather than being seen as an absolute concept in
148
positivism (Nola, 1988). It is then should be possible to reinterpret a positivist
statement of What is believed to be true, could be true or false using the relativist
terms as follows: That which is believed to be true is true for whoever believes it to
be true.
Secondly, epistemological relativism occurs naturally due to inherent variability in
perceptual capacities, and the concept of incommensurability develops at the
level of observation in relativism – as opposed to positivist view that stipulates
the possibility of objective observation (T. S. Kuhn, 1996). Feyerabend’s (1975)
view on methodological anarchism draws upon precisely this notion of
epistemological relativism where it is applied on the level of methodological
procedure. Epistemological relativism assumes that any observation follows a
presupposed theoretical framework and therefore is inherently relative to that
particular theory, and thus unable to generate objective data which would enable
extrapolation of universal laws or rules (Nola, 1988).
Thirdly, one of the largely popularised Fayerabend’s (1975, 1978, 1987)
arguments claims there are no methodological rules. He challenges a single
perspective approach on the grounds that it would effectively limit the scientific
inquiry serving as a framework of constraints, whereas theoretical anarchism to
the contrary would facilitate the scientific progress. Adhering to the rules of a
single given perspective not only does not aid, but at times may even hinder the
scientific process. General scientific description cannot be described through
philosophical consideration, which would make it impossible to devise a method
to differentiate between science and pseudo-science (Nola, 1988).
149
Feyerabend (1975, 1978, 1987) goes as far as to say that a universal scientific
method does not exist, and any form of inquiry does not require a predetermined
methodological process. Scientific reason does not need to follow any prescribed
form of regulation that specifies a privileged perspective, as procedures designed
to establish a prescriptive system would result in offering different incomparable
ranking systems none preferable to the other. Admittedly, it may be possible to
achieve this within a certain constraint – no universally applicable method could
exist however. This key concept lies in direct contradiction with positivist notion
of universal laws that could be applied to general phenomena.
4.3.4 Axiology
What constitutes value by either of the perspectives is discussed in the following
paragraphs.
4.3.4.1 Positivism
The overall goal of positivism is prediction through derivation of universal laws
that may be able to explain behaviour (P. F. Anderson, 1986a). The understanding
of phenomena is tied to systematic demonstration of underlying associations
between the variables selected to represent the phenomena. The accurate
identification of these variables and antecedents that are related to the
dependent variables is central to positivist perspective, as it would offer a degree
of certain predictive capacity to be developed based on the results of the
analyses (Kerlinger, 1964).
150
4.3.4.2 Relativism
In relativism, the methodological criteria are selectively employed by the
scientific community and interpreted in response to empirical and social factors –
as opposed to the criteria in hypothetico-deductive methods generally employed
in positivism (Holbrook, 1989; Holbrook & Hirschman, 1982; Holbrook &
O'Shaughnessy, 1988). Relativism is quintessentially descriptive, as relativists
strive to develop a full comprehensive account of the phenomena rather than
extrapolating universal law-like relationships that could be applied to general
phenomena. It is not associated with any one particular method of inquiry – the
theoretical framework is based upon empirical and qualitative evidence, and may
include data of qualitative, quantitative, historical, and social nature along with
any other sources that could prove useful in the attempt to develop a
comprehensive representative account of the phenomena (P. F. Anderson, 1983,
1986a, 1988a, 1988b; Lutz, 1989; Siegel, 1988).
4.4 Research methods justification
In this section, the method of inquiry is reviewed and justified against
alternatives.
4.4.1 Assessment of the quantitative method
Some of the limitations of quantitative method that researchers should consider
may include inappropriate application of statistical methods and techniques to
carry out the analysis, which may in extreme cases reduce the research project to
151
the level of a purely statistical exercise that does not carry any other purpose.
Highly technical and demanding, quantitative methods could be exhausting not
only in terms of computational resources, but may also be limiting for the
researcher in terms of the skills. This could lead to the quantitative method being
susceptible to mistakes and inaccuracies that may result in errors and drawing of
wrong conclusions altogether. On a separate note, there are some researchers
who are not comfortable with research findings that derive meaning from
numbers, employ quantitative and standardized data, and statistical modelling
(Saunders, Lewis, & Thornhill, 2009).
Provided the researcher’s philosophical position is aligned with the quantitative
method, many potential limitations outlined above could undoubtedly be
interpreted as an advantage. The concepts that deal with validity, reliability, and
generalizability are often better accounted for by quantitative method as they are
inherently imparted in the design (Ghauri & Grønhaug, 2005). One other major
significant advantage is that any academic research publications are expected to
be described following the positivist method, irrespective of the actual
perspective employed (Wolcott, 2002).
4.4.2 Theoretical justification
Considering the nature of research questions set in this research project, it could
be argued that no other than positivist theoretical framework may be suitable: it
is unlikely any other researcher than positivist would even consider examining the
capacity of connectionist framework to accurately predict consumer behaviour
152
for example. Moreover, it could be said that research questions discussed here
are inherently interrelated with the core values of the positivist theoretical
position and thus form and inseparable part of it, as other than positivist
researchers would not concern themselves with such positivistic notions as
predictability, and would not therefore choose predictability as a central measure
of research questions to begin with (Alvesson & Deetz, 2000).
It is uncommon however to encounter consumer behaviour research that is not
based on quantitative method (for example see Haigh & Crowther, 2005;
Holbrook, 1989; Kaynak & Kara, 2001; Kehret-Ward, 1988; Kumcu, 1987;
O'Shaughnessy, 1985; Sanders, 1987). In fact, researchers continuously engage in
an ongoing debate on whether positivism and quantitative method are
appropriate at all to study consumer behaviour. Haas (1987) for instance argues
that the subjective and social nature of consumer behaviour are obscured by
positivism and its objectivity. He would argue that human consumer behaviour is
above all a social process, and meaning derived from the interaction of
individuals should be preferable to that which is based on a quantitative method:
individuals do not respond to stimuli in a mechanistic manner as prescribed
according to the theoretical assumptions of the school of behaviour analysis, but
rather construct activities in a meaningful intentional manner. Behaviour is
individualistic, and is a product of perceptions, interpretations, and judgement
statements within a certain context, and therefore requires to be explained
through the perspective of the individual (Haas, 1987). Denzin and Lincoln (2000)
go as far as to state that all research is essentially interpretative, as it is
153
fundamentally conducted in a manner that reflects the researcher’s
weltanschauung – a set of beliefs and feelings about the world, and how to
understand and study it. Therefore, an interpretative approach would be
preferred to positivism, being capable to produce a profound and thorough
understanding of behaviour.
4.4.3 Alternative approaches
Considering the nature of research questions proposed here, it is quite possible
no other approach could be suitable without significant alterations to original
goal of this research project for a number of reasons.
As discussed above, the very formulation of the research questions proposed
here would likely not happen employing any other approach: predictive capacity
and behaviour modelling are inherently positivist notions, and are central to
critical behaviourist approach. Any attempt to consider the subject matter
employing any other alternative approach would inevitably require the
modification of research questions, as research questions contemplated here are
completely and profoundly interconnected with the quantitative scientific
method of inquiry and with the underlying theoretical and philosophical aspects
of positivist approach.
154
4.5 Research method
This section describes the research methods and design specifics employed:
includes a comprehensive description of the sample, explains the research
design, and describes statistical methods employed.
4.5.1 Sample
The Homescan data used here was acquired from the National Office of Statistics
panel that comprises results from a survey of about 35,000 households and
contains barcode scanned records of all their food purchases (was also used for
example by Heravi & Morgan, 2014a; Heravi & Morgan, 2014b). The data arose
from a market research data set supplied by Taylor Nelson Sofres (TNS), part of
Kantar World Panel.
The subset selected for the analysis here covers only one product group: wine.
This resulted in a subset with 170,989 cases, which cover purchases of 4,939
individual households over the time period from October 2002 through
December 2005. A total of 224 variables are present in the dataset that include
transactional, demographic, and product attributes – not all the attributes are
usable however and many repeat variables are included (only usable for other
product groups) which were omitted in the analyses here.
Data supplied by TNS UK Limited. The use of TNS UK Ltd data in this work does
not imply the endorsement of TNS UK Ltd. in relation to the interpretation or
155
analysis of the data. All errors and omissions remain the responsibility of the
authors.
4.5.2 Research design
In previous work, informational and utilitarian reinforcement data acquired from
matching studies (Foxall, Wells, Chang, & Oliveira-Castro, 2010) was integrated
with the consumer behavioural data, effectively appending two additional
variables to the dataset on a transaction level to reflect the informational and
utilitarian reinforcers each brand was able to offer. As a result, for every case in
the dataset that describes a brand purchasing decision, utilitarian and
informational reinforcement parameters were used as independent variables
(Greene, 2011). Even though it was shown that these additional reinforcement
variables were able to contribute to the modelling, significantly improving the
predictive capacity of the model, it is argued here that informational and
utilitarian reinforcement as described in BPM (Foxall, 1990, 2004, 2005) are
higher order latent variables that are formed using the regular variables such as
product attributes and consumer demographics, and therefore ought to be
represented by hidden layers in NN architecture.
4.5.2.1 Variables
Usually the process to identify the predictive variables to explain the relations
with the dependent variable tend to be tedious and time-consuming: researchers
start with a set of independent variables and identify the most predictive one,
then begin to add more variables systematically and assess the model with R2 or
156
preferably adjusted R2 and AIC value to decide whether it is worth adding the
extra variables to improve the model. This process is very much a manual
approach, and depends entirely on the perceptions of the researcher which
variables to consider and in what order. There is also a matter concerning
interactive variables, where interactive variables are sometimes produced and
incorporated into the model – but even on those occasions only a few
interactions between variables are considered. Once the interactions are
considered, the choice of interactive variables to include into the model increases
exponentially with the number of independent variables, and the manual process
is unable to examine any kind of exhaustive list of interactions by any measure.
This is one major drawback of the traditionally employed methods of analysis, as
they all require predetermined structure specified during the modelling phase.
Neural Networks on the other hand do not require any predetermined structure
– connectionist models take complete input data and, during the training process,
networks determine the best predictive variables and inherently examine all
possible variables interactions, thus eliminating the otherwise necessary
requirement to specify the network architecture beforehand. Various pruning
methods are then able to strip the model further, producing the lean underlying
structure that can serve as best estimation of underlying patterns within the data.
Price is an obvious choice for a dependent variable for any predictive modelling
exercise, but from a purely semantic consideration, it becomes apparent that a
straightforward price prediction may not provide a robust platform for the
analytical work of explanatory nature. As previously discussed, consumer choice is
157
a probabilistic value in behaviourist terms, and required to be operationalised as
a proportion of instances of choosing one product over the other in a given time
frame. Price alone could be overrepresented by the utilitarian reinforcement
variables, as the majority of wine purchased by absolute volume or bottle count
in the dataset used here would be a regular table wine. As such, the willingness of
the consumer to pay more or less per litre of wine proposes a better modelling
opportunity from the semantic and explanatory point of view as discussed in
detail in the following chapters. Instead, considering the context of the BPM and
theoretical framework adopted for this research, a new variable was generated
as a ratio of expenditure to pack volume to be used as a dependent variable here:
price paid per litre.
Connectionist models developed in such a way would then be useful in providing
an insight into what underlying factors influence consumer choice situation and
to what extent, and be able to assess quantitatively the changes to price per litre
that the consumers are willing to pay based on the information available from
independent variable values. Models could eventually contribute to the
development of connectionist framework able to explain the consumer
purchasing decision.
4.5.3 Apparatus
Statistical and data manipulation software employed during this research project
include Microsoft Office Excel 2007 (Microsoft-Corporation, 2006), SPSS 17.0
(SPSS-Inc., 2007), R version 3.1.2 (R_Core_Team, 2014), and R studio (R_Studio,
158
2012). Initial compiling and appending of data was done in SPSS, which is most
suitable to handle large data in addition to a reasonable degree of statistical
analytical capacity – still inadequate for the connectionist modelling required
here however should be noted. Excel was often used to quickly view the data due
to its versatile nature – must be said it was completely useless for most of the
statistical analyses however. R on the other hand is an exceptionally powerful
application capable of performing advanced analyses – still, additional coding and
package development was required to enable pruning and connectionist network
visualisation. Developed by statisticians as an open source project, R consistently
benefits from the contributions of worlds’ leading analysts, which was used to
developed and consequently assess all modelling work here. R Studio software
merely packages R and provides a very user-friendly interface and additional
functionality, which makes it much easier to use R.
4.5.3.1 R package development: RSNNS and NeuralNetTools
A large number of R packages were used, but the two most notable are the
RSNNS (Bergmeir & Benítez, 2012) to do all the modelling and pruning, and
NeuralNetTools to plot the connectionist network architecture once developed.
RSNNS was well developed to incorporate the functionality of the original
Stuttgart Neural Network Simulator (SNNS) software (Zell et al., 1994; Zell,
Mache, Sommer, & Korb, 1991a, 1991b, 1991c; Zell, Mache, Vogt, & Hüttel,
1993) into the R package as far as NNs modelling, but was lacking the pruning
element – as did any other NNs package at the time. It was then essential to get
159
involved with the original package developers, and introduce pruning as well
which was implemented in the original software. For that, author performed the
necessary coding in C++, which was then supplied to the package developers,
who were able to code the necessary wrapper to integrate it into the R package.
As a result, with a new update of the RSNNS, it now provides the pruning
functionality to anybody who may be interested to pursue the connectionist
modelling route as this author.
Somewhat similar events took place around the NeuralNetTools package, which
was able to plot the connectionist architecture but not as far as the part when
pruning occurred. After the collaboration of this author with NeuralNetTools
package developers, it is now possible to plot pruned connectionist networks as
well – a great way to represent visually the underlying architecture of the
network as a result of model learning process.
4.5.4 Sequential account of the research process
This project builds upon previous work, and develops the subject further by
addressing some of the limitations expressed heretofore (Greene, 2011). As such,
the initial phase of this research project is consolidating the previous findings and
identifying the line of inquiry that would be able to deepen the understanding of
consumer behaviour. Extensive literature evaluation paired with ongoing
discussions with specialists in the field of consumer behaviour is carried out to
provide a comprehensive account of developments in theoretical framework of
BPM as it is applied in empirical work. The connectionism is discussed in detail,
160
giving particular attention to the philosophical developments and application of
NNs models in consumer behaviour.
The second phase is largely revolving around the software development to enable
the types of analyses and modelling work required here as described above. As a
result, two R packages are improved to provide additional functionality.
This is followed by the next phase, which involves the models being developed.
Some regression models are developed during the exploratory stages to learn the
dataset, but their development is not progressed further as it is not within the
scope of this research project – moreover, previous work already carried out the
comparative analyses of logistic regression with NNs models (Greene, 2011).
Instead, NNs models of varying architectural complexity are developed to
examine the process of formulation of items in hidden layers of NNs model that
could be interpreted as distributed representation of informational and utilitarian
reinforcement as described in BPM.
Final phase built upon the modelling results acquired in the stages describe
above, and encompasses a comprehensive discussion of implications on
extending the BPM framework into the realm of connectionism. Future
opportunities and research directions are identified, discussing the limitations of
this research project.
161
4.6 Summary
This chapter focuses on unfolding the comprehensive account of research
process in sequential manner to provide a thorough view as it is employed in this
research project: an extended overview outlines the overall direction and goals,
followed by a discussion on research questions, clarification of philosophical
position, and justification and description of research methods with a sequential
account of the process.
162
5. Analysis
This chapter discusses the statistical analyses employed, describes the specifics of
the models developed in the course of research project, explains the tests in
detail, and provides an overview of the results.
5.1 Preliminary data manipulations
The dataset was made available by TNS UK and the fieldwork was carried out as
part of the Kantar World Panel and was originally obtained for the purposes of a
different research study (Heravi & Morgan, 2014a, 2014b). The master database
contains a large amount of data for a number of product categories: consumer
household descriptive database contains all the household descriptors such as
consumer demographics and household details, a transactional database contains
all the purchasing data on individual transaction level, and product attributes
database describes all the product SKUs in detail. The size of the database in MB
was so large that it caused certain logistical problems while simply moving the
files from one machine to another. As such, it was decided to focus on one
product category – wine was selected for no other reason than author’s previous
experience in the wine industry, which involved working with wine data on a daily
basis, thus offering a degree of familiarity with the data from the onset. Together
with the colleagues from Cardiff Business School (Heravi & Morgan, 2014a,
2014b), the data was consolidated into a transaction level database, where
household descriptors and product attributes were appended to the database. As
163
already described above, this resulted in the initial database with 170 989 cases
in total, and 319 variables. Even though the data contained an incredible number
of descriptors, many of the attributes – product attributes in particular – were
either not available or duplicated, no doubt used for other than wine product
categories. Data manipulations initially were carried out on a superficial level
only, without adjusting the values in any manner – only to ensure the data is
easily transferable between different software applications employed here and
would not cause any issues. During the exploratory analysis, many of the
unusable variables – obvious duplicates and empty categories – were removed
after initial examination, leaving the dataset with a total of 182 variables.
Normally it would be beneficial to preserve the data as complete as possible, but
the variables removed would not offer any contribution whatsoever and would
rather create unnecessary noise and clutter.
5.2 Exploratory analysis
The compiled dataset contains a number of variables that describe individual
households, product attributes, and more importantly the transactional data for
every purchase occasion totalling at 170 989 cases: this makes it a transactional
level data, as opposed to consumer level data where each case would represent
individual consumer or household, most likely summarising the individual
purchasing decision to some sort of an average or a sum value.
164
In the past author carried out similar research focusing on consumer loyalty as a
dependent variable, which was operationalised with an additional variable
calculated as a proportionate value of a money spent on the most often
purchased product variant divided by the total amount spent on all product
variants in a given time frame. This time however research questions deal with a
decision-making process where the consumer makes a conscious decision to give
up a certain amount of money in exchange for perceived utilitarian and
informational reinforcement that the product – in our instance wine – would be
able to provide. In particular, we are interested in the emergent process of
assigning value to the unit of product, and therefore the operationalisation is as
follows: dependent variable is the price per litre paid by the consumer, which is
then predicted with different modelling methods using any consumer, product,
and transactional independent variables. This new to the dataset dependent
variable is calculated by dividing the total amount paid by a consumer household
during each single transaction by a total number of litres of wine purchased,
which produces a numeric monetary value. Important to note here that, unlike
previous research that mainly focused on the predictive capacity of NNs models
as opposed to traditionally employed methods such as logistic regression for
comparative purposes (Greene, 2011), this research takes a step further and aims
to model the emergent process and examine the explanatory capacity of NNs
models in attempt to ultimately explain and even visualise the decision-making
process.
165
5.3 Regression and NNs comparative analysis
To examine the relations in the data, some exploratory regression analyses were
carried out.
To facilitate the modelling and avoid taxing the computational resources too
much, a subset was selected for exploratory regression analyses by geography: all
of Wales and West, which included 13787 total observations, 13782 once the few
cases where price was 0 were removed. Price per litre variable was calculated
here using the total amount spent divided by volume, and rounded to two digits
(pennies). All transactions were then assigned to either low spender or high
spender category, effectively converting price per litre into a new binary variable
as follows: price 3.99 identified as a dividing point between the two very distinct
types of purchases.
Variable conversion into binary is of course necessary for logistic regression
analysis – it is a method of choice in extensive marketing research literature, and
has been proven to offer consistent level of insight when it comes to modelling
relations in the data (Adya & Collopy, 1998). Starting with a logistic regression
establishes a solid basis for the ongoing analysis, and R (R_Core_Team, 2014)
offers ample solutions with multitude of packages that provide the logistic
regression functionality to examine the data.
Zelig (as described in Crosas, King, Honaker, & Sweeney, 2015) is one useful
package in R to build regression models, and is used here for comparative
166
purposes. Using only eight independent variables to predict the binary high or
low spender dependent variable, a regular least squares regression model is
developed with the following results: multiple R-squared of 0.07001, and
adjusted R-squared of 0.06947. Rather low R-squared values are expected
however while trying to predict what type of spender consumer may be from a
few simple descriptive attributes, and may indicate a challenge regression model
faces using the consumer behaviour dataset working with the dependent variable
converted from the probabilistic to binary. Price per litre is a numeric monetary
value, and when converted to either low or high value binary in nature is bound
to have an effect on R-squared values. Low R-squared values may also be due to
the fact that relevant variables are not available or were not measured in a
suitable manner, or that the model is not able to account for other effects, such
as nonlinearity. In validating models of consumer behaviour however, the R-
squared value may not be as important as other measures: within-sample
prediction accuracy for instance, or the coefficient values and the properties of
the consumer response promptness to price changes. As this research is not
particularly concerned with the R-squared value itself, but rather with its
determinants and underlying structural effects, the inherent nonlinear effects of
NN models are expected to provide quite the improvement with connectionist
models discussed later.
The data is then randomly split into two subsets for validation – regression-NNs
validation subset 1 and regression-NNs validation subset 2 – and models are
developed using both subsets independently in parallel. As a next step, Logit
167
regression models are developed with Zelig package for each of the subsets using
the same variables as with least squares regression described above, and all
variables are statistically significant as shown in Figure 4 using regression NNs
validation subset 1.
Figure 4. Regression coefficients and significance levels for regression-NNs validation subset 1.
Even though linear regression models and NNs parallel connectionist models are
fundamentally different, it is beneficial nevertheless to try to bring them to the
same level of analysis as a benchmark and compare the regression variable
contribution parameters that the logistic regression model provides with the
weights of the NNs model. Connectionist models are very powerful algorithms for
a number of reasons – not the least is inherent parallel nonlinear configuration
that requires no predetermined model structure. For the purposes of
comparative exercise, it is possible to isolate this nonlinear capacity and limit the
NNs models to only 2 layers of nodes, effectively constraining the model to a
linear function: input and output layers. Thus, using the same variables, the
simplest 2-layer NNs model is developed using a simple and elegant nnet package
in R that offers functionality to satisfy the NNs modelling research in most cases
168
(Venables & Ripley, 2002) – no hidden layers, just two layers for input and output
nodes. As a result, the 8-0-1 NNs model contains 8 weights as shown in Figure 5.
Figure 5. NNs model results for regression-NNs validation subset 1.
What immediately becomes obvious once the regression and NNs results are
examined is that NNs mode weights are identical to regression coefficient values
– the simplest NNs model with no hidden layers performs in exactly the manner
as logistic regression.
This is confirmed by using regression NNs validation subset 2 as well, and NNs
model weights are again identical to those of logistic regression coefficients as
shown in Figure 6 and Figure 7.
Figure 6. Regression coefficients and significance levels for regression-NNs validation subset 2.
This demonstrates how a simplistic NNs model with no parallel nonlinear
processing performs exactly as logistic regression would, and provides connection
weights identical to regression coefficient values.
169
Figure 7. NNs model results for regression-NNs validation subset 2.
To increase the validity and reliability of the test, this procedure was replicated
1000 times: random split into 2 new subsets each time which are then used to
build logit and NNs models to compare the regression coefficient values with NNs
connection weights, providing analogous results every time.
5.4 Exploratory NNs modelling
Following the exploratory analysis with traditional methods such as regressions,
initial NNs modelling was carried out. This phase was mainly concerned with
testing the software capacity to select a suitable package in R to carry out the
modelling. Upon examination of the common R packages that offer NNs
modelling functionality, it became apparent that only a few offered the capacity
to use multiple hidden layers, and – more importantly – none offered the
capacity to carry out pruning. This meant that author was facing a few options:
either abandon R in preference of another software, develop a new package from
scratch, or work with one of the existing package developers to advance the
functionality and expand it to include pruning. First option was easily dismissed,
as R is one of the most advances statistical modelling platforms: if something was
not available in R, very likely this was not available in other software packages
either and would have taken significantly longer to develop and roll out to make
it available than it would take to do the same in R, as the statistical programming
170
environment for R is structured around developing new packages by researchers
for other researchers. Second option implied the authors would be highly
proficient with coding, which was not the case and developing the skill would
require significant amount of time and could be considered a research project in
its own right. Thus, the third option was pursued where collaborating with coders
and developers additional functionality for the existing R package was developed
to allow pruning the networks.
5.4.1 RSNNS
RSNNS (Bergmeir & Benítez, 2012) was an R package that allowed multiple
hidden layers, and Bergmeir, package developer, was very responsive to the
initial communication and general questions regarding the R package. This author
proposed a collaboration project to develop the R package to include pruning,
and Bergmeir offered assistance with wrapping the R code and updating the
package once the coding is complete. The underlying functionality and low-level
interface was done in C++, which meant this author had to develop a sufficient
enough level of understanding and skill to do the necessary coding. This
alternative however was the most feasible and an optimal choice.
Package development began with familiarising with the low-level interface, which
was based on C++ coding and already available for NNs training functionality, but
not pruning. Using the SNNS manual and the ad hoc assistance of package
developer that involved adding missing low-level functions to the package, with
considerable effort this author was able to compile initial code capable of
171
carrying out pruning using the low-level interface, which was then implemented
and updated by Bergmeir in the official package and made available to anybody
within the wide scientific community.
As nnet package only allows connectionist networks with a single hidden layer
and no pruning capability, RSNNS package is used for all consecutive
connectionist modelling from here on.
5.4.2 NeuralNetTools
NeuralNetTools (Beck, 2015) is a rather recent package used to visualise the NNs
architecture. When it was first encountered, this author found the flexibility in
visualisation for a number of NNs R packages extremely useful, including package
nnet described above, which was also used extensively in previous research
(Greene, 2011), and package RSNNS as well. Even though initially NeuralNetTools
was great for visualisation of the NNs architecture, it was unable to
accommodate the pruning process. When this author contacted the
NeuralNetTools’ package developer Beck with an inquiry, he was not aware
pruning was in fact available at all in R – not surprisingly so, as pruning in RSNNS
was only introduced from this author’s collaboration with RSNNS’ Bergmeir
(2012) very recently. NeuralNetTools’ Beck was happy to advance and develop
the visualisation capacity further, and after collaboration with this author,
NeuralNetTools had a new functionality that allows visualising pruned NNs model
architectures as well.
172
5.5 Advanced connectionist models: hidden layers
The simplistic NNs model described above did not use any hidden layers and
therefore did not develop any capacity to account for nonlinearity in the
architecture. Next, hidden layers are introduced in the connectionist modelling to
provide a nonlinear dimension and build advanced models of consumer
behaviour. For illustration purposes, number of variables used in the model is
increased as reflected in a more complex input structure, and a single hidden
node is introduced to the network architecture as shown in Figure 8. Black
connections identify reinforcing connections, whereas light grey connections
identify inhibiting connections – line weight corresponds with the connection
weight. A single node in the hidden layer would not be expected to contribute to
the explanation as compared with what a regression would normally be able to
offer however, so the number of nodes is gradually increased.
Figure 8. Connectionist network 54-1-1 architecture using consumer data with 1 hidden layer and a single neuron, 1000 iterations, no pruning.
173
Figure 9. Connectionist network 54-2-1 architecture using consumer data with a single hidden layer and 2 neurons, 1000 iterations, no pruning.
Figure 9 shows 2 nodes in the hidden layer: simply increasing a number of nodes
to a total of 2 results in a dramatic change in the network architecture, as now
the hidden layer is able to examine the nonlinear relations within the data.
If the number of hidden nodes is increased to 4 as shown in Figure 10, the
architecture becomes even more complex, but at the same time provides the
network with an additional capacity to extract microfeatures.
Figure 10. Connectionist network 54-4-1 architecture using consumer data with a single hidden layer and 4 neurons, 1000 iterations, no pruning.
174
Figure 11. Connectionist network 54-4-2-1 architecture using consumer data with 2 hidden layers and 4-2 hidden neurons, 1000 iterations, no pruning.
As discussed above, in addition to increasing the number of hidden neurons
within a single hidden layer, another way to increase the connectionist model
complexity is by increasing a number of hidden layers and distributing the hidden
neurons among multiple layers. Figure 11 shows connectionist network with the
54-4-2-1 structure, where a total of 6 hidden neurons are distributed among 2
hidden layers, and Figure 12 shows the network that incorporates yet another
hidden layer for the total of 3, with 14 hidden neurons in a 54-8-4-2-1 network
structure. This of course provides an innumerable number of options how the
initial network architecture could be arranges – in the following sections some of
this will be examined and explored in attempt to assess which type of network
architecture could be more suitable for either predictive or explanatory purposes.
175
Figure 12. Connectionist network 54-8-4-2-1 architecture using consumer data with 3 hidden layers and 8-4-2 hidden neurons, 1000 iterations, no pruning.
It is of course not the complexity in itself that we are after here, but rather the
sufficient flexibility and functional size of the initial network architecture to allow
the further developments such as pruning to be carried out in the best possible
manner, and provide optimal result. Nevertheless, this capacity to account for
such a level of complexity is what makes the connectionist networks such a
powerful pattern recognition algorithm as it allows extracting microfeatures that
may very well even be incomprehensible to human researchers, which at the
same time makes the interpretation extremely difficult or perhaps even
impossible. A number of variable contribution analysis methods exist that
attempt to examine the network connection weights and interpret the results will
be discussed later, but here the focus is on pruning methods as a preferred
method for the connectionist network to systematically eliminate some
connections that do not contribute to the explanation.
176
5.6 Pruning
The dataset used here is still the same subset for Wales and West as above, but
only 1024 cases are randomly selected to test the models on initial development
stage. Reason for a somewhat conservative number of cases is that it takes quite
a long time to process the model computationally: could be only a few seconds
for a simple model as above with 1000 iterations and a couple of nodes in the
hidden layer, or considerably longer for a for a large model with 3 hidden layers
and 100 000 iterations with retrain pruning cycles – as long as several days of
non-stop processing.
Before advancing to modelling with consumer data however, it would be worth
to examine and assess pruning process with simulated data and tasks – i.e. train a
NNs model using more weights than the task requires, and examine how well
pruning deals with eliminating the unnecessary parts to trim down the model and
expose the underlying core architecture to explain relations in the data.
5.6.1 Assessing pruning performance using simulated data
Before proceeding with modelling consumer data, it is important to assess the
pruning capacity to isolate and remove unnecessary connections. It would not be
feasible to test this with consumer data in which the relations and patters are not
yet established – in fact, this is something this research project aims to achieve to
an extent. Thus, simulated data would be used.
177
Simulated dataset would contain X1 and X2 values, which are randomly
generated figures between 0 and 1, 1000 items each. Y would be the following
function with some noise added in R:
y = sin (x1 - 2 * x2) + 1 + rnorm (1000, 0, 0.2)
Figure 13. Visual representation of y = sin (x1 - 2 * x2) + 1 + rnorm (1000, 0, 0.2)
In Figure 13 the data is visualised in a 3-dimensional plot, showing that the cases
may seem to form a linear relationship in a 2-dimensional a surface, whereas a 3-
dimensional representations shows the data to be organised around a surface
rather than a line – to solve this, the connectionist model would require at least
one hidden layer with just 2 nodes, making all extra nodes unnecessary and
redundant. If that is indeed the case, pruning algorithm should eliminate the
unnecessary superfluous architecture, leaving the bare minimum core necessary
to solve the problem: 2 nodes in a single hidden layer. To test this, a simplified
connectionist model with excessive 2-2-2-2-2-1 architecture is developed – that is
a connectionist network with 4 hidden layers in addition to input and output
layers containing 2 nodes within each of the hidden layers. As suggested above
178
however, this problem only requires a single hidden layer with 2 nodes, thus
pruning should strip out the rest of the network exposing the core network
architecture.
Figure 14. Connectionist model with 2-2-2-2-2-1 architecture design, no pruning used in model development.
No pruning was used to build the network shown in Figure 14 – as a result, it is a
fully connected network that uses all the hidden layers and nodes, even though it
would not be required to solve this particular computational problem. This
illustrates the issue with predefined model architecture as it is difficult and even
impossible to know what exactly would be required to solve the task with
consumer data before the data is actually examined – a case of circular logic
even. With the synthetic artificial data used here it is well known what is required
to solve the task, and are able to say definitively that the predefined network
proposed here for illustrative purposes that incorporates excessive and
redundant architecture is able to solve the task just fine, but at the same time
makes it particularly difficult to examine the network architecture in attempt to
explain and define relations between the variables and interpret the results.
179
Figure 15. Connectionist model with 2-2-2-2-2-1 architecture design, pruning used in model development.
For the model shown in Figure 15 exactly the same steps are taken to develop the
connectionist model using the same synthetic data as for the model shown in
Figure 14 above, but now employing pruning methods to optimise the network
architecture during the model learning process. Optimal Brain Surgeon algorithm
displayed better results with consumer data when all available in RSNNS pruning
methods were tested, and is used for pruning here (Hassibi & Stork, 1993). Using
Optimal Brain Surgeon algorithm offers substantial benefits for connectionist
models: improved generalisation, simplified network architecture, reduced
computational capacity required and improved processing time as a result, and –
crucial for the research questions postulated here – improves network rule
extraction capacity as a result. It now becomes apparent that the core
architecture necessary to solve the task is in fact much simpler than shown in
Figure 14 and indeed only requires a single hidden layer and 2 nodes – extra
neurons within the first, second, and third hidden layers are all redundant now,
as the 2 neurons within the last hidden layer are sufficient to solve the
computational problem. The level of architecture complexity required for the task
here overall is rather low, as we only have 2 nodes in the input layer and few
hidden nodes – yet it is apparent that the pruned network architecture is
180
substantially more clear and easier to examine and interpret as a result. Pruning a
highly complex network architecture that would be trimmed to the core network
as a result should, potentially removing multiple nodes, should provide a
substantial benefit in explaining network architecture.
Since the synthetic dataset was designed for a network that would only require a
single hidden layer with only 2 nodes, we could assess performance of the 2
models above and compare it precisely with that model architecture: a simple
model with only 2 hidden neurons within a single layer. As a result, the network
architecture is as shown in Figure 16: fully connected with 2 hidden nodes.
Effectively, this architecture is similar to the one described above after the
pruning was carries out (shown in Figure 15) as pruning removed all but the 2
hidden nodes in the last layer only.
Figure 16. NNs model with 2-2-1 nodes in a single hidden layer, no pruning.
It would be useful to compare model performance of all 3 models described
above: 2-2-2-2-2-1 model with no pruning (Figure 14), 2-2-2-2-2-1 model with
pruning carried out (Figure 15), and then 2-2-1 model with a single hidden layer
(Figure 16).
Using RMSE, pruned 2-2-2-2-2-1 models (Figure 15) show RMSE output around
0.28, similar to 2-2-1 models with a single hidden layer and no pruning RMSE
181
output around 0.27 (Figure 16) – interestingly enough, 2-2-2-2-2-1 models with
no pruning (Figure 14) provide RMSE that varies between higher values around
0.53 and similar to other models 0.27. This procedure was replicated 1000 times,
providing consistent results.
This should serve as a convincing argument that pruning connectionist models is
an optimal approach to develop robust representative models to extract and
describe representative patterns within the data, and would be exceptionally
useful to examine and potentially explain consumer behaviour and decision-
making process of consumer choice. Moreover, the tests carried out suggest that
pruning is not ably able to substantially simplify the network architecture and
reduce the network size by eliminating the inessential connections, but able to do
so while the maintaining the level network predictive performance on a similar or
better level as compared to models where no pruning was carried out.
5.6.2 Assessing pruning performance with consumer data
Now that it is established that pruning is an effective and efficient way to remove
unessential connections to extract relevant patterns in the data and reveal the
core relations that may be able to explain consumer behaviour and decision-
making process, it would make sense to proceed with consumer data modelling.
Using the same Wales and West consumer data sample subset as in the section
describing comparative analyses above but with additional input variables, the
connectionist modelling is carried out to develop a predictive and explanatory
representation of consumer behaviour. For the first set of exploratory models,
182
only 1024 randomly selected cases are used – this is of course to speed up the
modelling process where a number of preliminary starting model architectures
are examined. Even with a method of analysis that does not require the final
model architecture to be predetermined and thusly to a large extent defined by a
researcher, there is still a matter of initial model architecture which needs to be
defined by the researcher nevertheless – i.e. the number of hidden layers and
computational nodes to be used and consequently pruned. This will be discussed
in detail in the limitations section later.
Number of iterations (and number of retrain cycles with pruning) plays an
important role as well: using 3 hidden layers with 8 neurones within each layer it
takes 10.15 sec with 1000 iterations, this goes up to 63.40 sec with 10 000
iterations, and up to 637.25 sec with 100 000 iterations. This network is depicted
in Figure 17 where the 54-8-8-8-1 structure contains an input layer with 54 nodes
(some of the non-numeric inputs are automatically dummy coded and therefore
appear as separate input nodes here), 3 hidden layers, and an output layer can be
clearly seen. Even though the connection weights are clearly identified by the
connection line weight, it is nevertheless very difficult to comprehend such a
complicated network visualisation displaying 256 connections between 79
neurons, making any attempts at interpretation problematic. This is not to say
that the network shown here is extremely complex – in earlier research,
substantially higher numbers of neurons were examined to assess the predictive
capacity in relation to connectionist model size, going as high as 200 neurons
within a single layer. Thus, it is possible that a more complex starting network
183
architecture would be required here to eliminate the possibility of inadvertently
limiting the final network architecture by setting too small of a starting point –
this of course assuming that pruning algorithm should consequently be able to
eliminate any number of redundant connections and neurons within the overall
architecture.
Figure 17. Connectionist network 54-8-8-8-1 architecture using consumer data with 3 hidden layers and 8 neurons each, 100 000 iterations, no pruning.
This is a rather straightforward network architecture as far as number of hidden
layers and neurons – as a next logical step, it would be useful to introduce
pruning and examine the effects. Figure 18 shows a model described above:
three hidden layers with 54-8-8-8-1 architecture, using 1000 iterations and no
pruning: it takes only 10.15 sec to complete and produces 568 connections in
total with 79 neurons. Once pruning is introduced with 1000 iterations and only
100 retraining cycles, it takes considerably longer to calculate: 306.25 sec. The
resulting network architecture however makes it abundantly clear how helpful
the pruning process really is even from just looking at the network in Figure 19 as
compared to network architecture in Figure 18 where no pruning was done – a
large number of connections is pruned out decreasing the total number to just
184
85, a number of neurones are also pruned out leaving a total of 74, and the
connection weights appear to be higher.
Figure 18. Connectionist network 54-8-8-8-1 architecture using consumer data with 3 hidden layers and 8 neurons each, 1000 iterations, no pruning.
Figure 19. Connectionist network 54-8-8-8-1 architecture using consumer data with 3 hidden layers and 8 neurons each, 1000 iterations, pruning with 100 retrain cycles.
Figure 20. Connectionist network 54-8-8-8-1 architecture using consumer data with 3 hidden layers and 8 neurons each, 1000 iterations, pruning with 250 retrain cycles.
185
With 250 retraining cycles as shown in Figure 20, it now takes 736.28 sec to
complete, and the network architecture is further optimised down to 66
connections in total with 74 neurons.
It is clear from these few examples the general direction the modelling process
should take, and suggests that pruning could be of great help in reducing network
architecture complexity to improve the simplify the data interpretation for
explanatory purposes.
5.6.3 Pruning different network architectures
As described above, the connectionist network determines and defines the final
network architecture during and as a result of the learning and pruning process.
Researcher however does need to define the initial design of connectionist
network: number of hidden layers and number of computational neurons within
each hidden layer are set before the model learning process begins. Given the
nature of research questions this research project is mainly concerned with, it is
important to examine the effectiveness of pruning different connectionist
network architectures. It makes sense to proceed with a sufficiently large initial
network architecture not to limit the model capacity from the onset, but a
compromise is necessary between including a larger dataset than used in
previous tests described above and keeping the modelling time reasonable. Thus,
the overall network size will be limited to 12 computational units here – this will
keep the network size reasonably constant and will allow focusing on
manipulating network architecture. There are a number of network designs to
186
allocate hidden neurons within 1-, 2-, and 3- layer networks, which will be
discussed in the following paragraphs.
To begin with, the network architecture with a balanced 54-4-4-4-1 layout is
examined. Using a larger consumer data subset this time that includes all of
Wales and West regions with a total number of 13787 cases, it takes 503.84 sec
to complete and produces a fully connected 54-4-4-4-1 network with 67 neurons
linked by 252 connections as shown in Figure 21, and produces RMSE of 0.9087.
Figure 21. Connectionist network 54-4-4-4-1 architecture using consumer data with 3 hidden layers and 4 neurons each, 10000 iterations, no pruning.
Using the same data and 54-4-4-4-1 network structure, the connectionist
network is developed applying pruning with 100 relearning cycles. It now takes
quite a bit longer to train and prune the network – a total of 1709.36 sec – but as
a result produces a network with much leaner architecture, only 59 neurons and
56 connections, and RMSE of 0.9538, which is comparable to the unpruned
network shown in Figure 21. In fact, as can be seen in Figure 22 where the
network architecture is shown, the pruning was successfully able to effectively
187
remove a few hidden layers leaving only 2 nodes in the first hidden layer. The
network architecture shown in Figure 22 is the outcome of one of the first
modelling attempts, and indeed the result is extremely promising as far as
theoretical implications go, and what it may mean for the discussion of using
connectionist models to substantiate and extend the theoretical framework of
BPM.
Figure 22. Connectionist network 54-4-4-4-1 architecture using consumer data with 3 hidden layers and 4 neurons each, 10000 iterations, pruning with 100 retrain cycles.
This of course needs to be validated, with the procedure replicated numerous
times to see if consecutive models continue providing consistent results and
similar patterns in the network architecture can be observed.
To examine the changes in the network architecture as a result of pruning, a
number of different architecture types are examined here for exploratory
purposes. Using the same number of neurons within the network as in the model
shown in Figure 22, 3 different network architecture types are explored: 54-6-4-
2-1 network that funnels the connections through the hidden layers, 54-4-4-4-1
188
network which analogous to the one shown in Figure 22 used for validation and
benchmarking, and reverse funnel 54-2-4-6-1 network that imposes a bottleneck
within the first hidden layer but allows growing the network slightly throughout
the successive hidden layers. Each of these network types was developed as a
trained (no pruning) and pruned variant, and replicated 100 times with 10000
iterations and 100 retrain cycles (Optimal Brain Surgeon pruning algorithm shows
remarkable performance with pruning and does not actually require many
retraining cycles) for validation purposes, producing a total of 600 models as a
result. Figure 23 shows one of the most common architectures as a result of
pruning the 54-4-4-4-1 network, which was optimised from 67 units with 252
connections down to 64 units with only 67 connections. It is clear that the
network effectively eliminated certain neurons altogether by pruning their
connections.
Figure 23. Connectionist network 54-4-4-4-1 architecture using consumer data with 3 hidden layers and 4 neurons each, 10000 iterations, pruning with 100 retrain cycles.
Due to the nature of the data employed here with a large number of input
neurons, the initial architecture of the network with 54-6-4-2-1 contains
189
substantially more connections (between input and first hidden layer) – as a
result, the common network architecture type obtained here was optimised from
67 units with 358 connections down to 67 units with only 78 connections as
shown in Figure 24.
Figure 24. Connectionist network 54-6-4-2-1 architecture using consumer data with 3 hidden layers and 4 neurons each, 10000 iterations, pruning with 100 retrain cycles.
Figure 25. Connectionist network 54-2-4-6-1 architecture using consumer data with 3 hidden layers and 4 neurons each, 10000 iterations, pruning with 100 retrain cycles.
Models with initial 54-2-4-6-1 network architecture design contain 67 units and
146 connections, and can be pruned down to 63 units with only 25 connections,
190
as shown in Figure 25. Here, the bottlenecked network did not show the capacity
to sufficiently develop within the subsequent hidden layers, normally leaving a
number of hidden neurons unused.
These results support the premise that even the smaller neural network models
should be able to extract the highly complex patters from the consumer decision-
making data and represent it with a visually straightforward network architecture,
while being able to carry a substantial amount of predictive and explanatory
capacity. If this is indeed the case, a simplified version of the network
architecture with a few hidden nodes should serve as a great explanatory model
of consumer behaviour – say a network with a 54-2-1 architecture as discussed in
the following section.
5.6.4 Assessing explanatory capacity of a connectionist
model with a single hidden layer and 2 neurons
The simple approach would be to include only 2 hidden nodes – this way, the
model is able to account for nonlinear relations within the data, and all possible
interactive combinations of input nodes would be accounted for and linked to the
output node through the 2 hidden nodes. It then should be possible to argue that
the 2 hidden nodes in this simple connectionist architecture represent utilitarian
reinforcement and informational reinforcement following the BPM framework.
Figure 26 shows this 54-2-1 network, where only 2 hidden nodes are used with
1000 iterations – very simple straightforward network that only took 5.69 sec to
191
run and uses 57 nodes (54 of which are input nodes as can be seen in Figure 26)
and 110 connections.
Figure 26. Connectionist network 54-2-1 architecture using consumer data with a single hidden layer and 2 neurons, 1000 iterations, no pruning.
Figure 27. Connectionist network 54-2-1 architecture using consumer data with a single hidden layer and 2 neurons, 1000 iterations, pruning with 100 retrain cycles.
Once pruning is introduced in the network shown in Figure 27 with 100 retraining
cycles – it takes 20.04 sec to run and pruning algorithms removes all but 21
connections as a result. Obviously, this is a very straightforward and easy to
interpret – any one of the input nodes can be traced to one of the hidden nodes,
and to output – not only visually, but also quantitatively, as every connection
carries the connection weight of course. This should allow the examination of the
architecture in detail and assess the variable contribution level. The
192
interpretation of result should be very transparent as well, where all connections
are represented visually.
However, would it allow sufficiently robust network to develop before it is
pruned? Previous research projects carried out by the author indicate that the
connectionist network with a single hidden layer provides substantially improved
modelling capacity over traditionally employed methods such as logit, being able
to extract nonlinear patterns from the data (Greene, 2011). It is important to
keep in mind that this relatively simple network architecture is nevertheless
extremely powerful and capable to solve complex tasks in efficient and effective
manner as shown above.
Figure 28. Connectionist network 54-2-1 architecture using consumer data with a single hidden layer and 2 neurons, 10 000 iterations, pruning with 100 retrain cycles.
There are a number of ways possible however to increase the model complexity
while at the same time maintaining the underlying agenda to develop the
network architecture open to interpretation. One way is to increase the number
of iterations and retraining cycles, allowing the network to learn everything there
is to learn within the given network architecture with only 2 hidden node.
Network shown in Figure 28 is identical to the one in Figure 27 with the exception
193
of increasing the number of iterations from 1000 to 10 000 – it takes only 42.07
sec to run and as a result pruning algorithm removes all but 15 connections,
making it even easier to study and use the network architecture to explain the
relations within the data.
5.7 Connectionist model predictive capacity
The original comparison of a regression model with the neural network model in
section 5.3 can now be revisited and supplemented with the assessment of
predictive capacity that a neural network with hidden layers and pruning is able
to offer. Previous research programme (Greene, 2011) focused on a comparative
assessment of a traditional method of analysis represented by a logistic
regression, which is widely common in the marketing and consumer behaviour
literature, with a connectionist model in the form of a feed-forward neural
network with a single hidden layer – as a result, neural network model shows
superior predictive capacity than a logistic regression model. For that reason, the
emphasis here would be to assess predictive capacity of various connectionist
architectures only, focusing on pruning capacity. Logistic regression is only able to
show level of performance comparable to the simplest network architecture with
no hidden layers as shown in section 5.3, and therefore is not considered here.
Using the same dataset as in the analyses carried out in section 5.6, a number of
neural network architectures are assessed in terms of predictive capacity and
model fitness.
194
Connectionist network 54-4-4-4-1 architecture is assessed, examining unpruned
and pruned variants. Limiting to 1000 iterations, network models without pruning
take as little as 54 sec to run, and produces 252 connections between
67computational units. Once pruning algorithm is introduced, it models take
substantially more time to run – one of the lowest at 660 sec, with most running
into 1000+ sec. As a result however, networks with as few as only 29 connections
are developed, making it considerably easier to examine and interpret the
network architecture and connections. While examining the RMSE figures for
networks with and without pruning, the differences are comparable: networks
without pruning show RMSE figures as low as 0.79, while models with pruning
show RMSE figures that are able to reach levels as low as 0.82, which is very
comparable considering only 1000 iterations were used to develop the
connectionist models.
When connectionist networks with 54-6-4-2-1 architecture are examined in the
similar manner, the observed results are similar to the 54-4-4-4-1 networks.
Networks without pruning and 1000 iterations take at least 58 sec to run, and
produce 358 connections between 67 units. Network design with architecture 54-
6-4-2-1 allow more connections between the input layer and first hidden layer
which here contains 6 rather than 4 units in 54-4-4-4-1 architecture, and
therefore provides higher total number of connections. Once pruning is
introduced, it takes 1676 sec to produce a network with substantially simplified
architecture (certain model runs take even less time, but fail to reduce the
network architecture complexity to the similar level, and thus are omitted here).
195
As a result, it is possible to reduce network architecture to as few as 47
connections amongst 62 computational units, which again provides a
substantially simplified network for subsequent examination and interpretation.
RMSE figures for unpruned 54-6-4-2-1 networks could be as low as 0.83, whereas
models with pruning show RMSE figures that can reach 0.87 – again, quite
comparable performance, especially considering that only 1000 iterations were
used to run the models.
When looking at bottleneck networks with 54-2-4-6-1 architecture, models that
do not use pruning produce 46 sec to run with 1000 iterations, and produce 146
connections between 67 units. Number of connections is substantially lower than
in 54-6-4-2-1 and 54-4-4-4-1 network architectures – again, it is the reduced
number between the input layer that contains the majority of neurons with our
data, and hidden layer that now contains only 2 units that provides lower number
of total connections as a result. Introducing pruning increases model run time to
311 sec, and reduces the number of connections to as few as only 9 connections,
pruning out most hidden layers as a result. Occasionally the model would take
substantially less time to run, but as a result it would fail to reduce the
architecture complexity to the full potential – generally speaking with models
that use only 1000 iterations, the longer it takes to run, the simpler the network
architecture would be as a result, suggesting that pruning requires certain
computational effort to be carried out properly. As a result, networks without
pruning show RMSE starting at 0.90, whereas networks with pruning show RMSE
figures as low as 0.95.
196
Figure 29. Connectionist network 54-2-1 architecture using consumer data with a single hidden layer and 2 neurons, 10 000 iterations, no pruning.
Figure 30. Connectionist network 54-2-1 architecture using consumer data with a single hidden layer and 2 neurons, 10 000 iterations, pruning with 100 retrain cycles.
Considering a substantially simplified network architecture as suggested in the
previous section with 54-2-1 design, it takes as little as 38 sec for network
without pruning to go through 1000 iterations, producing 110 connections
between 57 units. Pruning 54-2-1 network takes 175 sec to run and can eliminate
majority of connections, leaving fewer than 10 connections. RMSE figure of 0.85
can be observed with models when pruning was not used, and when pruning is
carried out RMSE figures of 0.88 are possible.
197
When number of iterations is increased to 10000, the results over multiple model
runs are more consistent compared to 1000 iteration models, but do not seem to
show better RMSE figures. Increasing number of iterations to 100000 does
improve RMSE figures. It also becomes apparent that the difference in time it
takes to run models without and with pruning is reduced with more iterations,
suggesting that pruning more complex models with higher number of iterations
takes only marginally more time, which is a very positive feature in terms of
scaling model size.
It is then should be apparent that pruning is able reduce the number of
connections and network architecture substantially, while maintaining
comparable level of predictive capacity. The ability to expose the core
architecture of the data after all possible interactions were explored during
model training would greatly simplify the task of exposing relations within the
data that can be used not only for interpretation and explanation (as can be seen
by comparing Figure 29 and Figure 30), but also for more transparent predictive
modelling.
5.8 BPM connectionist model
So this brings us to the main question which is as follows: would it be possible to
train a connectionist network that provides a sufficient predictive capacity, and
then use the connection weights and hidden nodes as a distributed
representation of informational and utilitarian reinforcement to develop the
198
explanatory model sufficient for interpretation of the decision-making process as
proposed by the theoretical framework of BPM? In addition, would pruning the
network architecture reduce the model complexity and as a result provide a
clearer explanatory framework while offering a comparable level of predictive
capacity? In the following section, this is explored in detail.
5.9 Summary
In this chapter, results of the statistical testing methods employed as part of this
research project were presented. First, results of preliminary data manipulations
and exploratory analysis were described. Then regression and connectionist
models were compared to establish a connection between the two methods of
analysis on a computational level. This was followed by a discussion of the results
from connectionist models of varying complexity. Finally, the capacity of pruning
algorithms to optimise the network architecture and expose the predictive core
were assessed in its attempt to provide a plausible account of representing
informational and utilitarian reinforcement within the hidden layers as part of
emergent distributed network architecture.
In the following chapter, results are discussed and interpreted.
199
6. Interpretation of results
In this chapter, the obtained results as described above are discussed and
interpreted within the wider context of research questions posed here, and the
field of consumer behaviour in general.
6.1 Informational and Utilitarian Reinforcement
For the sake of clarity before we proceed with the discussion of results, and to
restate how the concept of utilitarian and informational reinforcement is
operationalised and defined here following the connectionist frame of inquiry, it
would be useful to summarise how informational and utilitarian reinforcement
were examined in conjunction with connectionist modelling as part of the
research programme to date. Prior to this research project, the theoretical
approach was substantially different from the course identified here, as
informational and utilitarian reinforcement were previously introduced into the
neural network model in the form of additional input variables (Greene, 2011),
and even though results demonstrated that doing so significantly improved
model performance and corroborated previous findings (Foxall, Yan, Oliveira-
Castro, & Wells, 2013; Yan, Foxall, & Doyle, 2012a, 2012b), subsequently it was
identified and proposed that an entirely different modelling approach was
required to explore the concept of utilitarian and informational reinforcement
using connectionist networks (Greene, 2011). It was speculated that one reason
200
for such improvement could be due to the process of assigning the informational
and utilitarian reinforcement values to each brand available for selection, as it
effectively allowed to capture in the quantitative manner some of the
information on consumer decision-making setting and learning history which
possibly be lost in otherwise the process of transforming the brand-level data for
statistical analyses. This time however as part of the present research project, the
informational and utilitarian reinforcement are not added on the input variable
level as before, but rather are examined as a part of the artificial neural network
learning process on the level of connectionist weights, and it is argued that the
informational and utilitarian reinforcement are formed during the model learning
process following the principle of distributed representation, and thus are the
emergent entities within the hidden layers of the connectionist model.
6.2 Results discussion
Consumer behaviour modelling employing the NNs is able to offer as a result a
substantial amount of information open for interpretation and further analyses.
In the following sections, these essential aspects will be discussed in detail.
6.2.1 Exploratory data examination
It is important to note a few points regarding the data employed here: the
synthetic datasets developed to test the modelling capacity, and the actual
consumer data that contains transactional purchasing information, household
descriptor database, and product attribute database.
201
Considering the vast amount of data available in the wine subset of the consumer
dataset that was obtained from the Kantar World Panel, it is quite reasonable to
expect that it would have been sufficiently robust for the purposes of this
research to develop and test consumer models with sufficient level of predictive
and explanatory capacity. Indeed, this was the case in previous research (Greene,
2011) where only consumer data was employed for comparative purposes to
assess the adequacy of traditional and connectionist approaches to model and
predict consumer situation and consumer decision-making faculties. This time
however, as nonlinear connectionist models of higher complexity with multiple
hidden layers developed here require new software solutions to be developed in
parallel to the research process, these very algorithms themselves needed to be
assessed at the outset, before they can be employed to examine and model
consumer data. Thus, additional synthetic datasets were devised, which would
allow the relations in the synthesised data to be defined by design. Relations in
the consumer data, on the other hand, are not defined and thusly are not
suitable to be used to assess the adequacy and performance of algorithms and
software solutions – it is in fact the overarching purpose of this very research
project, to identify and extract the patterns in the consumer data, which could
then be useful to explain consumer behaviour and purchasing decision-making
process.
6.2.2 Results of regression and NNs comparative analysis
It is of course a widespread misconception that NNs models do not provide
sufficient explanatory capacity as by design they lack the computational and
202
processing transparency, and thus are not able to offer to researchers a glimpse
into the model development process. This may complicate the capacity of
researcher to interpret and explain the results thus obtained, but it is commonly
assumed that researchers may accept this limitation because the model is able to
offer predictive superiority as a sufficient trade-off. It is however not the model
inherent design that makes it difficult to interpret the results, but rather the
intricate nature of the relations within the data that the NNs model aims to
unravel that human researchers may struggle to comprehend and explain in
simple terms. In fact, this is the very reason why researchers employ nonlinear
parallel modelling in the first place – to decompose complex phenomena and
reduce the dimensionality of the problem and make it easier to comprehend and
interpret.
Moreover, it is argued here that connectionist models are not only adequate to
explain the relations between variables within the data as compared to
traditional regression models, but are superior not only in predictive but also in
explanatory capacity when the research problem and data complexity is high,
thus refuting the back box misconception commonly attributed to connectionist
modelling. One major reason for this argument is of course the inherent
capability of connectionist models to provide a comprehensive account of
nonlinear interaction between the variables in the data, something where
traditional methods could have serious capacity limitation – in particular an issue
with scalability when large datasets with many variables are involved, as number
of interactions to consider increases exponentially with the number of variables.
203
Connectionist networks, on the other hand, are able to deal with all interactions
within the data as part of a learning process, while the network architecture is
developed to represent the patters within the data rather than being pre-
specified by a researcher as require with traditional methods of analysis.
To address these points, traditionally employed in marketing and consumer
literature regression analysis is compared with the NNs model of the simplest
network architecture. The problem type is transformed to a dichotomous for this
particular task only – to align the distinctly different modelling approaches to an
even level, as it is not the actual model performance that is being compared at
this point, but rather the ability of the connectionist model to offer a level of
performance equal to that of its corresponding traditional model. As a result of a
series of comparative assessments that involve multiple random splits and
replications of the procedure to increase the validity of results, it is clear that the
connectionist model with the simplest network architecture that includes no
hidden layers is able to perform identically to a regression model – the method of
choice in traditional marketing and consumer literature and fields of study.
Connectionist models with simplified architecture with no hidden layers
developed and examined in the first stage of research project here were able to
show performance levels equal to performance of logit models: NNs models that
comprised solely of the input and output nodes, and incorporated no hidden
nodes, were able to offer connection weight values identical the coefficients in
logit models. No hidden layers of course means the simplified straightforward
NNs models were not able to account for any nonlinear relations within the data
204
– it can be said then that connectionist models display not only the predictive
performance levels equal to that of logit model counterpart, but also provide a
equivalent level of explanatory capacity on variable contribution dimension. Thus,
a clear link is established between the connectionist architecture of NNs and the
traditionally employed logit modelling as a method of analysis while working with
consumer data, and offers a solid starting point from which the two methods of
analysis can be shown to contrast substantively both in underlying architecture
and design, and levels of performance. The following paragraphs outline a
number of important points to support this.
First imperative point to discuss is the required predetermined structure of a logit
model, and the lack of thereof by design in connectionist models. The simple
models developed at this stage could of course be advanced and developed
further as required – both logit and NNs. Common way to improve a logit model
is to examine the variable contributions and pick the optimal set that includes the
variables that offer best predictive capacity, while at the same time introducing
the interactive variables to account for possible nonlinear relations within the
dataset. This task however is rather manual and tedious, and requires not only
computational resources, but also human resources as it may take researcher a
considerable effort to continuously test and assess alternative models to select
the optimal predictive set. Additionally, large datasets with numerous variables to
consider pose an even larger issue as number of possible interaction
combinations increases exponentially. As noted elsewhere (Bishop, 1995), the
fact that traditionally employed methods of analysis require a predetermined
205
model structure is an inherent weakness when compared with the connectionist
models. Whereas connectionist models, on the other hand, are able to determine
its own structure and network architecture following a statistical method (as
much as this term can be applied to artificial entities) during the modelling
process while extracting the patterns from the dataset. Therefore, to increase the
capacity of the NNs model and advance its complexity and capacity, it is only a
matter of increasing the number of computational nodes and (hidden) layers –
the optimal architecture is then determined during the network learning process
and there is no need for researcher to specify or define it beforehand. The
increased network complexity by introducing hidden layers introduces the
network capacity to account for nonlinearity, which considerably improves the
performance of the NNs model with complex consumer data.
Second, it is important to touch upon the theoretical implications that the
predetermined model architecture may postulate. As discussed above, variable
selection process to form the model structure is not only tedious and taxing, but
also could be studied as a research question in its own right (for example see
Greenland, 1989). Underlying theoretical or philosophical framework could
dictate the models structure as well. It is however a common practise with
traditional methods of analysis such as logit to develop the model structure and
carry out the variable selection process often relying on predictive capacity alone
to be used for all subsequent analyses – generally a goodness of fit criteria such
as R-squared, or another purely statistical measure. Depending on the underlying
theoretical framework, this may bring either positive or negative consequences.
206
On the one hand, the predetermined model structure may be used to control the
exclusion or inclusion of variables that may be of particular interest during the
operationalisation or research questions. The pre-programmed model structure
could also make the process of understanding and explaining the phenomenon
somewhat more straightforward. On the other hand, it is safe to assume that the
predetermined model structure would inevitable have an effect on the
interpretation of the results obtained this way. Thus, the positivistic aim for a
researcher to be largely removed from the subject matter could be said to be
compromised – in extreme cases rendering the results and meaning derived from
such results as nothing but a statistical artefact (Harris & Hahn, 2011). The nature
of predetermined model structure may play even larger role depending on the
type of research – for example, research questions that aim to provide a
descriptive account of a process of the phenomenon would be greatly influenced
if the model structure is in fact determined or even selected by the researcher,
potentially compromising the overall objective, rather that developed to reflect
the patterns in data. For such tasks, it should be more appropriate to employ
such method of analysis where the model structure and underlying architecture is
a product of analysis rather than an initial requirement – such as NNs models.
Third, the results described here address and carry a serious challenge to the
black box model claim. In marketing literature discussions and in the industry, it is
frequent to encounter the arguments against connectionist models that revolve
around the difficulties to explain and interpret the modelling process: the model
manipulates the input data according to certain rules and provides an output, but
207
makes it difficult to examine and interpret the modelling process with simple
terms (Gevrey, Dimopoulos, & Lek, 2003; Olden & Jackson, 2002; Olden, Joy, &
Death, 2004). As discussed above, the empirical evidence offered here suggest
otherwise and argues this claim to be untrue: if simplifies input-output
connectionist model is capable of generating results identical to those of a logit
model, it should be reasonable to infer that the explanatory power that simplified
connectionist model carries is comparable in the very least with the explanatory
capacity attributable to a logit model. The fact that the connection weights
offered by the NNs model are identical to the coefficient values of the logit model
supports this argument, and explanatory capacity of NNs models could be
considered to be on the same level as regression model is able to offer. Complex
connectionist models follow the same theoretical assumptions and statistical
rules as simplified model described above – it is these NNs models of higher
complexity that are often referred to as black box models, as complex patterns
extracted from the data become less obvious to researchers, and therefore less
straightforward to explain and interpret in simple terms. Nevertheless, this does
not point to the flaw of the model but rather to the boundaries of human
understanding, and the limited ability to consider complex multi-dimensionality,
and thus the black box analogy is hardly suitable for connectionist models.
In summary, simpler problems do not require complex algorithms, and are easier
to explain and interpret. Simple algorithms are inadequate when applied to
complex phenomena however, which require complex algorithms and models –
but complex phenomena are difficult to explain and interpret no matter what
208
algorithms and models are employed, as it is the complex nature of the problem
that makes it difficult to comprehend, not the method employed. This is a
fundamental notion here, as all consecutive modelling development follows this
very same principle by adding the necessary complexity to the connectionist
network architecture as dictated by the increasing complexity of the research
questions, and the need to examine relations in the data and phenomena of
higher informational density.
6.2.3 Exploratory NNs modelling results and discussion
Now that it is established that simplified NNs models are able to perform on par
with logit models and provide analogous level of explanatory power, the
connectionist models are developed further. Introducing additional hidden layers
between the model input and output provides the ability for the connectionist
model to account for nonlinear relations in the data. It should be safe to assume
that transactional data on consumer purchasing decision in a particular marketing
setting is complex, and therefore it is expected to contain a substantial extent of
nonlinear relations – models able to account for such relations should provide
considerable advantage over linear models in both predictive and explanatory
capacity. Low adjusted R described above may indicate relatively weak logit
model performance due to a number of factors. There is of course always a
possibility that the particular dataset does not in fact carry any predictive capacity
and is therefore not particularly useful in answering specific research questions
irrespective of the statistical methods employed to analyse it. In fact, even
though connectionist models do not require a predefined model structure and
209
architecture, researchers may be limited to what can be included nevertheless as
the data collected may have followed a particular philosophical position or
framework – whether explicitly stated or not. Thus, the data collected in such a
way would inherently only include the variables according to this philosophical
position or framework, limiting the ability of connectionist model to extract
patters from the data from the onset. Here however secondary data is used, and
the number of variables included is very large to say the least. Another possibility
for the relatively weak performance of logit model could be that the relations
between dependent and independent variables that are being described are in
fact not linear. If so, appropriate method of analysis should be able to improve
upon the results obtained with logit models by being able to account for
nonlinear relations within the consumer dataset.
Previous research carried out by this author examined the effect of network
complexity provided encouraging results – the focus at the time was on the
number of hidden neurons, and was limited to a single hidden layer (Greene,
2011). The dataset employed in the previous research programme was
sufficiently large (around 75 000 cases for a single product category) to make
sure connectionist network with many hidden neurons are not able to learn the
entire dataset to attain a nearly flawless prediction power. The effect of model
size on model performance was examined with networks with size starting from a
single hidden neuron and progressively increasing to a 100 hidden neurons total
(and even a few select models that used as many as 200 neurons within a single
hidden layer) and compared with the performance of a logit model (which of
210
course could not be developed further beyond the initial input variable selection).
The results showed gradual performance improvement as the connectionist
network size increased and more neurons were employed in the network
architecture, providing better means for the model to identify the nonlinear
patterns in the data. This suggested that connectionist models were suitable to
study complex datasets and consumer behaviour data in particular.
It should be expected for the model performance to flatten out at some point,
where increasing model size would not provide substantial improvement in the
model capacity – it was not observed with the dataset employed in the previous
research project (Greene, 2011), and as the dataset employed here is
considerably larger and more complex yet, it should be safe to assume this
limitation would not be an issue here either. It was then further discussed that
with sufficiently large datasets it would be beneficial to extend the scope of
research project to explore performance with multiple hidden layers – something
that may easily be a substantial research project in itself. This was of course for
the most part an exercise to assess the relation of the size of connectionist
network and predictive capacity; whereas here it is the explanatory capacity
which the network architecture may be able to provide is the focus of research
questions, and thus the notion of assessing the network architecture
performance with multiple hidden layers will be address only partially and from
an explanatory modelling point of view.
211
6.2.3.1 RSNNS and NeuralNetTools
It is noteworthy to remark upon the importance of collaboration within the
scientific community in the field of social science that not only builds upon the
previous work and research findings, but also goes into the wide-ranging efforts
of tools development. As should be obvious at this point, the necessary tools to
carry out the complex statistical and computational methods were absolutely
crucial to the investigation of the research questions posed here, and it is the
collaborative nature of researchers from various fields that made it possible.
Moreover, as a result now, the advanced tools and methods are available to any
and all other researchers out there through a free platform and statistical
environment.
6.3 Advanced connectionist models results and
discussion
In the following sections, the investigative analyses that focused on comparing
the connectionist network architectures of various sizes from both predictive and
explanatory point of view are discussed in detail.
6.3.1 Connectionist network predictive capacity
Previous work showed connectionist models to be vastly superior to traditionally
employed forecasting and predictive analysis methods such as logistic regression
(Greene, 2011). A number of analytical directions were discussed as part of
212
previous research programme that are now able to contribute to the present
research which builds upon those findings and focuses on the identified and
proposed areas of further development. Consumer data was employed at that
time, thus the findings should be applicable for the purposes of present
discussion – even so, additional analyses were carried out here as well
nevertheless as described in the previous chapter, addressing some of the
proposed in the previous research areas of potential development (Greene,
2011).
As identified and proposed in the earlier research programme, the nature of the
task revolves mainly around the concept of optimal architecture selection, and
could be recognised as twofold: on the one hand, the methodological procedure
to explore and assess the extent of potential network architectures is required;
and on the other hand, a decision mechanism which could be employed in
architecture selection process. Considering the model generalisation capacity and
the primary function of the connectionist model, one of the common approaches
to determine the optimal network architecture is a deliberate exploratory
analysis where a limited number of suitable architectures are explored and
assessed, such as the analyses carried out in previous research that examined and
compared the predictive capacity of the connectionist network depending on the
number of hidden neurons within a single hidden layer (Greene, 2011), which
have been extended here as described in the particular sections above that
examine the effect of the size and structure of the connectionist network with
multiple hidden layers on the overall network predictive and explanatory
213
performance. This approach however requires a considerable computational
effort to explore a limited group of networks, which could present a number of
limitations as a result. Moreover, using a broader group inclusion criteria as
employed here to study the connectionist models with varying architecture
structures and multiple hidden layers increases the feasibility of reaching the
computational limit (Bishop, 1995). In fact, many models described here have
been considerably restricted by limiting the network and sample size due to
computational limitations, otherwise resulting in consistent crashes of some of
the algorithms employed here. Additionally, another obvious drawback of this
approach is the requirement to train a large number of networks with varying
architectures to compare and select the optimal architecture that satisfies
primary and secondary requirements, be that predictive or explanatory capacity.
A considerably better approach that would be able to account for some of these
limitations at least to some extent is to remove computational neurons and
synaptic connections from a large complex network architecture in a systematic
manner, exposing the core network architecture while minimising the network
size – these methods are commonly referred to as pruning algorithms (Bishop,
1995). Pruning method employed here is the Optimal Brain Surgeon as described
in previous section and elsewhere (Hassibi & Stork, 1993; Hassibi, Stork, & Wolff,
1993) in greater detail.
Alternatively, it is possible to go in a reverse order and start with a relatively
simple connectionist network only to sequentially expand its architecture by
gradually adding more computational neurons that would form new connections
214
and hidden layers – a group of method commonly referred to as growing
algorithms (Bishop, 1995). Potentially this could be combined with the pruning
method employed here: growing the initial complex network architecture and
subsequently pruning it to expose the core structure that carries the best
predictive and explanatory function.
Another approach involves aggregating a number of networks to function
together as a single entity – this method is commonly referred to as a network
committee. It employs a divide and conquer strategy in which the response of
multiple constituent networks is combined to provide a superior single expert
response (Bishop, 1995) – particularly suitable to study phenomena that allow
themselves to be decomposed and subdivided into smaller isolated tasks to be
examined separately.
Number of computational units and the synaptic topology are able to exert a
considerable influence over the network performance, and these connectionist
network architecture optimisation methods would be expected to improve the
modelling process significantly and provide means to develop a network
architecture with superior predictive and explanatory abilities while using the
optimal network structure and size.
6.3.2 Variable contribution analysis and explanatory models
From the discussion above it should be clear that connectionist models offer
substantive predictive capacity and pattern recognition function. This research
215
project however is primarily concerned with the explanatory dimension
connectionist networks are able to offer.
Previous research programme offered a rather limited review as the explanatory
dimensions of connectionist models was largely out of scope of the project
(Greene, 2011), whereas here it is the primary focus. Variable contribution
discussion would make sense here – on the one hand, to address further the
argument that connectionist models are black box models, and on the other
hand, to explain the logic in undertaking the sequence of analytical account that
is to follow in the next sections. The black box argument is of course completely
unwarranted as already discussed to some extent earlier, and variable
contribution analysis is another good approach often employed to further refute
these unfounded claims. One such framework was proposed in the field of
ecological research (Gevrey et al., 2003; Olden & Jackson, 2002; Olden et al.,
2004) and this author previously proposed to evaluate its usefulness with
consumer behaviour data. Essentially, it can be contended that connectionist
models are able to provide a comparable level of explanatory capacity with the
traditionally employed methods such as regression analysis. Even more so, it can
be argued that it is in fact the connectionist methods that are able to provide a
robust account of behaviour when it comes to the explanatory dimension – on
the contrary, it is the traditionally employed methods such as regression analysis
that are not able to provide an adequate explanatory of behaviour. In regressions
for example, normally the partial regression coefficients are assessed to interpret
variable contribution – this is only true for statistically significant variables
216
however, and even then very little is provided by the model besides the
coefficient value and the sign for the independent variables that signify the
direction of the relation, while no further information could be extracted.
Whereas the situation is quite the opposite with connectionist models – variable
contribution analysis algorithms could be employed to provide the explanatory
account and determine relative variable contribution and contribution profile of
the input factors. In addition to pruning algorithms which are discussed in great
detail in the following sections that improve both predictive and explanatory
capacity of the connectionist networks as part of learning process, there are also
variable contribution algorithms for connectionist models that focus on
explanation and interpretation only and aim to estimate the relative contribution
of independent variables and determine what relative contribution each input
variable is able to provide in relations to the output of the model.
Gevrey, Dimopoulos, and Lek (2003) examined 7 methods that potentially could
be useful to perform variable contribution analysis with consumer behaviour
data: (1) the PaD method calculates the partial output derivatives according to
the input variables (Dimopoulos, Chronopoulos, Chronopoulou-Sereli, & Lek,
1999); (2) the Weights method uses the connection weights (Garson, 1991); (3)
the Perturb method performs input variable perturbation (Scardi & Harding,
1999); (4) the Profile method is a successive variation of one input variable while
others are kept at a fixed value (Sovan Lek, Belaud, Baran, Dimopoulos, &
Delacoste, 1996; S Lek, Belaud, Dimopoulos, Lauga, & Moreau, 1995; Sovan Lek,
Delacoste, et al., 1996); (5) the classical stepwise method observes the change in
217
the mean square error value by sequentially adding or removing input neurons
(Maier & Dandy, 1996); (6) Improved stepwise - method a is the same as (5), but
the input eliminations occur while the network is trained (Gevrey et al., 2003);
and (7) Improved stepwise – method b evaluates the change in the mean square
error by sequentially setting input neurons to the mean value (please see Gevrey
et al., 2003). They used a multi-layer feed-forward network architecture to
compare the variable contribution analysis methods – as a result, all methods had
the ability to order the variables by importance of their contribution to the
output, but the PaD and Profile methods were able to also order of contribution
and mode of action. The PaD method employs partial derivatives and real dataset
variables values and was identified as the most robust and coherent
computationally, followed by the Profile method where a representational matrix
of the data is constructed.
It should be safe to assume that ecological systems are complex enough to
provide data with complex relations, so the promising results that the techniques
surveyed by Gevrey and colleagues show should be applicable to consumer
behaviour data as well. Many additional factors should be taken into account of
course – the more obvious one is whether these variable contribution analysis
methods would be able to perform on a comparable level with much larger
sample sizes that contain a multitude of input variables. Relatively small sample
size and the 10-5-1 network structure used by Gevrey et al. is largely different
from the datasets used here, but nevertheless there are some important
learnings that can be extrapolated from the study as far as the further
218
developments, and it potentially this could be an important area to consider in
future work with employing the variable contribution analysis algorithms to a
pruned network architecture to assess the effectiveness and efficiency of the
algorithms, and whether they are able to contribute to explanation of behaviour
after pruning connectionist network. It is of some concern however that the real
ecological data was used in the comparative study to assess the variable
contribution algorithms – similar concerns expressed elsewhere (Olden et al.,
2004) contend that true relations and variable order are not known in the
empirical dataset employed by Gevrey et al. and therefore there is no
straightforward way to assess the accuracy that each method is able to offer.
Some of the other methodological issues were identified with the research design
by Olden and colleagues (2004) which prompted Gevrey et al. to redesign the
original study and use a synthetic simulated data instead to re-assess the variable
contribution analysis methods, adding an additional connection weights algorithm
(Olden & Jackson, 2002) at the same time. This time around, connection weights
algorithm was identifies as optimal as a result, showing the best level of
performance with simulated data.
The variable contribution analysis could be a great area for further research to
explore elsewhere, but it is out of scope of present research project. The concept
of using the simulated data however is not, and it makes all the sense to for the
research design perspective to incorporate the simulated data for the initial
assessment of the pruning efficiency and effectiveness in attempt to simplify the
network architecture for consecutive interpretative and explanatory account of
219
consumer behaviour and decision-making process. In the following sections, it is
the synthetic simulated data that is reviewed and evaluated first, which is then
followed by the discussions of the models that employ real consumer behaviour
data.
6.4 Interpreting connectionist model output
parameters and architecture
A number of ways may provide an insight into what happens inside the NNs
model and help interpret the result. Some of the most commonly used methods
assess how the number of hidden layers and nodes affects the predictive and
explanatory capacity of the model. A number of algorithms have been devised to
make use of the weight values from NNs model output. Model architecture
pruning techniques have also been shown to have a positive outcome in
developing models with improved out of sample testing faculties. In the following
sections, these methods will be briefly discussed and supported by the empirical
research.
6.4.1 Number of hidden layers and nodes
Generally, a larger model would have better resources at its disposal to analyse
extensive datasets and show better predictive capacity, but would also have
considerable limitations as discussed in the following paragraphs.
220
6.4.1.1 Model structure optimization
Once the models are developed it is imperative to have a look into the optimal
model structure. It is indeed true that the larger models would offer higher
predictive capacity and increase in the model fit, but at the same time, larger
models need to be penalized according to the Occam's razor principle. One
method to evaluate the model performance and select the optimal structure is
described by Huang, Chen, Hsu, Chen, and Wu (2004). Before carrying out the
analyses that would employ the neural network method for modelling, the
authors optimised the backpropagation models for multiple markets by
identifying the optimal input variable sets that included financial variables
following an approach that would resemble a process similar to that of a step-
wise regression model: once a simple initial model was constructed to represent
the financial markets, financial input variables were removed one at a time and
replaced with a remaining variable in attempt to examine the effect it would have
on the overall predictive capacity of the model. This process was repeated
numerous times until improvement could no longer be observed – the final
neural network architecture obtained in such a way was said to be optimal for
each of the financial markets. When these models were tested with 10-fold cross-
validation method, the prediction accuracy that these 2 models were able to
offer was estimated to be optimal as well, and these 2 fine-tuned models were
then used for all consecutive interpretative analyses (Huang et al., 2004). This
method eliminates the independent variables that carry the least predictive and
explanatory capacity and therefore can be excluded altogether from
221
consideration in the model or replaced with other potential input variables to
improve the overall predictive capacity. Thus, the model structure is simplified
and therefore is more preferable – it is expected to show higher AIC values as
well, as method described above penalizes model size to keep the connectionist
model architecture as simple as possible while at the same time striving to
maximise the overall model performance.
Even though apparently effective, it is difficult to estimate how effective it really
is, as it was tested using the empirical data where the relations between the
variables are not known, therefore making it practically impossible to assess the
method efficiency and effectiveness at simplifying the network architecture while
at the same time maintaining comfortable level of predictive and, more
importantly for the purposes of research project here, explanatory capacity.
Moreover, the process seems to be somewhat manual still, and potentially may
require substantial effort to sequentially and systematically test yet more and
more variables – the stopping mechanisms are not clearly defined either. It may
seem the researchers may be faced with a similar issue as with input variable
selection for the regression analysis, where in attempt to achieve high level of
validity and test a large number of variable combinations and interactions, the
truly robust process may prove to be excruciatingly taxing on both researcher
time and other resource allocation. Moreover, this approach of course cannot be
scalable in any reasonable manner – as dataset get larger, the amount of effort
and resources required increases exponentially. In this sense, the elegant pruning
algorithms could be a considerably more favourable solution to optimise the
222
connectionist network architecture in attempt to streamline the interpretative
and explanatory functions of the connectionist model.
6.4.2 Interpreting model weights
A number of methods have been shown to be useful in interpreting the weight
values of connectionist models.
Variable contribution analysis methods have been examined and compared by
Gevrey, Dimopoulos and Lek (2003). One of the seven methods they surveyed
included a computation that used connection weights to provide explanatory
dimension to a NNs model using ecological data. First proposed by Garson (1991)
and later further investigated by Goh (1995), the procedure is set to determine
the relative importance of the inputs by partitioning the connection weights.
Essentially, hidden-output connection weight of hidden neurons is partitioned
into components associated with the input neurons (for further details see
Appendix A of the Gevrey et al., 2003). Authors concluded that method that uses
connection weights was able to provide a good classification of input parameters
even though it was found to lack stability.
One of the concerns conveyed regarding the otherwise extensive investigation of
different methods was that the dataset originally employed in 2003 study (Gevrey
et al.) was empirical, and therefore did not allow to ascertain the factual precision
and accuracy of each method as the true relations between the variables are not
known (Olden et al., 2004). Instead, the artificial dataset was created using the
Monte Carlo simulation and employed to assess true accuracy of each method
223
using the dataset with defined and therefore knows relations. Results show that
weights method that uses input-hidden and hidden-output connection weights
displayed consistently best results out of all methods assessed, contrary to
Gevrey et al. original findings (2003). Additionally, the weights method was able
to accurately identify the predictive importance ranking, whereas other methods
were only able to identify the first few if any at all (Olden et al., 2004).
Olden and Jackson (2002) also used ecological data to demonstrate the predictive
and explanatory power of NNs. A number of methods surveyed, including Neural
Interpretation Diagram, Garson’s algorithm, and sensitivity analysis, aid in
understanding the mechanics of NNs, and improve the explanatory power of the
models. Interpretation of statistical models is imperative for acquiring knowledge
about the causal relationships behind the phenomena studied. They also propose
a randomization approach to statistically evaluate the importance of connection
weights and the contribution of input variables in the neural network – method
discussed in further details in the sections above.
Nord and Jacobsson (1998) have also addressed the issue of explaining and
interpreting NNs structure and developed algorithms for variable contribution
analysis. The study compared the proposed novel algorithmic approach for NNs
model interpretation with the analogous variable contribution method of partial
least squares regression. Sensitivity analysis is also performed through setting
each input to zero in a sequential manner. Linear regression coefficients for each
of the input variables have also been generated for the purposes of examining
the variable contribution direction. The results of the two approaches are then
224
reviewed and compared with the results of the partial least squares regression.
What the study is able to reveal is that in the linear dataset both the partial least
squares regression and NNs models show similar performance in the variable
contribution task, whereas with the nonlinear dataset the differences in
performance is apparent (Nord & Jacobsson, 1998).
The recently increasing interest in variable contribution in NNs models is
understandable, as this information could be useful to develop the optimal NNs
model structure or to enhance model explanatory capacity. The methods
commonly used examine the connection weights of the NNs model, which are
used to interpret the model performance – analysis of the first order derivatives
of NNs model with respect to input units, hidden units, and weights. For a more
extensive discussion on measures of relative importance and relative strength of
inputs please see Garson (1991). It is also possible to use the connection weights
in attempt to extract the symbolic rules to interpret the models. Huang, Chen,
Hsu, Chen and Wu (2004) use Garson’s contribution measures to assess the
relative importance of the inputs in NNs three-layer backpropagation model.
Even though Garson’s method emphasises connection weights between the
hidden and the output layers and does not consider the direction of the
influence, a comparative analysis revealed the method to improve understanding
of the financial process being modelled. The contribution analysis employing
Garson’s method identified the input variables contribution to the output
variables, which increased the understanding of financial input factors in the NNs
model (Huang et al., 2004).
225
Lek, Belaud, Baran, Dimopoulos, and Delacoste (1996) examined model response
to each of the variables. Functions derived by the NNs models during the learning
stages are very complex and pose a serious problem for each variable
contribution analysis. One of the ways to cope with such issues is to isolate a
complex phenomenon and separate it into smaller less complex phenomena to
be examined independently. Authors propose an experimental method to
examine the model response to each of the variables by applying typical
variations to a single separate variable while the other variables are held
constant. Using the environmental data, all but one variables were sequentially
set to their minimum, first quartile, median, third quartile, and maximum values
providing a response. The operation is then repeated for each of the variables,
performing it n times, where n is a total number of variables (Sovan Lek, Belaud,
et al., 1996).
Relatively few studies are carried out with the aim of developing methods for
variable contribution analysis in NNs models in particular – perhaps at least in
part due to seeming complexity of the task at hand. Song, Kong, and Yu (1988)
have developed a partial correlation index method that employs sequential
removal of variables one at a time. Results obtained under standardized training
conditions are used to estimate the relation between input and output variables.
Nord & Jacobsson (1998) proposed alternative method based on the sequential
zeroing of weights. Andersson, Aberg, and Jacobsson (2000) examined two
methods to study variable contribution in NNs models: (1) a variable sensitivity
analysis and (2) method of systematic variation of variables. Variable sensitivity
226
analysis is based on setting the connection weights between the input and hidden
layer to a zero in a sequential manner, whereas the systematic variation of
variables method is based on keeping the other variables constant or
manipulated simultaneously. In the course of the study, it is shown that there is a
high similarity between the method proposed by the authors for the variable
contribution analysis in NNS models and the nature of the processes used to
develop the synthetic datasets used. Thus, it is shown that the NNs models are
suitable not only for the function approximation in nonlinear datasets, but are
also able to accurately reflect the characteristic qualities of the input variables. As
a result, a transparency of highly interconnected NNs models could be
demonstrated in response to the ‘black box’ argument as well. Presented method
is then able to generate information about the variables that could be useful in
examination and interpretation of variable contribution and relations. Nord and
Jacobsson’s method (1998) mentioned above is based on the saliency estimation
principles (such as brain surgeon) as it estimates the consequence of weight
deletion on prediction error. The difference with the method proposed by
Andersson, Aberg and Jacobsson (2000) is in the way estimation is carried out
(theoretical calculation in saliency estimation methods as opposed to
experimentally derived values offered by Andersson et al., 2000), and builds upon
the findings of Nord and Jacobsson (1998). In the course of analysis, a systematic
variable contribution analysis is carried out on a highly interconnected network
structure, including the signal separation exercise, employing a number of
synthetic and empirical datasets to provide additional information on the
227
methods considered, including the ability to graphically reveal the variable
interdependencies. Previous research is considered there as well that is based on
the principle of systematic variable variation and not the connection weights.
Information obtained in such a way could constitute an analytical basis for a
comprehensive variable contribution analysis and variable selection procedure
survey (Andersson et al., 2000).
6.4.3 Pruning connectionist models
Model architecture plays an important role in a model adaptive performance. The
type of a task closely related to the connectionist model pruning effort that
attracted larger interest in the literature as discussed above is variable selection
method, which is mainly concerned with the methodical improvement of the
connectionist model architecture by systematic reduction of the input variables
(Andersson et al., 2000). There are a number of notions that variable selection
methods could consider: for example a method that examines the connectionist
model weight values (Ametller, Garrido, Stimpfl-Abele, Talavera, & Yepes, 1996)
employing the variance and saliency measures. Other approaches considered
employing various other methods such as F-test, principal component analysis,
decision tree methods, connectionist weight evaluation methods (Cibas, Soulié,
Gallinari, & Raudys, 1996; Proriol, 1995), and optimal brain damage algorithm
(LeCun, Denker, Solla, Howard, & Jackel, 1989). Despagne and Massart (1998)
discussed variable selection methods, and among a number of the different
approaches reviewed, which include a modified variant of Hinton diagram,
saliency estimation method, and two other methods that provide a means to
228
estimate the extent to which the variance of the predicted response corresponds
with the variable contribution of each input. Similar to the method proposed by
Nord and Jacobsson (1998), both methods revolve around the notion of
cancelling variable contribution in the trained connectionist network by either
zeroing input variables (Despagne & Massart, 1998) or connection weights (Nord
& Jacobsson, 1998).
While exploring how environmental conditions have an effect on fish population
to identify patterns that may be useful in future predictions, Olden and Jackson
(2001) compared traditional statistical approaches with NNs models. In the NNs
mode structure, the connection weights between neurons are the associative
links that signify the relation between the input and output variables and
therefore are the key to solving the problem. Connection weights signify the
influence each input variable is able to exert on the output, and dictate the
direction of the influence. Input variables with large connective weights carry
higher signal transfer capacity and therefore affect the output variable to a
greater extent. Excitatory effect (incoming signal increased with positive output
effect) is represented by the positive connection weight and inhibitory effect
(incoming signal reduced with negative output) is represented by the negative
connection weight. In recent work, some research supports the notion that it is
possible to use the connection weights to interpret the input variable
contribution in the task of predicting the network output (Aoki & Komatsu, 1997;
Chen & Ware, 1999; Özesmi & Özesmi, 1999). Others used the connection
weights to quantify the variable contribution ranking (Garson, 1991), or employ
229
sensitivity analysis to examine the input variable contribution range (Guégan, Lek,
& Oberdorff, 1998; Sovan Lek, Belaud, et al., 1996; Mastrorillo, Lek, & Dauba,
1997; Mastrorillo, Lek, Dauba, & Belaud, 1997). Even if it is possible to assess the
overall contribution of input variables employing these approaches, the
interpretation of interactive relations within the data presents an increasingly
difficult undertaking, as the interactions between the variables in the network
require immediate examination. Even a small network would contain a large
number of connections, making the interpretation increasingly difficult: 10 (input)
– 5 (hidden) – 1 (output) network would have 50 connection weights to examine
between the input and hidden layers. One way to manage this is through pruning
where connections with small weights that do not exert significant influence over
the network structure and output are removed (Bishop, 1995). Deciding which
weights to remove and keep however is a task that requires substantial effort.
Following the connectionist approach, Olden and Jackson (2001) were able to
develop a randomization test to address this task, which aims to randomise the
response variables to subsequently proceed with constructing a connectionist
network using this randomised dataset, at which stage all connection weights
between the input, hidden, and output connection nodes are recorded. This
process is replicated 10,000 times to ensure the estimated probability values are
stable to obtain a null distribution for the input, hidden, and output nodes, which
are then compared to the observed values – this allows to calculate the
significance levels which serve as the basis for the objective pruning test that
allows elimination of the connection weights that exert a minimal influence of the
230
network overall output and performance, and as a result helps identify those
input variables that are able to provide the best predictive capacity contribution
to the overall connectionist network performance. In the similar manner as was
carried out as part of the present research project, Olden and Jackson (2001)
considered varying levels of learning rate and parameters during the
connectionist network training stages to maximise the probability of global
convergence, and also considered varying numbers of training cycles to identify
the optimal level as far as network training and performance balances with the
resource allocation and training times. All input variables used to develop the
connectionist networks were standardised in the preliminary data manipulation
and exploratory analyses stages to avoid any possible occurrence of unnecessary
variances between the input and output variables due to the differing variable
scales. As a result, Olden and Jackson (2001) were able to provide a predictive
and explanatory insight into nonlinear complex relations of ecological data (a
task that poses a serious problem for traditional statistical approaches as species
often exhibit nonlinear response to environmental conditions). In the course of
detailed evaluation of NNs and traditional models it was shown that partitioning
the predictive performance of the model into measures such as sensitivity (ability
to predict the presence) and specificity (ability to predict the absence) allows for
a more efficient way to assess the model strengths, weaknesses, and applicability.
It is also shown that NNs are a useful approach for examining the interactive
effects and factors. Both empirical and simulated datasets were used for
comparative purposes, and show superior predictive performance of NNs models
231
over traditional regression approaches (Olden & Jackson, 2001). Building upon
the work described thus far, approach that Olden and Jackson (2002) propose in
their following publication provides the facility to eliminate irrelevant
connections between neurons whose weights do not significantly influence the
network output (i.e. predicted response variable), thus facilitating the
interpretation of individual and interacting contributions of the input variables in
the network. The approach is able to identify variables that provide a significant
contribution to network predictive capacity, which effectively constitutes a NNs
variable selection method.
One aspect worth discussing however is the approach to identify the optimal
number of neurons to use within the hidden layer of the network architecture: it
was determined by Olden and Jackson (2001) following the empirical
investigation where the performance of connectionist networks of varying sizes
(ranging from 1 to 20 hidden neurons) were compared to identify and select the
one with the network architecture that offers best predictive capacity for the
overall connectionist model. This approach is similar to the research work carried
out previously by this author (Greene, 2011) that revolved around the extended
comparative study of network architecture size where a number of networks
were developed and consequently compared on the basis of connectionist model
predictive capacity, with numbers of neurons within the hidden layer ranging
from 1 to as many as 200. The results were rather promising and naturally
suggested that large model sizes are able to provide increasingly better predictive
capacity as compared with traditionally employed methods such as logistic
232
regression and systematically selected connectionist networks with simpler
network architectures. The process to carry out this type of a comparative study
was rather tedious and required not only a lot of time to program the coding for
the connectionist modelling, but also was very computationally demanding with
modelling process running for weeks non-stop. This approach of course readily
identifies an issue with scaling possibility, as with larger networks the training and
testing time would be expected to increase exponentially. Another concern is
methodological, as the models tested and compared are nevertheless selected by
the researcher – together with the scaling issue where connectionist networks
with complex model architecture that incorporates multiple hidden layers would
pose a serious obstacle that would be extremely difficult to circumvent, and
instead would most likely simply remain as an effective limitation of the
approach. Moreover, it is important to consider using simulated synthetic data
for the methodological testing to determine the optimal approach and
architecture as an additional research stage before the actual investigation of the
data is carried out to make sure there is no bias in approach selection that is an
artefact of the data itself, which is then used to study this very same data during
the main stage of the experimental research project.
Here the approach to evaluate the model capacity was developed and carried out
quite in the manner as proposed, and simulated data was used to assess the
effectiveness and efficiency of the pruning method employed to simplify and
optimise the network architecture before developing and retraining connectionist
networks using the actual consumer data.
233
6.4.4 Pruning connectionist models: simulated data
To address the concerns expressed above while assessing the connectionist
network performance capacity as part of the comparative analyses that aim to
determine the optimal network architecture, before the consumer behaviour
dataset is examined, the models are compared and evaluated using the simulated
synthetic dataset instead. This should circumvent at least some of the most
commonly encountered points of concern as covered in previous paragraphs
while discussing and critiquing research design of some of the previous studies.
The overall purpose of this testing and evaluation stage using simulated data is
twofold. On the one hand, it is essential to carry out a proper empirical analysis
to test and assess the performance of the new pruning capacity that was
developed as part of this research project in the statistical package RSNNS, which
is now available for any researcher to use through the statistical programming
language and environment R, before we employ these techniques with consumer
data here. On the other hand, since the structure and form of simulated data is
notably less complex than the consumer behaviour dataset which will be used in
the subsequent analyses, it would make sense to carry out some of the analyses
on the simulated data initially – this should not make any difference semantically
since these preliminary analyses and the results deal for the most part with
technical and applied systematic aspects of research design, and therefore the
conclusions and learning they are able to offer should be general enough to be
applicable to any connectionist model irrespective of connectionist network
architecture complexity or the datasets employed.
234
A number of functions were identified as described in the results section above to
test the effectiveness of pruning functionality, and carried out in a manner that
would provide sufficient levels of validity and reliability through replications and
randomisations. Essentially the test was constructed to allow for a connectionist
network to develop an architecture that would be largely excessive for the task at
hand, as it could be expected that the connectionist network during the training
stages would tend to use all available computational neurons and connections to
build a best possible network within the constraints which are set, irrespective of
network architecture complexity or size – this of course would essentially be
reflected in higher computational demand and network learning time as a result.
This also makes it difficult to examine and assess the variable contribution values
to assess which of the input variables carry the highest levels of predictive or
explanatory capacity in relation to the model output. Whereas pruning algorithm
should account for all these factors and systematically force the connectionist
network to develop a concise essential network architecture, removing in the
process inessential connections weights, which can even result in isolating some
of the computational nodes and even bypassing some of the hidden layers
altogether. It is essential of course to maintain a sufficiently high level of
predictive and explanatory capacity not to sacrifice large proportions of modelling
ability in a trade-off for the optimised structure – to address this, the RMSE levels
of all models would be consistently recorded.
When the test were carried out, it is apparent that the results were entirely as
expected: the connectionist network that otherwise would be rather large and
235
use all available computation and network architecture resources even to
maximise the network capacity to the fullest extent, would be substantially
trimmed down by the pruning algorithm which would be extremely successful at
removing the inessential connection weights to optimise the network
architecture – all while the connectionist network predictive and explanatory
capacity remained substantially high and uncompromised by the network
structure optimisation efforts. Moreover, the pruning algorithm was not only able
to effectively remove the inessential connection weights, it also successfully
nullified multiple hidden layers – essentially all superficial connectionist network
architecture that was not essential for the task at hand was pruned out to leave
the bare-bone architecture required at the very minimum to solve the problem.
6.4.4.1 Pruning connectionist models with Optimal Brain Surgeon
algorithm
The test of course was merely designed to assess the capacity of coding and
method of using the pruning algorithm in the statistical programming package
RSNNS – and was not designed to test the pruning algorithm itself, which would
be way beyond the scope of this research project (this would inevitably take
research direction towards the field of machine learning – something that could
be potentially explored in collaboration as part of the future work). As already
mentioned above, the pruning algorithm used through the research project here
is the Optimal Brain Surgeon (Hassibi & Stork, 1993; Hassibi, Stork, Wolff, &
Watanabe, 1994; Hassibi et al., 1993) – the very positive results with pruning and
optimisation of the connectionist model structure using the simulated data as
236
discussed above could be largely credited to the sophisticated and elegant design
of this pruning algorithm. Hassibi, Stork, and Wolff set out to investigate the use
of information available from the second order derivatives of the error function
to prune the network architecture by removing unessential connection weights
from the trained connectionist model in attempt to optimise and simplify the
network to reduce the computational demands, reduce the training and
retraining time, and – more importantly – improve generalisation capacity and
even further develop the network ability to extract patterns from the data. In the
same manner as contemplated here, Hassibi, Stork, and Wolff embarked upon a
central problem of pattern recognition and machine learning that revolves largely
around the notion of minimising the system complexity, and could often be seen
as a problem of regularisation in connectionist modelling: without an appropriate
mechanism to minimise a number of connection weights, neural network models
could either be prone to overfitting and poor generalisation as a result; or on the
contrary could be unable to learn the dataset in an adequate manner if the
number of connection weights is insufficient. It is then common to proceed
initially with training a sufficiently large connectionist network to a minimum
error, and eliminate the inessential weights in a systematic manner to the point
where the neural network architecture is optimal – this is where the pruning
algorithms specify which connection weights are to be eliminated and the
remaining weights are to be adjusted for best performance in the most
computationally efficient manner. It was uncovered that Optimal Brain Damage
and magnitude-based methods have a tendency to eliminate crucial weights –
237
something that Optimal Brain Surgeon never does and is able to maintain a
perfect level of performance after pruning is carried out. As a result, Optimal
Brain Surgeon algorithm is shown to be vastly superior to other magnitude-based
methods and Optimal Brain Damage (LeCun et al., 1989) – some of which were
discussed and critiqued above: for the same training set error, Optimal Brain
Surgeon algorithm permits pruning of more connection weights than other
methods which also often end up removing the wrong weights, thus producing
better results with generalisation of test data. Method does not require
subsequent retraining – a typically slow cycle after pruning is carried out with
other algorithms.
Employing Optimal Brain Surgeon pruning algorithm here with simulated data
offered very promising results and showed excellent performance in
connectionist network optimisation to improve the clarity of the network
architecture, which should facilitate the development of exploratory and
interpretative accounts with consumer behaviour data. Moreover, the
6.4.5 Pruning connectionist models: consumer data
Having ascertained that the work to implement the coding in the RSNNS performs
as it should to enable pruning facility, and the Optimal Brain Surgeon algorithm is
able to deliver the positive result to optimise the connectionist network
architecture with simulated data, the second stage is to develop the
connectionist models using the consumer dataset.
238
Considering that the most commonly employed traditional approach in marketing
research is a logistic regression, any type of a connectionist network that
incorporates hidden layers could be considered a more advanced method: as
already discussed earlier, the simplest connectionist network with no hidden
layers shows identical level of performance as a logistic regression. Introduction
of a hidden layer within a connectionist model opens an entirely different level of
performance and capacity. Simulated data was modelled using a connectionist
network with multiple hidden layers, but with just a few neurons within each – as
was demonstrated, the type of data and a problem only required a single hidden
layer with 2 nodes to solve, thus the rest of the network architecture was
superficial and therefore was expected to be pruned out by the algorithm. The
pruning algorithm performed very well by removing all but the core network
architecture required to solve the problem, which was then attempted with
actual consumer behaviour dataset rather than the simulated data.
6.4.5.1 Optimal network architecture size
Consumer dataset from Kantar World Panel is of course a lot more complex and
includes a number of variables that operationalise transactional, household, and
product attributes to describe the purchasing situation and decision-making
environment in a comprehensive manner – after the data was normalised and
dummy coded, the variables that were selected for the modelling stages that
followed ended up being represented by 54 input variables as a result. Keeping in
mind that in the traditional marketing literature it is perfectly acceptable and is
common practice indeed to study the relations within the consumer behaviour
239
data with logistic regressions – a method that does not have any sufficient
capacity to account for the complex relationships that hidden layers are able to
capture, and even the available capacity to explore the interactive variables is
rarely considered for the reasons that make the process of developing and testing
the models exceptionally tedious and poorly scalable – it should be safe to
assume that a connectionist network with a single hidden layer with only a few
hidden neurons would seem to be able to provide a substantially more advanced
model architecture as a result. Thus, 54-2-1 network architecture should be
considered to possess a sufficient enough level of complexity to warrant the use
of connectionist framework to explore the aspects of informational and utilitarian
reinforcement as emergent properties within a hidden layer. Before this claim
can be argued however, at this stage it is important to establish that 54-2-1 or
architecture of similar level of complexity is indeed sufficient to provide a level of
model functionality necessary to satisfy the minimum conditions required for the
emergent properties of information and utilitarian reinforcement phenomena to
occur.
One of the major critiques of the traditionally employed methods of analysis
commonly used to study complex social phenomena such as the act and process
of consumer decision-making as argued above is of course the necessity to
specify the model architecture and framework by selecting the input variables for
the modelling – or just as importantly choosing to not select certain variables or
not produce interactive variables. Optimally, all possible variables and
interactions of all levels and combinations are considered and analysed, and
240
those that do carry the predictive and explanatory capacity are omitted – this is
something that traditionally employed methods of analysis such as logistic
regression cannot do, whereas something that connectionist models such as
neural networks inherently possess as an inseparable part of the computation
algorithm, and therefore excel at carrying out each and every time. For that
reason, building upon the previous research work carried out by the author and
following the research design of others as discussed above, the next experiment
was devised as follows: a sufficiently large network architecture is developed to
make sure it could train and learn the relations within the data freely, and
subsequently pruned to optimise and expose only the core essential network
architecture removing all unessential connections. If it is assumed that 54-2-1
network should essentially be sufficiently complex to allow the examinations of
the emergent properties of informational and utilitarian reinforcement, the 54-8-
8-8-1 network architecture was first examined to see if it would provide
sufficiently large and excessive levels of complexity. And it did – in fact, almost
too excessive if nothing else: the 54-8-8-8-1 network generates 568 connections
between 79 neurons. When pruning is introduced with a few hundred retraining
cycles, the model is optimised down to only 66 connections from the original 568
suggesting that perhaps the 54-8-8-8-1 network architecture could be trimmed
down quite a bit for all consecutive analyses. In summary however, it is important
not to overlook the very successful application of the optimisation algorithm that
is able to provide substantial simplification the network architecture, and
241
connectionist methods of training and subsequently pruning the network appear
to be a particularly fitting approach to develop the model of consumer behaviour.
6.4.5.2 Pruning different types of connectionist architectures
In the next set of experimental exercises, a few examples of various network
architectures are explored. For reasons of simplicity and to optimise the
modelling time required – given the lessons learned in the previous set of
experimental work that a leaner network architecture would be able to provide a
sufficiently complex structure nevertheless, and the fact that each neural
network model takes quite a bit of real time to calculate even using a high
powered machine with one of the best CPUs available on the mass market at the
time, and the fact that to improve the reliability and validity of experimental work
these models were replicated hundreds of times – a simpler network architecture
(excessively robust nevertheless) was selected that would comprise a more
manageable number of 12 hidden neurons distributed among the 3 hidden
layers.
Interestingly enough, one of the first connectionist model with 54-4-4-4-1
network architecture was optimised with a pruning algorithm down to a model
structure with 54-2-1-1-1 neurons – effectively supporting the initial argument
that 54-2-1 network structure is in fact the core architecture that is required to
model the relations within the data that represent the consumer purchasing
decision-making. Even though this demonstrates that at least in some cases the
model naturally removes all but 2 hidden neurons, this was not the most
242
common final optimised network architecture that was observed amongst
hundreds of retest connectionist models. In fact, it was observed that with a
starting 54-4-4-4-1 network architecture, pruning algorithm was most likely to
remove only a few neurons out of the 12 initially available: often a single neuron
in one or multiple hidden layers, if any neurons were removed at all. Every time
pruning algorithm was able to remove a substantial number of connections
irrespective if this would result in pruning the hidden neurons at the same time
as well: while a fully connected 54-4-4-4-1 network would contain 67 neurons
and 252 connections, pruning algorithm would be able to optimise network
architecture down to a much more manageable number of connections, with as
few as 67 connections remaining in some pruned connectionist networks.
As it would seem that the shape of the network architecture is able to exert a
certain level of influence, it would make sense to test a few different shapes as
well to examine what effect this would hold over the performance of the
connectionist model. Thus, in addition to the 54-4-4-4-1 network architecture, 2
more types are examined: a funnel-type network architecture with a 54-6-4-2-1
design which could essentially represent a more complex version of the 54-2-1
design; and a reverse version of the funnel that would compress the connection
into 2 nodes initially and then allow to grow the network again, with a design of
54-2-4-6-1. First, consider a network architecture with a 54-6-4-2-1 funnel-type
design: even though the number of hidden neurons compared with the 54-4-4-4-
1 design, number of connections in the initial network before pruning naturally
was substantially higher at 358 (as opposed to 252 in 54-4-4-4-1 network) due to
243
the fact that now the many input units were immediately connected to a total of
6 neurons within the first hidden layer rather than 4 neurons as was the case with
the models of the previous 54-4-4-4-1 design. This time, starting with a much
larger initial network architecture, pruning algorithms was able to trim it down to
78 neurons in the best-case scenario. Even though the absolute number of
connections that remained after pruning with the 54-6-4-2-1 network
architecture was higher than with the 54-4-4-4-1 network architecture, the
pruning efficiency improved substantially. The network architecture that followed
the reverse-funnel type design with 54-2-4-6-1 naturally produced lower number
of connections initially at 146 total due to having only 2 hidden neurons in the
first hidden layer to which the input neurons could connect. Pruning algorithm
however was able to eliminate more connections than with the other 2
architecture designs, leaving only 25 connections in the best-case scenario. This
means that both network architecture types 54-6-4-2-1 and 54-2-4-6-1 were able
to achieve higher pruning efficiency than the original 54-4-4-4-1 network
architecture design. It became apparent however that network architectures with
a funnel type design almost never removed neurons entirely during the pruning
stage; and connectionist networks architectures with a reverse funnel type
networks, once compressed to only 2 neurons in the hidden layer, almost never
used all the neurons in the second and third hidden layer, usually pruning out
around half of them entirely. Therefore it would suffice to propose here that for
exploratory and interpretative purposes it would seem that the optimal
connectionist network architecture design would be of a funnel type: either a
244
relatively simple 54-2-1 design which provides the best possible level of clarity
and should be easiest to use for explanatory and interpretative purposes; or
something more complex such as 54-4-2-1 or even 54-6-4-2-1 as examined here
to develop a more advanced connectionist model of consumer decision-making
process that would be able to extract higher number of patterns and
microfeatures from the data, but would of course be more difficult to interpret.
6.4.5.3 Adaptive model learning strategy
Important to note here that the models with 54-6-4-2-1 initial architecture design
type systematically did not prune out any neurons – unlike the models with other
2 initial architecture design types, even though the efficiency of removing the
number of connections was comparatively high across all initial architecture
design types. This may suggest that depending on the available resources within
the network architecture, connectionist models are able to adopt different
learning strategies, and therefore are able to prioritise identification and
extraction of different patterns and microfeatures as a result of this selection. It
should be obvious that network architecture design would play an important role
in the future research, and perhaps would be an interesting research topic in its
own right. This is similar to the results reported elsewhere (for example
Cleeremans et al., 1989), and may be a very promising line of inquiry to
investigate within the dimension of artificial learning, and what implications this
may carry for the field of artificial intelligence in general.
245
6.4.6 Concise explanatory connectionist model
It should be apparent that it is a particularly challenging task to identify a
straightforward way to explain and interpret consumer behaviour. On the one
hand, simpler models such as traditionally employed in marketing literature
logistic regressions are easier to interpret, yet they are arguably not robust
enough to capture the complexity of behaviour – in other words, the explanation
may be easy because there isn’t much the model is able to offer that needs to be
explained really. On the other hand, sophisticated models are robust enough to
capture the complex relations within the data and extract the patterns that may
offer means to explain behaviour, but at the same time they are not easy to
interpret – to some extent, because we tend to employ decomposition as a
method to simplify the complex phenomena and make them easier to
comprehend, which of course would not be an option here because this very
complexity is what makes the models robust in the first place. As a compromise,
it would seem to be a good option to use a connectionist model with a single
hidden layer that would contain only a few hidden neurons – this way the
relations between the input neurons and the hidden neurons are clear and
quantified with connection weights, and the hidden neurons could be interpreted
as an emergent properties that represent an intricate combination of all the
relevant microfeatures extracted from the input variables, and therefore can be
treated in a similar manner as the concept of utilitarian and informational
reinforcement as proposed here following the connectionist method of analysis
and interpretation – in actuality, these concepts are most likely to be too complex
246
to make the clear identification possible, but this is probably as close as it would
get to a robust model of consumer behaviour. Once the theory is sufficiently
developed to provide the plausible interpretative account using these emergent
properties located within the hidden layer, it may be possible to consider
connectionist models of higher complexity with multiple hidden layers to explore
a more complex mechanism of extracting the microfeatures and patterns from
the data.
6.4.7 Predictive connectionist model and pruning
It should be clear that pruning connectionist models which are primarily
developed for explanatory purposes offers a range of benefits such as simplified
architecture and exposed core relations within the data that can reduce the
complexity of interpretation. When it comes to predictive capacity however, the
answer is not entirely straightforward as was shown above using consumer
dataset: on the one hand, connectionist models with pruning show lower RMSE
figures than models without pruning; on the other hand, RMSE figures that can
be achieved by connectionist models with pruning are only slightly lower and are
very much comparable to the figures achievable by connectionist models that do
not employ pruning algorithms. It must be noted however that it is the
explanatory capacity of connectionist modelling which is of primary importance in
this research project, and predictive capacity is used as a secondary characteristic
to assess performance level of modelling as a benchmark. Thus, it is safe to argue
that indeed connectionist modelling with pruning algorithms is able to simplify
substantially the network architecture optimising it for explanatory and
247
interpretative purposes, while maintaining comparable level of predictive
performance benchmarked against connectionist models that do not employ
pruning algorithms. Still, if predictive capacity was a primary objective, it should
be possible to explore to what extent pruning algorithms can be optimised with
aim to improve overall predictive capacity of the model, perhaps holding
explanatory capacity as a secondary measure to provide some sort of constraint
to make the modelling relevant for a particular behaviour and context; or indeed
employ artificial synthesised dataset where relations are known and defined to
assess predictive capacity to the fullest extent – this however would have to be to
be explored elsewhere. Thus, for a balanced exploratory capacity with a relatively
high predictive performance, a connectionist network that employs pruning over
a simplified 54-2-1 network could be identified as an optimal model to produce
an interpretative account of consumer behaviour in a given context.
6.4.8 Pruning network architecture for interpretation
Even though it is not within the scope of this research project to go as far as
develop specific algorithms that could adopt pruning for interpretative purposes
specifically, perhaps it would be useful to illustrate with a few examples how
useful pruning can really be while attempting to provide an interpretative
account of behaviour. For illustration purposes, two types of network
architecture are developed and interpreted following on the discussions above: a
large connectionist network with 3 hidden layers that does not employ pruning,
and the same connectionist network that takes full advantage of pruning
algorithms.
248
Using the same data subset as above for Wales and West that contains 13787
cases, a fully connected 54-4-4-4-1 network is developed. To assure the learning
procedure is sufficient to explore the dataset, 100 000 iterations are used here to
develop the models. For a 54-4-4-4-1 fully connected connectionist network it
takes 4643 sec when pruning is not involved, being able to achieve RMSE of 0.90
as a result. Yet, in a fully connected network with 252 total number of
connections (as shown in Figure 31), it is difficult to identify any patterns or draw
any conclusions by examining the network architecture – indeed, additional
statistical analyses and possibly even adaptive algorithms would be required to
examine variable contribution in an efficient manner, which can be used for
interpretation of consumer behaviour.
Figure 31. Connectionist network 54-4-4-4-1 architecture using consumer data with 3 hidden layers and 4 neurons each, 100 000 iterations, no pruning.
Whereas when pruning algorithms are employed, it is quite the opposite
situation: it takes 5355 sec to achieve a comparable RMSE of 0.95, and produces
a substantially optimised connectionist architecture that now only has 51
connections. As shown in Figure 32, pruning algorithms was able to effectively
249
remove 2 entire layers which ultimately were not necessary to model the data
and illustrate the pattern within the data used here.
Figure 32. Connectionist network 54-4-4-4-1 architecture using consumer data with 3 hidden layers and 4 neurons each, 100 000 iterations, pruning with 100 retrain cycles.
Upon closer examination, it could be clearly seen that connectionist network that
employed pruning was able to establish during the learning process 3 emergent
features, which are represented by the 3 computational units within the first
hidden layer: 2 excitatory and 1 inhibitory units. Moreover, it is possible to
examine which input units contribute towards each of the hidden units which
would help to explain what the emergent properties represented by the hidden
units may signify, and in what manner in particular. For example as shown in
Figure 32, input units for private label wines have excitatory connections with
hidden unit H1 and inhibitory connection with hidden unit H3, input unit for other
country of origin (normally cheaper wines) also has an inhibitory connection with
H3, and input unit for age has an excitatory connection with H3 – this may
suggest for example that hidden unit H3 may be interpreted to represent a type
of Informational Reinforcement; whereas hidden unit H1 which has excitatory
250
connections with for example inputs units for not in work (employment status)
social class C1 and certain types of wine that offer additional benefits such as
sparkling or fortified wine may be very well interpreted to represent a type of
Utilitarian Reinforcement. In addition to the illustration of the inhibitory or
excitatory connection, connection weights for every connection are also readily
available of course for a more precise examination, which can also be used with
advanced variable contribution analysis algorithms developed to take advantage
of pruned connectionist network output.
It is obvious here that 3 hidden layers are not in fact necessary for this data since
pruning effectively nullified 2 hidden layers entirely, thus corroborating one of
the previous recommendations that even a simplified neural network with a
single hidden layer (for example with 54-2-1 network architecture) could in fact
be an extremely powerful method of computational analysis due to its inherent
capacity to examine all possible interactions within the data as part of the
learning process.
This effectively illustrates the ability of the connectionist network to identify the
relevant microfeatures within the data while going through thousands of
iterations and exploring any possible interactions between variables, which then
can be said to emerge as a representation of higher order faculties that can
effectively represent elements of consumer psychology which are used in the
decision-making process, or perhaps even be interpreted as a proxy for rule-
governed behaviour established as a result of thousands of iterations that can be
used as proxy for consumer experiences.
251
6.5 Theoretical implications
Once convinced that evidence presented above provides a substantiated
argument that connectionist models offer extensive predictive capacity in
modelling consumer behaviour, it is important to consider what level of
explanation and interpretation they are able to provide, and discuss the
theoretical implications as well.
It should be apparent at this stage that it is no easy task to produce a truly
comprehensive robust explanatory interpretative model of consumer decision-
making behaviour. On the one hand, the traditionally employed methods such as
logistic regression offer an easier solution open to interpretation – but it is their
inherent limitation that makes it impossible to actually capture those truly
complex relations within the data that makes the interpretation easier, as the
simpler model is unable to capture the complex parts. On the other hand, the
connectionist models are inherently designed in a manner that explores all
possible combinations and interactions within the data, and through many
training iterations learns the data by extracting the complex relations – the
process makes it possible to produce robust and comprehensive models capable
of capturing accurately the complex relations within the data, but at the same
time this very complexity makes it difficult to interpret the results obtained in
such a manner. There is just no straightforward way to describe a complex
phenomenon such as consumer decision-making process in simple terms without
252
inevitably losing part of the explanation, meaning, or interpretation – simpler
terms would ultimately only refer to incomplete and simpler concepts.
Having said that however, it does not mean that there will not be a method to do
so in the future. The work carried out here is largely theory developing, where a
novel approach is proposed to examine and extend a well-defined and
established concept of BPM with a connectionist approach. It is argued and
supported with empirical work that connectionist models are well suited to
extend the theoretical framework of BPM by providing an empirical evidence to
some of its claims and propositions, and proposing structures and processes to
continue developing the field further. Indeed, what the author asks is a shift in
the level of understanding – from traditional view where input variables which
are selected to operationalise a certain phenomenon are expected to link directly
into the output variable possibly with a few interactions along the way; to a
connectionist fully linked view where input variables are decomposed by the
modelling process into microfeatures, which are in turn combined and
reassembled in the best possible manner during the training process to form
complex patterns that describe relations within the data. These emergent
patterns when extracted from the data can then be called the underlying factors
that explain the behaviour – this is where concepts like informational and
utilitarian reinforcement can be expected to reside. In this sense, informational
and utilitarian reinforcement are the theoretical placeholders, as practically it
may not be entirely possible to separate the two, as they seemingly tend to form
a uniform info-utilitarian reinforcement in most observed cases – the temporarily
253
termed entities to make it possible to develop the explanatory account of
consumer behaviour. These patterns very well may describe the utilitarian and
informational reinforcement to some sufficient extent, but will also most likely be
something a lot more than that, something that is difficult to describe or name
because it is a concept that is difficult to comprehend for a human mind: too
many variables and interactions are considered simultaneously, and there is no
other analogous simpler concept that could help the comprehension. Thus,
connectionism is able to provide a theoretical framework and structure, and
develop an empirical highly predictive model, but it does not provide any feasible
way to decompose the phenomenon into smaller pieces that may be easier to
comprehend – simply because this very complexity and the interconnected
essence is what we are attempting to explain and interpret.
6.6 Summary
In this chapter, the research findings are discussed and interpreted. First, the
concept of informational and utilitarian reinforcement was revisited in
connectionist terms, followed by a discussion of results from the exploratory
modelling and comparative tests of regression and neural networks. Next,
connectionist models predictive capacity and the research around the variable
contribution analysis were reviewed. Finally, the advanced connectionist models
were developed and discussed, concluding with a conversation about the
theoretical implications that identified certain areas for future research direction.
254
255
7. Critical assessment of the research project
In this chapter, the precision, thoroughness, and contribution of the method
employed here will be concisely discussed; and the connectionist approach will
be critically reviewed and compared with its closest rival – the cognitive science.
7.1 Connectionism and cognitive science
Even though it is increasingly common to encounter connectionism while
attempting to model mental or behavioural phenomena as the emergent
processes of interconnected networks of simple units in many scientific fields
such as artificial intelligence, cognitive psychology, cognitive science,
neuroscience, and philosophy of mind, nevertheless, connectionism cannot be
considered a discipline – but rather a set of consistent approaches that span
across these many fields. Connectionism emerged as an approach that integrates
the symbolic school of thought, and the behaviourist school of thought: each
carry their own scientific framework and philosophy. When applied within a
context of a specific field, each school of thought provides a paradigm that
defines establishes goals to guide research, and provides its own set of
assumptions and techniques. Cognitive science proved to be a great collaborative
accomplishment over the years while applying the symbolic approach across
many fields of inquiry, whereas connectionism is in a position to question the
very foundation of cognitive science and its comprising disciplines as
256
connectionist models provide a plausible alternative explanation to symbolic
models.
Towards the end of 1970s, the symbolic systems became a method of choice in
cognitive science and its two central disciplines – cognitive psychology and
artificial intelligence – as behaviourism seemingly became dated. Soon after
however it became apparent that systems solely reliant on symbolic
representations and operational rules possessed a number of irreconcilable
limitations: the rigid inflexible structures tend to be fragile and offer an
inadequate solution to model the process of learning or pattern recognition. This
served as an opportunity to reintroduce the network models developed years
earlier that would rely on sub-symbolic interpretations and connectionist
networks comprised of large number of interconnected computational units,
emphasising the distributed representation and statistical modelling. Symbolic
systems nonetheless have also been progressively developed to exhibit a
reasonable level of flexibility and resilience, learning ability, and subtleness –
contrary to connectionist models however where statistical methods are
employed to extract patterns from the data, symbolic models maintained the use
of ordered strings of symbols.
It should be out of question that connectionist models are now an essential part
of cognitive psychology and artificial intelligence, and even more so in
engineering and other related fields – this has become more apparent in the
recent years in particular as advances in technology reduce the limits of
computational ability, and offer the additional capacity that large connectionist
257
models may require. Attempts to reconcile the symbolic and connectionist
models over the years inadvertently cultivated an environment susceptible to the
idea of hybrid models that would be based on the synthesised framework,
incorporating the elements of both connectionist and symbolic systems.
Nevertheless, it will still take some time until it is the case that the hybrid systems
approach could be considered universally accepted – some additional steps are
required such as points argued here for example that behaviourist approach
could benefit from incorporating the elements of intentionality to form
intentional behaviourism. The level of willingness to modify one’s research
framework depends on a number of factors that are distinctly dissimilar within
the two disciplines – such as the overall purpose of modelling for example.
7.1.1 Artificial intelligence
The primary goal of artificial intelligence is to develop an algorithm that would be
able to exhibit a level of performance that could be said to act in an intelligence
manner – ultimately an artificial general intelligence that would be capable of
successfully performing any intellectual task that a human being can perform. In
fact, considerable efforts are now increasingly directed at simultaneous
development and containment of artificial general intelligence, as it is
hypothesised that genuine artificial general intelligence would be capable of
recursive self-redesign, resulting in an event of intelligence explosion where
intelligence growth would be exponential, quickly exceeding intellectual capacity
of any human, and eventually exceeding combined intelligence of all humans.
Because it is hypothesised that the capacity of this superintelligence may be
258
incomprehensible for a human-level intelligence, the point beyond which events
may become unpredictable to human intelligence is commonly referred to as
technological singularity. On the one hand, an optimistic outcome is possible
where superintelligence would create a utopia for all humans using its superior
capacity to solve all world problems; on the other hand however, an opposite
outcome would be an event of global human extinction – hence the efforts to
contain the superintelligence, and even prevent any chance of its emergence
altogether.
Not all research in the field of artificial intelligence is devoted to modelling
however, and it is essential to understand that contemporary state artificial
intelligence is not only a product of intellectual tradition rooted in the
interdisciplinary nature of philosophy, cognitive science, and psychology, but also
is a fundamental constituent of it. For example, artificial intelligence
contemplates and offers a considerable contribution to central themes such as
nature of intelligence and knowledge, theoretical framework of knowledge
representation, considering whether certain models can be considered artificial
or rather a simulation of human cognitive process – pondering upon these and
similar questions is an essential part of the artificial intelligence discipline.
Furthermore, it should be possible to view computational algorithms as scientific
experiments in a traditional sense, where an algorithm is developed and run, and
researchers examine the results to subsequently redesign the algorithm and re-
run the experiment – in pursuit to determine whether the algorithm can be
considered an adequate representation of intelligent behaviour.
259
Newell and Simon (1976) argued that any system capable of general intelligence
would prove upon analysis to be a physical symbol system of sufficient size, which
can be further developed to exhibit general intelligence comparable to the extent
of general human intelligence appropriate for any real context and adaptive to
the environment within the reasonable limits of processing speed and
complexity. In subsequent years, researchers in cognitive science and artificial
intelligence explored the research field delineated by this hypothesis, adopting a
number of essential methodological commitments: (1) symbols and systems of
symbols are used to develop a descriptive account of the phenomena; (2) search
mechanisms are designed to explore the inferences that symbol systems could
potentially support; (3) it is assumed that a properly designed symbol system
would be able to provide a complete causal account of intelligence on its own,
effectively removing the need for a cognitive architecture; and finally (4) in
attempt to explain intelligence by developing working models of it, the field of
artificial intelligence could be considered empirical and constructivist. In the same
manner as they are used in natural language where it is understood that symbols
refer to or reference something other than themselves, the use of symbols in
artificial intelligence is extended to represent the reference to all forms of
knowledge, intention, and causality within the environment and context of an
intelligent entity – working on the premise that symbols and their semantics can
be implanted in formal systems in a constructive manner, introducing the notion
of a representation language. To model intelligence as a computer algorithm, it is
essential to be able to formalise a symbolic system: formal systems allow the
260
assessment of such issues as complexity and comprehensiveness, as well as
deliberate the structural organisation of knowledge and complex semantic
relationships. The issue of grounding of meaning however has been seen as a
hindrance by both advocates and opponents of symbolic systems in artificial
intelligence and cognitive sciences – it queries how symbols can have meaning,
and whether traditional artificial intelligence systems that operate on the
principle of linking once set of symbols to some other set of symbols would
actually have any ability to interpret these symbols in a meaningful manner in the
absence of supporting semantics that are normally available to humans from a
social context. As a result, the methodology of traditional artificial intelligence
systems focused on exploring the pre-interpreted states and their context, which
come pre-encoded by the architects of the artificial intelligence system with
contexts and semantic meaning and therefore serve as a function of this
particular type of interpretation – as a result, such artificial intelligence systems
are able to demonstrate very limited capacity to extrapolate new meaningful
associations while exploring their environment. Therefore, the most successful
applications that tend to abstract away from the social context to capture the
core factors of problem solving with pre-interpreted symbol systems nevertheless
remain inflexible, unable to demonstrate generalised interpretation capacity, and
lack resilience and ability to recover.
As discussed at length throughout this research project, explicit symbol systems
are not the only way to represent intelligence – connectionist frameworks
provide useful functionality to understand intelligence in a scientific and
261
empirically reproducible manner. Connectionist networks are models of cognition
that do not necessarily rely on pre-determined and specifically referenced
symbols to describe it, since the knowledge representation in the network model
is distributed across the architecture of the network, and it may be difficult – if
not entirely impossible – to isolate specific concepts to particular computational
neurons and synaptic connection weights within the model, as any part of the
model may be instrumental in the representation of various phenomena.
Therefore, connectionist networks serve as a challenge to the argument of
Newell and Simon (1976), and in addition to symbolic representations provide a
new string of research around the concepts of adaptive modelling and learning
for the field of artificial intelligence. Because the structure of the connectionist
network is formed by the process of learning as much as by the design, it does
not require an explicit symbolic model and rather is a result of interaction within
its environment. In such a way, connectionist models are recognised for a
number of substantial contributions to understanding of intelligence – more
importantly within a context of this research project (but not limited to) a
plausible model of underlying mechanisms that describe a learning processes and
behaviour, from the viewpoints of both artificial intelligence and cognitive
neuroscience. This very inherent nature of connectionism that is so distinctively
different from the tradition of symbol models in artificial intelligence is precisely
the reason why connectionist networks may be particularly suited to address
some of the questions that may be outside of the competence of expressive
functionality of symbol models – for example the pattern recognition capacity of
262
connectionist models dealing with noisy data, where distributed representation
enables a properly trained network to demonstrate performance similar to
humans employing extrapolated elements of similarity rather than logical rules in
task such as classification of previously unseen data. It is apparent that neither of
the approaches is likely to emerge as dominant, and hybrid solutions that
incorporate both symbols and network are necessary to develop truly robust
models of intelligence and cognition.
7.2 Distributed representation
In machine learning as a model of information processing, using one
computational unit to represent one element is the most straightforward method
– commonly referred to as local representation – is easy to understand and
interpret, as the network structure corresponds to the structure of the
knowledge it embodies. There are other implementations however – they are
more complex, but at the same time offer notable emergent properties
unavailable with local representations. In the following paragraphs, a few notable
features of distributed representation as an inherent feature of connectionist
models will be discussed – it should make it clear that distributed representation
is particularly suited to the task of explaining complex phenomena such as
consumer purchasing decision-making process.
263
7.2.1 Memory
Thinking about consumer decision-making process, the standard metaphor for a
memory system is typically a hypothetical warehouse for mental copies of items
with some sort of storage and retrieval facilities that find a copy of an item using
descriptors provided – this process however is inefficient and contradictory to the
process of human memory where acceptable results could be produce even with
missing or incorrect descriptions. Another way to view memory is not as a
traditional content-addressable search mechanism using available descriptors,
but rather as an inferential process: the memory is retrieved by constructing
every time a pattern of activity using microfeatures and their connections to
represent the most plausible concept consistent with the available cues.
Connectionist network models are inherently based on these principles and
therefore would be particularly suited to handle elements that are responsible
for memory storage and retrieval as part of the consumer decision-making
process.
7.2.2 Generalisation
Constructionist concept of memory is related to another feature of consumer
decision-making process – the instances when the new items are learned and
subsequently stored in memory. To accommodate this in a connectionist model,
while at the same time making sure the existing items are not deleted, many
connection weights could be adjusted a little – this would have a transferred
effect for all related items, while ignoring the unrelated items. This type of model
264
behaviour epitomises the concept of distributed representation, and while doing
so invokes the most remarkable of properties – generalisation. Rather than using
local representations, a distributed representation system would store
information by automatically extracting and decomposing the phenomena into its
constituent microfeatures, where specific microfeatures could relate to multiple
phenomena simultaneously. When new information is acquired, it is
automatically propagated throughout the distributed representation system to
modify activation patterns for all related patterns, thus making it available
throughout the system in a similar manner humans are able to make
generalisations.
7.2.3 Learning history and behaviour continuity
In attempting to formulate a plausible explanatory account of consumer
behaviour, being able to attach arbitrary descriptions to units using local
representation may seem more intuitive and therefore be considered an
apparent advantage over distributed representation systems. There is a matter of
efficiency that distributed representation systems offer however, which not only
allows assigning particular descriptions to the distributed representation clusters
in the similar manner as in the local representation systems, but also makes it
possible to construct new concepts – a feature that can clearly be useful to
facilitate the future development of theoretical frameworks for such concepts as
learning history and continuity of behaviour.
265
7.3 The implications for consumer behaviour
It can be said without a doubt that consumer decision-making process is a
complex phenomenon that can normally be attributed to intelligent behaviour –
in fact, it could serve as a reasonable test to assess a level of artificial general
intelligence: if an artificial system (likely embodied with the use of robotics) could
just go out at any unspecified established location and purchase a few required
items, all on its own while learning the environment and making other decisions
as necessary to reach the final purchase goal, it could be said to act in an
intelligent manner comparable to that of a human consumer. Thus, working to
develop plausible models of consumer behaviour could not only serve to satisfy
the immediate questions that deal with the purchasing decision-making process
per se, but also serve as a substantial contribution to modelling cognition and
intelligence.
7.3.1 Utilitarian and informational reinforcement, and NNs
In the course of this and previous research projects, it became apparent that a
method to empirically define and measure informational and utilitarian
reinforcement within the data would be extremely helpful (Greene, 2011). One of
the reasons of course is that previous research required a substantial amount of
work to consider and define the level of utilitarian and informational
reinforcement – for each of the brands within the data. This requires not only an
extensive familiarity with the market situation, but also a significant amount of
time and resources: if at all possible, a qualitative study is carried out to identify
266
and validate brand perception attributes, which are useful to operationalise
utilitarian and informational reinforcement in a subsequent quantitative survey
research. Then, traditionally a quantitative investigation would be carried out
which would aim to survey a number of individuals and collect brand perception
data where each brand is evaluated by respondent with a standardised
questionnaire, and with a substantial sample size, both utilitarian and
informational reinforcement values can then be attributed to brands to be used
in all consecutive research.
To streamline and optimise this process, it was hypothesised that this task could
be carried out by a connectionist model that should be able to extract the
utilitarian and informational reinforcement for each brand from the data during
the pattern recognition (learning) process and subsequently provide a method to
quantify them by assigning a numerical score to each brand. Instead, it became
apparent that utilitarian and informational reinforcement in fact belong to a
qualitatively different level of explanation, and could be modelled in a very viable
manner as emergent entities using distributed representation of hidden layers in
neural networks (Greene, 2011).
7.3.1.1 Consumer behaviour modelling process
The inherent nature of a neural network as a method of analysis makes it
qualitatively different from traditionally employed methods such as logistic
regression. For the purposes of this discussion, let us pay no attention to the
methodological differences and instead consider the semantics. Neural network
267
model with no hidden layers essentially is no different from a logistic regression
as it too only incorporates the input and the output layers. Once the hidden
layers are introduced between the input and output layers, the neural network is
capable to develop unique features that are of particular interest in the
discussion of behaviour analysis.
In the case of regression, the analysis only considers pre-specified by the
researcher input and output variables, whereas a neural network learns the
structure as part of the process. Complex neural networks that incorporate
hidden layers determine connection weights between the variables and the
hidden layers, depending on the network architecture. The key difference with
the traditional method is the meaning of hidden layers and neurons that emerge
from the learning process – network characteristics that are not defined by a
researcher but rather are the product of a learning process. In the context of
consumer behaviour and using the established framework of BPM for explanatory
purposes, it is the central hypothesis of this research project that the hidden
layers may be interpreted as emergent representation of utilitarian and
informational reinforcement, along with other factors that may influence
behaviour otherwise inconceivable to the researcher as identified by the neural
network in the process of learning and extracting the patterns from the data.
What is argued here then is that within the modelling of consumer decision-
making process, the utilitarian and informational reinforcement should be
positioned on a different semantic hierarchical level to the traditionally employed
268
inputs such as product parameters, consumer demographics, decision
environmental parameters, etc.
Thus, it makes it possible for the product characteristics and demographics to be
connected not solely to the output variables directly, but rather to intermediary
abstract entities following the distributed representation principle, which are
shaped by the learning process of the neural network. These abstract
characteristics are not easily interpretable however, and it may in fact prove
quite difficult – if not entirely impossible – to assert reliably whether the
constructs represented by the hidden neurons and connection weights truly
represent utilitarian and informational reinforcement. Within the theoretical
framework of BPM, utilitarian and informational reinforcement is defined in
terms of money allocation as a function of brand reinforcing attributes (Oliveira-
Castro, Foxall, & Wells, 2010). This issue however is not absent elsewhere either,
as it is not possible to say to what degree the utilitarian reinforcement could truly
be represented for example by the additional desirable product attributes such as
sausage in baked beans versus plain baked beans (Oliveira-Castro et al., 2010).
This assumption may not holds for all consumers, or even may very well be the
opposite case for some – consumers may receive higher utilitarian reinforcement
from plain baked beans as for example in their particular situation plain baked
beans may be used in a wider variety of dishes for example. Thus, the real issue is
the matter of operationalization, and is not unique to neural networks.
Even though it should be reasonably possible to conclude that testing this using a
pruned and optimised connected network where input units are connected to
269
hidden layer that would contain two hidden neurons to represent utilitarian and
informational reinforcement as proposed here in the previous chapters as the
preferred option to reduce the level of difficulty while interpreting the final
neural architecture, a number of other network architecture types were explored
and presented here in attempt to identify the optimal type most suitable for
explanation of consumer behaviour, may that be for predictive or interpretative
purposes. The abstract characteristics that emerge during the learning process
within the hidden layers however are not easily comprehensible or interpretable
for a number of reasons. For once, the network capacity allows exploring the
incredibly complex interrelations within the data, and able to identify subtle
unexplainable patterns. These patterns however could be too multidimensional
to comprehend for a human mind – something a researcher would not be able to
think of on their own as would be required in the case of regression analysis
where all variables and the structure must be predefined from the onset.
7.4 Cognitive process simulation
Thus far, relatively simple connectionist networks have been discussed here that
model isolated cognitive processes with simple input and output. But what about
modelling higher cognitive processes that would necessitate complexity in both
the processing and the input and output of the network? The two networks
discussed next are developed as possible models of higher cognitive processes.
270
7.4.1 Past-tense acquisition model
Ability to construct an infinite number of grammatically correct sentences is
based on grammatical rules used in a natural language. In psycholinguistics, it is
assumed that either an innate knowledge regarding the rules of grammar is
available or humans follow a process of hypothesis formation and subsequent
testing based on the experiences and linguistic information available. The product
is assumed to be a mental structure in the human mind that contains
representations of linguistic rules. Rumelhart and McClelland (1985b) propose a
mental structure capable of processing natural language that does not require
explicit representations of the rules. English past tense acquisition is a well-
studied phenomenon, where a U-shaped learning course is characterized by the
three stages. First, past tense for a small group of verbs (mostly irregular) is
acquired. This is followed by the acquisition of past tense for a larger group of
verbs (mostly regular) where the rule seems to develop (add -ed to verb stem) –
as a result it is also incorrectly applied to some of the irregular forms. In the final
stage, regular and irregular forms are used correctly suggesting that exceptions
to the rule are learned. Irregular verbs could be further grouped into
subcategories, which could explain some of the errors produced by human
learners. It is important to set the limits of what is being modelled to simplify the
model considerably yet enabling it to achieve the substantial level of simulation.
The model of Rumelhart and McClelland (1985b) consisted of three connected
networks, where the first network translated the phonological input into the
appropriate format for the second network, the pattern associator. Pattern
271
associator output was again translated by the third network into the final
phonological output. Networks are quite different from the traditional methods,
and therefore pose certain difficulties for traditionally employed strategies. For
example, ordered mapping of phonemes would not work as easily in network
architecture – instead, context sensitive phonemes were employed. An issue of
network efficiency was solved by replacing the representation of all phonemes
with the features of the phonemes, which substantially decreased the number of
units necessary to encode the problem. This offered distinctive enough
representations yet allowing the degree of generalization for a network to
generate past tense forms for previously unseen verbs. Thus, the network
functionality is not tied to the verb stems in particular but rather to the
distributed phonological features, at the same time identifying similarity between
the verbs and determining which of the verbs require the application of the
regular past tense rule. Even though this model is able to achieve high levels of
performance, one critique argues that much work is done in the featural
decomposition of phonemes, which is based on the adaptation of traditional
linguistic featural analysis (Pinker & Prince, 1988). Even though this may be true
to some extent, the contribution of the connectionist network lies in the pattern
recognition process not reliant on rules, which is necessary for learning to occur
in the network.
For the simulation, encoded verb stem is supplied to the two-layer feedforward
network that employs a stochastic version of logistic activation function. After
obtaining the pattern of activation, error correction procedure facilitates learning
272
through the adjustment of connection weights by comparing the obtained
pattern with the desired output pattern. This network however is not particularly
suitable for the purposes of examining the behaviour in detail due to its size, as it
was primarily tasked with simulation of the stage-like learning process of past
tense acquisition observable in children (U-shaped learning function). Smaller
purpose built networks with fewer weights can be scrutinized to examine the
emergence of behaviour and the underlying factors that influence it.
7.4.1.1 Overregularization in a simpler model
To examine the processing that occurs inside the network, a simplified model is
considered by McClelland and Rumelhart (1988) that comprises of eight input
units and simple enough rule used to transforms the input pattern for the output
of 18 cases in total. Predictably, due to a systematic nature of the input-output
relation, the network was able to achieve absolute performance without actually
relying on the rule. Then, one of the cases was transformed to be in conflict with
the rule employed to transform the input patterns. To simulate the children
learning process, the network was presented with only two cases: regular (that
follows the devised rule) and irregular. The network achieved good learning level,
but was not able to extrapolate the rule from just two cases properly. When the
remaining 16 cases were introduced, the network was able to extrapolate the
transformation rule. At that stage, overregularization could be observed with the
one irregular case (signified with the increased errors), which subsequently
followed by learning to incorporate the irregular case. Thus, the network learning
273
process closely resembles the stage-like learning process of children described
earlier.
7.4.1.2 Past-tense acquisition simulation
The learning simulation of Rumelhart and McClelland (1985b) consisted of stage-
like process where ten most frequently encountered verbs were supplied to the
network first, followed by a set of verbs of average frequency of usage, which
was finally followed by a set of verbs with low frequency of usage. The model was
able to achieve high level of performance (between 80 and 85 percent) on the
first set of verbs after 10 epochs, when the second set of verbs was introduced.
This resulted in a temporary drop of performance (around 10 percent) on
irregular verbs – a characteristic feature of Stage 2 in children learning process, a
result of interfering with learning the regular pattern. By epoch 20, the
performance started to improve, and by epoch 160 the performance of the
model was around 95 percent features correct – the model was able to learn the
irregular forms as exceptional to the rule cases. Mistakes of the model in the
general direction of overregularization, as expected (for example, –ed added to
the stem of the irregular word to forms such as comed or camed). Thus, the
network is able to simulate the stage-like learning and the effect of
overregularization without the use of rules. Testing the model on the previously
unseen set of verbs with low frequency of usage showed high level of
performance (92 percent correct feature activation for regular and 84 percent for
irregular verbs). The ability of the network to generate past-tense form to novel
274
verbs was less than optimal, but comparable to human performance in similar
task, suggesting comparable limitation of the model to the native human speaker.
The in-depth analysis revealed that many of the distinct subclasses of regular and
irregular verbs (for example, nine subtypes of irregular verbs described by Bybee
& Slobin, 1982) could be identified in the simulation results. The simulation
showed similar ranking of the performance within each of the subclasses, even
though variances were less dramatic. This could be due to the fact that the
explanations responsible to performance variations were not present in the
simulation, and therefore may provide a superior account of human
performance. In addition, it became apparent that the error type propensity
(comed vs camed) differed across verb subclasses.
Based on these findings, Rumelhart and McClelland (1985b) argue that it is
possible to simulate the essential characteristics of human learning behaviour
with relatively simple network architecture and without the use of explicit rules.
7.4.1.3 The role of input
In past-tense learning, the role of input in both human and network simulation
models is not entirely understood. What requires further examination is the
comparison of the input conditions of children and simulation models, and the
range of conditions under which U-shaped learning can be present in networks.
7.4.1.3.1 The role of input in children
Some researchers argue that supplying discontinuous input for the network is not
a proper mechanism to attain a U-shaped learning curve, as there is no sudden
275
change in vocabulary size or verb type and therefore it should be a factor to have
an effect on overregularization (Ullman, Pinker, Hollander, Prince, & Rosen,
1989). If not change in input, then what activates Stage 2 learning? Research
indicates that during Stage 1 irregular type verbs outnumber regular type (Bloom,
1970; Nelson, 1973). In Stage 1, children produce the stems for most verbs, but
also tend to use incorrectly some of the irregular past-tense forms instead of the
stem (Kuczaj, 1977). In Stage 2, proficiency of using past tense in appropriate
context improves for most verbs, with the exception of about one-third of
irregular verbs where overregularization occurs. If the same type of mechanism
was responsible for learning in Stages 1 and 2, it would not be able to control the
word production initially and only use the verbs as input, gradually developing
the network mapping as more verbs (largely regular, as regular verbs surpass
irregular by the end of Stage 2 as suggested by Ullman et al., 1989) become
available. As the mechanism progressively matures following the developmental
process that improves the coordination of previously separate competencies
(Bates, Bretherton, & Snyder, 1988), it gradually develops the capacity to control
the production of verbs. Therefore, it supports Rumelhart and McClelland's claim
that these changes facilitate the transition from Stage 1 to Stage 2. The
simplification of the process for the simulation model could be further justifiable
in light of lack of understanding how the process actually occurs in children. The
two suggested developments could either examine in greater detail the child
acquisition data, or the network behaviour.
276
7.4.1.3.2 The role of input in networks
When the network is trained on a small subset, it is capable of achieving high
performance by learning the whole dataset without extracting the patterns. In
the case of learning the past-tense verbs, it may seem to be just that, as the initial
set of verbs available to a child normally contains a rather small number. Once
more verbs paired with past-tense forms become available, the network begins
to develop the evidence of a systematic structure in weights. As the systematic
structure becomes more pronounced in network weights when more verbs are
made available for inputs, the inclination towards overregularization becomes
evident as well - even if only half of the input verb pairs exhibit the pattern
explicitly.
Plunkett and Marchman (1989) performed a number of past-tense formation
simulations where the network was presented with a complete set of verbs at all
training epochs, effectively eliminating the input change altogether. Regardless,
U-shaped learning was obtained for some individual verbs. They employed a
much simpler distributed coding for the verbs as well compared to which was
used by Rumelhart and McClelland (1985b) and therefore did not promote
generalization as much.
7.4.1.4 Past-tense formation model summary
Rumelhart and McClelland (1985b) argue that relation between the verb stem
and past-tense form is described by a set of general rules, but it is the mechanism
of distributed processing across the network weights that governs the relation.
277
Moreover, both standard and exceptional cases are encoded within the same
single network architecture, and network learning process shows similarities to
the learning process of children.
7.4.2 Kinship knowledge model
In psycholinguistics, the ability to handle kinship relations is a common test to
assess the model. (Hinton, 1986) developed a multi-layer connectionist network
for such a task, where the information on 24 individuals is analysed by a five-layer
network with three hidden layers. Individuals are divided between two families
with isomorphic family trees that include 12 relationship types. Inputs contain
Person 1 and a relationship type, and the model provides the encoding for Person
2 on the output. The first hidden layer contains 12 units: six units receive input
from 24 Person 1 units, and six from 112 relationship type units. The second
hidden layer contains 12 units that are fully connected to the first hidden layer,
and provide the input for the third hidden layer that contains six hidden units and
provides the input for the output layer that specifies the output for Person 2. The
network is forced to extract the relevant features for the distributed
representation as the input information follows through the layers that contain
fewer units (36-12-12-6-24 network structure), and then use the extracted
features to identify the output. Out of the 104 possible Person 1 - relationship -
Person 2 cases, a back-propagation network was trained on 100 cases and tested
on the remaining four cases. The network was able to provide a correct output
for all four cases in the first run and for three out of four cases in the second run,
278
suggesting a high performance capacity without the reliance on propositional
representations or inferential rules.
When the weights matrix is examined, it becomes apparent in what way was the
network able to accomplish the cognitive simulation task. For example, the
connection weights between the input units representing Person 1 and the first
hidden layer suggest the extraction of features relating to the family symmetry,
such as which of the two families or generations (younger or older) the person
belongs to. Thus, the network was able to identify the kinship structure only from
the cases of specific relationships presented to it through the restructuring of the
information and feature extraction procedures. Based on the internal featural
distributed representation developed as a result of the training process, the
network was able to learn the nature of relationships and use this knowledge to
make inferences. It is imperative however to be aware of the hidden unit
interpretation, as in most cases, even if it may seem quite natural to assume
certain labels from the examination of network behaviour, the hidden units
represent a very complex interrelated combined subset of subtle features
extracted by the network that may not be straightforwardly explained.
Many questions remain however that revolve around the interpretability of
hidden units, and whether the interpretation is even necessary or beneficial; the
training and testing procedures; and questions that deal with the type of higher
cognitive tasks that can be simulated with networks.
279
7.5 Phenomenological critique of computational
models of intelligence
One type of critique originates in phenomenology, and disputes the established
computational approach of modelling intelligence. On the basis that the process
of skill acquisition has been misunderstood, Dreyfus (1992) argues against
general approach to modelling intelligence which was undiscerningly adopted by
early Artificial Intelligence researchers – the same approach that forms a
fundamental part of the research programme described here. Building upon the
work of Heidegger (for extended discussion please see for example Heidegger,
1988), Dreyfus supports his opposing view with an assertion that humans are in
fact experts at carrying out a multitude of tasks within varying situational context
as part of everyday life. This type of expertise is arguably overlooked in traditional
approach to computational intelligence modelling, and instead is taken to be an
assumed foundation upon which all subsequent learning and rule formulation
occurs. Dreyfus argues quite the opposite by stating that humans begin with a
pre-formulated set of explicit rules, which are applied and specialised to a
multitude of particular contextual situations. The key element of critique
postulates that computational algorithms do not generate meaning or sense, but
rather appear as meaningful when taken within the context of human everyday
expertise. Thus, Dreyfus asserts that the understanding of intelligence with the
use of computational modelling would inevitably entail revisiting the fundamental
280
principles of meaning that originally identified the need for these computational
and technological artefacts, and how it reflects upon the human identity.
Certainly, these arguments may hold true while considering human intelligence,
but would they be appropriate in the same manner to explanation of artificial
intelligence? In fact, some of the aspects of intelligence that are taken to be
virtues with human intelligence, such as human everyday expertise as discussed
above, could be seen as a limitation and a constraint when speaking in terms of
just any general intelligence that is not tasked primarily with closely replicating
the way human intelligence occurs, and instead developed to achieve certain
level of performance by optimal means, which does not necessarily need to be
similar to that of a human intelligence. Positively, this research direction is further
supported by recent technological advancements that constitute a substantial
progress already, and computational models are eventually expected to reach
performance levels comparable to the most advanced currently known
information processing entity, the biological human brain – and indeed surpass
not only performance level of individual humans, but ultimately surpass the
combined collective intelligence of all humans. Certainly, this would entail
inherently different way for intelligence to emerge.
7.6 Summary
In this chapter, the concise assessment was offered to critically assess the
connectionist approach against the established tradition of cognitive science.
281
Some of the inherent features of both disciplines and their modelling approached
discussed to evaluate both advantages and disadvantages in relation to the task
of explaining the process of consumer decision-making.
The following chapter will consider a few potential direction for future research.
282
8. Future work
It is clear that even though the experimental work carried out as part of this
research project is able to offers a substantial amount of data for all subsequent
analyses carried out and discussed here, there is a multitude of unanswered
questions that could extent this line on enquiry further, and in a number of
potential directions. A few of these potential areas of inquiry will be discussed in
this chapter.
As a continuation of the explanatory modelling approach predominantly
discussed here, it could also be advantageous to continue developing explanatory
capacity by taking it in a number of different directions, some of which are
discussed next.
Once an acceptable level of explanatory capacity is achieved, it then becomes
possible to explore the prescriptive direction and normative consumer behaviour
modelling in attempt to optimise the connectionist modelling and develop a
certain level of prescriptive capacity to achieve specific objectives that may be
desirable for a number of reasons. Some of these are discussed in the following
paragraphs.
8.1 Individual respondent level of behaviour analysis
One obvious option to advance the research undertaken here would be to
consider individual respondent level rather than a multi-respondent prototype
283
defining method of analysis carried out here. Obviously this is outside the scope
of this research project, but would seem a natural evolution of this line of inquiry,
which is further corroborated by some of the other potential directions to
advance the work discussed here as proposed and discussed in the next
paragraphs.
The problem of demonstrating individual behaviour in a scientific manner is
reasonably well understood and comprehensively described (for an extended
discussion please see Skinner, 1953), and over the years has been applied to a
wide range of contextual and behavioural settings which resulted in producing
general descriptive accounts of mechanisms that govern and foster many
observable individual forms of behaviour. Thus, the research process that carries
out an applied behaviour analysis on an individual respondent or consumer level
is a self-monitoring and self-evaluating method of scientific inquiry to study
behaviour in experimental applied manner. The pragmatic nature of behaviourist
theory is evident in the application of behaviour analysis, where the verbal
description of non-verbal behaviour by the respondents themselves would not be
acceptable, and the focus of the research programme is in fact revolves around
what subjects can be brought to do rather than brought to say – that is unless
verbal behaviour of interest of course.
It would be customary to expect for a consumer behaviour to be composed of a
number of a sequences of physical events, and precise measurement of these
events is required for a scientific examination – this bring upon the problem of
reliable quantification of the behavioural response which cannot be easily
284
circumvented in applied consumer behaviour research, whereas in non-applied
consumer and other research there may often be an opportunity to select a
behavioural response item which is easier to quantify and measure in a reliable
manner from the onset. Behaviour analysis normally requires a demonstration of
sequential events that are said to be responsible for the manifestation of the
behaviour, and researcher is required to demonstrate an evident degree of
control over the said behaviour by either being able to increase or decrease the
frequency or duration of behaviour – something which is reasonably achievable
within the experimental laboratory setting by either replication or satisfactory
probability levels derived from the statistical analyses and modelling of grouped
respondent data, yet may pose a difficulty in an applied context of consumer
decision-making situation in a market environment. The two types of research
design that are commonly used to demonstrate with a certain degree of reliability
the behaviour control can be referred to as the reversal and the multiple baseline
methods.
The first method assumes a continuous tracking of behaviour for an extended
period of time to ensure the clear measurement stability is achieved before the
experimental variable is applied. The behaviour continues to be monitored and
measured to determine if the experimental variable is able to exert any
significant observable change of behaviour – if it is indeed the case, the
experimental variable is discontinued or otherwise altered to examine whether
the observed behavioural change is dependent on the experimental variable, as it
should result in the observed behavioural change to diminish and reverse (hence
285
the naming convention) to the levels as initially determined in the first stage of
the experimental research design. The experimental variable is then
reintroduced, yet again to observe if this would recover the behavioural change
to the level as previously observed in the second stage of experimental research
design. The reversal procedure with a subsequent measurement could be carried
out a number of times if the experimental research setting permits the multiple
reversal stages to improve the validity and reliability of the obtained results –
something that is unlikely to be the case in the applied context of consumer
decision-making situation in an actual market environment. In fact, it may be
difficult to carry out even a single reversal cycle, particularly when the
behavioural change caries a positive measurable commercial impact and one may
be reluctant to abandon the favourable results and revert to the original level for
the sake of experimental design. In contrast with experimental laboratory setting,
the dynamic market environment may also be difficult to control and
contaminate the applied research design as naturally occurring changes may be
brought upon by other external factors that lie outside the control of the
researcher. It may also be unethical to reverse some of the valuable behaviours
that are able to demonstrate as a result particularly positive and beneficial effects
– for example research programme that aims to improve the consumer
purchasing consumption decision-making habits in attempt to decrease the
incidence of certain diseases or other serious health risks. Moreover, it should be
expected while producing a valuable behaviour in a social setting to generate a
degree of extra-experimental reinforcement from the social setting itself; and as
286
a result, the valuable behaviour may cease to be dependent upon the
experimental design and behaviour control that was set up to produce this very
behavioural response in the first place.
The second method of multiple baselines may be particularly useful as an
alternative to reversal method as it could allow to overcome the difficulties within
the applied context of consumer decision-making in an actual market
environment and circumvent some of the potential issues described above. To do
so, a number of behavioural responses are identified and measured over time to
determine the baselines for all consecutive research work. Once these multiple
baselines are reliably established, the experimental variable is introduced with
one of the identified behaviours – the resulting behavioural change is recorded
while other baselines are monitored in parallel to identify any other concurrent
change that is not associated with the experimental variable. In the case of
success with the first application of experimental variable where the significant
observable behavioural change can be identified, rather than discontinuing or
altering the first experimental variable to reverse the newly created behavioural
change, instead the experimental variable is introduced to one of the other yet
unaffected baselines. If the significant behavioural change can be observed again
with the second baseline, this would increasingly demonstrate the effectiveness
of using the experimental variable to control behavioural response as opposed to
change occurring as a natural or random variance – at this point to improve the
validity and reliability of the results thus obtained, the experimental design can
be systematically extended for the remaining baselines by introducing the
287
experimental variable to yet another baseline at a time while monitoring and
tracking baselines for all behavioural response items in parallel. Even though
arguably the multiple baselines research design is better suited than reversal for
studying consumer decision-making within the applied context of market
environment and allows circumventing some of the potential limitation due to
practical and ethical issues, the element of qualitative judgement is still necessary
nevertheless to assess the suitability of inferential statistical analysis: for example
to determine and specify the number of baselines (the same in the case of
reversals) required to provide a convincing and satisfactory account of
demonstrating a reliable behavioural control.
These two research designs are of course the core foundations upon which more
complex composite and combined designs could be constructed – indeed it may
be required to decompose each successful demonstration into its comprising
elements to be studied and examined separately, perhaps employing certain
variable contribution analyses to determine the extent of contribution of each
component to the overall behavioural control; or perhaps introducing additional
elements to assess the generality of behavioural change as able to remain
durable over time; or perhaps examine the sequence of variables required to
improve the generality of behavioural change in the best possible manner. While
using secondary data, it may be increasingly difficult or perhaps even impossible
to identify the suitable cases where the behavioural change can be observed, but
perhaps some of these difficulties could be bypassed by a robust research design
and advanced modelling techniques that are inherently based on the principle of
288
individual stand-alone subsystems which are able to exert behavioural change as
a matter of collective contribution, such as swarm intelligence methods discussed
later.
8.2 Multi-category behaviour analysis
Certain practical and ethical limitations pertaining to research design described in
the preceding paragraphs above may also be true in attempt to study and model
consumer behaviour employing multiple product categories simultaneously and
in parallel. The scope of this research project is limited to a single product
category – wine, and proposition to examine multiple product categories
simultaneously would substantially increase the analytical complexity required, as
it would inevitably introduce the need to account for the dimension of interactive
cross-category consumer choice. There are a number of other reasons to follow
the boundaries set here as well that prescribe the inclusion of a single product
category – not the least of them is a high computational requirement that is
necessary to process the large amounts of data, which is the case even with a
single product category. If the computational power was not an issue where
parallel and distributed computing that links multiple processing units or even the
use of supercomputers can be employed, it could be beneficial to explore the
capacity of connectionist networks to develop cross-category models. On the one
hand, connectionist model generalisation capacity could be assessed where
connectionist model trained on one category could be tested to predict
behaviour with an entirely different product category to carry out a true out-of-
289
sample validation. On the other hand, the connectionist models could be trained
on multiple categories from the onset, which could enable the extraction of
mode general non-specific to product category patterns that represent consumer
behaviour more accurately, in turn making the connectionist models perform
better with single and multiple categories as well.
The notion of considering multiple product categories to develop better
understanding of behaviour has been reviewed over the years by many authors in
the different yet closely related thread of research that revolves around the
concept of market basket choice (for an extended discussion please refer to Chib,
Seetharaman, & Strijnev, 2002; East, Hammond, & Wright, 2007; Manchanda,
Ansari, & Gupta, 1999; Russell & Petersen, 2000). Without a doubt, as the
technological advancements facilitate the inevitable improvements in data
collection by refining the methods and growing the number of participants in the
panel data to improve the extent of representative sample and improve the
product and behaviour coverage on both longitudinal and individual levels,
researchers are increasingly involved in developing statistical modelling
techniques of basket-level multi-category consumer decision-making behaviour.
Global data providers are now able to offer truly massive integrated databases
(such as the complete dataset of Kantar World Panel which only a single category
was employed for the research programme described here) that include
longitudinal data and information on a household level for a number of observed
behavioural transactional variables such as store choice, category incidence,
brand choice, and quantity purchased – all this is of course supplemented by an
290
extensive household demographics, product attributes, and purchasing decision
environment (store and venue attributes). Indeed, the field of consumer
purchasing behaviour and consumer choice modelling spans a number of
decades, and over the years a multitude of models and methodologies have been
proposed, even some that attempt to develop comprehensive models that cover
a number of behavioural variables concurrently (for example an attempt to
model simultaneously incidence, brand choice, and purchase quantity by
Chintagunta, 1993); the bulk of choice modelling research however has been
limited to a research design that considers a single product category at a time –
very much like the research programme described here which is deliberately
constrained to a single product category, wine. In endeavour to extend the choice
modelling research by addressing the limitation of a partial single-category
consumer behaviour line of inquiry and providing a plausible account of
interdependencies between the multiple product categories, the notion of
market basket choice not only attempts to model the multiple purchase outcome
variables simultaneously, but also across multiple product categories – that is,
develop an understanding of choice behaviour across multiple, and ultimately all,
product groups that comprise the total shopping basket. The interest to extend
the multiple category choice research is not exclusively motivated by reasons of
academic research, but also commercially driven, and maybe even to the higher
extent: retailers strive to maximise total basket spend, organisation that hold
significant presence within multiple product categories increasingly optimise
multi brand portfolios and minimise cross-cannibalisation of profit and return,
291
and customer acquisition and cross- or up-selling initiatives require cross-
category models to improve targeting and segmentation analyses. Another
limitation that contributed to the historical restriction of choice modelling to a
single product category that can now possibly be overcome at least to some
extent as a result of technological advances is of course the very high
computational demand to carry out the multi-product modelling – naturally it is
not simply a matter of additive computation burden where multiple categories
would result in a linear increase in computational requirements dependent on
the number of categories modelled simultaneously, but the computational
requirements would rather increase exponentially, as the very point of the efforts
to carry out a multi-category choice modelling is to consider all possible inter-
category relations within the data. Nevertheless, recent advances in multiple
product category modelling have been able to propose a number of approaches
as discussed in the following paragraphs.
Multi-category purchasing incidence research surveys the dichotomous consumer
decision-making situation by considering the notions of product substitutability
and complementarity within a constraint of a limited disposable monetary
resource. The research indicates that simultaneously modelling the incidence of
higher number of product categories with multivariate probit and logit models
may mitigate otherwise overestimated effect of purchasing situation and
environment (Chib et al., 2002; Manchanda et al., 1999). Moreover, it is
suggested there may be a consistent positive correlation among all product
categories, which may be indicative of an inherent inclination for simultaneous
292
incidence across all product categories. Building upon the dichotomous choice
models to explore the effect of time-sensitive price elasticity of multi-category
purchasing incidence, multivariate additive risk models can be employed (Ma,
Seetharaman, & Narasimhan, 2005) to model the purchasing frequency variable.
Another research direction emphasises the prominence of underlying product
attributes to describe consumer multi-category purchasing behaviour (Chung &
Rao, 2003), while studying the bundled multi-category products. It should be
apparent that these and other comparable approaches are able to contribute
incrementally to the understanding if the multi-category purchasing incidence,
and these learnings could potentially be amalgamated to develop a cohesive
integrated framework of multi-category purchasing incidence.
Cross-category brand choice models investigate the relevance of marketing mix
sensitivity and cross-category brand choice correlations in the context of multi-
category purchasing. Employing decomposition methods to differentiate
between household-specific and category-specific components of the marketing
mix to examine whether the inclination of the household to stability of variety in
its brand choice would carry over multiple product categories, Ainslie and Rossi
(1998) discover high levels of cross-category correlations suggesting it to be the
case indeed, and propose that those household which exhibit strong brand
loyalty in their choice behaviour over extended time periods in one category are
likely to exhibit similar behaviour across categories and not pursue brand variety
in another category. This notion could become rather useful in a situation where
the information about consumer behaviour is only available for a particularly
293
product category and is limited or inadequate for other categories – because of
the high degree of correlations of household price coefficients across product
categories, it is possible to extrapolate and predict the consumer behaviour and
purchasing information from a well-understood product category to another
related less-understood product category (Iyengar, Ansari, & Gupta, 2003; Singh,
Hansen, & Gupta, 2005). Modelling cross-category correlations of household
brand volume purchasing shows high consistency in choice of store and some but
not all national brands (Russell & Kamakura, 1998), and branded products show
high correlation across multiple categories (Erdem, 1998).
In attempt to combine some of these models and develop a more comprehensive
approach to cross-category purchasing behaviour, computational requirements
would be substantial and require as a result a certain set of theory-driven
restriction to be imposed – flexible statistical model with unlimited parameters, a
better computational framework is required. Perhaps one way to structure the
approach in such a way that it would be able to offer a reasonable account of
multiple and cross-category consumer choice is to decompose the system of
individual consumer behaviour into multiple sub-systems that would model
consumption for a specific product category, while at the same time interacting
with other sub-systems that represent the consumer choice for a different
category of the same individual consumer, and also interact on a higher level with
other composite systems that represent other consumers, and the elements of
the consumer decision-making situation in a market environment. The swarm
294
intelligence methods could perhaps suggest a plausible solution to these
concerns, as discussed in the following sections.
8.3 Swarm intelligence methods
The term swarm intelligence first introduced by Beni (1993) is commonly referred
to a concept that can be described as a collective behaviour of natural or artificial
stand-alone self-organized sub-systems or agents. The inspiration for the artificial
smarm intelligence came from biological systems encountered in nature such as
ant colonies and bee hives which typically consist of a population of simple agents
interacting with each other and their environment. The individual agents are said
to follow simple procedures, and while the centralised control structure that
would specify how stand-alone agents should behave collectively is effectually
absent, the global collective behaviour that can be said to be intelligent can
eventually be observed to emerge as a result of sequences of random local
interactions of the stand-alone agents between themselves and the environment.
Artificial swarm intelligence systems and algorithms have been adopted to utilise
these principles and used in predictive analytics in the context of forecasting
tasks.
The universal prosperity of biological swarm systems such as ants, bees, or
termites could be attributed to a number of wide-ranging characteristics: (1) the
flexibility required to adapt to a changing environment, (2) the collective
robustness of the system that diminishes reliance on individual contributing
295
agents and offers the effect of graceful degradation in the worst case scenario,
and (3) the principle of self-organisation that eliminates the requirement of a
global control and local supervision. These characteristics are general enough to
be applicable beyond the world of biological systems, and can be desirable
descriptors of systems and organisations not only from the academic research
point of view, but could also bring substantial rewards if applied in the industry as
well. Without a doubt, there should be no issue with embracing the first two
principles almost universally, whereas the third principle may seem
counterintuitive to the traditionally established ways of working and
hierarchically structured systems in place. Nevertheless, the success of biological
systems and the effective pioneering applications of the swarm intelligence
principles in the industry should promote the consideration and acceptance,
while encouraging growing interest to develop these systems further.
The usefulness of swarm intelligence should not be limited to the obvious
application of biological foraging models to inspire specific types of optimisation
algorithms that are able to provide superior solutions with large complex tasks
that deal with high levels of ambiguity such as a probabilistic optimization
algorithm ant colony optimization (Dorigo, 1992) or an optimization algorithm
based on the foraging behaviour of the swarm of honey bees artificial bee colony
(Karaboga, 2005) – for example, it could also be successfully engaged to provide
robust solutions with more general tasks such as complex problem
decomposition based on the principle of flexible specialised labour allocation in
the colony or hive. The captivating notion here is of course the reliance on the set
296
of relatively simple operational rules that govern the local behaviour of individual
stand-alone agents, successful application of which would result in emergent
complex collective intelligence behaviour.
Consequently, at least to some extent for artificial systems, the key problem
revolves around determining and accurately defining these sets of rules to
facilitate this self-organised global intelligence behaviour, which of course
naturally emerged and was refined in biological systems over millennia in the
process of natural evolution. With complex systems, it may be extremely difficult
to predict and subsequently assess the modelled outcome of final global
emergent behaviour based on the set of simple rules – especially if it is an
adaptive dynamic system designed to respond quickly to undefined
environmental input factors. A number of key principles to keep in mind while
considering a swarm intelligence system are as follows: first, very simple rules are
able to generate unpredictable and often counterintuitive collective behaviours;
second, seemingly minor modifications to these simple sets of rules can result in
radically altered collective behaviours as a result; and third, even though the
complex task of predicting collective behaviour that may very well be beyond the
extent of human intelligence capacity, it is nevertheless possible to observe and
predict the modelled collective behaviour in simulated artificial systems. These
artificial systems can prove to be invaluable tools to advance and improve our
understanding of collective behaviour, and ultimately help predict what collective
behaviours would emerge within a certain set of constraints. In an organisational
setting, these simple rules may be modelled to predict how simple sets of rules
297
can affect staff knowledge acquisition rates, productivity, loyalty, personnel
turnover, and so on – many of which would also apply to consumer environemnt.
Indeed, as the global organisations continue to evolve following the technological
and communication advancements and ever increasing reliance on the user-
provided content and input in the product design and innovation processes, the
very notion what constitutes an organisation and its ways of working may be
redefined in the future – and swarm intelligence methods could play an
imperative role to establish the decentralised self-organising governance rules for
such organisations. Another essential consideration when assessing the efficiency
and effectiveness of particular rules in a swarm intelligence system – biological or
artificial – is the activation mechanism employed to communicate and transfer
information between the stand-alone agents or sub-systems, particularly in
relation to the overall aim and desired outcome that the emergent swarm
intelligence should strive to develop and maximise.
Nevertheless, there are numerous obstacles to adopting swarm intelligence
methods – it may be difficult for some to understand the mechanisms and
structure of swarm intelligence who are unfamiliar with self-organising systems,
the emergent nature of collective behaviour could be increasingly complex to
describe and predict, or drawing parallels between groups of individuals such as
human consumers and the swarms of insects may not be an appealing concept
from the social point of view. Nevertheless in certain circumstantial
environments, the collective human behaviour is constrained in a similar manner
to the biological swarm systems, and parallels can be drawn to illustrate not only
298
the conceptual, but also practical implications that may be useful to optimise the
process and behaviour to achieve higher level of organisation and strategic
governance. The swarm intelligence methods discussed thus far would of course
be largely applicable when it comes to the discussion of consumer behaviour in
general as well, and could be useful to optimise and facilitate and cultivate
certain types of desirable behaviours – be that to either benefit the consumer, or
maximise the returns of the company, or in an optimal situation provide a
substantial benefit to all parties involved, perhaps as a result of optimisation
programme for example. The way these algorithms and general approaches could
potentially be employed with the research project discussed here to extend the
work undertaken thus far is to develop a multitude of stand-alone connectionist
networks to represent individual consumers and their purchasing behaviour
across all available categories – the dataset employed here utilised but a single
product category from a larger data that covers a wider set of consumption
behaviours across a large number of product categories which would make it
possible to broaden the scope substantially compared to a rather focused single
category prototypical consumer approach employed here. These stand-alone
connectionist networks could be set to interact between each other within the
environment that can be described employing the purchasing context variables.
This would arguably represent an artificial system that can better describe the
real consumer setting and the purchasing decision-making process within the
context of the purchasing setting that would serve as constraints for the stand-
alone agent interactions, and can be employed for both predictive and
299
explanatory purposes. Allowing stand-alone agents that represent individual
consumers to interact within a consumer setting could illustrate how consumers
can influence each other in the process – something that is often ignored or
overlooked in marketing research where consumers are taken to operate in a
vacuum devoid of any consumer interaction and competition variables. One of
the reasons for that is the complexity that is concerned with attempts to provide
an adequate account for the continuity of behaviour and the learning history as
discussed in detail in the previous chapters.
8.4 Consumer provided content and design
The constantly evolving technological advancements facilitate the emergence of
truly global economies and organisations by continuously removing the obstacles
that traditionally and historically hindered this progression. Digital media was
able to unlock previously unattainable potential to bring together functionally
distinct areas of interest and expertise to facilitate the development of collective
cognitive problem-solving faculties.
If collective human behaviour can be characterised as a set of persons interacting
among each other for prolonged periods of time (for an extended discussion on
collective behaviour please refer to Krause & Ruxton, 2002), organisations and
consumer groups would constitute a prime example of a setting suitable for the
emergence of swarm intelligence as it provides an opportunity to solve large
scale problems through the facilitation of collective cognitive problem-solving
300
abilities which quite possibly would otherwise be unattainable to individual
contributors. As opposed to simpler largely organisms such as ants or bees that
for the most part display uniform levels of performance however, there is not
only a high level of inter-individual variability among human performance – there
are also those individuals who are able to display superior levels of performance
overall across multiple measurement criteria. Therefore, large body of
psychological research on collective decision-making revolved around the
concept of comparative assessment: namely, studying whether collective
decision-making would be able to provide adequate or better results than
individuals with superior decision-making abilities (as first demonstrated
empirically by Galton, 1907). Because of this, swarm intelligence at times could
be positioned as an alternative and even a threat to expert centres – indeed, one
potential concern is that it can contribute to decomposition and erosion of the
expert centres, as diverse groups of individuals that do not normally possess
expert knowledge are able to show comparable level of performance with the
level exhibited by the experts. This is however incorrect, as swarm intelligence is
more appropriate for a particular type of tasks that require a large number of
uncorrelated imprecise estimates that eventually result in a close approximation
of actual value, whereas the tasks with a large systematic bias that prevent
extraction of usable information to solve the task are better suitable for the
centres of expertise. As most tasks in an organisational or any other setting would
incorporate a degree of both bias and imprecision, it is imperative to identify and
select the tasks that are appropriate for swarm intelligence methods, and some
301
qualitative characteristics are important to keep in mind in the process (Krause,
Ruxton, & Krause, 2010).
It is often the case that swarm intelligence task groups follow the composition
principle based on functional, methodological, and epistemological diversity in
attempt to maximise the potential performance. It should be obvious how this
potentially could considerably improve some types of performance: for example,
programmes on information gathering from broadly diverse individual
contributors facilitated by technological advances would enable accessing and
processing collective knowledge and learning histories on unprecedented level.
Consumer provided content and design are the two areas where organisations
gradually adopt swarm intelligence methods to generate insightful and actionable
information by making the most of what brand communities are able to offer
(Fournier & Lee, 2009). Numerous interactive open-access web-based platforms
that became available recently to enable any type of a discussion and problem-
solving where vast numbers of individuals are able to contribute in a process that
evolves into a partially self-organised and decentralised system. A good example
would be the ongoing process of producing and constantly improving and
expanding the open-source statistical programing language and programming
environment R, which is extensively employed here as a method of choice for all
statistical modelling, could be viewed as one of the great example of swarm
intelligence methods where a type of self-organised and partially decentralised
system emerges as a result of a continuous deliberate collective effort from many
stand-alone individual contributors.
302
Innovation is another set of such tasks that normally deal with high levels of
imprecision, and could be particularly suitable for the swarm intelligence
methods as discussed in the following paragraphs.
8.5 Collective consumer innovation
Recent advances in computer technologies make it seemingly effortless to access
global populations, and facilitate information gathering by enabling the process
to engage with collective knowledge and creativity of enormous number of
individuals in a dynamic and interactive manner that allows collective process of
simultaneous collaboration. Collective consumer decision-making that became
more evident lately as a result of changing social environment is increasingly
accepted as a driving force behind some of the companies from the digital age
who recognise that collective involvement of consumer communities could be
treated as a novel form of a low-cost resource (McConnell & Huba, 2007). Indeed
with the recent upsurge in a number of various open-source projects,
progressively high numbers of consumers who are involved in product
development and feature modification are being recognised as an imperative
part of a collective creative process (Von Hippel, 2005). As an added benefit, this
dynamic collaborative creative environment contributes to the traditional word
of mouth marketing, while being facilitated and amplified by sophisticated web-
based solutions. The distinctive borders between the process of consumption and
that of creation and production are becoming increasingly blurred, as the notion
of collective consumer innovation allows the immersed organisation of consumer
303
purchasing decision-making and creative innovation to emerge as a combination
of the two.
Collective consumer innovation refers to the process that is said to occur when
consumers discover new interpretations in the collaborative process of social
interaction within a consumer group – something that would be impossible to
achieve while thinking on their own: the varying range of backgrounds and
experiences should offer increased probability to identify ideas fit to resolve a
particular consumption-related task, whereas the accumulating collective
experience and knowledge would help establish potential solution selection
criteria and mechanisms to develop and realise the idea to be subsequently
propagated and promoted to wider consumer populations utilising collective
network (Hargadon & Bechky, 2006). For organisations, the appropriate approach
may be to position themselves as a part of the creative community rather than
attempt to manage the collective creative process – it requires appropriate
technological framework to enable the fostering of social and cultural fabric for
the collective consumer innovation community (Kozinets, Hemetsberger, &
Schau, 2008). Some communities occupy ethical and sustainable consumption
viewpoints – something that could perhaps be accounted for as an additional set
of constraints or selection criteria as attempted to be modelled using swarm
intelligence methods or connectionist network.
304
8.6 Commercial application and data mining
It would be important to discuss the commercial application of course, as applied
deployment in the industry would not only be an excellent practical test to
validate the model output with previously unseen real-life consumer data, but
would also expand the applied dimension that has the potential to generate
innovative insight and suggest novel lines of inquiry for any future research.
Moreover, some of the valuable potential data sources compiled and maintained
by certain industries highly reliant on data may otherwise be inaccessible. A few
potential examples will be discussed very briefly in the following paragraphs that
rely heavily on efficient and effective information collection and organisation that
could benefit substantially from the integrated connectionist network solutions.
The obvious candidate for connectionist modelling of consumer behaviour is of
course the financial industry that historically deals with collecting large amounts
of data and information to be used in risk modelling and similar purposes. Big
data enhanced by the connectionist modelling could potentially carry huge
benefits, financial and otherwise: for example, each regular bankcard payment
could be enhanced with a cluster of data to describe in detail a purchased items
list containing full product attributes, quantities purchases, trade channel used,
and purchasing environment details. This would enable to offer a full account of
purchases to consumers to improve the understanding of their spend, improve
the unauthorised and fraudulent use tracking and prevention, qualitatively
improve the account history tracking and define relationship with the bank,
305
improve the accuracy of forecasting the individual financial performance and
behaviour and likely use of any financial services in the future, simplify the credit
decision process, and so on.
New customer acquisition and current customer retention with attrition
minimisation are the two major ongoing strategic marketing programmes that
any marketing focused company should devote a significant amount of resources
and effort to manage. Predictive analytics and forecasting modelling are
commonly the areas of focused effort where the capacity of connectionist
modelling and big data analytics could substantially improve the consumer
targeting efforts.
The technological advancements consistently provide cheaper and more efficient
methods to collect and store information, and the amount of data being created
has been growing exponentially for a number of decades now. Integration of R
functionality into large-scale applications such as the latest release of de facto
data industry standard SQL database management software represents a
considerable advancement towards improving and developing the data mining
capacity. Integrated big data analytics solutions could revolutionise the way the
information is collected and processed on a large scale to produce actionable
insights useful to inform the process of strategic decision-making.
Any human activity inevitably produces waste – managing and minimising the
waste could provide not only the benefit of smaller ecological footprint in market
conditions, which are to become increasingly more regulated and therefore
306
costly in the future, but also the immediate financial benefits of improved return
on investment and cash flow management. One obvious example would be direct
mail acquisition campaigns that produce printed materials, which are then
delivered by post. These campaigns are extremely wasteful, where response rates
as low as only 1% are to be expected, while the remaining 99% of all printed and
delivered materials are normally disposed of as waste. Improving the predictive
analytics to allow better consumer targeting would not only reduce the
generated printed materials being systematically wasted, but will also reduce the
marketing costs.
Classic marketing and consumer behaviour disciplines historically have been
largely focused on the purchasing stage of the total consumption process – as is
obviously the case in this research project as well. This is of course the case for a
number of reasons, one of which is that the consumer purchasing decision is
effectively complete when the contract between the buyer and the seller is
made, and the money is paid in exchange for a product ownership. The
purchasing stage however does not provide a comprehensive account of
environmental and social aspects of a total consumption process, and
increasingly large numbers of organisations strive to develop measurement
criteria and methods to assess the sustainability of their customer base as a
strategic long-term sustainable growth solution, where integrated data would be
particularly useful.
307
8.7 Summary
Even though some of the methodologically innovative approaches discussed in
this chapter like swarm intelligence could be difficult to implement and
challenging to adopt, the potential benefits outweigh those few concerns that are
voiced – for example the role of expert centres would evolve rather than diminish
and seize to exist towards the type of expertise that enables to facilitate and
cultivate the benefits in decision-making that a swarm intelligence method could
offer. Different levels to structure and carry out consumer behaviour analysis
were also briefly discussed in this chapter, concluding with the overview of
possible commercial applications.
308
9. Concluding remarks
Although this research project at times strived to venture into grander
philosophical concepts such as intelligence, cognition, and artificial intelligence,
the main scope of the project was intentionally constrained to the boundaries of
the field of consumer behaviour – boundaries which themselves could be rather
blurry at times and generally welcome participation of a number of disciplines:
psychology, marketing, economics, artificial intelligence as the case here, and
many others depending on the particular application. In these final pages, it
would be worthwhile to revisit these broader issues and reiterate particular
points made throughout the work presented here.
9.1 Contributions
One of the first points that this research project examined was the comparative
examination that assessed how connectionist neural network models measure up
against traditionally employed methods of analysis such as logistic regression. Not
only predictive, but also explanatory dimension was of interest, where it was
shown that simple connectionist models that do not incorporate hidden layers
provide connection weights that are analogous to the coefficient values of the
regression, suggesting analogous level of performance that the two methods are
able to provide when it comes to predictive capacity. When it comes to the
explanatory capacity however, the regression is already in its final form and does
not provide any means for further developed – whereas connectionist models is
309
only in its most primitive form, and can be developed substantially by
incorporating hidden layers and growing the network.
Second point is the fact that this research project was able to directly contribute
to development and advancement of a number of statistical programming
packages in R, namely RSNNS and NeuralNetTools, which are now available for all
researchers to use and potentially contribute to the understanding of
connectionist frameworks going forward.
Third, a number of advanced connectionist models of consumer behaviour have
been developed, ranging in size and complexity of the architecture, which
provide empirical evidence to corroborate previous research findings and help
explain and guide further research.
Fourth, a simulated dataset was developed to assess the efficiency and accuracy
of the pruning algorithms that was able to show in an obvious manner and
provide convincing empirical evidence for the effectiveness of pruning algorithms
in optimising the network architecture to expose the core underlying architecture
in attempt to improve the exploratory and interpretative function of a
connectionist model.
Fifth, once it was evident that pruning algorithms work very well and as designed,
the varying number of iterations and retrain cycles were examined in attempt to
investigate the level of influence these parameters are able to exert over the
overall connectionist model capacity to predict and explain consumer behaviour,
which could be further explored in a string of research that would focus on
310
network architecture optimisation to minimise the level of recourses required to
perform these advanced statistical models.
Sixth, the initial network architecture designs were examined to explore and
compare the subsequent network learning process and final form. This could be a
useful line of research to extend in attempt to identify the optimal learning
methods and initial network architecture design for a particular problematic.
Seventh, pruning algorithms were employed to optimise the network
architecture for explanatory and interpretative purposes – it is argued here that
this offers an alternative method to a variable contribution analysis, and could be
a particularly useful technique to explore complex phenomena such as consumer
decision-making as an emergent process.
Eighth, research carried out here provides empirical evidence to support the
proposition that connectionism is a robust and coherent approach that is
particularly suited to extend the theoretical framework of BPM.
Ninth, connectionist models developed here provide empirical evidence that
informational and utilitarian reinforcement can be observed as an emergent
process by means of distributed representation – an inherent functionality of
connectionist networks to embody complex phenomena.
Tenth, as part of this research programme, it reinforced a tendency towards
addressing the notion of continuity of behaviour from the radical behaviourism
point of view with the adoption of intentional behaviourism.
311
Eleventh, the theoretical and methodological deliberations contemplated here
propose a general structure to promote the move towards addressing the notion
of learning history as part of overall consumer behaviour process – naturally
there is a large amount of future work required here.
Twelfth, the detailed overview of the field of artificial intelligence in relation to
the process of consumer behaviour is offered here, in attempt to encourage the
future symbiotic collaborative efforts that could advance the developments in
both fields of study.
9.2 Summary
In the first chapter, the research project was briefly introduced, outlining the
contents and structure. The overall motivation for the research project was
presented, offering the scope and the summary of the work to be carried out.
Next two chapters focused on the fields of consumer behaviour and artificial
intelligence, and offered a comprehensive review that also touched upon a
number of related disciplines in the course of inquiry. Multidisciplinary nature of
work described here embraced the philosophical aspects of critical behaviourism
and cognitive sciences, considered and reviewed modelling approached that
follow both traditional symbolic and connectionist neural networks designs, and
contemplated the theoretical frameworks that propose to extend the theory of
Behavioural Perspective Model into the realm of connectionist architectures. For
that reason, an extended discussion that overviews the field of consumer
312
behaviour was offered, followed by an overview of theoretical and philosophical
frameworks of radical behaviourism, and continued to describe the details of
Behavioural Perspective Model as a next step towards an interpretative model of
consumer behaviour. The discussion was followed by an extensive discussion of
the science of the artificial, introducing the field of artificial intelligence: the
predictive and explanatory capacity of symbolic modelling methods was
discussed at length and compared against the connectionist neural networks
approach, describing in detail the techniques and architecture optimisation
algorithms employed in neural network models. Chapter 4 focused on the
research methods and outlined in detail the research questions, the research
methods employed in the course of the research project, and the philosophical
position adopted here. This was followed by sections that described the methods,
dataset structures and variables, modelling techniques, and the sequence of the
research process. Once the research methods were clearly explained, the next
chapter described the modelling methods that were carried out here, and offered
a detailed account of the statistical analyses developed throughout this research
project. Specific testing procedures to advance the line of inquiry were explained
in detail, and offered an overview of the results. Chapter 6 discussed the findings,
and offered an interpretative account of results within the wider context of
consumer behaviour. Variable contribution analysis as a method to improve the
descriptive functions of the modelling was discussed in light of employing the
advanced connectionist modelling method, posing an argument that
connectionist neural networks approach is particularly appropriate to provide the
313
comprehensive explanatory and interpretative account of consumer behaviour,
where pruning algorithms were employed to optimise the network architecture
to expose the core architecture. This then was followed by a discussion around
the theoretical implications. Critical assessment of the research project was
offered in the following chapter to demonstrate precision, thoroughness, and
level of contribution in comparison with its closest rival, the tradition of cognitive
science. Chapter 8 was developed around the possible future research directions
which were briefly discussed, identifying a number of possible strings of inquiry
that ranged from commercial in nature that aimed to apply and test the methods
proposed here in the applied industry setting, to typically theoretical and
philosophical endeavours that aimed to explore the concept of distributed
representation further and propose to potentially extend the line of inquiry into
the field of swarm intelligence. The final chapter offered closing remarks,
revisiting the contributions this research project aims to offer, and concluding
with an overall summary of the project.
314
References
Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for boltzmann
machines*. Cognitive Science, 9(1), 147-169.
Adya, M., & Collopy, F. (1998). How effective are neural networks at forecasting and
prediction? A review and evaluation. J. Forecasting, 17, 481-495.
Ainslie, A., & Rossi, P. E. (1998). Similarities in choice behavior across product categories.
Marketing Science, 17(2), 91-106.
Alvesson, M., & Deetz, S. (2000). Doing critical management research. London: Sage.
Ametller, L., Garrido, L., Stimpfl-Abele, G., Talavera, P., & Yepes, P. (1996). Discriminating
signal from background using neural networks: Application to top-quark search
at the Fermilab Tevatron. Physical Review D, 54(1), 1233.
Anderson, J. A., & Hinton, G. E. (1981). Models of information processing in the brain.
Parallel models of associative memory, 9-48.
Anderson, J. R. (1974). Retrieval of propositional information from long-term memory.
Cognitive psychology, 6(4), 451-474.
Anderson, J. R. (1981). Cognitive skills and their acquisition: Psychology Press.
Anderson, J. R. (1983a). The architecture of cognition. Cambridge, Mass.: HalYard
University Pres.
Anderson, J. R. (1983b). A spreading activation theory of memory. Journal of verbal
learning and verbal behavior, 22(3), 261-295.
Anderson, J. R. (1987). Skill acquisition: Compilation of weak-method problem situations.
Psychological review, 94(2), 192.
Anderson, J. R., & Pirolli, P. L. (1984). Spread of activation. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 10(4), 791.
315
Anderson, P. F. (1983). Marketing, scientific progress, and scientific method. The Journal
of Marketing, 47(4), 18-31.
Anderson, P. F. (1986a). On method in consumer research: a critical relativist
perspective. Journal of Consumer Research, 13(2), 155-173.
Anderson, P. F. (1986b). On method in consumer research: a critical relativist
perspective. The Journal of Consumer Research, 13(2), 155-173.
Anderson, P. F. (1988a). Relative to what - that is the question: a reply to Siegal. Journal
of Consumer Research, 15(1), 133-137.
Anderson, P. F. (1988b). Relativism revidivus: in defense of critical relativism. Journal of
Consumer Research, 15(3), 403-406.
Andersson, F. O., Åberg, M., & Jacobsson, S. P. (2000). Algorithmic approaches for
studies of variable influence, contribution and selection in neural networks.
Chemometrics and intelligent laboratory systems, 51(1), 61-72.
Aoki, I., & Komatsu, T. (1997). Analysis and prediction of the fluctuation of sardine
abundance using a neural network. Oceanolica Acta, 20(1), 81-88.
Armstrong, S. L., Gleitman, L. R., & Gleitman, H. (1983). What some concepts might not
be. Cognition, 13(3), 263-308.
Barsalou, L. W. (1983). Ad hoe categories. Memory & cognition, 11(3), 211-227.
Bashford, S. (2009). Bring me sunshine. Marketing, 32-33.
Bates, E., Bretherton, I., & Snyder, L. (1988). From first words to grammar: Cambridge:
Cambridge University Press.
Beck, M. (2015). NeuralNetTools: Visualization and Analysis Tools for Neural Networks.
Beni, G., & Wang, J. (1993). Swarm intelligence in cellular robotic systems Robots and
Biological Systems: Towards a New Bionics? (pp. 703-712): Springer.
Bergmeir, C., & Benítez, J. M. (2012). Neural networks in R using the Stuttgart neural
network simulator: RSNNS. Journal of Statistical Software, 46(7), 1-26.
316
Bernard, H. R. (1995). Research methods in anthropology: Qualitative and quantitative
approaches. Walnut Creek, CA: AltaMira Press.
Bishop, C. M. (1995). Neural networks for pattern recognition: Oxford university press.
Bloom, L. (1970). Language development: Form and function in emerging grammars. MIT
Research Monograph, No 59.
Bryman, A., & Bell, E. (2007). Business research methods. Oxford; New York: Oxford
University Press.
Bybee, J. L., & Slobin, D. I. (1982). Rules and schemas in the development and use of the
English past tense. Language, 265-289.
Calder, B. J., & Tybout, A. M. (1987). What Consumer Research Is. Journal of Consumer
Research, 14(1), 136-140.
Chen, D., & Ware, D. (1999). A neural network model for forecasting fish stock
recruitment. Canadian Journal of Fisheries and Aquatic Sciences, 56(12), 2385-
2396.
Chib, S., Seetharaman, P., & Strijnev, A. (2002). Analysis of multi-category purchase
incidence decisions using IRI market basket data. Advances in Econometrics, 16,
57-92.
Chintagunta, P. K. (1993). Investigating purchase incidence, brand choice and purchase
quantity decisions of households. Marketing Science, 12(2), 184-208.
Chomsky, N. (1957). Syntactic Structures. The Netherlands.
Chomsky, N. (1959). A review of BF Skinner's Verbal Behavior. Language, 35(1), 26-58.
Chomsky, N. (1968). Language and mind. Psychology Today, 1(9), 48-&.
Chung, J., & Rao, V. R. (2003). A general choice model for bundles with multiple-category
products: Application to market segmentation and optimal pricing for bundles.
Journal of Marketing Research, 40(2), 115-130.
317
Cibas, T., Soulié, F. F., Gallinari, P., & Raudys, S. (1996). Variable selection with neural
networks. Neurocomputing, 12(2–3), 223-248.
Cleeremans, A., Servan-Schreiber, D., & McClelland, J. L. (1989). Finite state automata
and simple recurrent networks. Neural computation, 1(3), 372-381.
Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (1993). Models of reading aloud: Dual-
route and parallel-distributed-processing approaches. Psychological review,
100(4), 589.
Cornwell, B., Chi Cui, C., Mitchell, V., Schlegelmilch, B., Dzulkiflee, A., & Chan, J. (2005). A
cross-cultural study of the role of religion in consumers' ethical positions.
International Marketing Review, 22(5), 531-546.
doi:10.1108/02651330510624372
Crosas, M., King, G., Honaker, J., & Sweeney, L. (2015). Automating Open Science for Big
Data. The ANNALS of the American Academy of Political and Social Science,
659(1), 260-273.
Cunningham, L. F., Young, C. E., Moonkyu, L., & Ulaga, W. (2006). Customer perceptions
of service dimensions: cross-cultural analysis and perspective. International
Marketing Review, 23(2), 192-210. doi:10.1108/02651330610660083
Curry, B., Davies, F., Phillips, P., Evans, M., & Moutinho, L. (2001). The Kohonen self‐
organizing map: an application to the study of strategic groups in the UK hotel
industry. Expert systems, 18(1), 19-31.
Curry, B., & Moutinho, L. (1993). Neural Networks in Marketing: Modelling Consumer
Responses to Advertising Stimuli. European Journal of Marketing, 27(7), 5-20.
doi:10.1108/03090569310040325
Dennett, D. C. (1981). Brainstorms: Philosophical essays on mind and psychology: MIT
press.
Dennett, D. C. (2002). Content and consciousness: Routledge.
318
Denzin, N., & Lincoln, Y. (2000). The discipline and practice of qualitative research.
Handbook of qualitative research, 2, 1-28.
Despagne, F., & Massart, D.-L. (1998). Variable selection for neural networks in
multivariate calibration. Chemometrics and intelligent laboratory systems, 40(2),
145-163.
Dimopoulos, I., Chronopoulos, J., Chronopoulou-Sereli, A., & Lek, S. (1999). Neural
network models to study relationships between lead concentration in grasses
and permanent urban descriptors in Athens city (Greece). Ecological modelling,
120(2), 157-165.
Dorigo, M. (1992). Optimization, learning and natural algorithms. Ph. D. Thesis,
Politecnico di Milano, Italy.
Dreyfus, H. (1992). What computers still can't do: a critique of artificial reason: MIT press.
Dreyfus, H., Dreyfus, S. E., & Athanasiou, T. (2000). Mind over machine: Simon and
Schuster.
Durkheim, E. (1964). The division of labor in society. New York: Free Press.
East, R., Hammond, K., & Wright, M. (2007). The relative incidence of positive and
negative word of mouth: A multi-category study. International journal of
research in marketing, 24(2), 175-184.
Elman, J. L. (1989). Structured representations and connectionist models. Retrieved from
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179-211.
Erdem, T. (1998). An empirical analysis of umbrella branding. Journal of Marketing
Research, 339-351.
Feyerabend, P. (1975). Against method: Outline of an anarchistic theory of knowledge.
London: Verso.
Feyerabend, P. (1978). Science in a free society. London: NLB.
Feyerabend, P. (1987). Farewell to reason. London: Verso.
319
Fodor, J. A. (1975). The language of thought (Vol. 5): Harvard University Press.
Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical
analysis. Cognition, 28(1), 3-71.
Foucault, M. (1995). Discipline and punish: the birth of the prison. New York: Vintage
Books.
Fournier, S., & Lee, L. (2009). Getting brand communities right. Harvard business review,
87(4), 105-111.
Foxall, G. R. (1990). Consumer psychology in behavioral perspective: Beard Books.
Foxall, G. R. (2004). Context and cognition: Interpreting complex behavior: New Harbinger
Publications.
Foxall, G. R. (2005). Understanding consumer choice: Palgrave Macmillan.
Foxall, G. R. (2009). Interpreting consumer choice: The behavioural perspective model:
Routledge.
Foxall, G. R. (2016). Addiction as Consumer Choice: Exploring the Cognitive Dimension:
Routledge.
Foxall, G. R., Wells, V. K., Chang, S. W., & Oliveira-Castro, J. M. (2010). Substitutability and
independence: Matching analyses of brands and products. Journal of
Organizational Behavior Management, 30(2), 145-160.
Foxall, G. R., Yan, J., Oliveira-Castro, J. M., & Wells, V. K. (2013). Brand-related and
situational influences on demand elasticity. Journal of Business Research, 66(1),
73-81.
Fullerton, R. A. (1987). Historicism: What it is, and what it means for consumer research.
Advances in Consumer Research, 14(1), 431-434.
Gallant, S. I. (1993). Neural network learning and expert systems: MIT press.
Galton, F. (1907). Vox populi (the wisdom of crowds). Nature, 75, 450-451.
Garson, D. G. (1991). Interpreting neural network connection weights.
320
Gevrey, M., Dimopoulos, I., & Lek, S. (2003). Review and comparison of methods to study
the contribution of variables in artificial neural network models. Ecological
Modelling, 160(3), 249-264.
Ghauri, P. N., & Grønhaug, K. (2005). Research methods in business studies: a practical
guide: Prentice Hall.
Goh, A. (1995). Back-propagation neural networks for modeling complex systems.
Artificial Intelligence in Engineering, 9(3), 143-151.
Goulding, C. (1999). Consumer research, interpretive paradigms and methodological
ambiguities. European Journal of Marketing, 33(9/10), 859-873.
Greene, M. N. (2011). Behavioural Perspective Model: A connectionist view (Master’s
thesis). Cardiff University, Cardiff, UK.
Greenland, S. (1989). Modeling and variable selection in epidemiologic analysis.
American journal of public health, 79(3), 340-349.
Guégan, J.-F., Lek, S., & Oberdorff, T. (1998). Energy availability and habitat
heterogeneity predict global riverine fish diversity. Nature, 391(6665), 382-384.
Güneren, E., & Öztüren, A. (2008). Influence of Ethnocentric Tendency of Consumers on
Their Purchase Intentions in North Cyprus. Journal of Euromarketing, 17(3/4),
219-231.
Haas, J. (1987). Prospects for Consumer Research. Advances in Consumer Research, 14,
76-78.
Haigh, J., & Crowther, G. (2005). Interpreting motorcycling through its embodiment in
life story narratives. Journal of Marketing Management, 21(5/6), 555-572.
Hanson, N. R. (1958). Patterns of Discovery, an Inquiry Into the Conceptual Foundations
of Science, by Norwood Russell Hanson: University Press.
321
Hargadon, A. B., & Bechky, B. A. (2006). When collections of creatives become creative
collectives: A field study of problem solving at work. Organization Science, 17(4),
484-500.
Harris, A. J., & Hahn, U. (2011). Unrealistic optimism about future life events: a
cautionary note. Psychological review, 118(1), 135.
Hassibi, B., & Stork, D. G. (1993). Second order derivatives for network pruning: Optimal
brain surgeon: Morgan Kaufmann.
Hassibi, B., Stork, D. G., Wolff, G., & Watanabe, T. (1994). Optimal Brain Surgeon:
Extensions and performance comparison.
Hassibi, B., Stork, D. G., & Wolff, G. J. (1993). Optimal brain surgeon and general network
pruning. Paper presented at the Neural Networks, 1993., IEEE International
Conference on.
Hebb, D. O. (1949). The organization of behavior: A neuropsychological approach: John
Wiley & Sons.
Hebb, D. O. (2005). The organization of behavior: A neuropsychological theory:
Psychology Press.
Heidegger, M. (1988). The basic problems of phenomenology (Vol. 478): Indiana
University Press.
Heravi, S., & Morgan, P. (2014a). A comparison of six sampling schemes for price index
construction in a COICOP food group. Applied Economics, 46(22), 2685-2699.
Heravi, S., & Morgan, P. (2014b). Sampling schemes for price index construction: a
performance comparison across the classification of individual consumption by
purpose food groups. Journal of Applied Statistics, 41(7), 1453-1470.
Hildum, D. C., & Brown, R. W. (1956). Verbal reinforcement and interviewer bias. Journal
of Abnormal and Social Psychology, 53, 108-111.
322
Hinton, G. E. (1986). Learning distributed representations of concepts. Paper presented at
the Proceedings of the eighth annual conference of the cognitive science society.
Hinton, G. E., Sejnowski, T. J., & Ackley, D. H. (1984). Boltzmann machines: Constraint
satisfaction networks that learn: Carnegie-Mellon University, Department of
Computer Science Pittsburgh, PA.
Holbrook, M. B. (1987). What is consumer research? Journal of Consumer Research,
14(1), 128-132.
Holbrook, M. B. (1989). Seven routes to facilitating the semiological interpretation of
consumption symbolism and marketing imagery in works of art: Some tips for
wildcats. Advances in Consumer Research, 16(1), 420-425.
Holbrook, M. B., & Hirschman, E. C. (1982). The experiential aspects of consumption:
Consumer fantasies, feelings, and fun. The Journal of Consumer Research, 9(2),
132-140.
Holbrook, M. B., & O'Shaughnessy, J. (1988). On the scientific status of consumer
research and the need for an interpretive approach to studying consumption
behavior. Journal of Consumer Research, 15(3), 398-402.
Holland, J. H., Holyoak, K. J., Nisbett, R. E., & Thagard, P. R. (1986). Induction: Cambridge,
MA: MIT Press.
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective
computational abilities. Proceedings of the national academy of sciences, 79(8),
2554-2558.
Huang, Z., Chen, H., Hsu, C.-J., Chen, W.-H., & Wu, S. (2004). Credit rating analysis with
support vector machines and neural networks: a market comparative study.
Decision support systems, 37(4), 543-558.
Hudson, L. A., & Ozanne, J. L. (1988). Alternative ways of seeking knowledge in consumer
research. Journal of Consumer Research, 14(4), 508-521.
323
Hunt, S. D. (1990). Truth in marketing theory and research. Journal of Marketing, 54(3),
1.
Hunt, S. D. (1991). Positivism and paradigm dominance in consumer research: Toward
critical pluralism and rapprochement. Journal of Consumer Research, 18(1), 32-
44.
Hunt, S. D. (1993). Objectivity in marketing theory and research. Journal of Marketing,
57(2), 76.
Husserl, E. (1970). The crisis of European sciences and transcendental phenomenology:
An introduction to phenomenological philosophy: Northwestern University Press.
Husserl, E. (2012). Ideas: General introduction to pure phenomenology: Routledge.
Huttenlocher, J., Higgins, E. T., & Clark, H. H. (1971). Adjectives, comparatives, and
syllogisms. Psychological review, 78(6), 487.
Insko, C. A. (1965). Verbal reinforcement of attitude. Journal of Personality and Social
Psychology, 2, 621-623.
Iyengar, R., Ansari, A., & Gupta, S. (2003). Leveraging information across categories.
Quantitative Marketing and Economics, 1(4), 425-465.
Karaboga, D. (2005). An idea based on honey bee swarm for numerical optimization.
Retrieved from
Kaynak, E., & Kara, A. (2001). An examination of the relationship among consumer
lifestyles, ethnocentrism, knowledge structures, attitudes and behavioural
tendencies: a comparative study in two CIS states. International Journal of
Advertising, 20(4), 455-482.
Kehret-Ward, T. (1988). Using a semiotic approach to study the consumption of
functionally related products. International Journal of Research in Marketing,
4(3), 187-200.
324
Kerlinger, F. N. (1964). Foundations of behavioral research: Educational and psychological
inquiry. New York; London: Holt, Rinehart and Winston.
Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464-1480.
Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21(1), 1-6.
Kosslyn, S. M. (1980). Image and mind: Harvard University Press.
Kozinets, R. V., Hemetsberger, A., & Schau, H. J. (2008). The wisdom of consumer crowds
collective innovation in the age of networked marketing. Journal of
Macromarketing, 28(4), 339-354.
Krause, J., & Ruxton, G. D. (2002). Living in groups: Oxford University Press.
Krause, J., Ruxton, G. D., & Krause, S. (2010). Swarm intelligence in animals and humans.
Trends in ecology & evolution, 25(1), 28-34.
Kuczaj, S. A. (1977). The acquisition of regular and irregular past tense forms. Journal of
verbal learning and verbal behavior, 16(5), 589-600.
Kuhn, T. S. (1996). The structure of scientific revolutions. Chicago ; London: University of
Chicago Press.
Kuhn, T. S. (2012). The structure of scientific revolutions: University of Chicago press.
Kumcu, E. (1987). An historical perspective framework to study consumer behavior and
retailing systems. Advances in Consumer Research, 14(1), 439-441.
LeCun, Y., Denker, J. S., Solla, S. A., Howard, R. E., & Jackel, L. D. (1989). Optimal brain
damage. Paper presented at the NIPs.
Lek, S., Belaud, A., Baran, P., Dimopoulos, I., & Delacoste, M. (1996). Role of some
environmental variables in trout abundance models using neural networks.
Aquatic Living Resources, 9(01), 23-29.
Lek, S., Belaud, A., Dimopoulos, I., Lauga, J., & Moreau, J. (1995). Improved estimation,
using neural networks, of the food consumption of fish populations. Marine and
Freshwater Research, 46(8), 1229-1236.
325
Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J., & Aulagnier, S. (1996).
Application of neural networks to modelling nonlinear relationships in ecology.
Ecological modelling, 90(1), 39-52.
Locke, J. (1997). An essay concerning human understanding. London ; New York: Penguin.
Long, A. (1998). Plato's Apologies and Socrates in the Theaetetus. Method in Ancient
Philosophy, 113–136.
Lu Hsu, J., & Han-Peng, N. (2008). Who are ethnocentric? Examining consumer
ethnocentrism in Chinese societies. Journal of Consumer Behaviour, 7(6), 436-
447. doi:10.1002/cb.262
Luger, G. F. (2005). Artificial intelligence: structures and strategies for complex problem
solving: Pearson education.
Lutz, R. J. (1989). Positivism, naturalism and pluralism in consumer research: Paradigms
in paradise. Advances in Consumer Research, 16(1), 1-8.
Ma, Y., Seetharaman, S., & Narasimhan, C. (2005). Empirical analysis of competitive
pricing strategies with complementary product lines. Available at SSRN 899606.
Maier, H. R., & Dandy, G. C. (1996). The use of artificial neural networks for the
prediction of water quality parameters. Water resources research, 32(4), 1013-
1022.
Manchanda, P., Ansari, A., & Gupta, S. (1999). The “shopping basket”: A model for
multicategory purchase incidence decisions. Marketing Science, 18(2), 95-114.
Margolis, H. (1987). Patterns, thinking, and cognition: A theory of judgment: University of
Chicago Press.
Marsden, D., & Littler, D. (1996). Evaluating alternative research paradigms: a market-
oriented framework. Journal of Marketing Management, 12(7), 645-655.
326
Mastrorillo, S., Lek, S., & Dauba, F. (1997). Predicting the abundance of minnow Phoxinus
phoxinus (Cyprinidae) in the River Ariege (France) using artificial neural
networks. Aquatic Living Resources, 10(03), 169-176.
Mastrorillo, S., Lek, S., Dauba, F., & Belaud, A. (1997). The use of artificial neural
networks to predict the presence of small‐bodied fish in a river. Freshwater
biology, 38(2), 237-246.
McClelland, J. L. (1979). On the time relations of mental processes: An examination of
systems of processes in cascade. Psychological review, 86(4), 287.
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context
effects in letter perception: I. An account of basic findings. Psychological review,
88(5), 375.
McClelland, J. L., Rumelhart, D. E., & Group, P. R. (1986). Parallel distributed processing.
Explorations in the microstructure of cognition, 2.
McConnell, B., & Huba, J. (2007). Citizen marketers: When people are the message:
Kaplan Pub.
McKee, A. (1984). Social economy and the theory of consumer behavior. International
Journal of Social Economics, 11(3/4), 45.
Medin, D. L. (1989). Concepts and conceptual structure. American psychologist, 44(12),
1469.
Microsoft-Corporation. (2006). Microsoft Office Excel 2007.
Minsky, M. (1975). A framework for representing knowledge.
Minsky, M., & Papert, S. (1969). Perceptrons MIT press. Cambridge, Ma.
Moutinho, L., Davies, F., & Curry, B. (1996). The impact of gender on car buyer
satisfaction and loyalty: A neural network analysis. Journal of Retailing and
Consumer Services, 3(3), 135-144.
327
Nelson, K. (1973). Structure and strategy in learning to talk. Monographs of the society
for research in child development, 1-135.
Newell, A. (1994). Unified theories of cognition: Harvard University Press.
Newell, A., & Simon, H. A. (1976). Computer science as empirical inquiry: Symbols and
search. Communications of the ACM, 19(3), 113-126.
Newell, A., & Simon, H. A. (1981). Computer science as empirical inquiry: Symbols and
search. Mind Design, 4l.
Nola, R. (1988). Relativism and realism in science. Dordrecht; Boston: Kluwer Academic
Publishers.
Nord, L. I., & Jacobsson, S. P. (1998). A novel method for examination of the variable
contribution to computational neural network models. Chemometrics and
intelligent laboratory systems, 44(1), 153-160.
Norman, D. A., Rumelhart, D. E., & Group, L. R. (1975). Explorations in cognition: WH
Freeman San Francisco.
O'Shaughnessy, J. (1985). A return to reason in consumer behaviou: an hermeneutical
approach. Advances in Consumer Research, 12(1), 305-311.
Olden, J. D., & Jackson, D. A. (2001). Fish–habitat relationships in lakes: gaining predictive
and explanatory insight by using artificial neural networks. Transactions of the
American Fisheries Society, 130(5), 878-897.
Olden, J. D., & Jackson, D. A. (2002). Illuminating the" black box": a randomization
approach for understanding variable contributions in artificial neural networks.
Ecological Modelling, 154(1), 135-150.
Olden, J. D., Joy, M. K., & Death, R. G. (2004). An accurate comparison of methods for
quantifying variable importance in artificial neural networks using simulated
data. Ecological Modelling, 178(3-4), 389-397.
328
Oliveira-Castro, J. M., Foxall, G. R., & Wells, V. K. (2010). Consumer brand choice: Money
allocation as a function of brand reinforcing attributes. Journal of Organizational
Behavior Management, 30(2), 161-175.
Özesmi, S. L., & Özesmi, U. (1999). An artificial neural network approach to spatial
habitat modelling with interspecific interaction. Ecological modelling, 116(1), 15-
31.
Pachauri, M. (2002). Consumer behavior a literature review. Marketing Review, 2(3), 319.
Paivio, A. (2013). Imagery and verbal processes: Psychology Press.
Peter, J. P. (1992). Realism or relativism for marketing theory and research: a comment
on hunt's "Scientific Realism". Journal of Marketing, 56(2), 72-79.
Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel
distributed processing model of language acquisition. Cognition, 28(1), 73-193.
Pitts, W., & McCulloch, W. S. (1947). How we know universals the perception of auditory
and visual forms. The Bulletin of mathematical biophysics, 9(3), 127-147.
Plunkett, K., & Marchman, V. (1989). Pattern association in a back propagation network:
implications for child language acquisition. Center for Research in Language.
Technical report, 8902.
Popper, K. R. (1959). The logic of scientific discovery. New York: Basic Books.
Proriol, J. (1995). Selection of variables for neural network analysis Comparisons of
several methods with high energy physics data. Nuclear Instruments and
Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors
and Associated Equipment, 361(3), 581-585.
Prus, R., & Frisby, W. (1987). Marketplace dynamics: the P's of "people" and "process".
Advances in Consumer Research, 14(1), 61-65.
Quine, W. V. O., Churchland, P. S., & Follesdal, D. (2013). Word and object: MIT press.
329
R_Core_Team. (2014). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria, 2012: ISBN 3-900051-07-0.
R_Studio. (2012). R Studio: integrated development environment for R: Version 0.97.
Rachlin, H. (1994). Behavior and mind: The roots of modern psychology: Oxford University
Press.
Reicher, G. M. (1969). Perceptual recognition as a function of meaningfulness of stimulus
material. Journal of experimental psychology, 81(2), 275.
Rosch, E., & Lloyd, B. B. (1978). Cognition and categorization. Hillsdale, New Jersey.
Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure
of categories. Cognitive psychology, 7(4), 573-605.
Rosenblatt, F. (1958). Two theorems of statistical separability in the perceptron: United
States Department of Commerce.
Rumelhart, D. E. (1975). Notes on a schema for stories. Representation and
understanding: Studies in cognitive science, 211, 236.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations
by error propagation. Retrieved from
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-
propagating errors. Cognitive modeling, 5, 3.
Rumelhart, D. E., & McClelland, J. L. (1982). An interactive activation model of context
effects in letter perception: II. The contextual enhancement effect and some
tests and extensions of the model. Psychological review, 89(1), 60.
Rumelhart, D. E., & McClelland, J. L. (1985a). Levels indeed! A response to Broadbent.
Rumelhart, D. E., & McClelland, J. L. (1985b). On learning the past tenses of English verbs.
Retrieved from
Rumelhart, D. E., McClelland, J. L., & Group, P. R. (1988). Parallel distributed processing
(Vol. 1): IEEE.
330
Rumelhart, D. E., Smolensky, P., McClelland, J. L., & Hinton, G. (1986). Sequential thought
processes in PDP models. V, 2, 3-57.
Russell, G. J., & Kamakura, W. A. (1998). Modeling multiple category brand preference
with household basket data. Journal of Retailing, 73(4), 439-461.
Russell, G. J., & Petersen, A. (2000). Analysis of cross category dependence in market
basket selection. Journal of Retailing, 76(3), 367-392.
Ryle, G. (2009). The concept of mind: Routledge.
Sanders, C. R. (1987). Consuming as social action: Ethnographic methods in consumer
research. Advances in Consumer Research, 14(1), 71-75.
Saunders, M., Lewis, P., & Thornhill, A. (2009). Research methods for business students.
Harlow: Pearson.
Scardi, M., & Harding, L. W. (1999). Developing an empirical model of phytoplankton
primary production: a neural network case study. Ecological modelling, 120(2),
213-223.
Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce
English text. Complex systems, 1(1), 145-168.
Selfridge, O. G. (1958). Pandemonium: a paradigm for learning in mechanisation of
thought processes.
Shepard, R. N. (1989). Internal representation of universal regularities: A challenge for
connectionism.
Siegel, H. (1988). Relativism for Consumer Research? (Comments on Anderson). Journal
of Consumer Research, 15(1), 129-132.
Simon, H. A. (1977). The logic of heuristic decision making Models of discovery (pp. 154-
175): Springer.
331
Singh, V. P., Hansen, K. T., & Gupta, S. (2005). Modeling preferences for common
attributes in multicategory brand choice. Journal of Marketing Research, 42(2),
195-209.
Skinner, B. F. (1938). The behavior of organisms: An experimental analysis.
Skinner, B. F. (1953). Science and human behavior: Simon and Schuster.
Skinner, B. F. (2014). Verbal behavior: BF Skinner Foundation.
Slovic, P., Fischhoff, B., & Lichtenstein, S. (1977). Behavioral decision theory. Annual
Review of Psychology, 28(1), 1-39.
doi:doi:10.1146/annurev.ps.28.020177.000245
Smolensky, P. (1988a). The constituent structure of connectionist mental states: A reply
to Fodor and Pylyshyn. The Southern Journal of Philosophy, 26(S1), 137-161.
Smolensky, P. (1988b). On the proper treatment of connectionism. Behavioral and brain
sciences, 11(01), 1-23.
Song, J., Kong, D., & Yu, J. (1988). Population system control. Mathematical and
Computer Modelling, 11, 11-16.
SPSS-Inc. (2007). SPSS Statistics Base 17.0 User’s Guide. Chicago, IL.
Taylor, W. K. (1956). Electrical simulation of some nervous system functional activities.
Information theory, 3, 314-328.
Touretzky, D. S., & Hinton, G. E. (1988). A distributed connectionist production system.
Cognitive Science, 12(3), 423-466.
Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and
probability. Cognitive psychology, 5(2), 207-232.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases.
science, 185(4157), 1124-1131.
332
Ullman, M., Pinker, S., Hollander, M., Prince, A., & Rosen, T. (1989). Growth of regular
and irregular vocabulary and the onset of overregularization. Paper presented at
the 14th Boston University Conference on Language Development.
van Kenhove, P., Vermeir, I., & Verniers, S. (2001). An Empirical Investigation of the
Relationship between Ethical Beliefs, Ethical Ideology, Political Preference and
Need for Closure. Journal of Business Ethics, 32(4), 347-361.
Venables, W., & Ripley, B. (2002). Modern Applied Statistics Using S: Springer, New York,
NY, USA.
Von Hippel, E. A. (2005). Democratizing innovation.
Watson, J. J., & Wright, K. (2000). Consumer ethnocentrism and attitudes toward
domestic and foreign products. European Journal of Marketing, 34(9/10), 1149.
Wolcott, H. (2002). Writing up qualitative research... better. Qualitative Health Research,
12(1), 91.
Yan, J., Foxall, G. R., & Doyle, J. R. (2012a). Patterns of Reinforcement and the Essential
Value of Brands: II. Evaluation of a Model of Consumer Choice. Psychological
Record, 62(3), 377.
Yan, J., Foxall, G. R., & Doyle, J. R. (2012b). Patterns of reinforcement and the essential
values of brands: i. incorporation of utilitarian and informational reinforcement
into the estimation of demand. The Psychological Record, 62(3), 361.
Zell, A., Mache, N., Hübner, R., Mamier, G., Vogt, M., Schmalzl, M., & Herrmann, K.-U.
(1994). SNNS (stuttgart neural network simulator) Neural Network Simulation
Environments (pp. 165-186): Springer.
Zell, A., Mache, N., Sommer, T., & Korb, T. (1991a). Design of the SNNS neural network
simulator. Paper presented at the 7. Österreichische Artificial-Intelligence-
Tagung/Seventh Austrian Conference on Artificial Intelligence.
333
Zell, A., Mache, N., Sommer, T., & Korb, T. (1991b). Recent developments of the SNNS
neural network simulator. Paper presented at the Applications of Artificial Neural
Networks II.
Zell, A., Mache, N., Sommer, T., & Korb, T. (1991c). The SNNS neural network simulator
GWAI-91 15. Fachtagung für Künstliche Intelligenz (pp. 254-263): Springer.
Zell, A., Mache, N., Vogt, M., & Hüttel, M. (1993). Problems of massive parallelism in
neural network simulation. Paper presented at the Neural Networks, 1993., IEEE
International Conference on.