joint effect of noise, personality and environmental factors on the intelligibility … · 2007....
TRANSCRIPT
Methods of Psychological Research Online 2001, Vol.6, No.2 Institute for Science Education
Internet: http://www.mpr-online.de © 2001 IPN Kiel
Joint Effect of Noise, Personality and Environmental
Factors on the Intelligibility of Speech
Agustín Turrero1, Pilar Zuluaga and Carmen Santisteban
Abstract
The performance of students in speech intelligibility tests is influenced by individual
characteristics such as sex and age, personality factors such as neuroticism (N), extra-
version, attention and sensitivity to noise, and environmental conditions such as the
location of the scholars in the classroom (LI), the location of the classroom itself with
regard to extraneous noise (LO) and background noise (BN). The first aim of this study
was to analyse the role of these factors in predicting performance. From a mathematical
point of view the problem was to establish a model to reflect accurately the relationship
between the expected proportion of successes and a set of covariates. We used a logistic
regression model mainly because of its high mathematical flexibility. A further aim was
to study in depth methodological questions such as the choice and assessment of the
model, including its extension to a random-effects model. One hundred and seventy stu-
dents participated in the study. The results indicate that only four of the factors studied
had any significant bearing upon their performance: N, LI, BN and LO, and that the
effect of the classroom on performance was a random one. The covariate pattern corre-
sponding to the best performance is given by the following levels : (N) high, (LI) front
row, (LO) playground and (BN) normal. For this pattern the estimated proportion of
successes is 0.66.
Keywords: Speech Intelligibility, Neuroticism, Noise, Multiple Logistic Regression,
Random Effects
1 Correspondence should be addressed to Agustín Turrero, Departamento de Estadística e I.O., Facultad
de Medicina, Universidad Complutense, 28040 Madrid, Spain.
e-mail: [email protected]
176 MPR-Online 2001, No. 2
1. Introduction
The interference of noise with speech is a masking process which affects communica-
tion and learning processes. An important aspect of communication interference in edu-
cational situations is the failure of scholars to hear words or sentences correctly, leading
to a lowering of performance in cognitive tasks. For example, there are abundant cross-
sectional and some longitudinal studies which report negative associations between
chronic exposure to high-noise sources (principally aircraft or road traffic noise) and
deficits in reading acquisition (Cohen et al., 1980; Evans, 1990; Hygge, Bullinger and
Evans, 1994). The effects of noise on performance are complex and have been widely
studied by many researchers (cf. Broadbent, 1957, 1981; Jones and Broadbent, 1979;
Kryter, 1970, 1994; Jones, Smith and Broadbent, 1979; Santisteban and Santalla,
1990a; Smith and Broadbent, 1991; Hygge et al. 1992; Santisteban, Sebastian and San-
talla, 1994; Santalla, Alvarado and Santisteban, 1999). Accurate predictions of the
audibility of a particular speech sound in the presence of a specific noise can be estab-
lished for a normal−hearing listener (Webster 1969, 1974; Kryter 1985, 1994) and the
speech interference level (SIL) or the articulation index (AI) (Beranek, 1947) can be
used in predicting the speech−masking capacity of a large variety of noises. Communica-
tion processes are, however, affected by numerous factors, including those referring to
the learner's own personality. Thus, the results of many studies showing that factors
such as neuroticism and sensitivity to noise are relevant in establishing individual differ-
ences in response to noise and its effects upon cognitive tasks have led us to include
these factors in our study (cf. Smith, 1988, 1993). Since Broadbent's initial studies
(1972) there has also been considerable interest in the role of individual differences in
sensitivity to noise when determining noise response. We have included gender in the
light of the findings of such authors as Gulian and Thomas (1986), who found that noise
affected female workers but had little apparent effect on males. Nevertheless, other
studies have found no differences in response to noise according to gender.
Much research has involved measuring speech intelligibility by using either nonsense
syllables and isolated words in phonetically balanced lists or whole sentences made up of
a series of these and taking the percentage of words that are correctly perceived. Thus,
for example, the percentage of words correctly perceived from a list of isolated words
and the corresponding percentage when these words are key words in a sentence have
been estimated by Kryter (1985, 1994). Studies into the effects of simulated impaired
frequency selectivity (IFS) on the intelligibility of speech in the presence of background
Turrero, Zuluaga & Santisteban: Factors Influencing the Intelligibility of Speech 177
noise and of interfering speech have been also made (Baer and Moore, 1994), which
confirmed that performance was seriously impaired in some cases in the presence of
both types of spectral smearing.
We have tried to predict the performance of school children in speech-
intelligibility tests from a set of factors including acoustic conditions and individual
characteristics represented by a vector of k components. In mathematical terms the
problem has been to establish a model that describes adequately the relationship be-
tween a probability,π , representing the expected proportion of successes, and a set of
covariates measured on different scales. The logistic distribution is a function capable
of modeling this kind of relationship, and preferable to others mainly due to its extreme
flexibility in mathematical terms and easy use.
The following section contains the logistic regression model and advances, presented
in a general way, the philosophy for the selection of the variables for the model. Next
the subjects and measurements used in the study are described. The subsequent section
shows in detail the results of fitting the logistic model to the data and this is followed
by an analysis of the natural extension of the model by taking into account random ef-
fects. Our findings are set out in the final section.
2. Logistic Regression Model
In the logistic regression model the relationship between π and the k vector of ex-
planatory variables X’ = ( x x xk1 2, ,..., ) associated with π is given by
(1)
where
(2)
and 0 1´ ( , ,..., )kβ β β β= is a ( 1)k + -vector of parameters.
( ) 1(X) (X)(X) 1 ,g ge eπ−
= +
0 1 1(X) ... ,k kg x xβ β β= + + +
178 MPR-Online 2001, No. 2
2.1. Methodology for the Variable Selection
The main aim of any strategy of variable selection is to obtain the model that fits the
data best. Nevertheless, “ the goodness of fit of the model should never be taken into
account without also taking into account the parsimony of the model” (Mulaik et al.,
1989, p. 437). As Hosmer and Lemeshow (1989, p. 83) note, “The rationale for mini-
mizing the number of variables in the model is that the resultant model is more likely
to be numerically stable and is more easily generalized”. Therefore, the model-building
process should seek the most parsimonious model that still explains the data. In order to
achieve this goal we had to decide on a procedure to select the variables for the model
and the measurements to assess the fitting of the model. There are several approaches to
variable selection, based on different analytic philosophies and different statistical
methods. Hosmer and Lemeshow (1989, p. 83) suggest a general method of variable se-
lection in the logistic regression context. The main feature of this method is that selec-
tion is controlled throughout the whole process by the analyst, in contrast to the me-
chanical selection procedures used by computers, such as stepwise or best subsets regres-
sion. In this way all scientifically relevant variables may be included irrespective of their
statistical contribution to the model. Quantitative epidemiologists have adopted this
methodological stance and for similar reasons we chose to adopt this approach in quan-
titative behavioral research. This model-building methodology is applied to our set of
performance data concerning speech intelligibility which, taking into account the role of
the analyst, is in itself a methodological contribution in this respect. Furthermore, we
discuss about the inappropriate use, for inference, of the estimated odds ratios in our
case. A second methodological issue dealt with in the paper is that of extending the
fixed-effects model to a random-effects model, i.e. the analysis continues taking into ac-
count any possible random effects that isolate sources of variation . We propose the lo-
gistic-binomial model as being more suitable for modeling this specific variation.
3. Subjects and variables
The subjects for this study were 170 schoolchildren at a secondary school in Madrid
(Spain) affected by high noise levels from two sources: a main road carrying heavy traf-
fic and Madrid airport. The students were between 14 and 19 years old and were tested
during normal school hours in six classrooms.
Turrero, Zuluaga & Santisteban: Factors Influencing the Intelligibility of Speech 179
The variables taken into account were:
1. Gender (G), with values of 0 or 1, corresponding to males and females respectively.
2. Age (A), the age of the student in years.
3. Location in the classroom (LI), an indicator of the distance between the student
and the source of the speech signals, with values of 0, 1, 2, corresponding to near,
intermediate and far distances respectively.
4. Location of the classroom (LO), an indicator of the site of the classroom, with val-
ues of 0 or 1 depending upon whether it gives onto the playground or onto the
road respectively.
5. Neuroticism (N) and Extraversion (E); two measurements of personality using the
Eysenck Personality Inventory (Eysenck and Eysenck, 1987). Both variables were
treated as ordinal variables with four categories: low, medium, high and very high.
6. Sensitivity to noise (SN), a measurement of the sensitivity to noise obtained using
the SENSIT-NA Questionnaire (Santisteban and Santalla, 1990b). Four levels were
considered: very little, little, sensitive and very sensitive.
7. Attention (AT), measured by the Spanish version of Thurstone's Identical Forms
Test (1958). Three levels were considered for this variable: inattentive, normal and
very attentive, depending upon the number of items solved correctly out the 60
graphical elements presented.
8. Background noise (BN), a measurement of the level of background noise present
in the classroom during the course of the intelligibility test. The chosen value was
the continuous equivalent sound level Leq .
All the above variables were treated as independent variables because none of them
were viewed as being subject to change during the study. The outcome or dependent
variable is that of speech intelligibility, i.e. the number of words correctly heard by the
subject as measured by a test developed at the Instituto de Acústica in Madrid (Del-
gado, 1968), consisting of 100 two-syllable words phonetically balanced in 10 groups.
The successive groups of words of the intelligibility test were reproduced at emission
levels decreasing in steps of 5 dBA, from 80 dBA to 35 dBA. This gives for each student
a mark of 0 to 10 according to the number of words heard correctly at each fixed emis-
sion level. As we only found significant differences in the subjects' performance at emis-
sion levels from 45 dBA to 55 dBA we selected an emission level of 50 dBA for the pur-
180 MPR-Online 2001, No. 2
poses of this study. The outcome variable in the analysis is the proportion of words
correctly heard in the group of 10 words reproduced at 50 dBA.
The data were processed using the statistical packages EGRET (1991) and BMDP
(1992).
4. Fitting the Logistic Regression Model
4.1 Variable Selection
The selection process began with a careful univariate analysis of each variable. The
results of fitting the univariate logistic regression models to the data are shown in Table
1, in which the nominal and ordinal scaled variables have been modeled by creating de-
sign variables according to the “reference cell coding” or ”partial” method used in the
programs BMDPLR and EGRET. The intercept term is referred to the constant-only
model. To assess the significance of the coefficient(s) for each variable we used the like-
lihood ratio test statistic, G, which is obtained in terms of the difference between the
deviances for the constant-only model and the model containing the variable in ques-
tion. The statistic G follows a χ2 distribution with p degrees of freedom, where p is the
number of coefficients ( categories minus one ) of the variable.
In accordance with Hosmer and Lemeshow (1989, p.86) and the publications of
Bendel and Afifi (1977) and Mickey and Greenland (1989) on linear and logistic regres-
sion respectively, we used a p-value of 0.25 as screening criterion to select candidate
variables for the multivariate model. Thus, on the basis of the output set out in Table
1, all of the variables, except for E and SN, appeared to be associated in some way with
the outcome, speech intelligibility.
Turrero, Zuluaga & Santisteban: Factors Influencing the Intelligibility of Speech 181
Table 1: Parameter Estimates, Their Standard Error Estimates, Deviances, L.R.Test
Statistics and Significance Levels for the Fitting Univariate Logistic Regression Models.
Variable β βσ Deviance G p
Constant -0.263 0.048 419.72
A -0.064 0.032 415.77 3.95 0.047
G 0.120 0.098 418.23 1.49 0.221
LI1 -0.363 0.120 392.60 27.12 <0.001
LI2 -0.610 0.119
LO -1.042 0.101 310.49 109.23 <0.001
N1 -0.125 0.126 405.65 14.07 0.003
N2 0.176 0.149
N3 -0.414 0.156
E1 -0.015 0.129 419.71 0.01 0.999
E2 -0.013 0.144
E3 -0.017 0.154
SN1 -0.019 0.147 416.50 3.22 0.359
SN2 -0.200 0.141
SN3 -0.051 0.172
AT1 -0.333 0.104 409.16 10.56 0.005
AT2 -0.056 0.188
BN -0.310 0.034 322.47 97.25 <0.001
Table 2 shows the results of fitting the multivariate logistic model including all the
variables except E and SN.
182 MPR-Online 2001, No. 2
Table 2: Parameter Estimates, Their Standard Error Estimates, Wald Test Statistics
and Two-tailed Significance Levels for the Multivariate Model.
Variable β β
σ β
βσ p
Constant 8.846 1.920 4.60 <0.001
A -0.033 0.035 -0.94 0.351
G 0.103 0.110 0.94 0.348
LI1 -0.311 0.127 -2.45 0.014
LI2 -0.691 0.127 -5.44 <0.001
LO -0.695 0.133 -5.23 <0.001
N1 -0.046 0.135 -0.34 0.733
N2 0.217 0.161 1.35 0.176
N3 -0.292 0.175 -1.67 0.094
AT1 -0.163 0.113 -1.44 0.150
AT2 -0.017 0.202 -0.08 0.934
BN -0.171 0.044 -3.89 <0.001Deviance : 247.49 (158 df)
The relevance of each variable is mainly verified through an examination of its
Wald statistic. Also, the comparison of the parameter estimate with the corresponding
estimate from the univariate model in Table 1 completes that examination. On the basis
of the results set out in Table 2, variables A, G, and AT should be excluded from the
analysis. Whether to include the variable N in the model was more questionable. Tables
3 and 4 show the results of fitting two new multivariate logistic models containing the
significant variables from the old model, the first including the variable N and the sec-
ond excluding it.
Turrero, Zuluaga & Santisteban: Factors Influencing the Intelligibility of Speech 183
Table 3: Parameter Estimates, Their Standard Error Estimates and Two-tailed Signifi-
cance Levels of Wald Test Statistic for the Multivariate Model Containing Variables
LI,LO,N and BN.
Variable β βσ p
Constant 9.290 1.870 <0.001
LI1 -0.296 0.126 0.019
LI2 -0.675 0.126 <0.001
LO -0.673 0.127 <0.001
N1 -0.044 0.133 0.743
N2 0.256 0.158 0.107
N3 -0.262 0.168 0.119
BN -0.193 0.041 <0.001Deviance : 251.30 (162 df)
Table 4: Parameter Estimates, Their Standard Error Estimates and Two-tailed Signifi-
cance Levels of Wald Test Statistic for the Multivariate Model Containing Variables LI,
LO and BN.
Variable β β
σ p
Constant 8.860 1.850 <0.001
LI1 -0.287 0.125 0.022
LI2 -0.661 0.124 <0.001
LO -0.720 0.125 <0.001
BN -0.183 0.041 <0.001Deviance : 260.48 (165 df)
The models in Tables 2 and 3 were compared via the likelihood ratio test. The statis-
tic of this test takes the value 3.80, which, compared to a χ2 distribution with 4 degrees
of freedom, yields a p-value of 0.43, showing that the variables A, G, and AT added
little information to the model once the other variables had been included. Also, the
observation of the estimated coefficients for the remaining variables supports that fact
since they were nearly identical in both models.
The likelihood ratio statistic, LRS, for the difference between the models in Tables 3
and 4 (a test for the significance of N) had a value of 9.18, which yielded a p-value of
0.027, thus demonstrating that N contributed significantly to the model. Nevertheless,
184 MPR-Online 2001, No. 2
the estimated coefficients for the remaining variables did not change appreciably in ei-
ther model. Observation of the estimated coefficients for the variable N in Table 3 sug-
gested that we should consider a new grouping for this variable. A new variable, de-
noted by NE, was chosen to replace N by regrouping two of its categories into one. The
variable NE thus obtained contains three categories, the first, or reference group in-
cludes the first two categories of N, i.e. the low and medium levels of neuroticism, the
rest of the categories being the same for both variables. A univariate analysis of NE
shows that the high level of neuroticism is the most favourable category with an esti-
mated proportion of successes of 0.5, whilst the values for the low-to-medium and the
very high levels are 0.44 and 0.36 respectively.
Table 5 shows the results of fitting a new multivariate logistic model including the
variable NE instead of N.
Table 5: Parameter Estimates, Their Standard Error Estimates and Two-tailed Signifi-
cance Levels of Wald Test Statistic for the Multivariate Model Containing Variables LI,
LO, NE and BN.
Variable β β
σ p
Constant 9.291 1.870 <0.001
LI1 -0.297 0.126 0.019
LI2 -0.672 0.125 <0.001
LO -0.675 0.127 <0.001
NE1 0.283 0.135 0.036
NE2 -0.234 0.146 0.107
BN -0.193 0.041 <0.001Deviance: 251.40 (163 df)
The models in Tables 3 and 5 are nested and so we could use LRS to compare them.
This statistic took the value 0.1, which, compared to a χ2 distribution with 1 degree of
freedom, yielded a p-value of 0.75. Thus we concluded that the model in Table 5 repre-
sented an improvement over that in Table 3 . Moreover, the LRS for the difference be-
tween the models in Tables 4 and 5 (a test for the significance of NE) had a value of
9.08 with an associated p-value of 0.010, which endorses the contribution of neuroticism
to the model.
We then focused our attention upon the assumption of linearity in the logit for the
variables that were modeled as being continuous. The only variable we needed to check
Turrero, Zuluaga & Santisteban: Factors Influencing the Intelligibility of Speech 185
was background noise (BN). This variable differs between rooms but not within the
same room. Moreover only four different levels of background noise were observed dur-
ing the tests in the six classrooms. These levels were 45 dBA, 45,5 dBA, 47 dBA and 50
dBA. One approach to assessing the scale of the logit was to categorize the variable BN
into groups and so we created two design variables using the 45 and 45.5 values as the
reference group. These design variables, BNG1 and BNG2, were then used in the multi-
variate model instead of BN.
Table 6 shows the results of this fitting with regard to the variables BNG1 and
BNG2.
Table 6: Estimated Coefficients, Estimated Standard Errors and Two-tailed Significance
Levels of Wald Test Statistic of BNG1 and BNG2 from the Multivariate Model Con-
taining LI, LO, NE and BNG.
Variable β β
σ p
BNG1 0.071 0.184 0.699
BNG2 -0.976 0.194 <0.001Deviance : 240.3 (162 df)
The estimated coefficients and p-values in Table 6 suggest a binary model.Thus, we
created a new dichotomous variable, BNC, taking a value of 1 if BN was greater than
47 dBA and 0 otherwise. The results of fitting the multivariate model with the new
variable BNC are given in Table 7.
Table 7: Results of Fitting the Multivariate Model Containing Variables LI, LO, NE
and BNC.
Variable β βσ p
Constant 0.525 0.106 <0.001
LI1 -0.294 0.127 0.020
LI2 -0.664 0.126 <0.001
LO -0.744 0.114 <0.001
NE1 0.268 0.135 0.047
NE2 -0.247 0.146 0.090
BNC -1.003 0.180 <0.001Deviance : 240.45 (163 df)
186 MPR-Online 2001, No. 2
These results show that students in a noisy classroom (50 dBA) obtained an esti-
mated proportion of successes of 0.38, which is much lower than the score of 0.63 ob-
tained by students in classrooms with low noise levels (≤ 47 dBA). Once again, a nu-
merical comparison of the deviance in Table 7 with that in Table 6, together with the
respective degrees of freedom, indicated an improvement over the last model.
It should be pointed out at this juncture that the variable BN was treated as a
categorical variable because of the particular conditions of this study. Generally speak-
ing, when there are a lot of values for BN the rational thing to do would be to treat it
as a continuous variable in the model.
Once we had ascertained that the continuous variable was in the correct scale we
were able to consider the main-effects model as being complete. We began the multi-
variate model in Table 2 with a deviance of 247.49 and 158 degrees of freedom, and fin-
ished in Table 7 with a deviance of 240.45 and 163 degrees of freedom.
At this stage in the model−building process we felt we should check for interac-
tions. The interaction between the variables LO and BNC made no sense because BNC
= 0 for all the classrooms giving onto the playground. The remaining interactions were
certainly of greater interest. The results of adding each interaction to the main-effects
model are shown in Table 8.
Table 8: Deviances, LRS, Degrees of Freedom and p-Value for Interactions of Interest to
be Added to the Main Effects Only Model.
Interaction Deviance LRS df p-value
Main Effects only2 240.45
LI x LO 236.55 3.90 2 0.142
LI x NE 238.11 2.34 4 0.674
LI x BNC 238.95 1.50 2 0.473
LO x NE 237.31 3.14 2 0.209
NE x BNC 229.08 11.37 2 0.003
It can be seen from the p-values associated with LRS in Table 8 that only the
NE× BNC interaction affords a significant improvement over the main-effects model.
2 Main effects model from table 7.
Turrero, Zuluaga & Santisteban: Factors Influencing the Intelligibility of Speech 187
Consequently, the final fixed-effects model contains the main effects set out in Table 7
plus this latter interaction. The results of fitting this model are giving in Table 9.
Table 9:Results of Fitting the Multivariate Model Containing Variables LI, LO, NE,
BNC and NE× BNC Interaction.
Variable β βσ p
Constant 0.556 0.107 <0.001
LI1 -0.298 0.130 0.022
LI2 -0.632 0.127 <0.001
LO -0.734 0.114 <0.001
NE1 0.111 0.144 0.440
NE2 -0.344 0.157 0.028
BNC -1.589 0.277 <0.001
BNC x NE1 1.256 0.401 0.002
BNC x NE2 0.969 0.449 0.031Deviance : 229.08 (161 df)
4.2 Assessing the Fitting of the Model
Once the model was constructed we needed to assess its overall fitting and suitability.
Some combinations of variable levels were not found in the results and thus only 25
different covariate patterns occur in the final model, as shown in Table 9. In this situa-
tion an appropriate statistic for assessing the fitting is the Hosmer-Lemeshow test
(Hosmer and Lemeshow, 1980; Lemeshow and Hosmer, 1982). The value of this statistic,
computed from the fitted logistic model in Table 9, is χ = 3.51, and the corresponding
p-value computed from the χ2 distribution with 8 degrees of freedom is one of 0.898,
which indicates that the model seems to fit the data quite well. A complete analysis
might include an examination of the individual residuals but this would involve consid-
erable effort and would not answer any outstanding questions.
4.3 Inferences from the Fitted Model
In epidemiological research inferences from a logistic regression model usually begin
with an estimation of the odds ratios for the various risk factors in the model. The main
reason for this is that in many instances the odds ratios approximate the relative risks
and those can easily be obtained from the coefficients of the logistic model. In our con-
188 MPR-Online 2001, No. 2
text this approximation requires that the expected proportion of failures be small for all
categories of each variable, which is unlikely. Furthermore, we are interested in esti-
mating the proportion of successes for the different levels of the variables and in estab-
lishing an order between these levels on the basis of these proportions. An initial ap-
proach for evaluating the effect that each variable in the model has on the proportion of
successes consists of “adjusting for all other variables”, which involves comparing the
different levels of the variable at certain common values of the remaining variables,
these values being their respective reference values. For each variable we can choose the
most favourable category (the greatest proportion of successes) and the most unfavour-
able category (the smallest proportion of successes). Table 10 shows the favourable and
unfavourable categories for each variable, together with their estimated proportion of
successes (E.P.S.).
Table 10: Estimated Proportion of Successes for the Favorable and Unfavorable Catego-
ries of Each Variable in the Model and Ratio of Proportions.
Variable Favourable E.P.S. Unfavourable E.P.S. Ratio of Proportions
LIFirst Range
0.636
Far Range
0.4811.32
LOYard
0.636
Highway
0.4561.39
N, BN≤47High Level
0.661
Very High Level
0.5531.20
N, BN>47High Level
0.583
Low or Medium Levels
0.2632.22
Since neuroticism and background noise are included in one interaction, both vari-
ables are jointly analysed in Table 10. The ratio of proportions (favourable/ unfavour-
able) is also given in the last column of this table. Note that the proportion of successes
for a student in the front row of the class is 1.32 times that for a student in a back row.
In a noisy classroom (BN>47 dBA) the proportion of successes for a student with a high
level of neuroticism is 2.22 times that of a student with a low-to-medium level.
Turrero, Zuluaga & Santisteban: Factors Influencing the Intelligibility of Speech 189
5. Extension to Random-Effects Model
5.1 Approaches
A natural extension of a fixed-effects model, when there is grouping in the data, is
the random-effects model, which offers an alternative to isolate sources of heterogeneity.
Fixed-effects models explicitly model a location parameter corresponding to the baseline
response in each stratum, whilst random-effects models address this fact by assuming
that each stratum baseline parameter is a realization from a probability distribution
specifiable with a small fixed number of parameters.
In our context it seems reasonable to believe that, because of its internal and external
acoustic conditions, the classroom (C) can play the role of a homogeneity factor.
For categorical data two approaches are available for modeling the so−called ex-
tra−binomial variation. The pioneering work of Crowder (1978) postulated that the suc-
cess probability for the i th stratum derives from a beta distribution. This model is re-
ported as the beta-binomial regression model.
Another approach, with a special intuitive appeal, is to postulate that the said prob-
ability is perturbed on the logit scale. Pierce and Sands (1975) proposed this method in
an unpublished paper, assuming a standard normal distribution for this perturbation.
This model is known as the logistic−normal regression model. Finally the logis-
tic−binomial model (Mauritsen, 1984) generalises the logistic regression model in a man-
ner similar to the logistic−normal regression model. In this case a standardised binomial
distribution is assumed for the random perturbation.
Mauritsen (1984) compares the beta-binomial with the logit−scale prior models by
analyzing such features as goodness of fit, speed of fit and utility, using real data sets
and via computer simulated data. On the basis of this comparison and taking into ac-
count that we are handled distinguishable responses we chose to use the logistic- bino-
mial model for modeling any possible extra−binomial variation.
Let rij denote the number of successes for the j th student in the i th classroom. We
are assuming
rij → Binomial ( 10, π ij ). (3)
190 MPR-Online 2001, No. 2
The logistic regression model assumes that there are no classroom effects and that π ij
, the success probability for the (i, j) subscripts can be written, in terms of its k-vector
of associated covariates Xij' =( , ,..., )x x xij ij ijk1 2 , as follows:
(4)
which can also be written, in terms of the logit transformation, as
(5)
where g (x ij ) = β β β β0 1 1 2 2+ + + +x x xij ij k ijk... ,
and β’ = ( 0 1, ,..., kβ β β ) is a (k + 1) −vector of parameters3.
The logistic−binomial regression model assumes
(6)
where vi is a realization from a symmetric, standardized binomial distribution, and
σ ≥ 0 is a scale parameter, i.e.
(7)
where wi→ binomial (K,1/2). In particular vi is the same for all students in the i th
classroom.
5.2 Fitting the Logistic−−−−Binomial Regression Model
Table 11 shows the results of extending the model in Table 9 to include the σ pa-
rameter, that is to say, the new model fitted to the data is the logistic−binomial model
formulated in (6). For the fitting, we used a six−point prior distribution, i.e. K=5 in (7).
The choice of this prior distribution is based upon the comparisons made by Mauritsen
(1984).
3Models (1) and (4) are the same. Equation (4) is more explicit than (1) due to notational necessity
( )( )( )
,1
ij
ij
g x
ij g x
e
eπ =
+
( )log it ( )ij ij ig x vπ σ= +
( ) ( )log it ,ij ijg xπ =
( ) 2,
( )i i i
ii
w E w w Kv
KVar w− −
= =
Turrero, Zuluaga & Santisteban: Factors Influencing the Intelligibility of Speech 191
Table 11: Results of Fitting the Logistic-Binomial Regression Model Containing All the
Fixed Effects Terms in Table 9 and Using C as Matching Variable.
Variable β βσ p
Constant 0.538 0.147 <0.001
LI1 -0.306 0.130 0.019
LI2 -0.653 0.128 <0.001
LO -0.691 0.208 <0.001
NE1 0.073 0.145 0.616
NE2 -0.353 0.157 0.025
BNC -1.610 0.367 <0.001
BNC x NE1 1.292 0.402 0.001
BNC x NE2 0.976 0.450 0.030
Excess variation 0.183 (σ) 0.075 0.007Deviance : 223.72 (160 df)
The Wald-test statistic for the excess variation term is a one-tailed test.
To test whether there is any statistically significant excess variation (σ > 0) we have
to compare the model containing no random-effects terms (Table 9) with that contain-
ing an excess-variation term (Table 11) using the likelihood ratio statistic. The square
root of the likelihood ratio statistic is treated as a one−tailed z-statistic since the linear
predictor for the random-effects portion of the regression is restricted to being
non−negative. This LRS results in a value of 5.36, which yields a p-value of 0.010, indi-
cating a significant excess of variation. Note that the standard errors for LO and BNC
in Table 11 have increased in relation to the corresponding ones derived according to
the standard logistic regression model (Table 9). This is the practical effect of the het-
erogeneity, or extra−binomial variation, in the data, since both variables are related to
C, the matching variable.
The variable BNC contributes to the model together with the variable NE, whilst the
variable LO has a separate effect. Thus, it may be that once σ is in the model LO is no
longer significant. This question is analysed in Tables 12 and 13. The intention is to
find the best place for LO in the model.
192 MPR-Online 2001, No. 2
Table 12: Results of Fitting Six Models to the Data, Using Logistic Regression With
and Without Random Effects.
Fit Fixed Effects Parameters Random Effects Parameters Deviance (df)
A Model I 271.19 (162)
B4 Model I, LO 229.08 (161)
C Model I EV 231.33 (161)
D5 Model I, LO EV 223.72 (160)
E Model I EV, LO 228.16 (160)
F Model I, LO EV, LO 221.86 (159)
Table 13: Analysis of the Fits Reported in Table 12.
Test Explanation Comparison LRS df p
1 Test for LO differences A vs. B 42.11 1 <0.001
2 Test for excess variation given no LO
differences A vs. C 39.86 1 <0.001 6
3 Test for excess variation given LO differ-
ences B vs. D 5.36 1 0.0106
4 Test for LO differences in the presence of
excess variation C vs. D 7.61 1 0.005
5 Test if the two LO groups need to be fit
separately D vs. F 1.86 1 0.163
6 Test for LO differences, while fitting
separate amounts of excess of variation E vs. F 6.3 1 0.012Note. All tests assume the presence of LI, NE and BNC differences.
In Table 12 we set out the results of fitting six different regressions to the data, the
first two using logistic regression and the last four using logistic−binomial regression.
‘Model I’ represents the fixed-effects model containing variables LI, NE, BNC and
NE×BNC interaction. ‘EV’ denotes the term that parametrizes the excess variation.
4 The fit B is the final fixed effects model in table 9.5 The fit D is the random effects model in table 11.6 This test compares the square root of the likelihood ratio statistic against a one tailed normal distribu-
tion.
Turrero, Zuluaga & Santisteban: Factors Influencing the Intelligibility of Speech 193
The first three tests in Table 13 are quite significant, as was to be expected after the
model−building process concluded in Table 11. Test 4 shows that the variable LO must
be present in the fixed-effects portion of the model. The results of the last two tests al-
lowed us to conclude that the best fitting of all is D, i.e. the random effects model in
Table11.
6 Conclusions
We have applied the logistic regression model for estimating the effects of noise, per-
sonality and other factors upon the performance of students in speech intelligibility
tests. The model-building methodology used in the analysis leaves the selection of vari-
ables in the hands of the analyst rather than in the computer's control. The rationale for
this approach is to provide as complete control of confounding as possible within the
given data set. On the basis of this analysis only four factors appear to have a close re-
lationship with performance results. Two of these are related to the acoustic conditions
in the classroom, location (LO) and background noise (BN); the third, student neuroti-
cism (N), is related to the subject's own personality, and the last one is the distance
between the student and the source of the speech signals (LI).
The extension of the model including random effects substantially improves the fit-
ting and provides the best tool for forecasting purposes.
The main behavioral implications of the models may be summarised as follows:
1. - The variables LI and LO represent separate effects on performance. The propor-
tion of successes for a student in the nearest row to the sound source within a classroom
giving onto the playground is estimated to be from 12% to 25% greater than for a stu-
dent in the middle rows, and from 30% to 65% greater than a student farthest away;
the rest of the covariates remain the same. In a classroom giving onto the main road the
corresponding percentages vary from 19% to 29% and from 45% to 75% respectively.
Overall, schoolchildren in classrooms giving onto the playground perform better than
those in classrooms giving onto the road, the proportion of successes for a student in the
former situation being estimated to range from 37% to 91% greater than for a student in
the latter situation, all other factors being equal.
2. − The variables N and BN exert a joint effect on performance. For any combina-
tion (LI, LO) the arrangement of the categories N BN× in order of performance is the
following: high × normal, low-to-medium × normal, high × noisy, very high × normal,
194 MPR-Online 2001, No. 2
very high × noisy and low-to-medium × noisy; where we use the adjectives ‘normal’
and ‘noisy’ to refer to BN ≤ 47 dBA and BN > 47 dBA respectively. Thus the most
favourable interaction is given for a student with a high level of neuroticism in a normal
classroom. The least favourable interaction, on the other hand, is given for a student
with a low-to-medium level of neuroticism in a noisy classroom. It is also noteworthy
that a student with a high level of neuroticism in a noisy classroom has a better forecast
than a student with a very high level of neuroticism in a normal classroom.
3. − The classroom represents a block factor in the data with a random effect on
performance. This means that two students with the same covariates but belonging to
two different classrooms have estimated proportions of successes which differ by a ran-
dom amount, the approximate distribution of which is N (0,0.18).
4. − Finally, we may conclude that the covariate pattern with the best performance is
given by the following values : front row (LI), playground (LO), high-level neuroticism
(N) and normal noise (BN). For this pattern, the estimated proportion of successes,
without taking into account the random effect, is 0.661. The values far distance from
the speech source, road, low-to-medium level of N and a noisy background constitute
the worst covariate pattern, with an estimated proportion of successes, irrespective of
the random effect, of 0.083.
It is important to note that one of the main results is the significant interaction of
BNC and NE; that is to say, although several variables may exert main effects on the
speech comprehension rate (intelligibility), only an analysis of the interactions of these
variables with the effect of background noise reveal significant information concerning
the influences upon susceptibility.
Turrero, Zuluaga & Santisteban: Factors Influencing the Intelligibility of Speech 195
References
[1] Baer, T., & Moore, B.C.J. (1994). Effects of spectral smearing on the intelligibility
of sentences in the presence of interfering speech. Journal of the Acoustical Society
of America, 95, 2277-2280.
[2] Bendel, R.B.,& Afifi, A.A. (1977). Comparison of stopping rules in forward regres-
sion. Journal of the American Statistical Association, 72, 46-53.
[3] Beranek, L. L. (1947). The design of speech communication systems. Proceedings of
the Institute of Radio Engineers, 35, 880-890.
[4] BMDP (1992). BMDP statistical software manual (Vol.2). Berkeley, CA : University
of California Press.
[5] Broadbent, D.E. (1957). Effects of noise on behaviour. In C.M. Harris (ed), Hand-
book of Noise Control. New York. McGraw-Hill, pp.10-34.
[6] Broadbent, D.E. ( 1972). Individual differences in annoyance by noises. Sound, 6,
56-61.
[7] Broadbent, D.E. (1981). The effects of moderate levels of noise on human perform-
ance. In J.Tobias & E.Schubeert (ed.), Hearing: Research and Theory. New York:
Academic Press
[8] Cohen,S., Evans, G.W., Krantz, D.S & Stokols, D. (1980). Physiological, motiva-
tional, and cognitive effects of aircraft noise on children: Moving from the labora-
tory to the field. American Psychologist, 35, 231-243.
[9] Crowder, M. J. (1978). Beta-binomial Anova for proportions. Applied Statistics, 27
(1), 34- 47.
[10] Delgado, C. (1968). Ruido y palabra : Test de inteligibilidad CIF. Electrónica y
Física Aplicada, XI, 107-112.
[11] EGRET (1991). EGRET statistical software. Statistics and Epidemiology Research
Corporation and Cytel Software Corporation, Seattle, WA.
[12] Evans, G.W. (1990). The nonauditory effects of noise on child development. In
B.Berglund, U.Berlund, J.Karlsson & T.Lindvall (eds), Noise as a Public Health
Problem. Vol 4. 425-453
196 MPR-Online 2001, No. 2
[13] Eysenck, H.J., & Eysenck, S. B. G. (1987). Eysenck Personality Inventory. Hodder
and Stoughton. Educational London. Revised Spanish Version. TEA. Madrid.
[14] Gulian,E. & Thomas, J.R. (1986). The effects of noise, cognitive set and gender on
mental arithmetic performance. British Journal of Psychology, 77, 503-511.
[15] Hosmer, D. W., & Lemeshow, S .(1980). A goodness-of-fit test for the multiple logis-
tic regression model. Communications in Statistics, A10, 1043-1069.
[16] Hosmer, D. W., & Lemeshow, S.(1989). Applied Logistic Regression. New York:
John Wiley & Sons.
[17] Hygge, S., Bullinger, M. & Evans, G.W. ( 1994). The Munich airport noise study:
Cognitive effects on children from before to after the change over of airports. Ab-
stract from the 23rd International Congress of Applied Psychology, Madrid, Spain.
Report to the Swedish Environmental Protection Agency.
[18] Jones,D.M. & Broadbent, D.E. (1979). Side-effects of interference with speech by
noise. Ergonomics, 22, 1073-1081.
[19] Jones, D.M., Smith, A.P.& Broadbent, D.E. (1979). Effects of moderate intensity
noise on the Bakan vigilance task. Journal of Applied Psychology 64,627-634.
[20] Kryter, K.D. (1970). The effects of noise on man. New York: Academic Press.
[21] Kryter, K. D. (1985). The effects of noise on man. 2nd ed..New York, NY : Academic
Press .
[22] Kryter, K. D. (1994). The Handbook of hearing and the effects of noise. Physiology,
Psychology and Public Health. New York : Academic Press.
[23] Lemeshow, S., & Hosmer, D.W. (1982). The use of goodness-of -fit statistics in the
development of logistic regression models. American Journal of Epidemiology,115,
92-106.
[24] Mauritsen, R. H. (1984). Logistic regression with random effects. Unpublished
Ph.D. Thesis,Department of Biostatistics, University of Washington, Seattle.
[25] Mickey, J., & Greenland, S. (1989). A study of the impact of confounder-selection
criteria on effect estimation. American Journal of Epidemiology,129, 125-137.
[26] Mulaik, S.A., James, L.R., Van Alstine, J., Bennett, N., Lind, S., & Stilwell, C. D.
(1989). Evaluation of goodness-of-fit indices for structural equations models. Psy-
chological Bulletin, 105, 430-445.
Turrero, Zuluaga & Santisteban: Factors Influencing the Intelligibility of Speech 197
[27] Pierce, D.A., & Sands, B.R.(1975). Extra-Bernouilli variation in binary data. Tech-
nical report No.46, Department of Statistics, Oregon State University.
[28] Santalla, Z., Alvarado, J.M. & Santisteban, C. (1999). ¿El ruido afecta a la
focalización de la atención visual? Psicothema, 11, 97-111.
[29] Santisteban, C. & Santalla, Z. (1993). The effects of everyday noise on comprehen-
sion and recall of reading texts. In B.Berglund, U.Berlund, J.Karlsson & T.Lindvall
(eds), Noise as a Public Health Problem, 2 , 553-556.
[30] Santisteban, C & Santalla, Z. (1990a). Efectos del ruido sobre memoria y atención:
Una revisión. Psicothema, 2,49-91.
[31] Santisteban, C., & Santalla, Z.(1990b). SENSIT-NA. Cuestionario de sensibilidad al
ruido para adultos. Norma Ed., S.A. Madrid.
[32] Santisteban, C., Sebastián, E.M. & Santalla, Z (1994). Efectos de ruidos cotidianos
sobre el recuerdo. Psicothema, 6;403-416 .
[33] Smith, A. P.(1988). Individual differences in the combined effects of noise and
nightwork on performance, in Manninen, O. (ed.), Recent Advances in Researches
on the Combined Effect of Environmental Factors ( Finland:Tampere, 365−380 ).
[34] Smith, A. P. (1993). Recent advances in the study of noise and human performance.
Proceedings of the 6th International Congress on Noise as a Public Health Problem,
3, 293−300.
[35] Smith, A.P.& Broadbent, D.E .(1991). Non-auditory effects of noise at work: a re-
view of the literature. Health and Safety Executive Contract Research Report
No.30.
[36] Thurstone, L.L. (1958). Identical Forms. Sciences Research Associates. Chicago.
[37] Webster, J. C. (1969). Effects of noise on speech intelligibility. In : American Spe-
ech and Hearing Association, Noise as a public health hazard. Washington, DC,
ASHA Reports, 4.
[38] Webster, J. C. (1974). The effects of noise on hearing speech. In: US Environmental
Protection Agency (Eds.), Noise as a public health hazard. Washington, DC, US
EPA, 24-43.