journal soche vol 3.1

Upload: lincovil-jaime

Post on 05-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Journal Soche Vol 3.1

    1/117

    Volume 3 - Number 1 - April 2012

    Volume 3 - Number 1 - April 2012

    Volu

    me3

    (1)April2012Contents

    Reinaldo Arellano-ValleA message from the editor-in-chief 1

    Carmen Batanero and Carmen DazTraining school teachers to teach probability: refections and challenges 3

    Alessandra Guglielmi, Francesca Ieva, Anna Maria Paganoniand Fabrizio Ruggeri

    A Bayesian random-effects model for survival probabilities after acutemyocardial infarction 15

    Christophe Chesneau and Nargess HosseiniounOn the wavelet estimation of a function in a density model with

    non-identically distributed observations 31

    Rahim Mahmoudvand and Mohammad ZokaeiOn the singular values of the Hankel matrix with application in singular

    spectrum analysis 43

    Luis Gustavo Bastos Pinho, Juvncio Santos Nobreand Slvia Maria de FreitasOn linear mixed models and their infuence diagnostics

    applied to an actuarial problem 57

    Lutemberg Florencio, Francisco Cribari-Neto and Raydonal OspinaReal estate appraisal of land lots using GAMLSS models 75

    Arabin Kumar Dey and Debasis KunduDiscriminating between the bivariate generalized exponentialand bivariate Weibull distributions 93

  • 7/31/2019 Journal Soche Vol 3.1

    2/117

  • 7/31/2019 Journal Soche Vol 3.1

    3/117

    Chilean Journal of Statistics

    Volume 3, Number 1

    April 2012

    ISSN: 0718-7912 (print)/ISSN 0718-7920 (online)c Chilean Statistical Society Sociedad Chilena de Estadstica

    http://www.soche.cl/chjs

  • 7/31/2019 Journal Soche Vol 3.1

    4/117

  • 7/31/2019 Journal Soche Vol 3.1

    5/117

    Chilean Journal of Statistics Volume 3 Number 1 April 2012

    Contents

    Reinaldo Arellano-Valle

    A message from the editor-in-chief 1

    Carmen Batanero and Carmen Daz

    Training school teachers to teach probability: reflections and challenges 3

    Alessandra Guglielmi, Francesca Ieva, Anna Maria Paganoni and Fabrizio Ruggeri

    A Bayesian random-effects model for survival probabilities after acute

    myocardial infarction 15

    Christophe Chesneau and Nargess Hosseinioun

    On the wavelet estimation of a function in a density model with

    non-identically distributed observations 31

    Rahim Mahmoudvand and Mohammad Zokaei

    On the singular values of the Hankel matrix with application in singular

    spectrum analysis 43

    Luis Gustavo Bastos Pinho, Juvencio Santos Nobre and Slvia Maria de FreitasOn linear mixed models and their influence diagnostics

    applied to an actuarial problem 57

    Lutemberg Florencio, Francisco Cribari-Neto and Raydonal Ospina

    Real estate appraisal of land lots using GAMLSS models 75

    Arabin Kumar Dey and Debasis Kundu

    Discriminating between the bivariate generalized exponential

    and bivariate Weibull distributions 93

  • 7/31/2019 Journal Soche Vol 3.1

    6/117

  • 7/31/2019 Journal Soche Vol 3.1

    7/117

    Chilean Journal of StatisticsVol. 3, No. 1, April 2012, 12

    FIFTH ISSUE

    A message from the Editor-in-Chief

    The fifth issue of the Chilean Journal of Statistics contains the following seven articlesthat address various interesting topics related to statistics and probability:

    (i) The first article is authored by Carmen Batanero and Carmen Daz. Dr. Batanerohas a wide recognition around the world for her contributions in the field of statistics

    education. In this article, the authors introduce some reflections on the importanceof training school teachers to teach probability. They analyze the reasons whythe teaching of probability is difficult for mathematics teachers. In addition, theydescribe the contents needed in the preparation of teachers to teach probability andsuggest possible activities to carry out this training.

    (ii) The second article is authored by Alessandra Guglielmi, Francesca Ieva, AnnaM. Paganoni and Fabrizio Ruggeri. Dr. Ruggeri has an extensive experience inthe analysis of survival and reliability and he is internationally recognized due tohis outstanding contributions in Bayesian statistics. In this article, the authorspropose a Bayesian model of random effects for survival probabilities after acutemyocardial infarction. They present a case-study by applying a Bayesian hierarchical

    generalized linear model to analyze a real data set on patients with myocardialinfarction diagnosis. In particular, they obtain posterior estimates of the modelparameters (regression and random effects parameters) through a MCMC algorithm.Some issues about model fitting are also discussed through the use of predictive tailprobabilities and Bayesian residuals.

    (iii) The third article is authored by Christophe Chesneau and Nargess Hosseinioun. Dr.Chesneau is a highlighted researcher on the topic of the article. In this article, theauthors conduct an interesting study on nonparametric density estimation. Theyinvestigate the estimation of a common but unknown function associated with adensity function of non-identically distributed observations by means of the powerfultool of the wavelet analysis. In order to do this, they construct a new linear wavelet

    estimator and study its performance for independent and dependent data. Then,in the independent case, they develop a new adaptive hard thresholding waveletestimator and prove that it attains a sharp rate of convergence.

    (iv) The fourth article is authored by Rahim Mahmoudvand and Mohammad Zokaei.In this article, the authors provide applications of the Hankel matrix in spectralanalysis. The aim of their work is to obtain some theoretical properties of thesingular values of the Hankel matrix that can be used directly for choosing propervalues of the two parameters of the singular spectrum analysis.

    ISSN: 0718-7912 (print)/ISSN: 0718-7920 (online)

    c Chilean Statistical Society Sociedad Chilena de Estadsticahttp://www.soche.cl/chjs

  • 7/31/2019 Journal Soche Vol 3.1

    8/117

    2 R.B. Arellano-Valle

    (v) The fifth article is authored by Luis Gustavo Bastos Pinho, Juvencio Santos Nobreand Slvia Maria de Freitas. In this article, the authors consider linear mixed modelsand diagnostic tools for statistical analysis of some practical actuarial problems.Their idea is based on the fact that the linear mixed models are an alternative totraditional credibility models. Thus, considering that the main advantage of linearmixed models is the use of diagnostic methods, they consider that these methods

    may help to improve the model choice and to identify outliers or influential subjects,which deserve better attention by the insurer.

    (vi) The sixth article is authored by Lutemberg Florencio, Francisco Cribari-Neto andRaydonal Ospina. Dr. Cribari-Neto is a very recognized Brazilian researcher withexcellent academic credentials. In this article, the authors perform real estateappraisal using a class of statistical models called generalized additive models forlocation, scale and shape (GAMLSS). By means of an empirical analysis, they showthat the GAMLSS models seem to be more appropriate for estimation of the hedonicprices function than the regression models currently used to that end.

    (vii) The seventh article of this issue is due to Arabin Kumar Dey and Debasis Kundu. Dr.Kundu is a highly productive Indian researcher. In this article, the authors study thediscrimination between the bivariate generalized exponential and bivariate Weibulldistributions. In order to do this, they use the difference of the respective maximizedlog-likelihood functions, for which they determine the asymptotic distribution of thecorresponding test statistic and calculate the associated asymptotic probability ofcorrect selection. Their work is finished with numerical illustrations of the effective-ness of the propose methodology.

    Finally, I take this opportunity to give my most sincere thanks to our Executive Editor,Professor Vctor Leiva, for his constant, profound/deep and noble commitment with theedition of each issue of our journal.

    Reinaldo B. Arellano-Valle1

    Editor-in-ChiefChilean Journal of Statistics

    1Departamento de Estadstica, Facultad de Matematicas, Pontificia Universidad Catolica de Chile, Apartado CodigoPostal: 7820436, Santiago, Chile. Email: [email protected]

  • 7/31/2019 Journal Soche Vol 3.1

    9/117

    Chilean Journal of StatisticsVol. 3, No. 1, April 2012, 313

    Statistics Education

    Teaching Paper

    Training school teachers to teach probability:reflections and challenges

    Carmen Batanero1, and Carmen Daz2

    1Departamento de Didactica de la Matematica, Universidad de Granada, Spain2Departamento de Psicologa, Universidad de Huelva, Spain

    (Received: 31 August 2009 Accepted in final form: 01 November 2009)

    Abstract

    Although probability is today part of the mathematics curricula for primary and secon-dary schools in many countries, the specific training to teach probability is far frombeing an universal component of pre-service courses for mathematics teachers responsi-ble of this training. In this paper, we analyse the reasons why the teaching of probabilityis difficult for mathematics teachers. In addition, we describe the contents needed in thepreparation of teachers to teach probability and suggest possible activities to carry outthis training.

    Keywords: Teaching probability School level Training of mathematics teachers.

    Mathematics Subject Classification: Primary 97A99 Secondary 60A05.

    1. Introduction

    The reasons for including probability in schools have been repeatedly highlighted over thepast years; see, e.g., Gal (2005), Franklin et al. (2005) and Jones (2005). These reasonsare related to the usefulness of probability for daily life, its instrumental role in other dis-ciplines, the need for a basic stochastic knowledge in many professions, and the importantrole of probability reasoning in decision making. Students will meet randomness not onlyin the mathematics classroom, but also in biological, economic, meteorological, political

    and social activities (games and sports) settings. All these reasons explain why probabilityhas recently been included in the primary school curriculum in many countries since veryearly ages and why the study of probability continues later through secondary and highschool and universities studies.

    Corresponding author. Departamento de Didactica de la Matematica. Facultad de Ciencias de la Educacion. Uni-versidad de Granada. Campus de Cartuja. 18071 Granada. Espana. Phone: (34)(958)243949. Fax (34)(958)246359.Email: [email protected] (C. Batanero), [email protected] (C. Daz)

    ISSN: 0718-7912 (print)/ISSN: 0718-7920 (online)

    c Chilean Statistical Society Sociedad Chilena de Estadsticahttp://www.soche.cl/chjs

  • 7/31/2019 Journal Soche Vol 3.1

    10/117

    4 C. Batanero and C. Daz

    Changes in what is expected in the teaching of probability and statistics do not justconcern the age of learning or the amount of material, but also the approach to teaching.Until recently, the school stochastic (statistics and probability) curriculum was reducedto a formula-based approach that resulted in students who were ill prepared for tertiarylevel statistics and adults who were statistically illiterate. The current tendency even forprimary school levels is towards a data-orientated teaching of probability, where students

    are expected to perform experiments or simulations, formulate questions or predictions,collect and analyse data from these experiments, propose and justify conclusions andpredictions that are based on data; see, e.g., NCTM (2000), Parzysz (2003) and MEC(2006a,b). As argued in Batanero et al. (2005), these changes force us to reflect on theteaching of chance and probability.

    The importance of developing stochastic thinking and not just stochastic knowledgein the students is being emphasized in many curricula. Indeed, some authors argue thatstochastic reasoning is different from mathematical reasoning, both of them being essentialto modern society and complementing each other in ways that strengthen the overallmathematics curriculum for students; see Scheaffer (2006).

    Changing the teaching of probability in schools will depend on the extent to which wecan convince teachers that this is one of the most useful themes for their students, as well

    as on the correct preparation of these teachers. Unfortunately, several authors agree thatmany of the current programmes do not yet train teachers adequately for their task toteach statistics and probability; see, e.g., Begg and Edwards (1999), Franklin and Mew-born (2006), Borim and Coutinho (2008) and Chick and Pierce (2008). Even when manyprospective secondary teachers have a major in mathematics, they usually study onlytheoretical (mathematical) statistics and probability in their training. Few mathemati-cians receive specific training in applied statistics, designing probability investigations orsimulations, or analysing data from these investigations. These teachers also need sometraining in the pedagogical knowledge related to the teaching of probability, where generalprinciples that are valid for geometry, algebra or other areas of mathematics cannot alwaysbe applied. The situation is even more challenging for primary teachers, few of whom havehad suitable training in either theoretical or applied probability, and traditional introduc-

    tory statistics courses will not provide them with the didactical knowledge they need; seeFranklin and Mewborn (2006).

    Research in statistics education shows that textbooks and curriculum documents pre-pared for primary and secondary teachers might not offer enough support. Sometimesthey present too narrow a view of concepts (for example, only the classical approach toprobability is shown). In addition, applications are at other times restricted to games ofchance. Finally, in some of them the definitions of concepts are incorrect or incomplete;see Canizares et al. (2002). There are also exceptional examples and experiences of coursesspecifically directed to train teachers in different countries some of them based on theo-retical models of how this training should be carried out; see, e.g., Kvatinsky and Even(2002), Batanero et al. (2004) and Garfield and Everson (2009). Although evaluation ofthe success of such courses is in general based on small samples or subjective data, theydo provide examples and ideas for teacher educators.

    The aim of this paper is to reflect on some specific issues and challenges regarding theeducation of teachers to teach probability at school level. In Section 2, we analyse thespecific features of probability and the different meanings of this concept that should betaken into account when teaching probability at school level. In Section 3, we summarise thescarce research related to teachers beliefs and knowledge as regards probability, which arenot always adequate. In Section 4, we then discuss possible activities that can contributeto the education of teachers to teach probability. Finally, in Section 5, we conclude thiswork with some personal recommendations.

  • 7/31/2019 Journal Soche Vol 3.1

    11/117

    Chilean Journal of Statistics 5

    2. The Nature of Probability

    A main theme in preparing teachers is discussing with them some epistemological prob-lems linked to the emergence of probability, because this reflection can help teachers tounderstand the students conceptual difficulties in problem solving. Probability is a youngarea and its formal development was linked to a large number of paradoxes, which show

    the disparity between intuition and mathematical formalization in this field; see Borovcnikand Peard (1996). Counterintuitive results in probability are found even at very elemen-tary levels (for example, the fact that having obtained a run of four consecutive headswhen tossing a coin does not affect the probability that the following coin will result inheads is counterintuitive). Difficulty at higher levels is also indicated by the fact that eventhough the Kolmogorov axioms were generally accepted in 1933, professional statisticiansstill debate about different views of probability and different methodologies of inference.

    Another difference is reversibility. In arithmetic or geometry, an elementary operation(like addition) can be reversed and this reversibility can be represented with concretematerials. This is very important for young children, who still are very linked to concretesituations in their mathematical thinking. For example, when joining a group of two appleswith another group of three apples, a child always obtain the same result (five apples).

    However, if separating the second set from the total, he/she always returns to the originalset, no matter how many times this operation is repeated. These experiences are veryimportant to help children progressively abstract the mathematical structure behind them.In the case of random experiment, we obtain different results each time the experiment iscarried out and the experiment cannot be reversed (we cannot get the first result againwhen repeating the experiment). This makes the learning of probability comparativelyharder for children.

    Of particular relevance for teaching probability are the informal ideas that children andadolescents assign to chance and probability before instruction and that can affect subse-quent their learning. As an example, Truran (1995) found substantial evidence that youngchildren do not see random generators such as dice or marbles in urns as having constantproperties and consider a random generator has a mind of its own or may be controlled

    by outside forces. There is also evidence that people at different ages maintain probabilitymisconceptions that are hard to eradicate with only a formal teaching of the topic; seeJones et al. (2007). Even though simulation or experimentation with random generators,such as dice and coins have a very important function in stabilizing childrens intuitionand in materializing probabilistic problems, these experiences do not provide the key tohow and why the problems are solved. It is only with the help of combinatorial schemesor tools like tree diagrams that children start to understand the solution of probabilisticproblems; see Fischbein (1975). This indicates the complementary nature of classical andfrequentist approaches to probability.

    Another reason for difficulty in the field of probability is that the meaning of someconcepts is sometimes too tied to applications. For instance, although independence ismathematically reduced to the multiplicative rule, this definition does not include all thecausality problems that subjects often relate to independence nor always serve to decideif there is independence in a particular experiment.

    2.1 Meanings of probability

    Different meanings also are linked to the concept of probability, which depends on theapplications of this concept in real situations and that are relevant in the teaching of thetopic, such as:

  • 7/31/2019 Journal Soche Vol 3.1

    12/117

  • 7/31/2019 Journal Soche Vol 3.1

    13/117

  • 7/31/2019 Journal Soche Vol 3.1

    14/117

    8 C. Batanero and C. Daz

    3.2 Teachers probabilistic knowledge

    Some of the activities that teachers regularly engage involve mathematical reasoning andthinking. Such activities consider figuring out what students know, choosing and manag-ing representations of mathematical ideas, selecting and modifying textbooks and decidingamong alternative courses of action; see Ball et al. (2001, p. 243). Consequently, teachersinstructional decisions in the teaching of probability are dependent on the teachers prob-abilistic knowledge. This is cause for concern when paired with evidence that mathematicsteachers, especially at the primary school level, tend to have a weak understanding ofprobability. For example, the study by Begg and Edwards (1999) found that only abouttwo-thirds of in-service and pre-service primary school teachers understood equally prob-able events and very few understood the concept of independence. Batanero et al. (2005)analysed results from an initial assessment based on sample of 132 pre-service teachersin Spain that showed they frequently have three probabilistic misconceptions: represen-tativeness, equiprobability and the outcome approach. In a research conducted by Borimand Coutinho (2008), the following results were obtained. First, secondary school teach-ers predominant reasoning about variation in a random variable was verbal, which didnot allow these teachers to teach their students the meaning of measures such as standard

    deviation, restricting them to the teaching of algorithms. Second, none of the teachersintegrated process reasoning, which would relate the understanding of mean, deviationsfrom the mean, the interval of k standard deviations from the mean and the density es-timation of frequency in that interval. Canada (2008) examined how pre-service teachersreasoned about distributions as they compared graphs of two data sets and found thatalmost 35% of the sample found no real difference when average was similar but spreadwas quite different.

    Few teachers have prior experience with conducting probability experiments or simula-tions and may have difficulty implementing an experimental approach to teaching proba-bility; see Stohl (2005). In a research conducted by Lee and Hollebrands (2008), althoughthe participant teachers engaged students in investigations based on probability experi-ments, they often missed opportunities for deepening students reasoning. Teachers ap-

    proaches to using empirical estimates of probability did not foster a frequentist conceptionof probability as a limit of a stabilized relative frequency after many trials. Teachers almostexclusively chose small samples sizes and rarely pooled class data or used representationssupportive of examining distributions and variability across collections of samples so theyfailed to address the heart of the issue.

    3.3 Teachers professional knowledge

    Wide probability knowledge, even when essential, is not enough for teachers to be ableto teach probability. As argued by Ponte and Chapman (2006), we should view teachersas professionals, and ground teacher education in professional practice, making all ele-ments of practice (preparing lessons, tasks and materials, carrying out lessons, observing

    and reflecting on lessons) a central element in the teacher education process. In fact, re-search focused on teachers training is producing a great deal of information about thisprofessional knowledge, which includes the following complementary aspects:

    Epistemological reflection on the meaning of concepts to be taught (e.g., reflectionon the different meaning of probability). For the particular case of the stochasticknowledge, Biehler (1990) also suggested that teachers need a historical, philosophical,cultural and epistemological perspective on this knowledge and its relationships toother domains of science.

  • 7/31/2019 Journal Soche Vol 3.1

    15/117

    Chilean Journal of Statistics 9

    Experience in adapting this knowledge to different teaching levels and students variouslevels of understanding. This includes, according to Steinbring (1990), organizing andimplementing teaching, experiencing students multiple forms of work and understand-ing experiments, simulations and graphical representations not just as methodologicalteaching aids, but rather as essential means of knowing and understanding.

    Critical capacity to analyse textbooks and curricular documents.

    Prediction of students learning difficulties, errors, obstacles and strategies in problemsolving (e.g., students strategies in comparing two probabilities and students confusionbetween the two terms in a conditional probability).

    Capacity to develop and analyse assessment tests and instruments and interpretstudents responses to the same.

    Experience with good examples of teaching situations, didactic tools and materials (e.g.,challenging and interesting problems, Galton board, simulation, calculators, etc.).

    Some significant issues related to the professional knowledge of teachers are whether teach-ers are able or not to (i) recognize what concepts can be addressed through a particularresource or task, and (ii) implement effective learning in the classroom with them. How-

    ever, the current preparation of teachers do not always assure this professional knowledge.For example, research conducted by Chick and Pierce (2008) found the teachers lack ofprofessional knowledge was evident in their approaches to the lesson-planning task, failingto bring significant concepts to the fore, despite all the opportunities that were inherentin the teaching tasks and resources.

    4. Possible Activities to Train Teachers in Probability

    It is important to find suitable and effective ways to teach this mathematical and profes-sional knowledge to teachers. Since students build their knowledge in an active way, bysolving problems and interacting with their classmates, we should use this same approachin training the teachers especially if we want that they use a constructivist and socialapproach in their teaching; see Jaworski (2001). An important view is that we should giveteachers more responsibility in their own training and help them to develop creative andcritical thinking. That is why we should create suitable conditions for teachers to reflecton their previous beliefs about teaching and discuss these ideas with other colleagues.

    One fundamental learning experience that teachers should have to develop their proba-bility thinking is working with experiments and investigations. To teach inquiry, teachersneed skills often absent in mathematics classrooms: such as ability to cope with ambiguityand uncertainty; re-balance between teacher guidance and student independence and deepunderstanding of disciplinary content. Some other approaches in the training of teachersinclude:

    Teachers collective analysis and discussion of the students responses,

    behavior, strategies, difficulties and misconceptions when solving proba-bility problems. Groth (2008) suggested that teachers of stochastics must deal withtwo layers of uncertainty in their daily work. The first layer relates to disciplinaryknowledge. Uncertainty is also ubiquitous in teaching because of the unique anddynamic interactions among teacher, students, and subject matter in any givenclassroom. Hence, teachers must understand and navigate the uncertainty inherentto both stochastics and the classroom simultaneously in order to function effectively.Case discussion among a group of teachers where they offer and debate conjecturesabout general pedagogy, mathematical content, and content-specific pedagogy can helpteachers challenge one anothers claims and interpretations; see Groth (2008).

  • 7/31/2019 Journal Soche Vol 3.1

    16/117

    10 C. Batanero and C. Daz

    Planning a lesson to teach students some content using a given instruc-

    tional device to develop probability and professional knowledge of

    teachers; see Chick and Pierce (2008). Since teachers are asked to teach probabilityfor understanding, it is essential that they experience the same process as their students.One way to do this is to have the students play the role of the learner and the teacher

    at the same time, going through an actual teacher as learner practice. If they had thechance to go through such a lesson as a learner and at the same time look at it fromthe point a view of a teacher, chances are that they will try it out in their own classrooms.

    Project work. New curriculum and methodology guidelines suggest that, whenteachers are involved in research projects, it can change how mathematics is experiencedin the classroom, especially in connection to stochastics. Inquiry is a well accepted(but not always implemented) process in other school subjects, like science and socialstudies, but it is rarely used in a mathematics classroom (where statistics is usuallytaught). Moreover, when time available for teaching is scarce a formative cycle whereteachers are first given a project to work with and then carry out a didactical analysisof the project can help to simultaneously increase the teachers mathematical and

    pedagogical knowledge and at the same time provides the teacher educator withinformation regarding the future teachers previous knowledge and learning; Godino etal. (2008).

    Working with technology. We can also capitalize on the ability of some softwareto be used as a tool-builder to gain conceptual understanding of probability ideas. Leeand Hollebrands (2008) introduced a design that used technology both as amplifiersand reorganisers to engage teachers in tasks that simultaneously developed their un-derstanding of probability with technology and provided teachers with experience firsthand about how technology tools can be useful in fostering stochastical thinking. In theexperiences of Batanero et al. (2005), simulation helped to train teachers simultaneouslyin probability content and its pedagogy, since it helps to improve the teachers prob-abilistic knowledge, while making them conscious of incorrect intuitions within theirstudents and themselves.

    5. Further Reflections

    Teachers need support and adequate training to succeed in achieving an adequate equi-librium of intuition and rigour when teaching probability. Unfortunately, due to timepressure, teachers do not always receive a good preparation to teach probability in theirinitial training. It is important to convince teacher educators that stochastics is an essen-tial ingredient in the training of teachers. Moreover, despite the acknowledged fact thatprobability is distinct and different from other areas of mathematics and the implied need

    to provide mathematics teachers with a special preparation to teach this topic, it is pos-sible to connect the stochastic and mathematical preparation of teachers when time fortraining teachers is scarce.

    Finally, much more research is still needed to clarify the essential components in thepreparation of teachers to teach probability, identify adequate methods, and establish ap-propriate levels at which each component should be taught. The significant research effortsfocusing that have focused on mathematics teacher education and professional developmentin the past decade (see Llinares and Krainer, 2006; Ponte and Chapman, 2006; Hill et al.,2007; Wood, 2008) have not been reflected in statistics education. This is an importantresearch area that can contribute to improve statistics education at school level.

  • 7/31/2019 Journal Soche Vol 3.1

    17/117

    Chilean Journal of Statistics 11

    Acknowledgement

    This work was supported by the project EDU2010-14947, MICIIN, Madrid & FEDER.

    References

    Ball, D.L., Lubienski, S.T., Mewborn, D.S., 2001. Research on teaching mathematics: theunsolved problem of teachers mathematical knowledge. In Richardson, V., (ed.). Hand-book of Research on Teaching. American Educational Research Association, Washing-ton, pp. 433456.

    Batanero, C., Canizares, M.J., Godino, J., 2005. Simulation as a tool to train pre-serviceschool teachers. Proceedings of the First African Regional Conference of ICMI. ICMI,Ciudad del Cabo.

    Batanero, C., Godino, J.D., Roa, R., 2004. Training teachers to teach probability. Journalof Statistics Education, 12. http://www.amstat.org/publications/jse .

    Batanero, C., Henry, M., Parzysz, B., 2005. The nature of chance and probability. In Jones,G.A., (ed.). Exploring Probability in School: Challenges for Teaching and Learning.

    Springer, New York, pp. 1537.Begg, A., Edwards, R., 1999. Teachers ideas about teaching statistics. Paper presented at

    the Annual Meeting of the Australian Association for Research in Education and theNew Zealand Association for Research in Education. Melbourne, Australia.

    Biehler, R., 1990. Changing conceptions of statistics: a problem area for teacher education.In Hawkins, A., (ed.). Training Teachers to Teach Statistics. Proceedings of the Interna-tional Statistical Institute Round Table Conference. International Statistical Institute,Voorburg, pp. 2038.

    Borim, C., Coutinho, C., 2008. Reasoning about variation of a univariate distribution:a study with secondary mathematics teachers. In Batanero, C., Burrill, G., Read-ing, C., Rossman, A., (eds.). Joint ICMI/IASE Study: Teaching Statistics in SchoolMathematics. Challenges for Teaching and Teacher Education. Proceedings of the

    ICMI Study 18 and 2008 IASE Round Table Conference. ICMI and IASE, Monterrey.http://www.ugr.es/~icmi/iase_study.

    Borovcnik, M., Peard, R., 1996. Probability. In Bishop, A., Clements, K., Keitel, C., Kil-patrick, J., Laborde, C., (eds.). International Handbook of Mathematics Education.Kluwer, Dordrecht, pp. 239288.

    Canada, D.L., 2008. Conceptions of distribution held by middle school students and pre-service teachers. In Batanero, C., Burrill, G., Reading, C., Rossman, A., (eds.). JointICMI/IASE Study: Teaching Statistics in School Mathematics. Challenges for Teachingand Teacher Education. Proceedings of the ICMI Study 18 and 2008 IASE Round TableConference. ICMI and IASE, Monterrey. http://www.ugr.es/~icmi/iase_study.

    Canizares, M.J., Ortiz, J., Batanero, C., Serrano, L., 2002. Probabilistic language in Span-ish textbooks. In Phillips, B., (ed.). ICOTS-6 Papers for School Teachers. InternationalAssociation for Statistical Education, Cape Town, pp. 207211.

    Chick, H.L., Pierce, R.U., 2008. Teaching statistics at the primary school level: beliefs,affordances, and pedagogical content knowledge. In Batanero, C., Burrill, G., Read-ing, C., Rossman, A., (eds.). Joint ICMI/IASE Study: Teaching Statistics in SchoolMathematics. Challenges for Teaching and Teacher Education. Proceedings of theICMI Study 18 and 2008 IASE Round Table Conference. ICMI and IASE, Monterrey.http://www.ugr.es/~icmi/iase_study.

  • 7/31/2019 Journal Soche Vol 3.1

    18/117

    12 C. Batanero and C. Daz

    Eichler, A., 2008. Germany, teachers classroom practice and students learning. InBatanero, C., Burrill, G., Reading, C., Rossman, A., (eds.). Joint ICMI/IASE Study:Teaching Statistics in School Mathematics. Challenges for Teaching and Teacher Educa-tion. Proceedings of the ICMI Study 18 and 2008 IASE Round Table Conference. ICMIand IASE, Monterrey. http://www.ugr.es/~icmi/iase_study.

    Fischbein, E., 1975. The Intuitive Source of Probability Thinking in Children. Reidel,

    Dordrecht.Franklin, C., Kader, G., Mewborn, D.S., Moreno, J., Peck, R., Perry, M., Scheaffer, R.,

    2005. A Curriculum Framework for K-12 Statistics Education. GAISE Report. AmericanStatistical Association. http://www.amstat.org/education/gaise.

    Franklin, C., Mewborn, D., 2006. The statistical education of PreK-12 teachers: a sharedresponsibility. In Burrill, G., (ed.). NCTM 2006 Yearbook: Thinking and Reasoning withData and Chance. NCTM, Reston, Virginia, pp. 335344.

    Gal, I., 2005. Towards probability literacy for all citizens: building blocks and instruc-tional dilemas. In Jones, G., (ed.). Exploring Probability in Schools: Challenges forTeaching and Learning. Springer, New York, pp. 3963.

    Garfield, J.B., Everson, M., 2009. Preparing teachers of statistics: a graduate course forfuture teachers. Journal of Statistics Education, 17. www.amstat.org/publications/

    jse.Godino, J.D., Batanero, C., Roa, R., Wilhelmi, M.R., 2008. Assessing and developing

    pedagogical content and statistical knowledge of primary school teachers through projectwork. In Batanero, C., Burrill, G., Reading, C., Rossman, A., (eds.). Joint ICMI/IASEStudy: Teaching Statistics in School Mathematics. Challenges for Teaching and TeacherEducation. Proceedings of the ICMI Study 18 and 2008 IASE Round Table Conference.ICMI and IASE, Monterrey. http://www.ugr.es/~icmi/iase_study.

    Groth, R.E., 2008. Navigating layers of uncertainty in teaching statistics through case dis-cussion. In Batanero, C., Burrill, G., Reading, C., Rossman, A., (eds.). Joint ICMI/IASEStudy: Teaching Statistics in School Mathematics. Challenges for Teaching and TeacherEducation. Proceedings of the ICMI Study 18 and 2008 IASE Round Table Conference.ICMI and IASE, Monterrey. http://www.ugr.es/~icmi/iase_study.

    Hacking, I., 1975. The Emergence of Probability. Cambridge University Press, Cambridge.Hill, H.C., Sleep, L., Lewis, J.M., Ball, D., 2007. Assessing teachers mathematical knowl-

    edge. In Lester, F., (ed.). Second Handbook of Research on Mathematics Teaching andLearning. Information Age Publishing, Inc. y NCTM, Greenwich, pp. 111155.

    Jaworski, B., 2001. Developing mathematics teaching: teachers, teacher educators andresearchers as co-learners. In Lin, L., Cooney, T.J., (eds.). Making Sense of MathematicsTeacher Education. Kluwer, Dordrecht, pp. 295320.

    Jones, G., 2005. Introduction. In Jones, G., (ed.). Exploring Probability in School: Chal-lenges for Teaching and Learning. Springer, New York, pp. 112.

    Jones, G., Langrall, C., Mooney, E., 2007. Research in probability: responding to classroomrealities. In Lester, F., (ed.). Second Handbook of Research on Mathematics Teachingand Learning. Information Age Publishing, Inc. y NCTM, Greenwich.

    Kvatinsky, T., Even, R., 2002. Framework for teacher knowledge and understanding ofprobability. In Phillips, B., (ed.). Proceedings of the Sixth International Conference onTeaching Statistics. [CD-ROM]. International Statistical Institute, Voorburg, Nether-lands.

    Laplace, P.S., 1985. Theorie Analytique des Probabilites [Analytical Theory of Probabili-ties]. Jacques Gabay, Paris. Original work published 1814.

  • 7/31/2019 Journal Soche Vol 3.1

    19/117

    Chilean Journal of Statistics 13

    Lee, H.S., Hollebrands, K., 2008. Preparing to teach data analysis and probability withtechnology. In Batanero, C., Burrill, G., Reading, C., Rossman, A., (eds.). JointICMI/IASE Study: Teaching Statistics in School Mathematics. Challenges for Teachingand Teacher Education. Proceedings of the ICMI Study 18 and 2008 IASE Round TableConference. ICMI and IASE, Monterrey. http://www.ugr.es/~icmi/iase_study.

    Llinares, S., Krainer, K., 2006. Mathematics student teachers and teacher educators as

    learners. In Gutierrez, A., Boero, P., (eds.). Handbook of Research on the Psychologyof Mathematics Education. Sense Publichers, Rotherdam/Taipei, pp. 429-459.

    Ma, L.P., 1999. Knowing and Teaching Elementary Mathematics. Lawrence Erlbaum, Mah-wah.

    MEC, 2006a. Real Decreto 1513/2006, de 7 de Diciembre, por el que se establecen lasensenanzas mnimas de la educacion primaria.

    MEC, 2006b. Real Decreto 1631/2006, de 29 de Diciembre, por el que se establecen lasensenanzas mnimas correspondientes a la educacion secundaria obligatoria.

    NCTM, 2000. Principles and standards for school mathematics. NCTM, Reston, Virginia.Retrieved August 31, 2006. http://standards.nctm.org.

    Parzysz, B., 2003. LEnseignement de la statistique et des probabilites en France: Evolutionau cours dune carriere denseignant periode 19652002. Teaching of statistics and proba-

    bility in France. Evolution along a teachers professional work period 1965-2002. In B.Chaput Coord, Probabilites au Lycee. Commission Inter-Irem Statistique et Proba-bilites, Paris.

    Ponte, J.P., Chapman, O., 2006. Mathematics teachers knowledge and practices. InGutierrez, A., Boero, P., (eds.). Handbook of Research on the Psychology of Mathemat-ics Education: Past, Present and Future. Sense Publishers, Roterdham, pp. 461494.

    Scheaffer, R.L., 2006. Statistics and mathematics: on making a happy marriage. In Bur-rill, G., (ed.). NCTM 2006 Yearbook: Thinking and Reasoning with Data and Chance.NCTM, Reston, Virginia, pp. 309321.

    Steinbring, H., 1990. The nature of stochastic knowledge and the traditional mathemat-ics curriculum - Some experience with in-service training and developing materials. InHawkins, A., (ed.). Training Teachers to Teach Statistics. Proceedings of the Interna-

    tional Statistical Institute Round Table Conference. International Statistical Institute,Voorburg, pp. 219.

    Stohl, H., 2005. Probability in teacher education and development. In Jones, G., (ed.).Exploring Probability in Schools: Challenges for Teaching and Learning. Springer, NewYork, pp. 345366.

    Truran, K.M., 1995. Animism: a view of probability behaviour. In Atweh, B., Flavel, S.,(eds.). Proceedings of the Eighteenth Annual Conference of the Mathematics EducationGroup of Australasia MERGA. MERGA, Darwin, Northern Territory, Australia, pp.537-541.

    Wood, T., 2008. The International Handbook of Mathematics Teacher Education. SensePublishers, Rotterdam.

  • 7/31/2019 Journal Soche Vol 3.1

    20/117

    14 Chilean Journal of Statistics

  • 7/31/2019 Journal Soche Vol 3.1

    21/117

    Chilean Journal of StatisticsVol. 3, No. 1, April 2012, 1529

    Bayesian Statistics

    Research Paper

    A Bayesian random effects model for survivalprobabilities after acute myocardial infarction

    Alessandra Guglielmi1, Francesca Ieva1,, Anna M. Paganoni1

    and Fabrizio Ruggeri2

    1Department of Mathematics, Politecnico di Milano, Milano, Italy2CNR IMATI, Milano, Italy

    (Received: 18 March 2011 Accepted in final form: 14 July 2011)

    Abstract

    Studies of variations in health care utilization and outcome involve the analysis of multi-level clustered data, considering in particular the estimation of a cluster-specific adjustedresponse, covariates effect and components of variance. Besides reporting on the extentof observed variations, those studies quantify the role of contributing factors includingpatients and providers characteristics. In addition, they may assess the relationshipbetween health care process and outcomes. In this article we present a case-study, con-sidering a Bayesian hierarchical generalized linear model, to analyze MOMI2 (MonthMonitoring Myocardial Infarction in Milan) data on patients admitted with ST-elevationmyocardial infarction diagnosis; both clinical registries and administrative databankswere used to predict survival probabilities. The major contributions of the paper consistin the comparison of the performance of the health care providers, as well as in theassessment of the role of patients and providers characteristics on survival outcome.

    In particular, we obtain posterior estimates of the regression parameters, as well as ofthe random effects parameters (the grouping factor is the hospital the patients wereadmitted to), through an MCMC algorithm. The choice of covariates is achieved in aBayesian fashion as a preliminary step. Some issues about model fitting are discussedthrough the use of predictive tail probabilities and Bayesian residuals.

    Keywords: Bayesian generalized linear mixed models Bayesian hierarchical models Health services research Logistic regression Multilevel data analysis.

    Mathematics Subject Classification: Primary 62F15 Secondary 62P10 62J12.

    1. Introduction

    Over recent years there has been a growing interest in the use of performance indicatorsin health care research, since they may measure some aspects of the health care process,clinical outcomes or disease incidence. Several examples, available in clinical literature; see,e.g., Hasday et al. (2002) and Saia et al. (2009), make use of clinical registries to evaluateperformances of medical institutions, helping the health governance to plan activities onreal epidemiological evidence and needs and to evaluate the performances of structuresthey manage, providing knowledge about the number of cases, incidence, prevalence and

    Corresponding author. Francesca Ieva. Dipartimento di Matematica Politecnico di Milano, Piazza Leonardo daVinci 32 I-20133, Milano, Italy. Email: [email protected]

    ISSN: 0718-7912 (print)/ISSN: 0718-7920 (online)

    c Chilean Statistical Society Sociedad Chilena de Estadsticahttp://www.soche.cl/chjs

  • 7/31/2019 Journal Soche Vol 3.1

    22/117

    16 A. Guglielmi, F. Ieva, A.M. Paganoni and F. Ruggeri

    survival concerning a specific disease. As a worthy contribution of this work, both clinicalregistry and administrative database were used to model in-hospital survival of acutemyocardial infarction patients, in order to point out benchmarks to be used in providerprofiling process.

    The disease we are interested in is the ST-segment elevation acute myocardial infarction(STEMI): it consists of a stenotic plaque detachment, which causes a coronary thrombosis

    and a sudden critical reduction of blood flow in coronary vessels. STEMI is characterizedby a great incidence (650 - 700 events per month have been estimated only in Lombardiaregion, whose inhabitants are approximately ten millions) and serious mortality (about 8%in Italy), and in fact it is one of the main causes of death all over the world. A case of STEMIcan be diagnosed through the electrocardiogram (ECG), observing the elevation of STsegment, and treated by thrombolytic therapy and/or percutaneous transluminal coronaryangioplasty (PTCA), which up to now are the most common procedures. The patients inour data set always undergo directly to a PTCA procedure avoiding the thrombolysis, evenif the two treatments are not mutually exclusive. Anyway, good results for any of the twotreatments can be evaluated by observing first the in-hospital survival of inpatients, andthen quantifying the reduction of ST segment elevation one hour after the intervention.Concerning heart attacks, both survival and quantity of myocardial tissues saved from

    damage strongly depend on time saved during the process; in this work, we focus on thesurvival outcome. Anyhow, time has indeed a fundamental role in the overall STEMI healthcare process. By Symptom Onset to Door time we mean the time since symptoms onset upto the arrival at Emergency Room (ER), and Door to Balloon time (DB time) is the timesince the arrival at ER up to the surgical practice of PTCA. Clinical literature stronglystresses the connection between in-hospital survival and procedures time, as attested, e.g.,in Cannon et al. (2000), Jneid et al. (2008) and MacNamara et al. (2006).

    The presence of differences in the outcomes of health care has been documented ex-tensively in recent years. In order to design regulatory interventions by institutions forinstance, it is interesting to study the effects of variations in health care utilization onpatients outcomes, in particular examining the relationship between process indicators,which define regional or hospital practice patterns, and outcomes measures, such as pa-

    tients survival or treatments efficacy. If the analysis of variations concerns in particularthe comparison of the performance of health care providers, it is commonly referred to asprovider profiling; see Normand et al. (1997) and Racz and Sedransk (2010). The results ofprofiling analyses often have far-reaching implications. They are used to generate feedbackfor health care providers, to design educational and regulatory interventions by institutionsand government agencies, to design marketing campaigns by hospitals and managed careorganizations, and, ultimately, to select health care providers by individuals and managedcare groups.

    The major aim of this work is to measure the magnitude of the variations of health careproviders and to assess the role of contributing factors, including patients and providerscharacteristics on survival outcome in STEMI patients. Data on health care utilizationhave a natural multilevel structure, usually with patients at the lower level and hospi-tals forming the upper-level clusters. Within this formulation, two main goals are takeninto account: one is to provide cluster-specific estimates of a particular response, adjustedfor patients characteristics, while the other one is to derive estimates of covariates effects,such as differences between patients of different gender or between hospitals. Hierarchicalregression modelling from a Bayesian perspective provides a framework that can accom-plish both these goals. In particular, this article considers a Bayesian generalized linearmixed model (see Zeger and Karim, 1991) to predict the binary survival outcome by meansof relevant covariates, taking into account overdispersion induced by the grouping factor.

  • 7/31/2019 Journal Soche Vol 3.1

    23/117

    Chilean Journal of Statistics 17

    We illustrate the analysis on a subset of data collected in the MOMI2 survey on patientsadmitted with STEMI diagnosis in one of the structures belonging to the Milano Cardio-logical Network, using a logit model for the survival probability. For this analysis, patientsare grouped by the hospital they have been admitted to for their infarction. Assuming aBayesian hierarchical approach for the hospital factors yields modelling dependence amongthe random effects parameters, but also using the data set to make inferences on hospitals

    which do not have patients in the study, borrowing strength across patients, as well asclustering the hospitals. A Markov chain Monte Carlo (MCMC) algorithm is necessary tocompute the posterior distributions of parameters and predictive distributions of outcomes,as well as to use other diagnostic tools, such as Bayesian residuals, for goodness-of-fit anal-ysis. The choice of covariates and link functions was suggested first in Ieva and Paganoni(2011), according to frequentist selection procedures and clinical know-how; however, itwas confirmed here using Bayesian tools. We found out that killip first, that is an indexof the severity of the infarction, and then age, have a sharp negative effect on the survivalprobability, while the Symptom Onset to Balloon time has a lighter influence on it. Aninteresting, novel finding is that the resulting variability among hospitals seems not toolarge, even if we underlined that four hospitals have a more extreme effect on the survival(one has a positive effect, while the remaining three have a negative effect) then the others.

    Such finding can be explained by the relative homogeneity among the hospitals, all locatedin Milano, the region capital. Larger heterogeneity is expected in future when extendingthe analysis to all the hospitals in the region. The advantages of a Bayesian approach tothis problem are more than one: providers profiling or patients classification are allowedto be guided not only by statistical but clinical knowledge also, hospitals with low expo-sure can be automatically included in the analysis, and providers profiling can be simplyachieved through the posterior distribution of the hospital-effects parameters.

    To the best of our knowledge, this study is the first example of the use of Bayesianmethods in provider profiling using data which arise from the linkage between Italianadministrative databanks and clinical registries. This paper shares the same frameworkof hierarchical generalized linear mixed models as in Daniels and Gatsonis (1999), whoexamined differences in the utilization of coronary artery bypass graft surgery for elderly

    heart attack patients treated in hospitals.The paper is organized as follows. Section 2 illustrates the data set about STEMI in

    Milano Cardiological Network, while Section 3 describes the main features of the pro-posed model, with a short discussion on covariates selection. Section 4 and 5 discuss priorelicitation and Bayesian inferences, respectively. Finally, Section 6 presents results of theinference on quantities of interest with a discussion. Some final remarks are reported inSection 7. All the analyses have been performed with WinBUGS; see Lunn et al. (2000)and also http://www.mrc-bsu.cam.ac.uk/bugs and R (2009) (version 2.10.1) programs.

    2. The STEMI Data Set

    A net connecting the territory to hospitals, by a centralized coordination of the emergency

    resources, has been activated in the Milano urban area since 2001. The aim of a moni-toring project on it is the activation of a registry on STEMI to collect process indicators(Symptom Onset to Door time, first ECG time, Door to Balloon time and so on), in orderto identify and develop new diagnostic, therapeutic and organizational strategies to beapplied to patients affected by STEMI by Lombardia region, hospitals and 118 organi-zation (the national toll-free number for medical emergencies). To reach this goal, it isnecessary to understand which organizational aspects can be considered as predictive oftime to treatment reduction. In fact, organizational policies in STEMI health care processconcern both 118 organization and hospitals, since a subject affected by an infarction canreach the hospital by himself or can be taken to the hospital by 118 rescue units.

  • 7/31/2019 Journal Soche Vol 3.1

    24/117

    18 A. Guglielmi, F. Ieva, A.M. Paganoni and F. Ruggeri

    So, in order to monitor the Milano Cardiological Network activity, times to treatmentand clinical outcomes, the data collection MOMI2 was planned and made on STEMI pa-tients, during six periods corresponding to monthly/bimonthly collections. For these units,information concerning mode of admission (on his/her own or by three different types of118 rescue units), demographic features (sex, age), clinical appearance (presenting symp-toms and Killip class at admittance), received therapy (thrombolysis, PTCA), Symptom

    Onset to Door time, in-hospital times (first ECG time, DB time), hospital organization(for example, admission during on/off hours) and clinical outcome (in-hospital survival)have been listed and studied. The Killip classification is a system used with acute myocar-dial infarction patients, in order to stratify them in four risk severity classes. Individualswith a low Killip class are less likely to die within the first 30 days after their myocardialinfarction than individuals with a high Killip class. The whole MOMI 2 survey consists of840 statistical units, but in this work we only focus on patients who underwent primaryPTCA and belonging to the third and fourth collections, since they are of better quality.Among the resulting PTCA-patients, we selected those who had their own hospital admis-sion registered also in the Public Health Database of Lombardia region, in order to confirmthe reliability of the information collected in the MOMI2 registry. Finally, the considereddata set consists of 240 patients.

    Previous frequentist analyses on MOMI2 survey (see Grieco et al., 2008; Ieva, 2008;Ieva and Paganoni, 2010) pointed out that age, total ischemic time (Symptom Onset toBalloon time, denoted by OB) in the logarithmic scale and killip of the patient, are themost significant factors in order to explain survival probability from a statistical andclinical point of view. Here killip is a binary variable, corresponding to 0 for less severe(Killip class equal to 1 or 2) and 1 for more severe (Killip class equal to 3 or 4) infarction.This choice of covariates was confirmed using Bayesian variable selection procedure; seethe next section for more details.

    The main goal of our study is to explain and predict, by means of a Bayesian randomeffects model, the in-hospital survival (i.e., the proportion of patients discharged alivefrom the hospital). The data set consists ofn = 240 patients who were admitted to J = 17hospitals after a STEMI event. The number of STEMI patients per hospital ranges from 1

    to 32, with a mean of 14.12. Each observation yi records if a patient survived after STEMI,i.e., yi = 1 if the ith patient survived, yi = 0 otherwise. In the rest of the paper, y denotesthe vector of all responses (y1, . . . , yn). The data set is strongly unbalanced, since 95%of the patients have been discharged alive. The observed hospital-survival rates rangesfrom 75% to 100%. These high values are explained because they are in-hospital survivalprobabilities, a follow-up data being not available yet. The data set contained some missingcovariates, with proportions of 7%, 24% and 2% for age, OB and killip respectively. Themissing data for age and OB were imputed as the empirical means (64 years for age,553 minutes for OB), while we sampled the missing 0-1 killip class covariates from theBernoulli distribution with probability of success estimated from the non-missing data.After having imputed all the covariates, the mean value of age and OB did not change,while the proportion of patients with less severe infarction (killip = 0) was 94%. Finally,we had no missing data concerning hospital of admission and outcome.

    3. A Bayesian Generalized Mixed-Effects Model

    We considered a generalized mixed-effects model for binary data from a Bayesian view-point. For a recent review on this topic, see Chapters 13 in Dey et al. (2000). For eachpatient (i = 1, . . . , n), let Yi be a Bernoulli random variable with mean pi, which representsthe probability that the ith patient survived after STEMI. The pis are modelled througha logit regression with covariates x := {xi}, xi := (1, xi1, xi2, xi3) which represent the age,

  • 7/31/2019 Journal Soche Vol 3.1

    25/117

    Chilean Journal of Statistics 19

    the Symptom Onset to Balloon time in the log scale (log-OB) and the killip, respectively,of the ith patient in the data set. Moreover, age and log-OB have been centered. Since thepatients come from J different hospitals, we assume the following multilevel model, withthe hospital as a random effect:

    Yi|piind Be(pi), i = 1, . . . , n , (1)

    and

    logit(pi) = log

    pi

    1 pi

    = 0 + 1xi1 + 2xi2 + 3xi3 + bk[i], (2)

    where bk[i] represents the hospital effect of the ith patient in hospital k[i]. We denote by the vector of regression parameters (0, 1, 2, 3). It is well-known that Equations (1)and (2) have a latent variable representation (see Albert and Chib, 1993), which can bevery useful in performing Bayesian inference, as well as in providing medical significance:conditioning on the latent variables Z1, . . . , Z n, the Y1, . . . , Y n are independent, and, fori = 1, . . . , n,

    Yi = 1, if Zi 0;0, if Zi < 0;

    (3)

    where

    Zi = xi + bk[i] + i, i

    i.i.d. f, (4)

    being f(t) = et(1 + et)2 the standard logistic density function. The same class of

    models, however, without considering random effects, was applied in Souza and Migon(2004) to a similar data set of patients after acute myocardial infarction.

    As mentioned in Section 2, the choice of covariates was first suggested in Ieva andPaganoni (2011), using frequentist model choice tools. However, we have considered it alsoin a Bayesian framework, using the Gibbs variable selection method by Dellaportas et al.

    (2002). But first, as a default analysis, we considered covariates selection via the R packageBMA; see Raftery et al. (2009). A subgroup of 197 patients with 11 non-missing covariateswas processed by the function bic.glm, and 7 covariates were selected (age, OB time, killip,sex, admission during on/off hours, ECG time, number of previous hospitalizations). Forthis choice of covariates, the non-missing data extracted from the 240-patients data setconsists of 217 units, which were again analyzed via bic.glm . The posterior probabilitythat each variable is non-zero was very high (about 40%) for age and killip, while theywere smaller than 7% for the others. Moreover, the smallest BICs denoting the bestmodels resulted for those including age, killip and sex. Since sex is strongly correlatedwith age in our data set (only elderly women are in), at the end, we agreed with the choiceof covariates in Ieva and Paganoni (2011), considering only age and killip, while the OBtime was strongly recommended by clinical and health care utilization know-how, since itwas the main process indicator of the MOMI2 clinical survey.

    As a second analysis, we consider only covariates which have non-missing values for allpatients (age, OB time, killip, sex, admission during on/off hours, number of previoushospitalizations), to be analyzed using the Gibbs variable selection method. The linearpredictor assumed in the right hand-side of Equation (2) to select covariates can be rep-resented as

    i = 0 +6

    j=1

    jjxij, i = 1, . . . , n , (5)

  • 7/31/2019 Journal Soche Vol 3.1

    26/117

    20 A. Guglielmi, F. Ieva, A.M. Paganoni and F. Ruggeri

    where (1, . . . , 6) is a vector of parameters in {0, 1}. Of course, a prior for both theregression parameter and the model index parameter must be elicited, so that themarginal posterior probability of suggest which variables must be included in the model.We assumed different noninformative priors for the logit model with the linear predictorgiven in Equation (5), as suggested in Ntzoufras (2002), implementing a simple BUGScode to compute the marginal posterior distributions for each j, for j = 1, . . . , 6, and the

    posterior inclusion probabilities. However the analysis confirmed the previously selectedmodel.

    The selection of such a few number of covariates (with respect to 13, the total number)is not surprising since previous analyses; see Ieva (2008) and Ieva and Paganoni (2010)pointed out that the covariates are highly correlated. For instance, there is dependencybetween age on one hand and sex, or symptoms, or mode of admission, on the other,between symptoms and killip, or symptoms and mode of admission, and between sexand symptoms. These relationships can be explained because acute coronary syndromes,as STEMI, affect mainly male patients instead of females, and are more frequent as thepatient age increases. Moreover, it is well-known that the STEMI symptoms depend on theseverity of the infarction itself, and elderly patients have usually more atypical symptoms.Furthermore, the symptoms may influence the choice of the type of ambulance sent to

    rescue the patient; ambulances which allow the ECG teletransmission are usually sent topatients presenting more typical infarction symptoms, in order to allow them to skip thewaiting time due to ER procedures, and to reduce accordingly the door to balloon time.

    4. The Prior Distribution

    As mentioned in the previous sections, one of the aim of this paper is to make a compar-ison among the patients survival probabilities treated in different hospitals of the MilanoCardiological Network. Such an aim can be accomplished if, for instance, we assume thehospital each patient was admitted to as a random factor. We make the usual (from aBayesian viewpoint) random effects assumption for the hospitals, that is, the hospital ef-

    fect parameters bjs are drawn from a common distribution; moreover, since no informationis available at the moment to distinguish among the hospitals, we assume symmetry amongthe hospital parameters themselves, i.e., b1, . . . , bJ can be considered as (the first part ofan infinite sequence of) exchangeable random variables. Via Bayesian hierarchical models,not only we model dependence among the random effects parameters b := (b1, . . . , bJ), butit be also possible to use the data set to make inferences on hospitals which have few orno patients in the study, borrowing strength across hospitals. As usual in the hierarchicalBayesian approach, the regression parameter and the hospital parameter b are assumeda priori independent, is given a (multivariate) Gaussian distribution and b is given ascale-mixture of (multivariate) Gaussian distributions; more specifically:

    b, MN(, V),

    b1, . . . , bJ|i.i.d. N(b,

    2), and U(0, 0).(6)

    Observe that the prior assumption on b is that, conditionally on the parameter , eachhospital effect parameter has a Gaussian distribution with variance 2; here the uniformprior on is set as an assumption of ignorance/symmetry on the standard deviation ofeach hospital effect. The Gaussian prior for is standard, but its hyperparameters, aswell as the hyperparameter of the prior distribution for , it is given informatively, usingavailable information from other MOMI2 collections; for more details, see Section 6.2. Onthe other hand, a more standard prior for bj would be a scale-mixture of normals, mixed

  • 7/31/2019 Journal Soche Vol 3.1

    27/117

    Chilean Journal of Statistics 21

    by an inverse-gamma distribution for 2, with parameter (, ) for small . However, thisprior has been often criticized (see Gelman, 2006), mainly because the inferences do notresult robust with respect to the choice of , and the prior density (for all small ), as wellas the resulting posterior, are too peculiar. In what follows, the parameter vector (, b, )is denoted by .

    5. Bayesian Inference

    Based on given priors and likelihood, the posterior distribution of is expressed by

    (|y, x) ()L(y|, z, x)f(z)

    = ()(b|)()n

    i=1

    (I(0,+)(zi))yi(I(,0](zi))

    1yi

    ni=1

    f(zi xi bk[i]).

    (7)

    We are interested in predictions too. This implies (i) considering the posterior predictivesurvival probability of a new patient coming from an hospital already included in thestudy, or (ii) the posterior predictive survival probability of a new patient coming from anew (J + 1)th hospital. We have

    P(Yn+1 = 1|y, x, bj) =

    R4

    P(Yn+1 = 1|, bj, x)(|bj, y)d, j = 1, . . . , J , (8)

    for a new patient with covariate vector x coming from the jth hospital in the study, and

    P(Yn+1 = 1|y, x, bJ+1) =R4 P(Yn+1 = 1|, bJ+1, x)(|bJ+1, y) d, (9)

    where (|bJ+1, y) is computed from

    (, bJ+1|y) =

    R+

    (bJ+1|)(, |y)d ,

    being (bJ+1|) the prior population conditional distribution given in Equation (6).As far as model checking is concerned, we consider predictive distributions for patients

    already enrolled in the study in the spirit of replicated data in Gelman et al. (2004). Morespecifically, we compute

    P(Ynewi = 1|y, xi, bk[i]), for all i = 1, . . . , n . (10)

    Here, Ynewi denotes the ith replicated data that could have been observed, or, to thinkpredictively, as the data that we would see tomorrow if the experiment that producedyi today were replicated with the same model and the same value of parameters thatproduced the observed data; see Gelman et al. (2004, Section 6.3). Since we have a veryunbalanced data set, the following Bayesian rule is adopted: a patient is classified as aliveif P(Ynewi = 1|y, xi, bk[i]) = E[Y

    newi |y, xi, bk[i]] is greater than the empirical mean yn. This

  • 7/31/2019 Journal Soche Vol 3.1

    28/117

    22 A. Guglielmi, F. Ieva, A.M. Paganoni and F. Ruggeri

    rule is equivalent to minimize the expected value of the following loss function

    L(P(Yi = 1|y, xi, bk[i]), a1) = Max{0, yn P(Yi = 1|y, xi, bk[i])},

    L(P(Yi = 1|y, xi, bk[i]), a0) = Max{0, P(Yi = 1|y, xi, bk[i]) yn},

    where the action a1 is to classify the patient as alive and the action a0 corresponds to

    classify the patient as dead. Then the coherence between the Bayesian rule and the dataset is checked.

    Finally we computed the latent Bayesian residuals for binary data as suggested in Albertand Chib (1995). Thanks to the latent variable representation in Equations (3) and (4) ofthe model, we can consider the realized errors

    ei = Zi (x

    i + bk[i]), i = 1, . . . , n , (11)

    obtained solving Equation (4) w.r.t. i. Each ei is a function of the unknown parameters,so that its posterior distribution can be computed through the MCMC simulated values,and later examined for indications of possible departures from the assumed model and thepresence of outliers; see also Chaloner and Brant (1998). Therefore, it is sensible to plotcredibility intervals for the marginal posterior of each ei, comparing them to the marginalprior credibility intervals (of the same level).

    6. Data Analysis

    In this section we illustrate the Bayesian analysis of the data set described in Section 2,giving some details on computations and prior elicitation.

    6.1 Bayesian computations

    As we mentioned in Section 1, all estimates were derived using WinBUGS. The compu-tation of the full conditionals to directly implement a Gibbs sampler algorithm can becomputed starting from Equation (7); however they are not standard distributions, i.e.,closed form expressions do not exist for all of them, given the priors in Equation (6). Somedetails on the full conditionals for general design GLMMs required by WinBUGS are inZhao et al. (2006).

    The first 100,000 iterations of the chain were discarded, retaining parameter values each80 iterations to decrease autocorrelations, with a final sample size equal to 5,000; werun the chains much longer (for a final sample size of 10,000 iterations), but the gainin the MC errors was relatively small. Some convergence diagnostics (Gewekes and thetwo Heidelberger-Welch ones) were checked; see, e.g., the reference manual of the CODApackage (Plummer et al., 2006) for more details. Moreover, we monitored traceplots, au-

    tocorrelations and MC error/posterior standard deviation ratios for all the parameters,indicating the MCMC algorithm converged. Code is available from the authors upon re-quest.

    6.2 Informative prior hyperparameters

    Concerning information about hyperprior parameters, we fixed b = 0 regardless of any in-formation, since, by the exchangeability assumption, the different hospitals have the sameprior mean (fixed equal to 0 to avoid confounding with 0). As far as is concerned, wehave enough past data to be relatively informative in eliciting prior hyperparameters; they

  • 7/31/2019 Journal Soche Vol 3.1

    29/117

    Chilean Journal of Statistics 23

    were fixed after having fitted model given in Equations (1) and (2), under non-informativepriors for , to similar data, i.e., 359 patients undergone primary PTCA whose data werecollected during the other four MOMI2 collections. Therefore, for the present analysis, wefixed = (3, 0, 0.1,0.7)

    , which are the posterior means of the regression parametersunder the preliminary analysis. The matrix V was assumed diagonal, V = diag(2, 0.04,0.5882, 3.3333), which, except for the second value, are about 10 times the posterior vari-

    ances of the regression parameters under the preliminary analysis (0.04 is 100 times theposterior variance, in order to consider a vaguer prior for 1). The prior hyperparameter0 was fixed equal to 10, a value compatible with the support of the posterior distribu-tion for in the preliminary analysis. Posterior estimates of , b and proved to berobust with respect to and V, even when we fixed a non-diagonal matrix for V, as-suming prior dependence through the regression parameters (the non-diagonal V elicitedvia the preliminary analysis as well). As far as the variances of the s parameters areconcerned, the robustness analysis pointed out that assuming smaller values than thosereported here yielded a too informative prior, that is the data did not swamp the prior;on the other hand, larger variances produced typical computational difficulties of a toovague prior. This choice of the variances values represents an optimal trade-off betweenthese two behaviors.

    6.3 Results

    Summary inferences about regression parameters and can be found in Table 1, while themarginal posterior distributions are depicted in Figures 1 and 2.

    Table 1. Posterior means, standard deviations, and 95% credibility intervals of the regression pa-rameters and .

    Informative prior Credibility intervalsmean sd lwr upr

    intercept 0 3.8160 0.5704 2.8310 5.1100age 1 -0.0792 0.0324 -0.1464 -0.0183log(OB) 2 -0.1527 0.3326 -0.7902 0.5154killip 3 -1.5090 0.8159 -3.0470 0.1340random effect std. dev. 1.1770 0.7417 0.0766 2.8960

    From Table 1 and Figure 1 it is clear that the marginal posteriors of 1 and 3 areconcentrated on the negative numbers, confirming the nave interpretation that an increasein age or killip class decreases the survival probability. The negative effect of the log(OB) isquestionable, given its high variability, even if the posterior median of 2 is 0.16. Anyway,it was indeed included because of its clinical relevance; moreover, it is the main process

    indicator in health care monitoring of STEMI procedures. Observe that the posteriormean of 0 + bj, which is the logit of the survival probability for a patient with averagecovariates from any hospital, is between 2.90 and 4.78, yielding a high posterior estimatesof the survival probability from any hospital, as expected.

    By inspecting Figure 2 a shrinkage of the posterior density of with respect to theuniform prior can be observed; this fact supports the conjecture of a low variability withinmedical institutions, which can be partly explained by the relative homogeneity amongthe hospitals, all located in Milano. As far as the marginal posterior distribution of therandom effect parameters are concerned, Figure 3 displays the posterior median and mean(with 95% credibility intervals) of each hospital parameter bj, for j = 1, . . . , J .

  • 7/31/2019 Journal Soche Vol 3.1

    30/117

    24 A. Guglielmi, F. Ieva, A.M. Paganoni and F. Ruggeri

    2 3 4 5 6 7

    0.

    0

    0.2

    0.

    4

    0.

    6

    0

    density

    0 .20 0 .15 0. 10 0.0 5 0.00 0 .05

    0

    2

    4

    6

    8

    10

    12

    1

    density

    1.5 1.0 0.5 0.0 0.5 1.0

    0.

    0

    0.

    2

    0

    .4

    0.

    6

    0.

    8

    1.

    0

    1.

    2

    2

    density

    4 2 0 2

    0.

    0

    0.

    1

    0.

    2

    0.

    3

    0.

    4

    0.

    5

    3

    density

    Figure 1. Marginal posterior density of the regression coefficients.

    0 2 4 6 8 10

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.

    8

    1.

    0

    1.

    2

    1.

    4

    density

    Figure 2. Marginal posterior density of .

    In Table 2, we report

    pj = min{P(bj > 0|y), P(bj < 0|y)}, j = 1, . . . , J ,

    together with the signum of the posterior median of the bjs. Low values of pj denote theposterior distribution ofbj is far from 0, so that the jth hospital significantly contributes tothe (estimated) regression intercept 0+bj. In Figure 3, the credible intervals correspondingto pj less than 0.18 are depicted in yellow; it is clear that hospital 9 has a positive effect,while hospital 10, 11 and 15 have a negative effect on the survival probability.

  • 7/31/2019 Journal Soche Vol 3.1

    31/117

    Chilean Journal of Statistics 25

    0 5 10 15

    2

    0

    2

    4

    hospitals

    bj

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    _

    Figure 3. Posterior median (bullet), mean (square) and 95% credibility intervals of all random effectparameters bj . The credible intervals for hospitals such that min(P(bj > 0|y), P(bj < 0|y)) < 0.18are dashed.

    Table 2. Values of pj and the signum of the posterior median of each hospital parameters.

    b1 b2 b3 b4 b5 b6 b7 b8 b90.27 0.40 0.32 0.25 0.44 0.41 0.49 0.49 0.18

    + + + + - + + + +b10 b11 b12 b13 b14 b15 b16 b17

    0.17 0.12 0.28 0.28 0.44 0.17 0.26 0.29- + - - + - - +

    Observe that all the credible intervals of the random effect parameters in Figure 3include 0, so that we might wonder if the random intercept should be discarded from themodel. However, Mauri (2011) presents a Bayesian selection analysis of the same data set

    considered here, concluding that the posterior inclusion probability of the random effect issignificantly larger than 0 (between 0.2 and 0.6 under different reasonable priors). Similarfindings were drawn in Ieva and Paganoni (2010) from a frequentist perspective.

    Figure 4 displays medians and 95% credibility intervals for the posterior predictive sur-vival probabilities give in Equation (8) of four benchmark patients:

    (a) x1 = 0, x2 = 0, x3 = 0, i.e., a patient with average age (64 years), average OB (553min.) and less severe infarction (Killip class 1 or 2);

    (b) x1 = 0, x2 = 0, x3 = 1, i.e., a patient with same age and OB as (a), but with severeinfarction (Killip class 3 or 4);

    (c) x1 = 16, x2 = 0, x3 = 0, i.e., an elder patient (80 years), with average OB (553 min.)and less severe infarction;

    (d) x1 = 16, x2 = 0, x3 = 1, i.e., an elder patient with average OB and severe infarction,coming from an hospital already in the study. The last credibility interval (in red in eachpanel) corresponds to the posterior predictive survival probability give in Equation (9) ofa benchmark patient coming from a new random (J + 1)th hospital. Moreover, from thefigure it is clear that killip has a stronger (on average) influence on survival than age since,moving from left to right panels (same age, killip increased) the credibility intervals getmuch wider than moving from the top to the bottom panels (same killip, age increased).

    Finally, as far as predictive model checking is concerned, we computed the predictiveprobabilities in Equation (10); the classification rule described in Section 5 gives an errorrate equal to 27% (64 patients were erroneously classified as dead and only 1 patient was

  • 7/31/2019 Journal Soche Vol 3.1

    32/117

  • 7/31/2019 Journal Soche Vol 3.1

    33/117

    Chilean Journal of Statistics 27

    10 5 0 5

    0.

    0

    0.

    1

    0.

    2

    0.

    3

    residuals

    density

    Figure 5. Left panel: posterior distributions of the latent Bayesian residuals. The dashed and solidlines correspond to observations yi = 0 (dead) and yi = 1 (alive), respectively. The solid grayline is the marginal prior distribution (logistic). Right panel: posterior distributions of the latentBayesian residuals against the expected posterior survival probabilities.

    7. Conclusions

    In this work we have considered a Bayesian hierarchical generalized linear model with ran-dom effects for the analysis of clinical and administrative data with a multilevel structure.These data arise from MOMI2 clinical registry, based on a survey on patients admitted withST-elevation myocardial infarction diagnosis, integrated with administrative databanks.The analysis carried out on them could provide a decisional support to the cardiovascular

    health care governance. We adopted a Bayesian point of view to tackle the problem ofmodelling survival outcomes by means of relevant covariates, taking into account overdis-persion induced by the grouping factor, i.e., the hospital where each patient has beenadmitted to. To the best of our knowledge, this study is the first example of a Bayesiananalysis of data arising from the linkage between Italian administrative databanks andclinical registries. The main aim of this paper was to study the effects of variations inhealth care utilization on patient outcomes, since the adopted model points out relation-ships between process and outcome measures. We also provided cluster-specific estimatesof survival probabilities, adjusted for patients characteristics, and derived estimates ofcovariates effects, using MCMC simulation of posterior distributions of the parameters;moreover we discussed model selection and goodness of fit. We found out that Killip first,and age, have a sharp negative effect on the survival probability, while the OB (onset to

    balloon) time has a lighter influence on it. The resulting variability among hospitals seemsnot too large, even if we underlined that 4 hospitals have a more extreme effect on thesurvival: in particular hospital 9 had a positive effect, while hospitals 10, 11 and 15 had anegative effect. As far as negative features of the MCMC outputs are concerned, we foundthat the marginal posterior distributions of (0, bj), for each j, are concentrated on linesof the whole parameter space, due to the confounding between the intercept parame-ter and the random effects parameters. However the mixing and the convergence of thechain, under a suitable thinning, were completely satisfactory. Finally, as a further step inthe analysis, we are considering Bayesian nonparametrics to model the hospital effects, inorder to take advantage of the in-built clustering they provide.

  • 7/31/2019 Journal Soche Vol 3.1

    34/117

    28 A. Guglielmi, F. Ieva, A.M. Paganoni and F. Ruggeri

    References

    Albert, J.H., Chib, S., 1993. Bayesian analysis of binary and polychotomous response data.Journal of the American Statistical Association, 88, 669679.

    Albert, J.H., Chib S., 1995. Bayesian residual analysis for binary response regression mod-els. Biometrika, 82, 747759.

    Cannon, C.P, Gibson, C.M., Lambrew, C.T., Shoultz, D.A., Levy, D., French, W.J., Gore,J.M., Weaver, W.D., Rogers, W.J., Tiefenbrunn, A.J., 2000. Relationship of symptom-onset-to-balloon time and door-to-balloon time with mortality in patients undergoingangioplasty for acute myocardial infarction. Journal of American Medical Association,283, 29412947.

    Chaloner, K., Brant, R., 1988. A Bayesian approach to outlier detection and residualanalysis. Biometrika, 31, 651659.

    Daniels, M.J., Gatsonis, C., 1999. Hierarchical generalized linear models in the analysis ofvariation in health care utilization. Journal of the American Statistical Association, 94,2942.

    Dellaportas, P., Forster, J.J., Ntzoufras, I., 2002. On Bayesian model and variable selectionusing MCMC. Statistics and Computing, 12, 2736.

    Dey, D.K., Ghosh, S.K., Mallick, B.M., (eds.) 2000. Generalized Linear Models: A BayesianPerspective. Chapman & Hall/CRC, Biostatistics Series, New York.

    Gelman, A., 2006. Prior distributions for variance parameters in hierarchical models (Com-ment on article by Browne and Draper). Bayesian Analysis, 3, 515534.

    Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2004. Bayesian Data Analysis. Secondedition. Chapman & Hall/CRC, Boca Raton, Florida.

    Grieco, N., Corrada, E., Sesana, G., Fontana, G., Lombardi, F., Ieva, F., Marzegalli, M.,Paganoni, A.M., 2008. Predictors of reduction of treatment time for ST-segment el-evation myocardial infarction in a complex urban reality. The MoMi2 survey. MOXReport n. 10/2008. Dipartimento di Matematica, Politecnico di Milano. Available athttp://mox.polimi.it/it/progetti/pubblicazioni/quaderni/10-2008.pdf .

    Hasday, D., Behar, S., Wallentin, L., Danchin, N., Gitt, A.K., Boersma, E., Fioretti, P.M.,

    Simoons, M.L., Battler, A., 2002. A prospective survey of the characteristics, treatmentsand outcomes of patients with acute coronary syndromes in Europe and the mediter-ranean basin. The euro heart survey of acute coronary syndromes. European HeartJournal, 23, 11901210.

    Ieva, F., 2008. Modelli statistici per lo studio dei tempi di intervento nellinfarto miocardicoacuto. Master Thesis. Dipartimento di Matematica, Politecnico di Milano. Available athttp://mox.polimi.it/it/progetti/pubblicazioni/tesi/ieva.pdf .

    Ieva, F., Paganoni, A.M., 2010. Multilevel models for clinical registers concerning STEMIpatients in a complex urban reality: a statistical analysis of MOMI2 survey. Communi-cations in Applied and Industrial Mathematics, 1, 128147.

    Ieva, F., Paganoni, A.M., 2011. Process indicators for assessing quality of hospitals care:a case study on STEMI patients. JP Journal of Biostatistics, 6, 5375.

    Jneid, H., Fonarow, G., Cannon, C., Palacios, I., Kilic, T., Moukarbel, G.V., Maree, A.O.,Liang, L., Newby, L.K., Fletcher, G., Wexler, L., Peterson, E., 2008. Impact of time ofpresentation on the care and outcomes of acute myocardial infarction. Circulation, 117,25022509.

    Lunn, D.J., Thomas, A., Best, N., Spiegelhalter, D., 2000. WinBUGS - a Bayesian mod-elling framework: concepts, structure, and extensibility. Statistics and Computing, 10,325337.

  • 7/31/2019 Journal Soche Vol 3.1

    35/117

    Chilean Journal of Statistics 29

    MacNamara, R.L., Wang, Y., Herrin, J., Curtis, J.P., Bradley, E.H., Magid, D.J., Peterson,E.D., Blaney, M., Frederick, P.D., Krumholz, H.M., 2006. Effect of door-to-balloon timeon mortality in patients with ST-segment elevation myocardial infarction. Journal ofAmerican College of Cardiology, 47, 21802186.

    Mauri, F., 2011. Bayesian variable selection for logit models with random intercept: appli-cation to STEMI data set. Master Thesis. Dipartimento di Matematica, Politecnico di

    Milano.Normand, S.T., Glickman, M.E., Gatsonis, C.A., 1997. Statistical methods for profiling

    providers of medical care: issues and applications. Journal of the American StatisticalAssociation, 92, 803814.

    Ntzoufras, I., 2002. Gibbs variable selection using BUGS. Journal of Statistical Software,7. Available at http://www.jstatsoft.org/.

    Plummer, M., Best, N., Cowles, K., Vines, K., 2006. CODA: Convergence diagnosis andoutput analysis for MCMC. R News, 6, 711.

    R Development Core Team, 2009. R: A Language and Environment for Statistical Com-puting. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org.

    Racz, J., Sedransk, J., 2010. Bayesian and frequentist methods for provider profiling us-

    ing risk-adjusted assessments of medical outcomes. Journal of the American StatisticalAssociation, 105, 4858.

    Raftery, A., Hoeting, J., Volinsky, C., Painter, I., Yeung, K.Y., 2009. BMA: Bayesian modelaveraging. Available at http://CRAN.R-project.org/package=BMA .

    Saia, F., Piovaccari, G., Manari, G., Guastaroba, P., Vignali, L., Varani, E., Santarelli, A.,Benassi, A., Liso, A., Campo, G., Tondi, S., Tarantino, F., De Palma, R., Marzocchi,A., 2009. Patient selection to enhance the long-term benefit of first generation drug-eluting stents for coronary revascularization procedures: insights from a large multicenterregistry. Eurointervention, 5, 5766.

    Souza, A.D.P., Migon, H.S., 2004. Bayesian binary regression model: an application toin-hospital death after AMI prediction. Pesquisa Operacional, 24, 253267.

    Zeger, S.L., Karim, M.R., 1991. Generalized linear models with random effects: a Gibbs

    sampling approach. Journal of the American Statistical Association, 86, 7986.Zhao, Y., Staudenmayer, J., Coull, B.A., Wand, M.P., 2006. General design Bayesian

    generalized linear mixed models. Statistical Science, 21, 3551.

  • 7/31/2019 Journal Soche Vol 3.1

    36/117

    30 Chilean Journal of Statistics

  • 7/31/2019 Journal Soche Vol 3.1

    37/117

  • 7/31/2019 Journal Soche Vol 3.1

    38/117

    32 C. Chesneau and N. Hosseinioun

    and there exists a sequence of real positive numbers (vi)iZ (which can depend on n) suchthat

    infxX1()

    wi(x) vi. (3)

    The goal is to estimate f globally when only n random variables X1, . . . , X n of (Xi)iZare observed. Such an estimation problem has been recently investigated by Aubin andLeoni-Aubin (2008a,b). It can be viewed as a generalization of the standard biased densitymodel; see e.g., Patil and Rao (1977), El Barmi and Simonoff (2000), Brunel et al. (2009)and Ramirez and Vidakovic (2010).

    In this article, we investigate the estimation of f via the powerful tool of the waveletanalysis. Wavelets are attractive for nonparametric density estimation because of theirspatial adaptivity, computational efficiency and asymptotic optimality properties. Theyenjoy excellent mean integrated squared error (MISE) properties and can achieve fast ratesof convergence over a wide range of function classes (including spatially inhomogeneousfunction). Details on wavelet analysis in nonparametric function estimation can be foundin Antoniadis (1997) and Hardle et al. (1998).

    In the first part of this study, we develop a new linear wavelet estimator. We determine asharp upper bound for the associated MISE for independent (Xi)iZ. Then, we extend thisresult for possible dependent (Xi)iZ following the -mixing case. In particular, we provethe upper bound obtained in the independent case is not deteriorated by our dependencecondition as soon as the -mixing coefficients (m)mN of (Xi)iZ (defined in Section 3)satisfy

    nm=1 m C, where C > 0 denotes a constant independent of n. The second

    part of the study is devoted to the adap