discussant: sylvia richardson
TRANSCRIPT
International Statistical Review (2005),73, 2, 263–264, Printed in Wales by Cambrian Printersc© International Statistical Institute
Discussant: Sylvia Richardson
Centre for Biostatistics, Imperial College, EPH Department, London, UKE-mail: [email protected]
A brief discussion of some of the challenges facing Biostatistics will be given.
1 Introduction
Probability and Statistics have transformed our ideas of Nature and Society. Since the beginningof the 17th century, their development both shaped and was shaped by interaction with the differentcontexts of application, early on with gambling and demography, more recently with physics, agro-nomics, economics, biology and the medical sciences. One of the greatest statisticians of the 20thCentury, Sir Ronald Fisher, called statistics: “the peculiar aspect of human progress” that has given“to the 20th century, its special character”. In this vast domain, Biostatistics focuses on the applicationof statistical techniques to scientific research in health-related fields, including medicine, biology,and public health and especially on the development of new tools to study these areas. Biostatistics,which is a relatively new discipline, is now faced with exciting challenges arising particularly as aconsequence of rapid technological advances in biological and medical sciences.
2 Challenges
The challenge for Biostatistics today is to deal with complexity at many levels. At the deepestlevel, there is an underlying unobserved complex biological phenomenon on which different typesof interactions (genetic, environmental) and possibly therapeutic treatments act. In contrast, diseasesare typically characterised only at a more superficial level by an observational process, possiblyimperfect. This intricate superposition leads to complex biomedical data. Scientific questions arealso posed at many levels: at the micro level of genomic data where the aim is understanding thefunction of genes; at the cellular level where medical imaging helps to detect patterns in tissuesand cells; at the individual level where clinicians monitor patients in order to profile treatments orwhere epidemiologists investigate the importance of risk factors; and finally at the macro or grouplevel where aggregated data is analysed, as in spatial epidemiology which interprets geographical ortemporal variations of disease risk.
Recently, the size, type and quality of biomedical data have changed radically. Large data baseshave been set up to monitor public health performance and to study geographical heterogeneity ofdisease risk at a small spatial scale; gene expression, measured on ten of thousands of genes simulta-neously, is linked to genetic data on thousands of markers and complex phenotypes. Moreover, thereare many non-standard features of biomedical data that distinguish them from classical experimentalset-ups and need to be taken into account: missing or censored events, existence of intricate patternsof correlation and dependence, noisy, mis-measured and heterogeneous data with diverse sources ofvariability. All these features necessitate the development of new tools for integrating data efficientlyfrom multiple and widely differing sources, new ways of modelling that rely on modularity, newapproaches to the problem of multiple comparisons and false discovery and an increased interactionwith the field of Bioinformatics.
264 S. RICHARDSON
At the heart of modern statistical methods is the idea of hierarchical model building, in which aglobal picture of any complex data problem is constructed by:
– using a number of local sub-models to capture different components of the problem, each withhidden variables
– organising these sub-models in a hierarchical fashion– linking the local models via interpretable probabilistic relationships
with the ultimate aim of making probabilistic inference about all hidden variables. This strategy ofmodel building is closely tied up with developments of efficient algorithms for estimation of theunknown quantities,e.g. exact probability propagation, maximisation in multidimensional spaces orstochastic (Monte Carlo) simulations. The foundations have been laid; the challenge facing Biostatis-ticians today is to capitalize on these and develop an effective interplay between realistic, compactand interpretable models, inferential procedures that account for and adequately propagate uncer-tainties and deal with multiple testing problems, and efficient estimation algorithms and techniquesfor prediction for the very large and expensive datasets that are produced by the new biotechnologies.
3 Conclusion
I would like to end with an insightful quotation by Francis Bacon, a British philosopher of the16th century:
“The men of experiments are like the ant, they only collect and use; the reasonersresemble spiders, who make cobwebs out of their own substance. But the bee takes themiddle course: it gathers its materiel from the flowers of the garden and the field, buttransforms and digests it by a power of its own.”
I think that biostatisticians can be identified with the busy bee. The dynamic interactions betweencomplex scientific questions, theoretical and computational advances, and non- standard character-istics of the data have shaped Biostatistics as it is today and will shape its future and its role inSociety.
Related Reading
Brazma, A. & Vilo, J. (2000). Gene expression data analysis. FEBS Letters, 17–24.Genovese, C.R. & Wasserman, L. (2002). Operating characteristics and extensions of the FDR procedure.J. Royal Statistical
Society B, 64, 499–518.Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models.Science, 303, 799–805.Green, P.J., Hjort, N.L. & Richardson, S. 2003.Highly Structured Stochastic Systems. Oxford University Press.
[Received March 2005, accepted May 2005]