statistical and dynamical models in biology and medicine ... · statistical and dynamical models in...

9
Workshop Statistical and dynamical models in biology and medicine October 11-12, 2012, University of Stuttgart This workshop intends to bring together researchers from different research areas such as bioinformatics, biostatistics and systems biology, who are interested in modeling and analysis of biological systems or in the development of statistical methods with applications in biology and medicine. Jointly organized by the GMDS/IBS Working Groups 'Statistical methods in bioinformatics' and 'Mathematical models in medicine' (Tim Beißbarth, Universität Göttingen; Julien Gagneur, EMBL Heidelberg; Nicole Radde, Universität Stuttgart; Ingo Röder, TU Dresden) Sponsored by International Biometrical Society Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie 1 Program Overview Thursday, October 11 th , 2012 Session I: Dynamic models in biology (Chair: Nicole Radde) 12:30-13:25 Keynote: Hidde de Jong Identification of metabolic network models from high-throughput 13:25-13:45 Navid Bazzazzadeh Identification of the temporal dynamics of biological processes 13:45-14:05 Steffen Waldherr Dynamic modeling reveals robustness mechanism via sensitivity regulation in the TRPV1 ion channel 14:05-14:25 Jens Keienburg Quantitative Modeling of the temperature control mechanism in a molecular clock 14:25-16:00 Poster session I + Coffee Session II: Computational Immunology (Chair: Ingo Röder) 16:00-16:55 Keynote: Thomas Höfer Stochastic switches in mammalian gene expression 16:55-17:15 Lars Kaderali Mathematical Modeling of Hepatitis C Virus Infection 17:15-17:35 Gunnar Cedersund Dynamical conclusive modeling of insulin signalling: unique predictions despite unidentifiability 17:35-17:55 Markus Scholz Modeling chemotherapy outcome of diffuse large B-cell lymphoma 17:55-18:15 Natalie Filmann Modeling of viral dynamics after liver transplantation in patients with chronic hepatitis B and B/D 18:15-18:30 Election of new leaders for GMDS/IBS working group “Mathematical models in medicine” 18:30-19:30 Poster session II + drinks 20:00 Get together (Restaurant Amadeus, directions on last page, please sign up for dinner until 15:45) Friday, October 12 th , 2012 Session III: Probabilistic Networks in Biology (Chair: Tim Beißbarth) 9:00-9:55 Keynote: Marco Grzegorczyk Bayesian regularization of non-homogeneous dynamic Bayesian networks by coupling interaction parameters 9:55-10:15 Narsis Aftab Kiani Network inference from dynamic perturbation data using evolutionary approach 10:15-10:35 Mohammad J. Sadeh Molecular Network Reconstruction with a Safeguard against the Unknown Unknowns of Biology 10:35-10:55 Katrin Illner Bayesian blind source separation for data with network structure 10:55-11:30 Poster session III + Coffee Session IV: Statistical models for genomics (Chair: Julien Gagneur) 11:30-12:25 Keynote: Stephane Robin Some Statistical Models and Algorithms for Change-Point Problems in Genomics 12:25-12:45 Gen Lin Tracing the genetic shuffle - methods to uncover variations in recombination using single sperm cell data 12:45-13:05 Simon Anders Statistical aspects of single-cell transcriptomics 13:05-13:25 Stefanie Tauber Exploring the Sampling Universe of RNA-Seq 13:25-13:45 Achim Tresch Identification of transcription states by directional Hidden Markov Models 13:45-13:50 Workshop closing 2

Upload: dinhdieu

Post on 05-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Workshop

Statistical and dynamical models in biology and medicine

October 11-12, 2012, University of Stuttgart

This workshop intends to bring together researchers from different research areas such as bioinformatics, biostatistics and systems biology, who are interested in modeling and analysis of biological systems or in the development of statistical methods with applications in biology and

medicine.

Jointly organized by the GMDS/IBS Working Groups 'Statistical methods in bioinformatics' and 'Mathematical models in medicine' (Tim Beißbarth, Universität Göttingen; Julien Gagneur,

EMBL Heidelberg; Nicole Radde, Universität Stuttgart; Ingo Röder, TU Dresden)

Sponsored by

International Biometrical Society

Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

1

Program Overview

Thursday, October 11th, 2012

Session I: Dynamic models in biology (Chair: Nicole Radde)12:30-13:25 Keynote:

Hidde de JongIdentification of metabolic network models from high-throughput

13:25-13:45 Navid Bazzazzadeh Identification of the temporal dynamics of biological processes

13:45-14:05 Steffen Waldherr Dynamic modeling reveals robustness mechanism via sensitivity regulation in the TRPV1 ion channel

14:05-14:25 Jens Keienburg Quantitative Modeling of the temperature control mechanism in a molecular clock

14:25-16:00 Poster session I + Coffee

Session II: Computational Immunology (Chair: Ingo Röder)16:00-16:55 Keynote:

Thomas HöferStochastic switches in mammalian gene expression

16:55-17:15 Lars Kaderali Mathematical Modeling of Hepatitis C Virus Infection

17:15-17:35 Gunnar Cedersund Dynamical conclusive modeling of insulin signalling: unique predictions despite unidentifiability

17:35-17:55 Markus Scholz Modeling chemotherapy outcome of diffuse large B-cell lymphoma

17:55-18:15 Natalie Filmann Modeling of viral dynamics after liver transplantation in patients with chronic hepatitis B and B/D

18:15-18:30 Election of new leaders for GMDS/IBS working group “Mathematical models in medicine”

18:30-19:30 Poster session II + drinks

20:00 Get together (Restaurant Amadeus, directions on last page, please sign up for dinner until 15:45)

Friday, October 12th, 2012

Session III: Probabilistic Networks in Biology (Chair: Tim Beißbarth)9:00-9:55 Keynote:

Marco GrzegorczykBayesian regularization of non-homogeneous dynamic Bayesian networks by coupling interaction parameters

9:55-10:15 Narsis Aftab Kiani Network inference from dynamic perturbation data using evolutionary approach

10:15-10:35 Mohammad J. Sadeh Molecular Network Reconstruction with a Safeguard against the Unknown Unknowns of Biology

10:35-10:55 Katrin Illner Bayesian blind source separation for data with network structure10:55-11:30 Poster session III + Coffee

Session IV: Statistical models for genomics (Chair: Julien Gagneur)11:30-12:25 Keynote:

Stephane RobinSome Statistical Models and Algorithms for Change-Point Problems in Genomics

12:25-12:45 Gen Lin Tracing the genetic shuffle - methods to uncover variations in recombination using single sperm cell data

12:45-13:05 Simon Anders Statistical aspects of single-cell transcriptomics

13:05-13:25 Stefanie Tauber Exploring the Sampling Universe of RNA-Seq

13:25-13:45 Achim Tresch Identification of transcription states by directional Hidden Markov Models

13:45-13:50 Workshop closing

2

SESSION IDynamic models in biology (Chair Nicole Radde)

Keynote: Hidde de Jong; INRIA Grenoble - Rhône-Alpes; France; [email protected] of metabolic network models from high-throughput

Hidde de Jong

Understanding the cellular processes that shape the response of microbial cells to changes in their environment requires the study of the interactions between gene expression and metabolism. In recent years high-throughput datasets comprising simultaneous measurements of metabolism (fluxes, metabolite concentrations) and gene expression (protein and mRNA concentrations) have become available. These datasets provide a rich store of information for modeling the dynamics of the networks of biochemical reactions underlying cellular processes. In particular, they promise to relieve what is currently a bottleneck for modeling in systems biology, obtaining reliable estimates of parameter values in kinetic models.

Notwithstanding these experimental advances, parameter estimation remains a particularly challenging problem, among other things due to incomplete knowledge of the molecular mechanisms, noisy and partial observations, heterogeneous experimental methods and conditions, and the large size of networks. As a consequence, the models may not be identifiable, may not generalize to new situations due to overfitting, and nonlinear rate functions may make them cumbersome to analyze. This has led to the proposal of simplified kinetic modeling frameworks, including linlog kinetics, loglin kinetics, power-law kinetics, and more recently, convenience kinetics.

I will present methods for the estimation of parameter values of approximate kinetic models of metabolic networks, capitalizing upon the favorable mathematical properties of these models. In particular, the methods will be shown to be applicable in the case of missing data, a frequently-occurring situation in high-throughput experiments due to experimental limitations or instrument failures. Moreover, I will discuss how identifiability problems, due to implicit dependencies between the parameters or to limitations in the quantity and quality of the available data, can be detected and resolved in the framework of approximate kinetic models. The application of the above methods will be illustrated by means of a state-of-the-art dataset on central carbon metabolism in Escherichia coli.

Navid Bazzazzadeh; DKFZ; Germany; [email protected] of the temporal dynamics of biological processes

Navid Bazzazzadeh, Benedikts Brors, Roland Eils

The behavior and dynamics of complex systems are the focus of many research fields. The complexity of such systems comes not only from the number of their elements, but also from the unavoidable emergence of new properties of the system, which are not just a simple summation of the properties of its elements. The behavior of dynamic complex systems can be described by a number of well developed models, the majority of which however do not incorporate the modularity and the evolutionary dynamics of a system simultaneously. In this work, we propose a Bayesian model that addresses this issue. Our model has been developed within the Random Finite Set Theory.

Steffen Waldherr; University Stuttgart; Germany; [email protected] modeling reveals robustness mechanism via sensitivity regulation in the TRPV1 ion channel

Steffen Waldherr, Rene Buschow, Jörg Isensee, Frank Allgöwer, and Tim Hucho

Neurons in the peripheral nervous system sense pain-eliciting stimuli via special ion channel receptors. One such ion channel involved in pain sensing is TRPV1, which is activated for example by heat, low pH, or small molecule agonists like capsaicin. These stimuli trigger action potentials in the neurons, giving rise to the perception of pain in the brain. TRPV1 sensitivity can be regulated by intracellular signaling mechanisms. While the channel is well characterized on the single protein level, and mathematical models for channel activation are available, the interdependence of TRPV1-mediated calcium entry, TRPV1 expression levels, and intracellular calcium concentrations has so far not been quantitatively and dynamically investigated. Measurements of the TRPV1-activation mediated calcium influx and TRPV1 expression levels on a single cell level show a surprising lack of correlation between the measured variables. We construct a mathematical model of the TRPV1 regulation and intracellular calcium dynamics, aiming to elucidate the relation between TRPV1 expression and its response dynamics. The model is constructed mainly from information in the literature. Some parameters such as the basal calcium import or TRPV1 sensitization rates may vary greatly from one cell to the other. These parameters are estimated separately for each individual cell from our measurements of the calcium influx and TRPV1 protein amount. The estimated parameter values indicate that the TRPV1 sensitization kinetics are likely regulated by the total cellular TRPV1 amount in a way to decouple the amount of sensitized TRPV1 from the total TRPV1 amount. Physiologically, such a regulation may underly robustness of pain signaling against fluctuations in the channel expression level. Our combined computational and experimental approach thus reveals principles of the interaction between intracellular biochemical signaling and ion dynamics involved in neural signaling.

3

Jens Keienburg; Ruperto-Carola, Bioquant; Germany; [email protected] Modelling of the temperature control mechanism in a molecular clock

Jens Keienburg, Stefan Körkel, Michael Brunner, Johannes Schlüder, Hans Georg Bock, Roland Eils

The aim in this project is the ODE based modelling of the central genetic network that constitutes the circadian oscillation (molecular clock) in Neurospora. On the basis of quantitative measurements of the oscillation at constant temperature, a parameter estimation could be performed to fit an initial Goodwin model. The sensitivity of the biological system to temperature variations provided a way to externally change the oscillation behavior via a single physical parameter. Although the Goodwin model had been previously shown to account for temperature compensation in general, further quantitative measurements including temperature variations required not only a finetuning of parameters, but a modification of the model equations. In introducing an intermediate variable to delay transcription we identified an improved model, which is in good accordance with the system behavior reveiled in the quantitative data under temperature influence. A parameter estimation to fit the model completely is currently in work. To better understand the control effect of temperature and to obtain a model that can be applied to simulate various temperature scenarios, we will in the next step use Optimal Experiment Design to select important scenarios for further measurements.

SESSION IIComputational Immunology (Chair Ingo Röder)

Keynote: Thomas Höfer; Theoretical Systems Biology; DKFZ; Germany; [email protected] switches in mammalian gene expression

Thomas Höfer

We have used innate and adaptive immune responses to pathogens as model systems to dissect how mammalian cells react to external stimuli. Stochastic all-or-nothing switching at the level of the individual cell emerges as a ubiquitous mode of regulation from these studies. In this talk, I will discuss how we combine single-cell experiments with mathematical modeling to identify the mechanistic sources and functional consequences of this behavior. We find that both signal transduction and gene regulation contribute to stochastic switching in individual cells, while intercellular communication provides a means for achieving coherent responses at the population level. I will discuss molecular mechanisms that could generate stochastic all-or-nothing switches.

Lars Kaderali; IMB, Tu Dresden; Germany; [email protected] Modeling of Hepatitis C Virus Infection

N. Sulaimanov, M. Binder, D. Clausznitzer, C. M. Hüber, S. M. Lenz, J. P. Schläder, M. Trippler, R. Bartenschlager, V. Lohmann and L. Kaderali

Hepatitis C virus (HCV) infection develops into chronicity in 80% of patients, characterized by persistent low-level replication. To understand how the virus establishes its tightly controlled intracellular RNA replication cycle, we developed a detailed mathematical model of the initial dynamic phase of intracellular HCV RNA replication. Such a model allows it to study the antagonistic race between viral replication and cellular immune response. We quantitatively measured viral RNA and protein translation upon synchronous delivery of viral genomes to host cells, and thoroughly validated the model using additional, independent experiments. Model analysis was used to predict efficacy of different classes of inhibitors and identified sensitive substeps of replication that could be targeted by therapeutic intervention. A protective replication compartment proved to be essential for sustained RNA replication, balancing translation versus replication and thus effectively limiting RNA amplification. The model predicts host factors involved in formation of this compartment to determine cellular permissiveness to HCV replication. In gene expression profiling we identified metal ion binding and the phosphatidyl inositol system as key processes potentially determining cellular HCV replication efficiency. Our results show that the formation of the replicative compartment and involved factors are highly attractive drug targets.

Gunnar Cedersund; Department of Biomedical Engineering; Sweden; [email protected] conclusive modeling of insulin signalling: unique predictions despite unidentifiability

Gunnar Cedersund

Insulin signaling is at the heart of type 2 diabetes, one of the most expensive and rapidly spreading diseases of our time. Insulin signaling is initiated by the binding of insulin to its receptor, and this binding triggers a cascade of intracellular events of high complexity. The interaction network within which these intracellular events take place is only partially known, and many opposing hypotheses co-exist.

4

In this presentation, I will outline how we use an integrated experimental/modeling approach to gradually unravel this signaling network. Dynamic models are used to analyse which of the prevailing hypotheses that may and may not explain collected data. New methodologies allow us to identify certain predictions uniquely, also when the parameter values are non-unique. These unique predictions allow us to close to loop, and use the models to more confidently plan new experiments. I will illustrate this process on small models for parts of the network. I will also present a comprehensive dataset and model for the network level, which allows us to for the first time get an internally consistent view of where the insulin resistance appears that leads to type 2 diabetes. Most of the obtained insights would not have been obtained without the mathematical modeling.

Markus Scholz; University of Leipzig; Germany; [email protected] Modelling chemotherapy outcome of diffuse large B-cell lymphoma

Markus Scholz, Katja Roesch, Dirk Hasenclever

Background: The NHL-B2 trial showed that both time intensification of CHOP chemotherapy as well as addition of etoposide improves outcome in elderly diffuse large B-cell lymphoma (DLBCL) patients. However, these intensifications were not additive since double intensification was comparable to baseline CHOP. This interaction cannot be explained by increased toxicity. We hypothesise that the immune system plays a key role in controlling residual tumour cells after treatment. More intense chemotherapy may be detrimental in cases for which transient depletion of immunologic effector cells allows an early re-growth of residual tumour cells. To understand this process in more detail, we aim to develop a differential equations model of tumour growth, chemotherapy and immune response.

Model: We modify and extend a model proposed by Kusnezov which is based on two coupled ordinary differential equations describing dynamics of immunologic effector cells, tumour cells and their interactions. Major model features are an exponential tumour growth, a modulation of the production rate of effector cells by the presence of the tumour (immunogeneity) and mutual destruction of tumour and immune cells. Chemotherapy is introduced by a transient reduction of both, immune and tumour cells during the course of the treatment. Growth rate, chemosensitivity and immunogeneity of the tumour are assumed to be patient-specific. Additionally, initial tumour sizes serves as a proxy of the stage of disease. The model was simulated for a dense grit of possible parameter settings. Cure status and - if applicable - time to relapse were determined for different chemotherapies. The patient population is characterized by a distribution on the grid, which was chosen maximum entropy given corresponding expectations, variances and covariances. Since each distribution determines a survival curve, parameters of the distribution can be estimated by fitting clinical survival data.

Results: The model can qualitatively explain that more intense chemotherapies can result in inferior therapy outcome in a subset of patients. The model can also explain survival data after different chemotherapeutic regimen. Predicted hazard-ratios are in agreement with clinical observations. Estimated parameters are biologically plausible.

Conclusions: Our model explains observed paradox therapy effects in DLBCL by the simple assumption of a relevant anti-tumour effect of the immune system. It is possible to estimate the distribution of model parameters in trial populations. We will exploit the clinical relevance of our model insights in the future. We also aim to extend our model by the effects of immunologic tumour therapies not considered so far.

Natalie Filmann; Goethe University Frankfurt; Germany; [email protected] of viral dynamics after liver transplantation in patients with chronic hepatitis B and B/D

Natalie Filmann, Ingmar Mederacke, Heiner Wedemeyer, Eva Herrmann

Viral kinetic models have become an important tool for understanding the main biological processes behind the dynamics of chronic viral diseases and optimizing effectiveness of anti-viral therapy [1]. We analyzed the dynamics of hepatitis B and hepatitis B/D co-infection (HBV/HDV) and the pharmacokinetics/pharmacodynamics of the reinfection prophylaxis (=polyclonal antibodies) after liver transplantation. Therefore we developed a mechanistic model consisting of a system of ordinary differential equations. This model was fitted by analyzing the kinetics of the viremia and antibodies after liver transplantation in patient data and correlated with the reinfection prophylaxis dosing schemes [2]. The results suggest that this modeling approach may help to indicate factors which indicate an upcoming reinfection and to quantify the necessary HBIG dose rate to successfully prevent reinfection.

References:[1] M.A. Nowak, R.M. May, Virus Dynamics: Mathematical Principles of Immunology and Virology, Oxford University Press, 2000.[2] I. Mederacke, N. Filmann et al. Rapid early HDV RNA decline in the peripheral blood but prolonged intrahepatic hepatitis delta antigen persistence after liver transplantation., J Hepatol. 2012 56(1) 115-22.

5

SESSION IIIProbabilistic Networks in Biology (Chair Tim Beißbarth)

Keynote: Marco Grzegorczyk; TU Dortmund; Germany; [email protected] regularization of non-homogeneous dynamic Bayesian networks by coupling interaction

parametersMarco Grzegorczyk

The objective of systems biology research is the elucidation of the regulatory networks and signalling pathways of the cell. The ideal approach would be the deduction of a detailed mathematical description of the entire system in terms of a set of coupled non-linear differential equations. As high-throughput measurements are inherently stochastic and most kinetic rate constants cannot be measured directly, the parameters of the system would have to be estimated from the data. Unfortunately, standard optimization techniques in high-dimensional multimodal parameter spaces are not robust, and model selection is impeded by the fact that more complex pathway models would always provide a better explanation of the data than less complex ones, rendering this approach intrinsically susceptible to over-fitting. To assist the elucidation of regulatory networks, dynamic Bayesian networks can be employed. The idea is to simplify the mathematical description of the biological system by replacing coupled differential equations by conditional probability distributions. This results in a scoring function (marginal likelihood) of closed form that depends only on the structure of the network and avoids the over-fitting problem. Markov Chain Monte Carlo (MCMC) algorithms can be applied to search the space of network structures for those that are most consistent with the data.

To relax the homogeneity assumption of classical dynamic Bayesian networks (DBNs), various recent studies have combined DBNs with multiple changepoint processes. The underlying assumption is that the parameters associated with time series segments delimited by multiple changepoints are a priori independent. However, the assumption of prior independence is unrealistic in many real-world applications, where the majority of segment-specific regulatory relationships among the interdependent quantities tend to undergo minor and gradual adaptations. Moreover, for sparse time series, as typically available in many systems biology applications, inference suffers from vague posterior distributions, and could borrow strength from a systematic mechanism of information coupling. There are two approaches to information coupling in time series segmented by multiple changepoints: sequential information coupling, and global information coupling. In the former, information is shared between adjacent segments. In the latter, segments are treated as interchangeable units, and information is shared globally. Sequential information coupling is appropriate for a system in the process of development, e.g. in morphogenesis. Global information coupling, on the other hand, is more appropriate when time series segments are related to different experimental scenarios or environmental conditions. These coupling schemes have been applied to the regularization of DBNs with time-varying network structures, by penalizing network structure changes sequentially or globally. However, these approaches do not address the information coupling with respect to the interaction parameters and assume complete parameter independence among time segments.

In the talk I will present two novel non-homogeneous dynamic Bayesian network models for sequential [1] and global [2] information sharing with respect to the interact on parameters.

REFERENCES:[1] Grzegorczyk, M. and Husmeier, D. (2012a): A non-homogeneous dynamic Bayesian network model with sequentially coupled interaction parameters for applications in systems and synthetic biology. Statistical Applications in Genetics and Molecular Biology (SAGMB), vol. 11 (4), Article 7.[2] Grzegorczyk, M. and Husmeier, D. (2012b): Bayesian regularization of non-homogeneous dynamic Bayesian networks by globally coupling interaction parameters, In: N. Lawrence and M. Girolami (editors), Proceedings of the 15th International Conference on Artifical Intelligence and Statistics (AISTATS), 467-476, vol. 22 of JMLR.

Narsis Aftab Kiani; IMB, Tu Dresden - BioQuant,Heidelberg University; Germany; [email protected] inference from dynamic perturbation data using an evolutionary approach

Narsis Aftab Kiani, Lars Kaderali

Recent technological developments in experimental high-throughput molecular biology have enabled the use of network inference algorithms to predict causal models of networks from correlational data. To-date, various methods have been developed and applied to reveal signaling pathway mechanisms, yet the results so far have been modest. A fundamental challenge is how to efficiently find the true network among an exponentially increasing number of possible network topologies with an increasing number of nodes, under conditions of limited data. This is an ill-posed problem, that requires either the integration of prior biological knowledge, or strong regularization. We focus here on the problem of reconstructing signalling networks from time course observational data or steady-

6

state data, after defined perturbations of the system. We introduce the EARN approach, a novel structure-learning method that utilizes Bayesian networks with a probabilistic Boolean threshold. To deal with a multimodal and high dimensional target distribution, distributed evolutionary Markov chain Monte Carlo sampling is used to evaluate the posterior distribution over models, given the data. To increase the efficiency of the sample, an evolutionary algorithm has been integrated into the Markov chain. Available biological prior knowledge easily can be integrated into the inference method. We show results of our approach on both real and simulated data. We illustrate in a simulation study that our method can deal with noisy and missing time-series or steady-state data. Comparison of the model predictions against the actual edges in the network and against a random network using four different metrics, accuracy, precision, sensitivity, and specificity and based on simulated data for networks of different topology, demonstrates that our method is more efficient than similar methods. We then applied EARN to reconstruct the ERBB signaling network in trastuzumab resistant breast cancer cells from 16, partially combinatorial, siRNA interventions. In this data set, a high fraction of genes have not been observed experimentally. In spite of this, our approach reliably uncovered a significant amount of the currently known interactions in the human G1/S transition network. In particular, we outperform the recently proposed DEPN approach in reconstructing the ErbB signaling network, based on the experimentally validated network from the STRING data base.

Mohammad Sadeh; University of Regensburg; Germany; [email protected] Network Reconstruction with a Safeguard against the Unknown Unknowns of Biology

Mohammad J. Sadeh, Giusi Moffa and Rainer Spang

Our current understanding of virtually all cellular networks is almost certainly incomplete. We miss important but sofar unknown genes and mechanisms in the pathways. Moreover, we often only have a partial account of the molecular interactions and modifications of the known players. When analyzing the cell, we look through narrow windows leaving potentially important events in blind spots. Network reconstruction is naturally confined to what we have observed. Little is known on how the incompleteness of our observations confounds our interpretation of the available data.

Here we ask the question, which features of a network can be confounded by incomplete observations and which cannot. In the context of nested effect models, we show that in the presence of missing observations or hidden factors a reliable reconstruction of the full network is not feasible. Nevertheless, we can show that certain characteristics of signaling networks like the existence of cross talk between certain branches of the network can be inferred in a not-confoundable way. We derive a test for inferring such not-confoundable characteristics of signaling networks. Next, we introduce a new data structure to represent partially reconstructed signaling networks. Finally, we evaluate our method both on simulated data and in the context of a study on early stem cell differentiation in mice.

Katrin Illner; Helmholtz Zentrum München; Germany; [email protected] blind source separation for data with network structure

Katrin Illner, Christiane Fuchs, Fabian J. Theis

Many biological applications involve large scale data where some prior information about the data-generating structure is given. These might for example be gene expression measurements with a known underlying gene regulatory network, or metabolic data with known metabolic pathways. The measured data, however, will typically be a mixture of different components that are associated to the network or to single subnetworks, respectively. We focus on this structural aspect and aim to separate the mixture of observed signals into informative sources. Technically, we define a Gaussian graphical model with latent variables to describe the dependence structure of each source. To keep the parameter space small we require stationarity within each source. We estimate parameters and sources using expectation maximization. The flexible Bayesian model allows for including parameter priors and dealing with missing observation values. We show the separation performance on synthetic data and check for robustness regarding network perturbations. As a real world example we consider gene expression data where the genes are linked by a gene regulatory network. We demonstrate how the model identifies relevant biological processes and discuss how the activity of given pathways can be determined using the stationarity parameter.

7

SESSION IVStatistical Models for genomics (Chair Julien Gagneur)

Keynote: Stephane Robin; INRA/AgroParisTech; France;[email protected] Statistical Models and Algorithms for Change-Point Problems in Genomics

Stephane Robin

Change-point problems often arise in genomic applications, such as the detection of genomic alterations or copy number variations. The general problem can be stated as follows: consider a series of observation along time, the distribution of which is subject to abrupt changes, can we infer the number of change-points, their location, the magnitude of the change, etc.?

Such problems have been intensively studied in the statistical literature and raise several types of issues in terms of model selection (how many change-points?), algorithmics (the segmentation-space is exponentially large with respect to the length of the series) or modelling (how to account for some dependency between the series).

We will present a series of models and corresponding inference algorithms, focusing on deterministic methods. We will also limit ourselves to algorithms that recover exactly the optimal segmentation (in terms of likelihood) or provide the exact posterior distribution of various quantities of interest. These methods will be illustrated with application in genomics.

References:• Rigaill, R., Lebarbier, E., & Robin, S. (2011). Exact posterior distributions and model selection criteria for multiple change-point detection

problems. Stat. Comput., 1-13.• Picard, F., Lebarbier, E., Hoebecke, M., Rigaill, G., Thiam, B., & Robin, S. (2011). Joint segmentation, calling and normalization of multiple

{CGH} profiles. Biostatistics, 412(43), 4413-428.• Picard, F., Lebarbier, E., Budinska, E., & Robin, S. (2011). Joint segmentation of multivariate Gaussian processes using mixed linear

models. Comput. Statist. Data Anal., 55(2), 1160-70.

Gen Lin; EMBL Heidelberg; Germany; [email protected] the genetic shuffle - methods to uncover variations in recombination using single sperm cell data

Gen Lin, Julien Gagneur, Chenchen Zhu,Lars Steinmetz

Recombination is a critical process that ensures the proper segregation of chromosomes during meiosis, while maintaining both genome integrity and exchange of genetic material. Studies based on sperm cell genotyping focusing on single hot spot have revealed significant recombination rate variations between individuals. However, little is known about inter-individual variations genome-wide.

The main challenge to achieving a genome-wide measure of recombination rate through sperm typing is the cost of genotyping large number of single sperm cells. Here we developed computational methods that make this possible, using low sequencing coverage data from high number of single sperm cells. We developed a Metropolis-Hastings algorithm to obtain the phase of the donor genome, from which we inferred the recombination events in each sperm cell.

Akin to a shotgun like approach, we have performed single cell sequencing at low coverage for ~ 200 cells across 2 individuals. Our phasing method was able to deal efficiently with missing data, and takes into account the uncertainty. It is an efficient R implementation that runs in average within 60 minutes / chromosome (~ 100 cells) on a standard laptop. Also, we have developed a statistical test based on genotype imputation to identify significant differences in recombination rates between individuals from such sparse genotyped data. We could detect significant differences in 70 regions genome-wide, between 2 individuals.

Altogether, our statistical methods enable an efficient strategy to provide a quantitative assessment of personal recombination rates from sequencing data.

8

Simon Anders; EMBL; Germany; [email protected] Statistical aspects of single-cell transcriptomics

Simon Anders

Advanced in sample preparation techniques now allows for the sequencing of RNA from individual cells, which makes it possible to study how genes vary and co-vary in the distribution of their expression strength across cells and so obtain novel insights into the mechanics of transcriptional regulation. However, due to the low amount of biological material obtained from a single cell, technical noise in such data is strong, and for proper inference of biological properties, one needs to reliable distinguish biological from technical variation. By comparing technical and biological replicates of RNA-Seq samples obtained from single-cell amounts of material, we characterized the properties of technical noise in these settings, and developed an inference procedure suitable for the analysis of such data.

Stefanie Tauber; CIBIV (Center for Integrative Bioinformatics); Austria; [email protected] the Sampling Universe of RNA-Seq

Stefanie Tauber and Arndt von Haeseler

How deep is deep enough? While RNA sequencing states a well-established technology the required sequencing depth for detecting all expressed genes is not known. If we leave the entire biological overhead and meta-information behind we are dealing with a classical sampling process. Such sampling processes are well known from population genetics and thoroughly investigated. Here we use the Pitman sampling formula to characterize the sampling process of RNA sequencing. By doing so we are able to model the sampling by means of two parameter which grasp the conglomerate of different sequencing technologies, protocols and their associated biases. As a consequence we are able to realistically simulate the distribution of reads on a per-gene as well as on the transcriptomic level. Additionally we are able to evaluate the theoretical expectation of uniform coverage. Most importantly, given a pilot sequencing experiment we provide an estimate for the size of the underlying expressed transcript universe and an estimate for the number of newly detected genes when sequencing an additional sample.

Achim Tresch; University Cologne; Germany; [email protected] of transcription states by directional Hidden Markov Models

Achim Tresch

Technologies for measuring protein-DNA binding on a genomic scale provide detailed information on transcription factor occupancies at almost single nucleotide resolution. It is a challenge to integrate multiple such data sets and draw conclusions about higher order functional relationships between these transcription factors. We apply Hidden Markov Models (HMMs) to chromatin immunoprecipitation data of transcription initiation, -elongation and –termination factors, as well as to nucleosome data to automatically define distinct transcription states in yeast. The method recognizes previously known transitions from initiation to elongation and identifies novel, so far uncharacterized transitions during the early stages of transcription. As an extension to standard HMMs, we introduce directional HMMs (dHMMs) that adequately model the fact that transcription can occur in forward or reverse direction. dHMMs identify bidirectional promoters and previously unknown transcripts along the whole yeast genome. We conclude that dHMMs can be used as a general purpose tool for the automated annotation of transcription / chromatin states in arbitrary organisms.

9

POSTER SESSION

1.) Michaela Bayerlova; University Medical Center Göttingen; Germany; [email protected] gene expression data in breast cancer using a graph-based WNT pathway model

Bayerlova M, Kramer F, Pukrop T, Klemm F, Bleckmann A, Beißbarth T

Gene Set Enrichment Analysis is a versatile bioinformatics approach and has been frequently used in modern research. Using global WNT gene sets, the WNT pathway in general is highly active in the molecular basal-like subgroup of breast cancer and in all subgroups which later metastasize to the brain. However, previous studies of breast cancer primaries could not identify a WNT ligand or sub-pathway mediating these signals. Our own results indicate that -Catenin independent WNT signaling is of importance in breast cancer and its metastases. However, currently there is no WNT model available which differentiates between the distinct WNT sub-pathways. Therefore, we aim to develop a new graph-based WNT model and to use it for a more refined pathway analysis of breast cancer expression data. As a first step we collect information about the human WNT pathway from severalpublic databases (PID, Biocarta, Reactome, KEGG). These databases often use the Biopax format as a standard XML format. To utilize this knowledge within the statistical computing environment of R we have developed a Biopax-Parser package, which allows to retrieve pathway nodes corresponding to signaling components and pathway edges representing molecular interactions. The Biopax pathways are parsed into the R, transformed into adjacency matrix and can be merged, shrunk or extended. This package allows us to generate a consensus WNT model and use it in the further analyses. Different algorithms integrating network knowledge and allowing a more refined enrichment analysis are currently tested and will be used to discriminate activation of different WNT sub-pathways.

2.) Frank Kramer; University Medical Center Göttingen; Germany; [email protected]: A new package to parse, modify and merge BioPAX-Ontologies within R

Frank Kramer, Michaelá Bayerlova, Annalen Bleckmann, Tim Beißbarth

Methods for network reconstruction are often designed with the possibility to integrate prior knowledge about the topology of biological signaling networks. However, the format of prior knowledge required, usually in form of an adjacency matrix, is a strong abstraction of the biological reality. In the past years ontologies have been the tool of choice to represent and allow the sharing of knowledge of this biological reality. BioPAX is a commonly used ontology for the encoding of regulatory pathways. The R Project for Statistical Computing is the standard environment for statistical analyses of high-dimensional data and network reconstruction methods. Although there are packages available that provide the pathway data of databases like KEGG, the Pathway Interaction Database (Nature/NCI) or Reactome as graphs, there was no software available to parse, merge and manipulate BioPAX ontologies inside of R. We present a new open-source package called rBiopaxParser that parses BioPAX-Ontologies and represents them in R. Class definitions, properties and restrictions are mapped on a 1:1 basis, with respect to the limitions of object-orientation of R. The user is able to parse arbitrary BioPAX OWL files, for example the exports of popular online pathway databases like PID, Reactome or KEGG. Instances of BioPAX-Classes can be programatically added or removed. Multiple pathways can be merged or transformed into an adjacency matrix suitable as input for network reconstruction algorithms, i.e. reducing a pathway to a graph with edges representing only activations or inhibitions. The software is publicaly available at https://github.com/frankkramer/rBiopaxParser and will submitted to Bioconductor soon.

3.) Andrei Kramer; IST, Uni Stuttgart; Germany; [email protected] of the posterior entropy in a Bayesian framework for parameter estimation in biological

networks.Andrei Kramer

The careful design of or even brainstorming for biological experiments and the evaluation of their results can benefit greatly from fast (possibly even interactive) model-parameter sampling procedures. The sampling algorithms need to be adaptive to the target probability distribution to be fast but also robust with respect to multimodality and with regard to inherently inaccurate models. Since biochemical models are unwieldy for the purposes of understanding a biological system, systems biology treats phenomenological models and summarizes often very complex multi stage interactions (with intermediate substances) as one reaction kinetic. The resulting models do not have any meaningful true parameters as such, but can fit experimental observations within the precision of the data. Since there is no unique fitting parameter value it is very natural to work with probability distributions. Therefore, MCMC techniques make these high dimensional probability distributions accessible for analysis, and therefore seem appropriate for our purposes. They are time consuming though and need to be modified and adjusted for best possible effective sampling speed for this very specific problem type. We provide

10

examples of how this might be achieved in the future and is to a limited extent done now, using the Hybrid Monte Carlo algorithm. Our method is evaluated on two biological network examples, models for MAPK signaling and the insulin pathway with data from literature.

References:• Girolami M, Calderhead B: Riemannian manifold Langevin and Hamiltonian Monte Carlo methods. JR Statist Soc B 2011, 73:1-37.• Kramer A, Hasenauer J, Allgöwer F, Radde N: Computation of the posterior entropy in a Bayesian framework for parameter estimation in

biological networks. In IEEE International Conference on Control Applications, Yokohama, Japan 2010:493-498. [Part of 2010 IEEE Multi-Conference on Systems and Control].

• Fritsche-Guenther R, Witzel F, Sieber A, Herr R, Schmidt N, Braun S, Brummer T, Sers C, Blüthgen N: Strong negative feedback from Erk to Raf confers robustness to MAPK signalling. Mol Syst Biol 2011, 7.

• Brannmark C, Palmer R, Glad ST, Cedersund G, Stralfors P: Mass and information feedbacks through receptor endocytosis govern insulin signaling as revealed using a parameter-free modeling framework. Journal of Biological Chemistry 2010,

4.) Valerii Sukhorukov; Helmholtz Centre for Infection Researc; Germany; [email protected] Architecture of Mitochondrial Network: A Dynamic Graph Representation

M. Sukhorukov, Daniel Dikov, Andreas S. Reichert, Michael Meyer-Hermann

Within eukaryotic cells mitochondria form a tubular network spanning the volume of cytosol. Its structure results from balanced dynamics of motile chondriome parts, constantly undergoing fission and mutual fusion processes. Despite the good understanding of molecular complexes responsible for the elementary fission or fusion events, the architecture and dynamic properties of the resulting reticulum as a whole remain obscure. We employ a graph theory formalism to represent mitochondria as a spatial network evolving according to a well-defined set of node transformations established experimentally. The graph is then studied as a mean-field deterministic approximation, as well as explicit stochastic computational model. The former establishes a clear analytical relationship between the fission/fusion rates and resulting reticulum structural characteristics. The agent-based model further extends this insight by predicting a detailed distribution of the chondriome component sizes. Its analysis indicates that mitochondria operate in the vicinity of a structural phase transition. We propose that instability of the reticulum configuration resulting from the critical regime should enable high morphological adaptability of this organelle, explaining the variability of mitochondrial configurations observed in different cell types. The structural flexibility is also crucial for the efficiency of mitochondria in their function as a main cellular metabolic and apoptotic operator.

5.) Silvia von der Heyde; University Medical Center Göttingen; Germany; [email protected] modelling of drug resistance in breast cancer

von der Heyde S, Bender C, Henjes F, Korf U, Beißbarth T

Despite promising progress in the field of targeted breast cancer therapy, drug resistance still remains a scientific challenge. Monoclonal antibody drugs, e.g. trastuzumab and pertuzumab, and small molecule inhibitors like erlotinib, have been designed to hinder ERBB2 and EGFR receptor induced aberrant signalling driving tumour progression via the MAPK and PI3K pathways. Both receptors belong to the ERBB receptor family whose oncogenic potential unfolds in case of overexpression or mutations. Their functional interaction as dimers renders bypasses possible to overcome pathway blockades. Our intention is to reveal such resistance mechanisms in ERBB2-amplified breast cancer specimens and to recommend related individual optimal (combinatorial) drug treatments. We focus on RPPA proteomics data of receptor phosphorylation and downstream signalling molecules in the breast cancer cell lines BT474, SKBR3 and HCC1954. The latter is known to harbour an oncogenic PI3K mutation and to be trastuzumab resistant. SKBR3 and HCC1954 also show increased EGFR expression. The experiments involved treatment with the mentioned drugs up to 60 min and 30 h, respectively, to gain insight into fast cellular signalling events as well as long-term effects. In a Boolean modelling approach signalling networks are reconstructed from data in a cell-specific manner applying the DDEPN [1] algorithm under consideration of prior literature knowledge as a reference network. Subsequently, network behaviour is simulated in response to certain stimuli and inhibitor combinations to reveal drug inputs leading to tumour suppression and to detect (edgetic) resistance mechanisms, e.g. by analysing HCC1954-specific interactions. The R package BoolNet [2] is used for pathway analysis via perturbation simulations. Unravelling cell-specific oncogenic protein interactions establishes a basis for targeted interference of pathological cellular processes and hence a further step to predictive models of drug response mechanisms for personalized medicine.

References:[1] C. Bender et al., Bioinformatics 2010;[2] C. Müssel et al., Bioinformatics 2010

11

6.) Sebastian Gerdes; TU Dresden, Medical Faculty, IMB; Germany; [email protected] polyclonality prevent the outbreak of leukemia?

T cell receptor (TCR) polyclonal mature T cells are surprisingly resistant to oncogenic transformation through retroviral induction of T cell oncogenes. It has been shown that leukemia/lymphoma did not occur upon transplantation of polyclonal T cells into RAG1-1-deficient recipients, although the T-cells were transduced with high copy numbers of gammaretroviral vectors encoding potent T cell oncogenes [1]. Further studies [2] demonstrated that the transplantation of T cells from TCR monoclonal OT1 mice that were transduced with the same protocol resulted in leukemia/lymphoma. The underlying mechanisms that prevent oncogenesis in the polyclonal situation and endorse the outbreak of leukemia in the monoclonal situation are currently unclear.

Using a mathematical modeling approach, we challenge the arising hypothesis that polyclonality induces competition within the T cell repertoire, which in turn suppresses the emergence of a leukemic clone. As a starting point, we developed a simple model of T cell homeostasis emphasizing the analogy of T cell homeostasis to species coexisting in ecological niches. The key assumption of the model is that T cell survival is critically dependent on the interaction of the clone-specific TCR with self-peptide-MHC-complexes (corresponding to environmental niches).

Based on our modelling results, we speculate about the cellular properties of the leukemic clone. Within our model framework, we are able to explain the observed phenomena under the following two assumptions about the cellular properties of the leukemic clone:(i) The leukemic clone is less competent than other T cell clones in acquiring survival stimuli from niches.(ii) Proliferation of the leukemic clone is less dependent on niche interaction. This is a plausible assumption as the transgenes are potent oncogenes capable of activating mitogenic pathways.

From our results we conclude, that clonal competition is a possible mechanism to counterbalance clonal dominance. Our modeling results allow us to foster the design of further biological experiments. A future goal is to determine the minimum clonal complexity that is needed in order to control the leukemic clone under the given circumstances.

References:[1] Newrzela S, Cornils K et al. Resistance of mature T cells to oncogene transformation. Blood. 2008;112(6):2278-2286.[2] Newrzela S, Al-Ghaili N et al. T-cell receptor diversity prevents T-cell lymphoma development. Leukemia. doi: 10.1038/leu.2012.142.

7.) Patrick Weber; IST - Uni Stuttgart; Germnay; [email protected] key-players and secretion activity at the trans Golgi network of mammalian cells

Patrick Weber, Nicole Radde

The secretory activity of mammalian cells is highly regulated. Key protein and lipids interact at the trans Golgi network in interrelated feedback loops to regulate the formation of tranport vesicels. This project uses systems biological methods to understand the interaction strength of the key players and their influence on the secretory activity. Bayesian methods and ODE models are used to explain the available data and get a deeper insight in the importance of the individual interaction strengths.

8.) Manuel Nietert; University Medical Center Göttingen; Germany; [email protected] Classification of Cell Populations with Multi-channel Flow Cytometry Data - Using Sparse Grids

Classifying A Sparsely Populated Data SpaceManuel Nietert, Steve Wagner, Dorit Arlt, Tim Beißbarth

Assigning cell population clusters to multi-channel FACS data resembles the task of optimizing the automatic classification of various haystack built-ups. Surely a strategy based on analyzing the build-up of reference stacks can be used to generate classification models helpful in assigning similar classes to the various populations present in new stacks. For most of the available experimental FACS data, each resulting data space of an experiment is though only sparsely populated. This fact has various reasons ranging from the currently observed populations in the set likely not spreading out over all available combinations of measured attributes to an incomplete representative sampling of the population distribution itself. By applying a sparse grid-based approach to classify the multi-dimensional FACS space derived from patients blood samples we present a means to automatically assign cell populations based on previously defined reference populations, while in this case aiding in the identification of potential circulating tumor cells (CTCs) minimizing the prior use of expert knowledge, whilst still optimizing the sensitivity and specificity of the classification method.

12

9.) Martin Falk; VISUS, University Stuttgart; Germany; [email protected] - Modeling, Simulating, and Analyzing Cellular Processes

Martin Falk

In systems biology, models are typically developed in an iterative, cyclic fashion. We developed "CellVis" to support this cyclic process of model development for cellular models. We employ a three-dimensional, stochastic simulation where signaling molecules are represented by individual particles to allow for the modeling of spatial effects, like local differences in molecular concentrations. In addition, the cellular architecture is considered with respect of cytoskeletal filaments of both microtubules and microfilaments. Intracellular processes include interactions between molecules, the transport by diffusion and motor proteins along microtubules, the import of molecules through nuclear pore complexes, and the dynamic modification of the cytoskeleton. To reduce the computing times needed by the simulation, a highly-parallel implementation employing recent graphics hardware is used. The analysis step also relies heavily on graphics hardware to allow the interactive visualization of the 3D simulation results. Several visualizations are available to analyze different aspects of the data. The microscopic visualization generates images similar to pictures obtained by microscopy, whereas a glyph-based visualization renders the data in a geometric representation in the same way as it is used during the simulation. The atomistic visualization adds additional details to the modeled proteins down to the atomic scale for a deeper immersion into the data, educational purposes, or artistic renderings. Ongoing work is concerned with a statistical analysis of the simulation results and a comparative visualization of multiple simulation runs.

10.) Dimitra Bon; Goethe-University Frankfurt; Germany; [email protected] Viral dynamic model of antiretroviral therapy for patients infected with HIV-1

Dimitra Bon, Christoph Stephan, Oliver T. Keppler and Eva Herrmann

Combination antiviral therapies consisting of reverse transcriptase inhibitors, protease inhibitors and an integrase inhibitor have been developed to suppress HIV below the limit of detection. We present a mathematical model for the effect of different combination treatment regimens on the dynamics of HIV RNA and CD4 T-cell counts. We focus on modeling the treatment effect of the integrase inhibitor-Raltegravir. The model consists of a system of ordinary differential equations and the parameters were chosen or estimated in order to agree with clinical data of a recent clinical trial. All the numerical simulations were calculated with Matlab.

11.) Eva-Maria Geissen; IST, University Stuttgart; Germany; [email protected] the Dynamics of the Spindle Assembly Checkpoint in Schizosaccharomyces Pombe

Eva-Maria Geissen, Stephanie Heinrich, Silke Hauf, Nicole Radde

The spindle assembly checkpoint (SAC) is a crucial surveillance mechanism within the eukaryotic cell cycle that is needed to ensure proper chromosome segregation [Musacchio and Salmon, Nature Reviews 2007]. The SAC robustly responds to a single unattached kinetochore. At the same time the switch-off dynamics are very fast. Despite a wealth of cell biological and biochemical data, the complex in vivo kinetics of the SAC are still only fragmentarily understood. By combining mathematical modeling with qualitative and quantitative experimental data from fission yeast we aim to obtain insight into this signaling network. Therefor the abundances of the core SAC proteins were determined by quantitative fluorescence deconvolution microscopy and fluorescence correlation spectroscopy. Furthermore we measured the impact of perturbation of SAC protein abundance on SAC functionality. Our mathematical model is based on well-established molecular features and uses chemical reaction kinetics formulated via ordinary differential equations. Model parameters and their uncertainties were estimated based on the qualitative and quantitative experimental data, applying a statistical Bayesian approach. The posterior distribution was investigated via Markov chain Monte Carlo (MCMC) sampling. Obtained parameters were used for model simulation and prediction of experimental outcomes. By this means we demonstrated that published reaction rate constants gathered in vitro cannot explain efficient SAC signaling regardless of other model parameters, which is in accordance with [1]. Including these rate constants in the sampling process, we obtained a model that recapitulates several features of the signaling mechanism. In particular switch off dynamics resemble in vivo observations well. We furthermore observed discrepancies between experimental data and model predictions for some experiments, motivating a refinement of the model. We were able to formulate several mechanistic hypotheses for these discrepancies, which will guide the subsequent experiment and modeling process. Our systems biological approach for modeling SAC signaling dynamics is an example for a fruitful interplay between experiment, modeling and model analysis. Possible mechanisms for checkpoint signaling can now be computationally tested against a rigorous framework of quantitative experiments. By this means valuable insights into the molecular mechanisms of SAC signaling can be obtained.

References:[1] M. Simonetta, R. Manzoni, R. Mosca, M. Mapelli, L. Massimiliano, M. Vink, B. Novak, A. Musacchio, and A. Ciliberto, The influence of catalysis on mad2 activation dynamics. PLoS Biol, vol. 7, no. 1, p. e10, Jan 2009.

13

12.) Alexander Groß; Ulm University; Germany; [email protected] cross-differentiation across subpopulations of hematopoietic stem cells in response to irradation

Alexander Groß, Jianwei Wang, Lenhard Rudolph, Hans A. Kestler

Hematopoietic stem cells (HSC) are divided into common lymphoid (lineage of T-, B- and NK-cells) or common myeloid (lineage of macrophages, erythrocytes, dendritic cells and others) progenitors. An external stimulus affecting HSCs in a lineage-dependent manner may possibly lead to cross-differentiation of HSC lineages in order to jointly maintain the function of HSCs. In this study, we investigate the behavior of common myeloid progenitor HSCs and common lymphoid progenitor HSCs after irradiation. Although this ultimately leads to their depletion, the experiments show remarkable differences of cell numbers for both groups.

We discriminate cell populations by CD150 cell surface markers. CD150hi cells represent the lineage of myeloid progenitors, while CD150lo cells stand for lymphoid progenitors. Their interactions are described in a computational model based on delay-differential equations. Several cellular processes involving these populations are specified. CD150hi HSCs can differentiate into CD150lo HSCs and the latter to further lymphoid progenitor cell lineages. Proliferation and apoptosis happen in all HSCs subpopulations and are based on measurements of further markers. We translated the specified populations and cellular processes into a model of coupled delay-differential equations. After inferring the corresponding kinetic parameters, we simulate the dynamics of the subpopulations and compare them to the experimental observations. Enabling differentiation from CD150hi to CD150lo and from CD150lo to further lineages yields a significantly lower error between the experimentally observed cell numbers and results from computational model simulations. The experiments demonstrate that different HSCs populations show distinct responses to external stimuli like irradiation. The results from our computational model indicate that in response to irradiation myeloid progenitor HSCs differentiate into lymphoid progenitor HSCs.

13.) Bernd Klaus; University of Leipzig; Germany; [email protected] Secretory Granules in Pancreatic Beta-Cells: in-silico modeling of their statistics and dynamics

Jaber Dehghany, Michael Meyer-Hermann

Insulin is the body's main glucose lowering hormone which is stored in dense-core secretory granules in pancreatic beta-cells. Glucose-induced insulin secretion follows a two phase time course: one rapid and transient phase and a week but sustained phase. Loss of first phase in insulin secretion results in Type 2 Diabetes, a metabolic disorder which is rapidly increasing worldwide. Therefore it is important to understand the cellular mechanism underlying biphasic insulin secretion. Total number of granules, size distribution and spatial distribution of granules in a typical beta-cell are important in the proposed models for stimulated insulin secretion from beta-cells. We have developed an in-silico model based on experimental results to find the true size distribution, 3D density profile and total number of granules in a typical beta-cell. Our findings imply that rat beta-cells contain 50% less granules∼ compared to the previously assumed number. Also each granule contains about twofold more insulin, while its exocytosis increases membrane capacitance about twofold less than assumed previously. Then we made an agent-based model for granules dynamics inside the cell and factors playing role in the two-phases of insulin release. Our analysis shows that second phase of insulin secretion, despite first phase, is controlled by priming time as a rate-limiting factor. The increasing secretion rate can be explained by facilitated priming (and hence exocytosis) of granules, instead of enhanced granule dynamics induced by metabolized glucose.

14.) Bernd Klaus; University of Leipzig; Germany; [email protected] Empirical Bayes Estimation of False Disovery Rates

Bernd Klaus

False discovery rate (FDR) analysis is a major recent statistical innovation that has found widespread application in the study of high-dimensional genomic data (gene expression, RNA-Seq etc.). Its ultimate goal is the seperation of signal (e.g. differentially expressed genes) from noise (e.g. inactive genes). Since both of them are very often overlapping, decision making is very difficult. However, once a mixture model, composed of a "null" component for the noise, and an "alternative" component that represents the signal, has been fit to the data, false discovery rates allow intuitive and simple signal identification. The fitting process is greatly faciliated by the incredible number of test statistics encountered in todays multiple testing problems. Here, this is actually and not a curse but a a blessing, since this high dimension allows to estimate the FDR from data. Truncated maximum likehood estimation has been shown to be a powerful aproach for estimating the null component, yielding reliable null model parameter estimates. I propose an new appraoch to truncated maximum likehood estimation based on an automated smoothing method. Furthermore this new null model estimation technique is complemented by contrained maximum likelihood estimation for the alternative density using log--convace density estimation. Log-convace density estimation provides a non-paremtric, tuning-parameter free, and yet very smooth.

14

15.) Stephan Lorenzen, University Medical Center Göttingen; Germany; [email protected]: Detrending and fit of time resolved luminometric data

Stephan Lorenzen

For the study of biological rhythms, the expression of GFP fusion proteins is an often used tool. In such a setting, the expression of genes of interest can be monitored in real time; the use of plate readers even allows high-throughput studies. However, besides showing physiological changes in the expression of genes, the data is dominated by major fluctuations due to, e.g., cell proliferation over time, which superimposes the data as trend. In the analysis of circadian data, this trend is usually eliminated by division by a sliding 24 hour avarage. However, this procedure leads to an omission of 24 hours of data from the analysis and thus artificial shortening of the time series. Other methods of detrending include fitting a polynomial through the data and dividing the data by this "trend" to obtain detrended data for analysis. One drawback of this method is the trend of polynomials to "break out" (obtain very large/small values) at the boundaries of the fitted area. We here present a new detrending and analysis algorithm which uses sums of cosine functions similar to Fourier series to detrend and fit the data while maintaining its "circadian" information. The quality of data analysis is thus improved compared to other detrending algorithms, which can be shown by the lower fraction of misclassifications using artificial data.

16.) Marit Ackermann; Biotec, TU Dresden; Germany; [email protected] Impact of natural genetic variation on gene expression dynamics

Marit Ackermann, Weronika Sikora-Wohlfeld, Andreas Beyer

DNA sequence variation causes changes in gene expression which in turn has profound effects on cellular states. These variations affect tissue development and may ultimately lead to pathological phenotypes. A genetic locus containing a sequence variation that affects gene expression is called an "expression quantitative trait locus" (eQTL). Whereas the impact of cellular context on expression levels in general is well established, much less is known about the cell-state specificity of eQTL. Previous studies differed with respect to how "dynamic eQTL" were defined. Here, we propose a unified framework distinguishing static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. By using murine mRNA expression data from four stages of hematopoiesis and 14 related cellular traits, we demonstrate that static, conditional and dynamic eQTL, although derived from the same expression data, represent functionally distinct types of eQTL. While static eQTL affect generic cellular processes, non-static eQTL are more often involved in hematopoiesis and immune response. Our analysis revealed substantial effects of individual genetic variation on cell type-specific expression regulation. Among a total number of 3,941 eQTL we detected 2,729 static eQTL, 1,187 eQTL were conditionally active in one or several cell types, and 70 eQTL affected expression changes during cell type transitions. We also found evidence for feedback control mechanisms reverting the effect of an eQTL specifically in certain cell types. Loci correlated with hematological traits were enriched for conditional eQTL, thus, demonstrating the importance of conditional eQTL for understanding molecular mechanisms underlying physiological trait variation. The classification proposed here has the potential to streamline and unify future analysis of conditional and dynamic eQTL.

17.) Jannik Vollmer; University Stuttgart; Germany; [email protected] signaling has been implicated in non-small-cell lung cancer (NSCLC) development

Jannik Vollmer

Autocrine/paracrine FGF signaling has been implicated in non-small-cell lung cancer (NSCLC) development and progression. In multiple NSCLC cell lines, a switch in lung epithelial cell expression from FGFR2-IIIb to FGFR1-IIIc and FGFR2-IIIc results in autocrine response to FGF2 ligand, which is also expressed by these cell lines. FGF2 and FGFR1 expression have also been detected in patient samples and has been shown to correlate with short survival times. Stimulation with FGF2 revealed a bell-shaped dose-response in NCI-H1703 cells. The FGF signaling complex involves a ternary interaction of FGF ligand, receptor, and heparan sulfate glycosaminoglycans (HSGAGs). HSGAGs bind directly to both ligand and receptor, and can serve both to stabilize the receptor-ligand complex and to sequester autocrine/paracrine ligands, thus increasing their local concentration. A systems approach combining experiments and modeling was used to understand the complex interplay of these components and their implications on cell signaling and behavior. The mathematical model explains the mechanism of experimentally observed biphasic signaling response as well as several perturbation experiments with extracellular heparin addition and inhibitors of intracellular signaling molecules.

18.) Andreas Leha; University Medical Center Göttingen; Germany; [email protected] "Cancerous Appetite" - Metabolic Remodeling in Glioblastoma Multiforme

Ashwini Kumar Sharma, Bernhard Radlwimmer, Roland Eils, Rainer König

The complex molecular circuitry of cancer survival and progression pose a daunting challenge in understanding the underlying mechanisms and its associated pathways. Deregulation of signaling pathways result in metabolic reprogramming, an “emerging hallmark of cancer” which is crucial for cancer survival. Analysis of three different

15

gene expression data-sets (TCGA, GSE16011 and unpublished in-house data) for Glioblastoma Multiforme (GBM) revealed important alterations in glutaminolysis, amino acid and fatty acid metabolisms. In the pathway for leucine degradation a possible case of “enzymatic function takeover” was observed wherein a functional counterpart could replace a down-regulated enzyme. IDH1 mutation status is an important classifier that categorizes GBM into Primary and Secondary grades with the latter having better prognosis. The GBM gene expression data was classified based on IDH1 mutations. Metabolic profiles for these two grades revealed subtle differences in glutaminolysis which could lead to their respective phenotype. Flux balance analysis was performed to simulate the changes in cellular energetics of GBM. The gene expression data was integrated with the generic human metabolic model (Human Recon 1) to obtain a GBM specific metabolic flux profile. The deregulated sub-networks in GBM were analyzed for metabolic flux distributions along with altered flux directionality in associated metabolic networks.

19.) Andreas Leha; University Medical Center Göttingen; Germany; [email protected] Ordinal Therapy Response with High-Dimensional Expression Data

Andreas Leha, Klaus Jung, Tim Beißbarth

Molecular diagnosis or prediction of clinical treatment outcome based on high-throughput genomics data is a modern application of machine learning techniques for clinical problems. In practice, clinical parameters, such as patient health status or toxic reaction, are often measured on an ordinal scale (e.g. good, fair, poor). Commonly, the prediction of ordinal end-points is treated as a multi-class classification problem, disregarding the ordering information contained in the response. This may result in a loss of prediction accuracy. Classical approaches to model ordinal response directly, including for instance the cumulative logit model, are typically not applicable to high-dimensional data. Although there have been some extensions of existing methods for response prediction tailored towards ordinal response and high-dimensional data (e.g. [AW12]), the choice of methodology is still limited and the field is still lacking a comparative study.

We present a comparison of several approaches for ordinal classification on real world data as well as simulated data including the novel algorithm hierarchical twoing (hi2) that extends [FH01] and combines the power of well-understood binary classifiation with ordinal response prediction. Our findings suggest, that the classification performance of an algorithm is dominated by its ability to deal with the high-dimensionality of the data. Although the comparative evaluation do not show a clear winner, taking the ordinality of the response into account can improve the classification accuracy.

References :[AW12] K.J. Archer and A.A.A. Williams. L 1 penalized continuation ratio models for ordinal response prediction using high-dimensional datasets. Statistics in Medicine, 2012. [FH01] Eibe Frank and Mark Hall. A simple approach to ordinal classifiecation. In In: Proc 12th Europ Conf on Machine Learning, pages 145 156. Springer, 2001.�

20.) Klaus Jung; University Medical Center Göttingen; Germany; [email protected] Feature Selection with Parallel High-Dimensional Expression Data of microRNAs and mRNAs

Klaus Jung, Stephan Artmann, Annalen Bleckmann and Tim Beißbarth

While in the past, high-dimensional expression data of different molecular levels (e.g. genome, proteom, epigenome) was often studied separately, these data are now often available in parallel, i.e. they are observed on the same patient samples. This opens the opportunity to study interactions between the different molecular levels and to improve standard analyses with high-dimensional data such as feature selection or the training of a classifier. A frequent case of parallel high-dimensional expression data is the joint availability of microRNA and mRNA data [1]. We present new methods for selecting differentially expressed microRNAs in a two-group setting by borrowing additional information of their related mRNA target gene sets. In detail, microRNAs are selected first by individual test results. Next, the group effect of the related mRNA target sets are assessed by means of global test procedures or gene set enrichment methods. The p-values of individual miRNA tests are combined with the p-values from the gene set specific tests by means of meta-analytic approaches. The connection between individual microRNA and target gene set was obtained from the 'TargetScan'-data base [2]. In a simulation study and on several real world examples we assess the performance of our approaches. In particular we compare the false-discovery rate and the power rate of individual miRNA selection with that of combined testing. In summary, we found that connecting parallel expression data of the microRNA and the mRNA level can improve power rates while still maintaining a pre-specified false-discovery rate.

References:[1] Artmann S, Jung K, Bleckmann, A and Beißbarth T (2012): Detection of simultaneous Group Effects in microRNA Expression and related Target Gene Sets. PLoS ONE, 7, e38365.[2] Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, et al. (2007) MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27: 91-105.

Posters should be up throughout the whole workshop at the given Poster-number.

16

Participants

Name Institute Email Name Institute Email

17

Venue

The workshop is taking place in the lecture hall 7.01 of the University of Stuttgart in the building opposite to the S-Bahn Station, Pfaffenwaldring 7, 70569 Stuttgart.

Stuttgart is easily reachable by high-speed train connections from most parts of Germany. The workshop location can be reached from the main train station by S-Bahn S1 (direction Herrenberg), S2 (direction Filderstadt via Flughafen/Messe) or S3 (direction Flughafen/Messe, all 10 minutes driving time, 2.60 Euro). The closest airport is Stuttgart Airport, which also has a direct connection to the University via S-Bahn S2 (direction Schorndorf) and S3 (direction Backnang, both 17 minutes, 2.60 Euro)

Participants who are interested will meet for dinner at the Restaurant Amadeus (Charlottenplatz 17, 70173 Stuttgart) on Thursday October 11th at 8pm. The restaurant is in the city center near “Altes Schloss” (Old Castle) and is reachable within 10 minutes walking from the main train station or the S-Bahn station “Stadtmitte”. Just follow the information signs to “Schlossplatz”. If you stand between “Neues Schloss” and “Altes Schloss” with viewing direction to “Altes Schloss”, then “Chalottenplatz” is to the left hand side of “Altes Schloss”, directly behind the “Karlsplatz”. Please sign up in the get-together list before the end of the first poster session on October 11th, if you plan to attend.

Hint: The Workshop takes place at the main University Campus, which is not directly in the city center, but in Stuttgart-Vaihingen, but is easily reachable from the city center by the S-Bahn (see above). There are only few possibilities for accommodation directly near the campus (The Tagungshotel Commundo, which is mentioned on the workshop homepage, is one of them), and we recommend to search for an accommodation between the University and the city center that is close to one of the S-Bahn stations “Hauptbahnhof”, “Stadtmitte”, “Feuersee” or “Schwabstraße”.

Workshop Venue at University Campus Restaurant Amadeus from Main Train Station

18