2019 winter school in mathematical & computational...
TRANSCRIPT
Hosted and sponsored by Institute for Molecular Bioscience
2019 WINTER SCHOOLIN
MATHEMATICAL & COMPUTATIONAL BIOLOGY
1‐5 July 2019
The AuditoriumQueensland Bioscience PrecinctThe University of Queensland
Brisbane, Australia
PROGRAM
2019 Winter School in Mathematical and Computational Biology 1‐5 July 2019
http://bioinformatics.org.au/winterschool
Queensland Bioscience Precinct (Building #80) The University of Queensland
Brisbane, Australia
i
MONDAY 1 JULY 2019 08:00 Registration desk open NEXT GENERATION SEQUENCING & BIOINFORMATICS 09:00 – 09:05 Welcome and introduction
Dr Nicholas Hamilton Research Computing Centre and Institute for Molecular Bioscience The University of Queensland
09:05 – 09:40 Next‐generation sequencing – technology overview Dr Christopher Noune Australian Genome Research Facility Ltd (AGRF), Brisbane
09:40– 10:15 NGS mapping, errors and quality control Dr Felicity Newell QIMR Berghofer Medical Research Institute, Brisbane
10:15 – 10:45 Mutation detection in whole‐genome sequencing Dr Ann‐Marie Patch QIMR Berghofer Medical Research Institute, Brisbane
10:45 – 11:15
Morning Tea
11:15 – 11:50 De novo genome assembly A/Professor Torsten Seemann The University of Melbourne
11:50 – 12:25 Genomics of non‐model organisms Dr Ira Cooke James Cook University, Townsville
12:25 – 13:00 Statistical ‘omics integration Dr Kim‐Anh Lê Cao The University of Melbourne
13:00 – 13:45
Lunch
13:45 – 14:20 Adaptation and conservation insights from the koala genome include a diversity of diversifying selection on cytochrome P450 monooxygenase sequences Dr Catherine Grueber The University of Sydney
14:20 – 14:55 Comparative genomics of pregnancy Dr Camilla Whittington The University of Sydney
14:55 – 15:30 Whole‐genome sequencing and cancer genomics Dr Katia Nones QIM Berghofer Medical Research Institute, Brisbane
ii
15:30 – 16:00
Afternoon Tea
16:00 – 16:25 Bioinformatics in industrial research Dr Benjamin Goudey IBM Research‐Australia, Melbourne
16:25 – 16:50 Bioinformatics for a direct‐to‐the‐public microbiome product Dr David Wood Microba, Brisbane
16:50 – 17:15 Defensive NGS informatics – what can go wrong and how do you know when to throw in the towel? Mr John Pearson QIMR Berghofer Medical Research Institute, Brisbane
17:45
Social BBQ Venue: Auditorium foyer (If weather permits, BBQ will be held on the rooftop of the Queensland Bioscience Precinct)
iii
TUESDAY 2 JULY 2019
NEXT GENERATION SEQUENCING & BIOINFORMATICS
09:00 – 09:45
An Introduction to RNA‐seq A/Prof. Nicole Cloonan The University of Auckland, New Zealand
09:45 – 10:30
Bioinformatics analysis of single‐cell RNA‐seq data Dr Joshua W.K. Ho The University of Hong Kong
10:30 – 11:00
Morning Tea
LONG READ BIOINFORMATICS AND APPLICATIONS
11:00 – 11:45
Fast and accurate long‐read assembly with wtdbg2 Dr Jue Ruan Chinese Academy of Agricultural Sciences, China
11:45 – 12:30
RNAseq 2.0: a single molecule revolution Dr Martin Smith Garvan Institute of Medical Research, Sydney
12:30 – 13:30
Lunch
13:30 – 14:15
Global effort to sequence the Australian’s iconic animal – lessons from the koala genome Dr Zhiliang Chen Illumina, Sydney
14:15 – 15:00
Nanopore and I Dr Han Ming Gan Deakin University, Melbourne
15:00 – 15:30
Afternoon Tea
15:30 – 16:15
Completing bacterial genomes using long sequencing reads: working towards the perfect genome assembly Mr Ryan Wick Monash University, Melbourne
16:15 – 17:00
Applications of long‐read sequencing in microbial genomics Ms Leah Roberts The University of Queensland
iv
WEDNESDAY 3 JULY 2019
LONG READ BIOINFORMATICS AND APPLICATIONS
9:00 – 09:45
Genotyping tandem repeats with high throughput sequencing Dr Devika Ganesamoorthy The University of Queensland
09:45 – 10:30
The application of long read sequencing technologies to animal agriculture Dr Elizabeth Ross QAAFI, The University of Queensland
10:30 – 11:00
Morning Tea
11:00 – 12:00 The dawn of cloud‐native bioinformatics Dr Denis Bauer CSIRO, Sydney
12:00 – 12:45 Panel discussion Chair: A/Prof. Nicole Cloonan The University of Auckland, New Zealand
*** FREE WEDNESDAY AFTERNOON ***
SPECIAL ACTIVITIES IN THE AFTERNOON
12:45 – 13:15
IMB tour – Limited to 50 attendees only Meeting point: Auditorium foyer
14:00 – 17:00 Special Wednesday Afternoon WorkshopAn introduction to Galaxy with the NeCTAR Genomics Virtual Laboratory Dr Igor Makunin Research Computing Centre, The University of Queensland Venue: Multi Media Room (Room 3.141, access through the auditorium foyer) (This workshop is limited to 36 attendees only and is intended for bench scientists, and no previous informatics experience is needed.) What is required before attending the workshop? Remember to download Galaxy Workshop Information Sheet from the 2018 Winter School web site. http://bioinformatics.org.au/winterschool/program/
v
THURSDAY 4 JULY 2019 DATA SCIENCE AND MACHINE LEARNING FOR BIOINFORMATICS
09:00 – 09:45
Machine learning for bioinformatics A/Prof. Nicola Armstrong Murdoch University, Perth
09:45 – 10:30
Integrating cancer research with machine learning – achievements and challenges Dr Maren Westermann Max Kelsen, Brisbane
10:30 – 11:00
Morning Tea
11:00 – 11:45
Feature selection for biological data: educated guesses or blind computation Dr James Doecke CSIRO, Brisbane
11:45 – 12:30
Statistical compression of protein folding patterns and inference of recurrent substructural themes Dr Arun Konagurthu Monash University, Melbourne
12:30 – 13:30
Lunch
13:30 – 14:15
Will it cut? Predicting the efficiency of CRISPR‐based gene editing Dr Dimitri Perrin Queensland University of Technology
14:15 – 15:00
Introduction of machine learning for health data analytics A/Prof. Hanna Suominen Australian National University, Canberra
15:00 – 15:30
Afternoon Tea
15:30 – 16:00
Data science and machine learning for bioinformatics Dr Marina Naval Sanschez CSIRO, Brisbane
16:00 – 16:30
Extracting knowledge from models trained on biological data to measure performance and improve understanding of the system Ms Alexandra Essebier The University of Queensland
16:30 – 17:00
Random forest and its application to genome‐wide association studies Dr Arash Bayat CSIRO, Sydney
17:00 – 17:15
Resource talk: How COMBINE supports students in bioinformatics, computational biology and related fields Mr Rhys White The University of Queensland
vi
FRIDAY 5 JULY 2019 GETTING STARTED WITH BIOINFORMATIC SOFTWARE
09:00 – 09:45
Galaxy Australia: advanced bioinformatics within a biologist‐friendly interface Dr Igor Makunin Research Computing Centre, The University of Queensland
09:45– 10:30
VariantSpark: A cloud‐based machine learning approach for big genomic data Dr Arash Bayat CSIRO, Sydney
10:30 – 11:00
Morning Tea
11:00 – 11:30
Deploying and utilising virtual servers in the NeCTAR Cloud Mr Thom Cuddihy QFAB Bioinformatics and UQ Research Computing Centre, The University of Queensland
11:30 – 11:50
Hacky Hours – communities to learn and build data science Ms Amanda Miotto Griffith University & QCIF
12:00 – 13:00
IMB Friday Noon Seminar in conjunction with Winter School How can long reads help us diagnose and track microbial pathogens? Prof. Nicholas Loman, University of Birmingham, UK
13:00 – 14:00
Lunch
14:00 – 14:45
Generating dynamic reports using R Markdown in RStudio Dr Momeneh (Sepideh) Foroutan The University of Melbourne
14:45 – 15:30
An introduction to network analysis methods Dr Melissa Davis Walter and Eliza Hall Institute for Medical Research, Melbourne
15:30 – 16:15
What the heck is a Hackathon? How Hackathons promote creativity and community in bioinformatics A/Prof. Jessica Mar, The University of Queensland
16:15 – 16:20
Travel awards and close
16:30
Refreshment with SIMBA (Student IMB Association)
~*~*~*~*~
BIOGRAPHY AND ABSTRACT
1
Dr Christopher Noune Laboratory Supervisor – Microbial Profiling and Custom Amplicon NGS Australian Genome Research Facility Ltd (AGRF) Brisbane
Biography: Dr Christopher Noune is the Microbial Profiling and Custom Amplicon Next Generation Sequencing (NGS) supervisor from the Australian Genome Research Facility (AGRF) based in Melbourne. Within this position, Chris manages the metagenomics portfolio offered by the AGRF such as 16S and ITS amplicon based profiling, and including client collaboration in applying custom amplicon NGS approaches that can be used to analyse a wide‐range of meta‐barcoding studies. Prior to joining the AGRF in January 2018, Chris completed his PhD at the Queensland University of Technology, in which he studied the Dynamics, Diversity and Evolution of Baculoviruses by applying various NGS and bioinformatic techniques. Date: Monday 1 July 2019 Presentation title: Next‐generation sequencing – technology overview Abstract: The “Next‐Generation Sequencing” landscape is one of constant change, with new and emerging technologies competing with established methods and platforms. Chris talks about the sequencing technologies that currently dominate the sequencing landscape, as well as those that are poised to take over from them. In doing so, Chris will explain how each of the sequencing technologies works, and will give examples of projects that are suitable for each type of platform, as well as taking a look at what’s “next” in Next‐Gen.
BIOGRAPHY AND ABSTRACT
2
Dr Felicity Newell Senior Research Officer QIMR Berghofer Medical Research Institute Brisbane
Biography: Felicity received her PhD from The University of Queensland (UQ) in 2007, and completed a Master of Information Technology at the Queensland University of Technology (QUT) in 2009. She has worked as a bioinformatics programmer, developing biological web applications at QFAB Bioinformatics and software for the analysis of cancer sequencing data at the Queensland Centre for Medical Genomics at UQ. She has also conducted postdoctoral research at UQ and QUT, using bioinformatics approaches to study cancer and autoimmune diseases. She is currently a Senior Research Officer within the Medical Genomics group at QIMR Berghofer Medical Research Institute where her research involves using next generation sequencing data to understand the genetics of cancers including oesophageal adenocarcinoma and melanoma. Date: Monday 1 July 2019 Presentation title: NGS mapping, errors and quality control Abstract: The analysis of next generation sequencing data often requires alignment (mapping) of the reads that are generated to a reference genome. Alignment software needs to be able to efficiently identify the location of reads within a reference genome while accounting for sequence variation, including true variation such as single nucleotide polymorphisms, as well as differences that are introduced as the result of sequencing errors. In this presentation, I will give an overview of some of the approaches to sequence alignment. A good understanding of the common errors and biases that can occur with mapping and throughout NGS processing pipelines is necessary in order to obtain high quality data from downstream analyses such as variant detection. I will also discuss the sources of such errors and outline quality control steps that may be performed.
BIOGRAPHY AND ABSTRACT
3
Dr Ann‐Marie Patch Bioinformatician QIMR Berghofer Medical Research Institute Brisbane
Biography: Ann‐Marie leads the Clinical Genomics group at QIMR Berghofer Medical Research Institute. She is a bioinformatician who is an expert in the interpretation of multiple‐omics data, including whole genome sequencing, transcriptomics and methylomics in cancer research. Her team works across many cancer projects to develop methods for the interpretation of somatic mutations, copy number and structural variations and to integrate expression and methylation analyses. Her research interests focus on understanding the intricacies of genomic heterogeneity of cancer and how that affects response to therapy. Date: Monday 1 July 2019 Presentation title: An introduction to variant detection for whole genome data Abstract: The technology for generating sequencing data is rapidly developing, as is the type and number of sequencing analysis software tools. For most alignment based projects variant detection is a key process that underlies layers of complex higher‐level analysis. Therefore, it is important to ensure this process is robust and have a way of testing how well your process identifies variants. In this talk, I will describe and discuss the principles and challenges of identifying the full range of mutation types including single nucleotide variants, indels up to large structural variants (SVs) from whole genome sequencing.
BIOGRAPHY AND ABSTRACT
4
A/Prof. Torsten Seemann Lead Bioinformatician Melbourne Bioinformatics & Doherty Applied Microbial Genomics The University of Melbourne
Biography: A/Prof. Torsten Seemann is the lead bioinformatician at Melbourne Bioinformatics, formerly known as Victorian Life Sciences Computation Initiative (VLSCI) and Doherty Applied Microbial Genomics, both at the University of Melbourne. His work uses bioinformatics and genomics to better understand the spread and evolution of bacterial pathogens and antimicrobial resistance. He is best known for his software tools which are used internationally, and he is a strong supporter of open science. Date: Monday 1 July 2019 Presentation title: De novo genome assembly Abstract: How do we generate the genome sequence of our favourite organism? In this talk I will introduce the problem of de novo genome assembly; describe the strategies and caveats of the way the problem is tackled; and outline ways to assess the results. The related problems of transcriptome and metagenome assembly, and how the latest technologies are transforming de novo assembly, will also be touched upon.
BIOGRAPHY AND ABSTRACT
5
Dr Ira Cooke Senior Lecturer in bioinformatics James Cook University Townsville
Biography: Ira Cooke is a senior lecturer in bioinformatics at James Cook University where he is also Co‐Director of the Centre for Tropical Bioinformatics and Molecular Biology. Originally trained as a physicist, he made the transition to bioinformatics a decade ago and has not looked back. His research uses comparative ‘omic approaches to help understand development, toxicity, immunity and microbial interactions in corals and cephalopods. Date: Monday 1 July2019 Presentation title: Genomics of non‐model organisms Abstract: So you have assembled the genome of your favourite organism. This talk will describe what comes next, including gene and repeat modelling, comparative genomics and population genomics. It will also explore the potential afforded by falling sequencing costs and new technologies to reduce the gap in resource quality between model and non‐model organisms.
BIOGRAPHY AND ABSTRACT
6
Dr Kim‐Anh Lê Cao Senior Lecturer The University of Melbourne
Biography: Dr Kim‐Anh Lê Cao graduated from her PhD in 2008 at the Université de Toulouse, France. Soon after her graduation she moved to Australia and was appointed as a postdoctoral research fellow at the Institute for Molecular Bioscience, The University of Queensland, then as a Research and Consultant Biostatistician at QFAB Bioinformatics in 2009 – 2013. Kim‐Anh’s research directions veered towards biomedical problems when she moved to UQ Diamantina Institute in 2014 and was awarded an NHMRC Career Development Fellowship (CDF1). In 2017, she joined The University of Melbourne, as a Senior Lecturer at the School of Mathematics and Statistics, and Melbourne Integrative Genomics that hosts biology‐focused researchers with statistical and computational skills. In 2019 she was awarded her NHRMC CDF2, focusing on microbiome studies and received the biennial Moran medal in Statistical Sciences from the Australian Academy of Science. Dr Kim‐Anh Lê Cao is an expert in multivariate statistical methods and develops novel methods for ‘omics data integration. Since 2009, her team has been working on developing the R toolkit mixOmics dedicated to the integrative analysis of `omics' data to help researchers mine and make sense of biological data (http://www.mixOmics.org). More information about Kim‐Anh’s research group: http://lecao‐lab.science.unimelb.edu.au/ Date: Monday 1 July 2019 Presentation title: Statistical ‘omics integration Abstract: Technological improvements have allowed for the collection of data from different molecular compartments (e.g. gene expression, protein abundance) resulting in multiple omics data from the same set of biospecimens or individuals (e.g. transcriptomics, proteomics). We propose to adopt a systems biology holistic approach by statistically integrating data from multiple biological compartments. Such approach provides improved biological insights compared with traditional single omics analyses, as it allows to take into account interactions between omics layers. In this talk, I will present a dimension reduction multivariate method called DIABLO, which addresses data integration challenges, such as the complexity and sheer size of the datasets, each with few samples and many molecules, and the heterogeneous nature of data measured on different scales and technological platforms. DIABLO is a hypothesis‐free method that constructs combinations of variables (e.g. cytokines, transcripts, proteins, metabolites) that are maximally correlated across data types to identify a minimal subset of markers – a multi‐omics signature. This signature can highlight novel findings but is also the starting point to network modelling. DIABLO is not limited to a data‐driven analysis, and can also handle pathway‐based analysis, or a mix of knowledge‐ and data‐ driven analyses. I will illustrate the use of DIABLO in studies we have analysed for bulk omics, microbiome, and single cells. DIABLO is implemented in our package mixOmics, dedicated to omics data integration.
BIOGRAPHY AND ABSTRACT
7
Dr Catherine Grueber Robinson Fellow School of Life and Environmental Sciences The University of Sydney
Biography: Catherine completed her PhD at the University of Otago (New Zealand), followed by a postdoctoral position at the University of Sydney. She has recently been awarded a prestigious Robinson Fellowship in the School of Life and Environmental Sciences at the University of Sydney, which has allowed her to set up a research group in applied evolutionary genetics. Catherine and her team investigate how animal populations respond to natural and “unnatural” conditions: whether bringing threatened species into captivity to prevent extinction, or securing a more productive food supply through animal breeding. This research uses evolutionary theory, population genetics, computational modelling and meta‐analysis to learn how to maximise species resilience for the future. Catherine’s research has been supported by the Australian Research Council, the Save the Tasmanian Devil Program, and San Diego Zoo Global. Date: Monday 1 July 2019 Presentation title: Adaptation and conservation insights from the koala genome include a diversity of diversifying selection on cytochrome P450 monooxygenase sequences Abstract: Koala (Phascolarctos cinereus), an endemic Australian marsupial, feeds almost entirely on leaves from the Eucalyptus genus, a diet that would be toxic to most mammals. Sequencing the koala genome enables us to better understand the species’ unique adaptations to this diet. The presented work forms part of this larger investigation (by researchers from the Koala Genome Consortium), into the evolutionary and conservation lessons we can learn from the koala genome, including adaptation to diet. The presentation begins with an overview of the koala reference genome construction. I next present a specific analysis of selection on cytochrome P450 monooxygenase (CYP) gene sequences across a multispecies alignment (N = 154 sequences, 33 from koala, as well as sequences from nine other species). We tested for diversifying selection utilising a mixed‐effects model of episodic diversifying selection to reveal episodic selection: codons under positive selection in only a part of the tree, while under purifying selection elsewhere. We found conserved regions of the alignment showing a strong tendency towards negative (purifying) selection, as would be expected for a functional protein. Nevertheless, many codons showed evidence of episodic selection, including some with significantly greater evidence for selection in koala‐specific lineages than in other species. These effects also varied across gene paralogues. Collectively these results suggest that koala CYPs evolve under diversifying selection: multiple genes are under different types of selection, and different codons appear to be under selection across genes. Together these results from the koala genome have implications for our understanding of the evolution of toxin metabolism in mammals.
BIOGRAPHY AND ABSTRACT
8
Dr Camilla Whittington Senior Lecturer School of Life and Environmental Sciences The University of Sydney
Biography: Camilla completed her PhD at the University of Sydney, followed by postdoctoral positions at the University of Zurich and the University of Sydney. She also spent time as a Fulbright Fellow at Washington University. Camilla is now a Senior Lecturer in the School of Life and Environmental Sciences at the University of Sydney. Her group’s research focuses on the evolution of pregnancy, using a combination of genomic, physiological, and morphological techniques. The research is funded by a University of Sydney Research Fellowship, L’Oreal‐UNESCO for Women in Science Fellowship, and the Australian Research Council. Date: Monday 1 July 2019 Presentation title: Comparative genomics of pregnancy Abstract: Evolutionary innovations such as eyes, wings, and live birth (viviparity) are dramatic, adaptive novelties that have shaped the evolutionary trajectories of animals. However, their origins are poorly understood because they are produced by the collective action and evolution of thousands of genes. By applying genomic techniques to a targeted range of animals, my work aims to elucidate the genetic underpinnings of evolutionary innovations and to discover fundamental evolutionary mechanisms. Our current focus is on understanding the fundamental biology and repeated evolution of viviparity in vertebrates. I will discuss our studies of the transition from oviparity (egg laying) to viviparity in reptiles, mammals, and fish, including the male‐pregnant seahorse. Our work suggests that there are common evolutionary mechanisms that underpin the development of novel traits across divergent species.
BIOGRAPHY AND ABSTRACT
9
Dr Katia Nones Senior Research Officer QIMR Berghofer Medical Research Institute Brisbane
Biography: Dr Katia Nones is a Senior Research Officer at QIMR Berghofer Medical Research Institute. She is an expert on the interpretation of next generation sequencing and array data for cancer research. She was a member of the Australian International Cancer Genome Consortium (ICGC) and is a member of the Australian Genomics Heath Alliance. Her research has been focused on using genomic, epigenomic and expression data in a multi‐disciplinary field with a particular interest in cancer research. She uses whole‐genome sequencing to identify novel driver genes and identify potentially druggable targets. She also uses tumour specific mutations to identify mutational signatures associated to processes linked to tumour development. These signatures can also indicate treatment options. Date: Monday 1 July 2019 Presentation title: Whole‐genome sequencing and cancer genomics Abstract: In this talk I will give some examples of how we used the whole‐genome sequencing data in our research to identify the tumour specific mutations. I will also show examples of how the pattern of these mutations can give us information about driver genes in different cancer types and give us clues about potential treatment options.
BIOGRAPHY AND ABSTRACT
10
Dr Benjamin Goudey Research Scientist IBM Research‐Australia Melbourne
Biography: Benjamin Goudey has been a Research Scientist at IBM Research Australia since 2014. During this time, he has been involved in a range of topics including using supercomputers to analyse billions of pairs of genetic variants, developing scalable techniques for bacterial genomics, and more recently, building prognostic models for Alzheimer’s disease. The central theme throughout most of his research is the development of predictive models from messy datasets, typically focused genomics. Ben received his PhD in 2016 in Computer Science at the University of Melbourne and is an Honorary Research Fellow in the School of Population Health, University of Melbourne. Date: Monday 1 July 2019 Presentation title: Bioinformatics in industrial research Abstract: Industrial research is all about conducting research within a company to develop new products, enhance existing ones, or identify new areas the company should pursue. The increasing amount of high‐throughput biological data being generated in healthcare and life sciences, and the associated growth in companies trying to make use of this data, means that there is an increasing number of non‐academic research roles available for bioinformaticians and computational biologists. In this presentation, I'll talk about my experience of being a bioinformatician at IBM Research Australia. I'll provide an overview of my role and will describe a few of the projects that I've been involved in; some related to genomics, other less so. I'll also describe some of the advantages and challenges of working in a large company rather than in academia will provide some insights that might be useful for those thinking about non‐academic career paths.
BIOGRAPHY AND ABSTRACT
11
Dr David Wood Lead Bioinformatician Microba Brisbane
Biography: Dr David Wood has over 10 years’ experience in bioinformatics in industry and academia. David is interested in the use and advancement of high‐throughput genomics technologies in consumer products and public health. He is an author on over 30 scientific publications. Date: Monday 1 July 2019 Presentation title: Bioinformatics for a direct‐to‐the‐public microbiome product Abstract: Microba is an early‐stage company spun out of The University of Queensland, providing Australia's first direct‐to‐consumer metagenomics microbiome test, and one of the first worldwide. In this talk I will discuss building and running Microba's Metagenomics Analysis Platform (MAP), our industry‐level fast‐turnaround cloud‐based bioinformatics platform transforming fastq to a customer‐friendly microbiome report in sub 24hrs.
BIOGRAPHY AND ABSTRACT
12
Mr John Pearson Team Leader Genome Informatics QIMR Berghofer Medical Research Institute Brisbane
Biography: John Pearson has spent 25 years as a bioinformatician creating software for medical researchers and has worked at NIH (National Institutes of Health), The University of Queensland, QIMR Berghofer Medical Research Institute. He was a founding Faculty member at the Translational Genomics Research Institute (TGen) in Phoenix, Arizona. John has held software development grants from Microsoft, the American Cancer Society, and the National Institutes of Health and has participated in the 1000 Genomes Project and the International Cancer Genome Consortium. Date: Monday 1 July 2019 Presentation Title: Defensive NGS informatics ‐ what can go wrong and how do you know when to throw in the towel? Abstract: Next‐generation sequencing has radically changed medical research by allowing deep interrogation of the DNA and RNA of pathogenic organisms, families with inherited disorders and the de‐novo mutations responsible for tumourigenesis. As with any new technology, a "gold rush" mentality can arise where being first to the answer can push rigour and methodological soundness into the background. In this seminar, I'll talk about some of the ways sequencing can go wrong, how the problems became apparent, what we did about them, and tools we developed to try to catch the same problems in future.
BIOGRAPHY AND ABSTRACT
13
A/Prof. Nicole Cloonan School of Biological Sciences The University of Auckland New Zealand
Biography: Nicole Cloonan is an Associate Professor in Bioinformatics at The University of Auckland. She was previously an ARC Future Fellow at the QIMR Berghofer Medical Research Institute, and an ARC Postdoctoral Fellow at The University of Queensland. Her work is multi‐disciplinary in nature, involving computational biology and bioinformatics, biochemistry, cell biology, and molecular biology – all of which she uses to understand the complexity of RNA systems. She’s pretty awesome, you should come to New Zealand and do a PhD with her. Date: Tuesday 2 July 2019 Presentation title: An Introduction to RNA seq Abstract: RNA‐seq publications are nearly ubiquitous across biological disciplines, as is, regrettably, the poor analysis of RNA‐seq data. While RNA‐seq technologies have matured over the last decade, the fundamentals of analysis have not, and yet many avoidable errors are published each year. With the rise of black‐box analysis software, it is more important than ever to understand and appreciate the limitations of the tools we use. Although this RNA‐seq group therapy session will be presented with love, care, and support, those with fragile emotional states (such as PhD students at the end of their RNAseq‐based study) may still wish to avoid this presentation.
BIOGRAPHY AND ABSTRACT
14
A/Prof. Joshua W.K. Ho School of Biomedical Sciences The University of Hong Kong Hong Kong
Biography: Dr Joshua Ho is an Associate Professor in the School of Biomedical Sciences at the University of Hong Kong (HKU). Dr Ho completed his BSc (Hon 1, Medal) and PhD in Bioinformatics from the University of Sydney, and undertook postdoctoral research at the Harvard Medical School. Prior to joining HKU, Dr Ho was the Head of Bioinformatics at the Victor Chang Cardiac Research Institute in Sydney, where he was also a NHMRC Career Development Fellow and a National Heart Foundation Future Leader Fellow. His current research focuses on single‐cell analytics, systems approaches to study gene regulation, scalable big data bioinformatics methods, and biomedical software quality assurance. Dr Ho has over 76 publications, including first or senior‐author papers in leading journals such as Nature, Genome Biology, Nucleic Acids Research and Science Signaling. His research excellence was recognised by the 2015 NSW Ministerial Award for Rising Star in Cardiovascular Research, the 2015 Australian Epigenetics Alliance’s Illumina Early Career Research Award, and the 2016 Young Tall Poppy Science Award. Date: Tuesday 2 July 2019 Presentation title: Bioinformatics analysis of single‐cell RNA‐seq data Abstract: In this lecture, we will explore the characteristics of single‐cell RNA‐seq data, and discuss current bioinformatics methods and software that have been designed to analyse these data.
BIOGRAPHY AND ABSTRACT
15
Dr Jue Ruan Chief Scientist Innovation Team of Agricultural Genomics Technologies Agricultural Genomics Institute Chinese Academy of Agricultural Sciences China
Biography: Dr Jue Ruan is a Chief Scientist of Innovation Team of Agricultural Genomics Technologies at Chinese Academy of Agricultural Sciences’ Agricultural Genomics Institute, Shenzhen, Guangdong, China. Dr Ruan received his BSc in Biology Science degree in 2004 from Life Science College, Nankai University, Tianjin, China. He completed his PhD in Bioinformatics at Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China in 2009. His research interests include algorithm development of de novo assembly and sequence alignment, and ultra‐low frequency somatic mutation detection. Date: Tuesday 2 July2019 Presentation title: Fast and accurate long‐read assembly with wtdbg2 Abstract: Existing long‐read assemblers require tens of thousands of CPU hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a novel long‐read assembler wtdbg2 that, for human data, is tens of times faster than published tools while achieving comparable contiguity and accuracy. It represents a significant algorithmic advance and paves the way for population‐scale long‐read assembly in future.
BIOGRAPHY AND ABSTRACT
16
Dr Martin Smith Genomic Technologies Leader Kinghorn Centre for Clinical Genomics Garvan Institute of Medical Research Sydney
Biography: Martin is the Genomic Technologies Group Leader at the Kinghorn Centre for Clinical Genomics, located at the Garvan Institute of Medical Research in Sydney, Australia. He is a computational biologist from Canada with a background in genomics, microbiology and immunology. His work has focused on genome and transcriptome annotation using comparative genomics and machine learning. He has been using nanopore sequencing since 2014, with a heavy focus on transcriptomic applications. Date: Tuesday 2 July2019 Presentation title: RNAseq 2.0: a single molecule revolution Abstract: High‐throughput transcriptomic and epigenomic studies have substantiated the prevalence and dynamics of regulatory regions in the human genome, including the surprising diversity and contentious function of long non‐coding RNAs. What additional layers of complexity can single‐cell and single‐molecule sequencing technologies unravel? How will the observation of native molecules in real‐time improve our understanding of health and disease? I will describe genomic and computational strategies for functional transcriptome annotation using nanopore sequencing, with emphasis on targeted RNA sequencing, single cell sequencing, epitranscriptomics, and raw signal analysis.
BIOGRAPHY AND ABSTRACT
17
Dr Zhiliang Chen Senior Bioinformatics Specialist Illumina Sydney
Biography: Zhiliang holds a bachelor and a PhD in bioinformatics. During the years working with NGS data, she has extensive experience in different type of NGS data analysis, from the old days using 454, then moving onto Illumina, and then PacBio. When she was a postdoctoral researcher in the University of New South Wales, she involved in Koala Genome Consortium and assembled the koala genome almost single‐handed. Zhiliang is currently a bioinformatics sales specialist at Illumina based in Australia. Date: Tuesday 2 July 2019 Presentation title: Global effort to sequence the Australian’s iconic animal – lessons from the koala genome Abstract: The koala genome consortium sequenced the koala genome, producing the highest quality marsupial genome to date. This is the first marsupial genome sequenced and assembled using PacBio long‐read technology and it provides insights of the advantage of using long‐read technology to construct high quality genome, as well as solving traditionally hard to assemble regions in the genome.
BIOGRAPHY AND ABSTRACT
18
Dr Han Ming Gan Senior Research Fellow in Genomics School of Life and Environmental Sciences Deakin University Melbourne
Biography: Han Ming Gan (Ming) is a Senior Research Fellow in Genomics at Deakin University Waurn Ponds Campus. He was previously a Senior Research Fellow and Genomics Facility Lab Manager at the Monash University Malaysia Sunway Campus. Prior to his move to academia, he was a Field Application Scientist at ScienceVision SB, the sole distributor of Illumina products in Malaysia. Previously trained as a Molecular Microbiologist during his PhD, in order to keep up with the rapid pace of microbial genomics, he has picked up several computational biology skills through self‐learning during his stint in ScienceVision SB. Armed with the knowledge in Illumina sequencing technology and computational biology, he has successfully set up two brand‐new genomics labs (both still functional) from scratch in Monash Malaysia and Deakin University. His current interests include long read sequencing (Nanopore) and finding practical solutions to improve genome assemblies through hybrid assembly (Illumina + Nanopore). Date: Tuesday 2 July2019 Presentation title: Nanopore and I Abstract: Recent advances in Nanopore long read sequencing technology have transformed the landscape of genomics, enabling the generation of substantially improved genome assemblies without significant capital investment. However, not all DNA’s are created equal and each Nanopore sequencing project appears to require different sets of molecular biology techniques and computational tools. Ming will share his experience on what works and what doesn’t in his lab, from data generation to genome assemblies.
BIOGRAPHY AND ABSTRACT
19
Mr Ryan Wick Research Assistant Monash University Melbourne
Biography: Ryan Wick is a PhD student and research assistant in Prof. Kathryn Holt's group at Monash University. His work mainly focuses on bacterial genome assembly with an emphasis on the use of data from Oxford Nanopore's sequencing platforms. His broader academic interests include machine learning, metagenomics and phylogenomics. Date: Tuesday 2 July2019 Presentation title: Completing bacterial genomes using long sequencing reads: working towards the perfect genome assembly Abstract: Genome assemblers are tools which aim to reconstruct an original genome from sequencing reads. A ‘perfect’ assembler would take only reads as input and output a complete, error‐free genome. This goal is usually impossible with short reads alone, but adding long reads from Oxford Nanopore sequencers brings it tantalisingly within reach. In this talk, I will describe the various ways genome assembly can fail and what researchers can do to achieve a perfect (or close to it) bacterial genome.
BIOGRAPHY AND ABSTRACT
20
Ms Leah Roberts PhD Candidate School of Chemistry and Molecular Biosciences The University of Queensland
Biography: Leah Roberts is in the final stage of her PhD with A/Prof. Scott Beatson and Prof. Mark Schembri in the School of Chemistry and Molecular Biosciences, The University of Queensland. Her PhD has focused on investigating clinically‐significant gram‐negative bacterial using a range of whole genome sequencing technologies. Date: Tuesday 2 July2019 Presentation title: Applications of long‐read sequencing in microbial genomics Abstract: The advent and continual advancement of whole genome sequencing technology has dramatically improved our ability to investigate bacterial genomes at the highest possible resolution. However, despite their small size, many bacterial genomes are unable to be completely resolved using short‐read sequencing alone. This is primarily due to the abundance of repetitive regions and mobile genetic elements in bacterial genomes, which often lead to collapsed repeats and ultimately breaks in the final assembly. This becomes problematic for downstream analyses, as these repetitive regions, which often contain important genomic features such as virulence or antibiotic resistance genes, cannot be accurately contextualised in the final assembly. To overcome this problem, long‐read sequencing technologies have been developed to span across repetitive regions and generate complete assemblies. Two of the main technologies currently offering long‐read sequencing for bacterial genomes are Pacific Biosciences Single Molecule Real‐Time (SMRT) sequencing, and Oxford Nanopore long‐read sequencing. In this talk, I will discuss the advantages and disadvantages of both long‐read sequencing technologies, and expand on the types of analyses we have used as well as those in the literature. This talk will focus on the application of long‐read sequencing in microbial genomics and meta‐genomics, with a particular focus on bacterial isolates in clinical settings.
BIOGRAPHY AND ABSTRACT
21
Dr Devika Ganesamoorthy Research Officer Institute for Molecular Bioscience The University of Queensland
Biography: Dr Devika Ganesamoorthy is an early career researcher at The University of Queensland. She obtained her PhD in Molecular Biology from the University of Melbourne in 2014. She has a strong interest in genomics and her major research focus has been on the development and assessment of high throughput genomic methods to assess genomic variation. She has extensive expertise and skills in Nanopore long‐read sequencing technology and has explored the method for various applications. She is currently a postdoctoral researcher in A/Prof. Lachlan Coin’s group at the Institute for Molecular Bioscience in The University of Queensland, where she is presently working on various projects including human genomic variation and cancer genomics. Date: Wednesday 3 July2019 Presentation title: Genotyping tandem repeats with high throughput sequencing Abstract: Tandem repeats comprise significant proportion of the human genome, including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. There are almost 1 million tandem repeats in the human genome, however only few of these regions have been investigated in terms of disease association. Genome‐wide analysis of tandem repeats is hindered due to the limitations in analysis techniques and lack of high throughput analysis methods. Recent advances in high throughput sequencing technologies provides an opportunity to explore this class of genomic variation that has been under‐studied in genomic research. We have developed novel targeted sequencing approaches to facilitate high throughput analysis of tandem repeats in large‐scale association studies. These targeted approaches can be used in combination with short‐read Illumina sequencing or long‐read Nanopore sequencing technology. We have also developed novel genotyping tools (GtTR and VNTRTyper) to improve the analysis of TRs from both long‐read and short‐read sequencing data. The genotyping estimates from these tools are comparable to the accuracy of PCR genotyping. These targeted approaches and analysis tools will assist us to explore the impact of tandem repeat variations in complex traits and common diseases.
BIOGRAPHY AND ABSTRACT
22
Dr Elizabeth Ross Research Fellow QAAFI The University of Queensland
Biography: Originally from country Victoria, Elizabeth moved to Melbourne to study a Bachelor of Animal and Veterinary Bioscience at La Trobe University. Following an honours year examining the link between MHCII genes and parasite burden in marsupials she began a PhD based at the Victorian Department of Primary Industries. Her PhD covered the metagenomic analysis of the dairy cattle rumen microbiome, including the prediction of quantitative traits from whole microbiome shotgun sequence data, virome assembly and sample comparisons. After a one year Postdoc at the University of Melbourne on the chickpea transcriptome, she took a two year break from academia before moving to The University of Queensland where she begun working with long read sequence data. Here at UQ she has assembled a platinum quality cattle genome using PacBio sequence data, as well as begun the voyage into Isoseq ‐ the use of PacBio sequence data to identify full length transcripts. She coordinates the short read sequencing of hundreds of animals on the Illumina platform, and supervises Oxford Nanopore projects. She also leads the UQ long read sequencing group. In her spare time she enjoys playing with her two young daughters, her dog and camping. Date: Wednesday 3 July2019 Presentation title: The application of long read sequencing technologies to animal agriculture Abstract: Long read sequencing is changing genomics as we know it. The possibilities and economics of new technologies will drive a second genomic revolution in non‐model organisms. Tasks such as the assembly of ultra‐high quality genomes that use to take years, millions of dollars and whole teams of people can now be completed by just a few personal in under a year and for under $100K worth of consumables. But the possibilities don't end there. Here we discuss the current and future state of applied long read sequencing, and its potential to provide real world impacts for agriculture. We also discuss the lessons and pitfalls of using long read technologies, and the implications for the current status quo of agricultural genomics.
BIOGRAPHY AND ABSTRACT
23
Dr Denis Bauer Leader Transformational Bioinformatics Group CSIRO Sydney
Biography: Dr Denis Bauer is an internationally recognised expert in machine learning, specifically in processing big genomic data to help unlock the secrets in human DNA. Her achievements include developing an open‐source, artificial intelligence‐based cloud‐service that accelerates disease research and contributing to national and international initiatives for genomic medicine funded with over $500M. As CSIRO’s transformational bioinformatics leader, Denis is frequently invited as a keynote at international medical and IT conferences including Amazon Web Services Summit 2018, International conference on Frontotemporal Dementia’18, Alibaba Infinity Singapore’18 and Open Data Science Conference India’18. Her revolutionary achievements have been featured in international press such as GenomeWeb, ZDNet, Computer World, CIO Magazine, the AWS Jeff Barr blog, and was in ComputerWeekly’s Top 10 IT stories of 2017. Denis holds a BSc from Germany and PhD in Bioinformatics from The University of Queensland, and has completed postdoctoral research in both biological machine learning and high‐throughput genetics. She has 33 peer‐reviewed publications (14 as first or senior author), with over 1000 citations and an H‐index 14. Denis advocates for gender equality in IT, and is active on CSIRO’s Inclusion and Diversity committee. Date: Wednesday 3 July2019 Presentation title: The dawn of cloud‐native bioinformatics Abstract: Genomic produces more data than Astronomy, twitter, and YouTube combined, having caused research in this discipline to leapfrog to the forefront of cloud technology. Using machine learning and harnessing radically new architecture patterns, a new cloud‐native discipline of bioinformatics is emerging. The talk illustrates this transformation on the example of disease gene discovery. Here, a Spark‐based machine learning framework, VariantSpark, was custom designed to deal with ‘wide’ or ultra‐high‐dimensional data (80 million columns) to find the genetic origin of ALS in 22,000 whole genome sequences. Made available on Amazon Web Services (AWS) and Microsoft Azure through notebook‐style access portals, international researchers can explore large volumes of data in real time. The talk also discusses a new cloud architecture paradigm, serverless, pitted to become an $8 Billion market for its ability to make analysis more economical, akin to how prefabrication scaled up the construction sector over bricklaying. The talk illustrates, the “search engine for the genome” (GT‐Scan), a web‐service that enables researchers to identify the optimal spot in the 3 billion letter‐long genome to make alterations (CRISPR) that one day helps to cure or prevent diseases. Providing practical tips for the new cloud‐native generation of bioinformaticians, the talk compares cloud setups across AWS, Alibaba and Azure and touches on how to evolve cloud architecture more efficiently through a hypothesis‐driven approach to DevOps.
BIOGRAPHY AND ABSTRACT
24
Dr Igor Makunin NeCTAR Genomics Virtual Laboratory (GVL) Project Research Computing Centre The University of Queensland
Biography: Igor has extensive experience in analysis of nextGen sequencing data, comparative genomics, genetics and molecular biology. He provides support for biologists working with nextGen sequencing data on the Galaxy platform. Igor has worked as a scientist at the Queensland Institute of Medical Research, The University of Queensland, Institute of Cytology and Genetics (Novosibirsk, Russia), the University of Geneva and the University of Cambridge. Date: Wednesday 3 July 2019 Special Workshop: An introduction to Galaxy with the NeCTAR Genomics Virtual Laboratory Abstract: The Galaxy platform is one of the world’s most popular and fastest growing bioinformatics web‐based interfaces. With Galaxy, biologists can access a huge range of bioinformatics tools, using user‐friendly and intuitive graphical interfaces. Galaxy also captures and records analysis pipelines to provide full reproducibility, and simplifies sharing of data and analyses between colleagues. The NeCTAR*‐supported Genomics Virtual Laboratory project has adopted Galaxy as one of its major platforms to bring the power of the national research cloud to bench biologists. Through the GVL and NeCTAR, Australian researchers and their collaborators have free access to high performance bioinformatics computing resources. This workshop will focus on a hands‐on introduction to using Galaxy on the research cloud. Participants will learn where and how they can access a Galaxy instance, how to upload and access data, running basic analysis pipelines, and using both integrated and plug‐in functions to visualise genomic data. We will introduce histories and workflows, and explore how they can be used to run reproducible analysis pipelines and to share analyses with colleagues. We will also discuss how to extend the standard Galaxy build to add new tools and custom reference genomes. The workshop is intended for bench scientists, and no previous bioinformatics experience is needed. *National eResearch Collaboration Tools and Resources
BIOGRAPHY AND ABSTRACT
25
A/Prof. Nicola Armstrong Mathematics & Statistics Murdoch University Perth
Biography: A/Prof. Armstrong is a statistical bioinformatician who completed her doctoral studies in Statistics at the University of California, Berkeley. After graduating with her PhD, she spent several years in the Netherlands as a postdoc at Eurandom and the Vrije Universiteit before moving to the Netherlands Cancer Institute in Amsterdam as a senior statistician. On returning to Australia, she worked at the Garvan Institute and the University of Sydney before moving to Murdoch University where she is currently an Associate Professor in mathematics and statistics. Her research work has centred on the development of statistical methodology and the application of statistics to problems in genetics, genomics and biomedical research. Date: Thursday 4 July2019 Presentation title: Machine learning for bioinformatics Abstract: In this talk, I will introduce some common machine learning methods that are used with ‘omics data. The advantages and disadvantages of various approaches to supervised learning will be outlined. Quantifying the performance of a technique and other important concepts that should be considered before starting to analyse data will also be discussed.
BIOGRAPHY AND ABSTRACT
26
Dr Maren Westermann Machine Learning Engineer Max Kelsen Brisbane
Biography: Dr Maren Westermann is a machine learning engineer at Max Kelsen. She works on the Immunotherapy Outcome Prediction (IOP) project that combines whole‐genome sequencing and machine learning to improve the success rates of immunotherapy in cancer patients. Maren has a strong background in the biological sciences. After completing a Bachelor’s and Master’s degree in Biology at the University of Giessen, Germany, she graduated with a PhD from The University of Queensland. In her dissertation Maren statistically analysed and modelled greenhouse gas emission from fertilised soils, and data from her PhD project can be used to improve climate models. Maren started exploring today’s possibilities of computer science with the start of MOOCs. She started learning Python and R through online courses and integrated her programming knowledge into her research. After finishing her PhD, Maren became interested in machine learning and its potential to provide solutions for previously unsolvable problems. Building on her strength in statistics, she started educating herself in machine learning, again making use of MOOCs. Date: Thursday 4 July2019 Presentation title: Integrating cancer research with machine learning ‐ achievements and challenges Abstract: In Australia, about 3 in 10 deaths are caused by cancer making it one of the most common fatal diseases. The development of cancer has been linked to mutations of the genome. In humans, the genome comprises approximately 3.2 billion base pairs of DNA. Therefore, studying these mutations had been an insurmountable task until the turn of the millennium when the sequencing of the first whole human genome was completed. However, commercially analysing a person’s genome for detrimental mutations was still not feasible at that time because of technical and financial constraints. Today the cost of sequencing a person’s whole genome has dropped to about 1000 USD and computers have much greater disk space and computational power, enabling researchers to handle and analyse large datasets like genomic data. Dr Maren Westermann will give an overview of state of the art machine learning models applied in today’s cancer research and highlight the challenges and constraints that are faced by the cancer research community.
BIOGRAPHY AND ABSTRACT
27
Dr James Doecke Senior Research Scientist and Team Leader CSIRO – Health and Biosecurity Division Brisbane
Biography: Dr Doecke has been working as a Biostatistician for approximately 12 years. After completing his PhD in statistical genetics at Griffith University, he started his career with the Queensland Institute for Medical Research in 2006 in Biostatistics. In 2008 he moved to CSIRO to further his career in statistical model development, specifically in feature selection and model prediction. He has extensive experience in analysing data from studies on Alzheimer's disease, cancer and inflammatory bowel disease. Dr Doecke has published over 70 Journal papers, including manuscripts in prestigious journals such as Nature, Gut and Molecular Psychiatry. With over 2200 citations, his work has been instrumental in the identification of blood based biomarkers in Alzheimer’s disease, and biomarker research in general. Dr Doecke is consistently asked to work in prestigious laboratories around the world. He has spent time in the Cambridge Institute for Medical Research in the UK, and the MD Anderson Cancer Center, the number 1 cancer centre in the USA. Currently he leads a team of Biostatisticians at CSIRO, and is the technical lead for all biomarkers and biostatistics arising from data collected within the Australian Imaging, Biomarkers and Lifestyle (AIBL) study of ageing. With a background in biostatistics, molecular biology and epidemiology, Dr Doecke applies both simple and complex statistical methodologies to real world medical problems, and advocates the importance of broad knowledge in medial biology and biostatistics to be able to answer some of the world’s most complex disease problems. Date: Thursday 4 July2019 Presentation title: Feature selection for biological data: educated guesses or blind computation Abstract: With the era of big data upon is, and the introduction of machine learning technologies to be able to assess this data becoming more available and accessible, it is tempting to want to run all possible computations to assess each and every relationship possible. This can run into billions of relationships to assess, not to mention that this becomes even larger when we have multiple outcomes. Whilst it is now possible to run billions of computations across multiple CPU's, we still run into trouble when we want to take into account the multivariate nature of most applications. A standard covariance matrix of a big data set will take a very long time to run. Even with the massive resources we have to compute relationships amongst this big data, many complex methodologies are not able to be run at such a large scale. One alternative is to construct a biological and statistical design within your data prior to analyses. This talk will describe one such design to assess three genomic platforms of data (SNP, mRNA expression and DNA methylation) with a view to understanding some of the complex relationships commonly found within disease phenotypes.
BIOGRAPHY AND ABSTRACT
28
Dr Arun Konagurthu Senior Lecturer Monash University Melbourne
Biography: Dr Arun Konagurthu is a Senior Lecturer and (currently) the Director of the undergraduate Bachelor of Computer Science studies at Monash University's Faculty of Information Technology. He held the Larkins Fellowship at this Faculty in 2010‐2013. Prior to that, he held the Eberly College of Science Fellowship at Pennsylvania State University, working with Prof. Arthur Lesk at the Huck Institutes of Genomics, Proteomics, and Bioinformatics. His research interests cover protein structural bioinformatics, statistical inductive inference, combinatorial optimization, graph theory and algorithms. Date: Thursday 4 July2019 Presentation title: Statistical compression of protein folding patterns and inference of recurrent substructural themes Abstract: Computational analyses of the growing corpus of three‐dimensional (3D) structures of proteins have revealed a limited set of recurrent substructural themes as building blocks of protein architecture. Knowledge of such architectural building blocks underlying the observed repertoire of protein folding patterns remains crucial to unravel how protein 3D structures come about, how they function and how they evolve. Characterizing a comprehensive dictionary of such building blocks has been an unanswered computational challenge in protein structural studies. Using information‐theoretic inference, we address this question and identify a comprehensive dictionary of 1,493 substructural 'concepts'. Each concept represents a topologically‐conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the world‐wide protein data bank and completely inventoried all concept instances. This yields an unprecedented source of biological insights. These include: correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino‐acid sequence‐structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. This talk will mainly discuss the unsupervised method based on the Minimum Message Length (MML) criterion we used to learn this comprehensive architectural concept dictionary: see http://lcb.infotech.monash.edu.au/prosodic.
BIOGRAPHY AND ABSTRACT
29
Dr Dimitri Perrin Senior Lecturer Queensland University of Technology
Biography: Dr Dimitri Perrin is Senior Lecturer at the Queensland University of Technology, where he leads the Biomedical Data Science group. His research interests are in developing new approaches to analyse, understand and optimise biomedical and social systems. His work therefore spans the areas of data science, modelling, computational biology and bioinformatics. Recent projects include gene editing (CRISPR), high‐resolution biomedical imaging (CUBIC), and mobile apps for health research. Dimitri Perrin holds a Master’s Degree (Diplôme d’Ingénieur) from ISIMA and MSc from Université Blaise Pascal, and received his PhD from Dublin City University. Date: Thursday 4 July2019 Presentation title: Will it cut? Predicting the efficiency of CRISPR‐based gene editing Abstract: We are in the middle of a technological revolution: while methods to modify genomes have been around for some time, CRISPR provides a way to achieve this with unprecedented ease and precision. One of the most crucial parts in CRISPR experiments is the design of the "guide" sequence that will decide where the modification is made. This is not trivial, and there are a number of tools that aim to make this step more reliable. In this talk, we will discuss machine‐learning approaches that try to identify efficient guides directly from their sequence. We will look at their individual performance, and at the surprising lack of agreement between tools. Consensus‐based methods can partly address this, but limitations remain.
BIOGRAPHY AND ABSTRACT
30
A/Prof. Hanna Suominen Associate Professor in Computer Science @anucecs Research Team Leader in Machine Learning @Data61news Research Program Leader in Big Data @AnuOur Co‐chair of @clefehealth Co‐founder of @postacapp Australian National University Canberra
Biography: A/Prof. Hanna Suominen, with over 15 years’ experience in longitudinal, multimodal data analytics for saving, structuring, and summarising data, is bridging the gap between Computer Science (CS) and health/social sciences. Her MSc was awarded in applied mathematics, PhD in CS, and Adj. Prof. in CS in the University of Turku, Finland in 2005, 2009, and 2013, respectively. She joined The ANU and Data61 as the Team Leader of Theory and Applications of Multimodal Pattern Analysis (TAMPA) within the Machine Learning (ML) Group after working in Data61/NICTA as a Team Leader of Natural Language Processing (NLP) and Senior Researcher in ML. Hanna has over 100 publications with 60 co‐authors from 10 countries, including Harvard, Karolinska Institutet, and Max Planck. Her work has been published in the most prestigious journals, cited over 1,200 times, and awarded for best papers, ML/NLP‐methods, business‐plans, and teaching‐units. She has scored competitive grants with a total value of over $10‐20 million in the past 2 years alone. Date: Thursday 4 July2019 Presentation title: Introduction of machine learning for health data analytics Abstract: Information flow, defined as channels, contact, communication, or links to pertinent people, is critical in any data intensive field but critical in healthcare. For example, over 10% of preventable adverse events in healthcare are caused by failures in information flow. These failures are tangible in handover; regardless of good verbal communication, 65%‐100% information is lost after 3‐5 shifts if notes are taken by hand, or not at all. The goal of our studies was to make producing and using clinical documentation more efficient through machine learning‐assisted information flow, and thereby contribute to health and healthcare. We studied automated speech recognition (ASR) and text classification as ways to populate health records and perform hospital surveillance. ASR recognised up to 73% of 14,095 test words correctly. The classifier achieved on 100 test documents the 81% F1 in filtering out irrelevant text and up to 100% in filling out the form headings. At the level of 75% precision, the surveillance system had 100% recall, which is, it did not miss a single sick patient. We also introduced Web apps to demonstrate the software design and released synthetic but realistic clinical datasets. The significance hinges on opening our data, software, and evaluations to the research and development community for studying clinical documentation, ASR, and classification.
BIOGRAPHY AND ABSTRACT
31
Dr Marina Naval Sanchez CSIRO Brisbane
Biography: Marina Naval Sanchez received a BS degree in Agri‐food and MSc in Agriculture Engineering majors in animal biotechnology from Universitat de Lleida (Spain), and a MSc in Applied bioinformatics from Cranfield University (UK). She completed a PhD from Katholieke Universiteit Leuven (Belgium) under the mentorship of Prof. Stein Aerts. She moved to Australia in 2015 as an OCE Postdoctoral Fellow at CSIRO (Brisbane) with Dr James Kijas and currently with Dr Toni Reverter. Her main research interests include genomics, transcriptomics, populations and evolutionary genomics and gene regulatory networks, focusing on livestock species. Date: Thursday 4 July 2019 Presentation title: Machine learning applications in functional annotation Abstract: Functional genomics is in the forefront of the application of machine learning methods to predict enhancers and cis‐regulatory regions as well as to predict the impact of SNPs in regulatory regions at the level of downstream function, namely open‐chromatin and gene expression. These tools are based on data generated by high‐throughput technologies such as ChIP‐seq, ATAC‐seq, transcription factor state and chromatin states such as the ones produced by the Encyclopedia of DNA Elements (ENCODE) or RoadMap Epigenomes in human. In the realm of non‐model organisms, our lab is part of the Functional Annotation of Animal Genomes (FAANG), which aims to produce high‐throughput experimental profile of regulatory elements across tissues mostly in livestock species. CSIRO has generated ATAC‐seq in Tropical cattle for four tissues and Salmon for several tissues and developmental states. The next step is to make use of machine learning methods, starting with Support Vector Machines (SVM), Random Forests and Deep Learning approaches to unravel the regulatory logic underlying functionality and predict the impact of mutations in phenotype. In this study, I will demonstrate how we applied the machine learning method (e.g. SVM) to identify master regulators in cattle tissues and salmon development ATAC‐seq data.
BIOGRAPHY AND ABSTRACT
32
Ms Alexandra Essebier PhD Candidate School of Chemistry and Molecular Biosciences The University of Queensland
Biography: Alex Essebier completed her undergraduate degrees in Science (Biochemistry and Molecular Biology) and Information Technology at the University of Queensland in 2013 and a Masters of Bioinformatics in 2015. Alex first developed an interest in bioinformatics in her second year of university when she discovered it would allow her to solve a variety of biological problems through the application of her programming skills. She has undertaken a number of research projects over the last five years as part of A/Prof. Mikael Bodén’s group at UQ. These projects involved large biological datasets and allowed Alex to explore big data techniques to extract relevant patterns and relationships. Her main focus is on the use of machine learning to analyse high‐throughput genomic datasets. She is currently a PhD student investigating the application of machine learning to predict long distance regulatory interactions. The ability to accurately detect these interactions can improve our understanding of developmental disorders and diseases such as cancer. Alex’s multidisciplinary background has allowed her to provide bioinformatic insight on a number of research projects engaging with multiple collaborators. It has also provided her with the skills to drive her own research and work toward developing new bioinformatic tools and techniques. Date: Thursday 4 July 2019 Presentation title: Extracting knowledge from models trained on biological data to measure performance and improve understanding of the system Abstract: Extracting knowledge from statistical and machine learning approaches is a challenge faced by many researchers including those in the field of bioinformatics. A large amount of data is now available that captures multiple aspects of human biology at the cellular level and we are faced with the task of extracting knowledge, patterns and relationships from this data to assist in our understanding of how our bodies function at a molecular level and to aid in the treatment of diseases such as cancer. To explore approaches to extracting knowledge from a model, we built a Bayesian network with a limited set of features and a relatively simple structure to identify transcription factor binding sites in vivo, information key to understanding regulation in the genome. While our network did not have the best performance, Bayesian networks are generative and our aim was to gain a better understanding of the features that define a transcription factor binding event. In this talk I will discuss the approaches we used and the challenges we faced in extracting knowledge from our Bayesian network and linking performance back to the input data to learn about the features of transcription factor binding and how they vary under different conditions.
BIOGRAPHY AND ABSTRACT
33
Dr Arash Bayat Researcher Transformational Bioinformatics Health and Biosecurity Business Unit CSIRO Sydney
Biography: Arash is a researcher in Transformational Bioinformatics team at CSIRO. He has completed his bachelor and master degrees in computer engineering and moves towards bioinformatics during his PhD study at University of New South Wales. His current research interest is using machine learning and cloud infrastructures to process big genomic data. Date: Thursday 4 July2019 Presentation title: Random forest and its application to genome‐wide association studies Abstract: GWAS is about computing association power of SNPs with the phenotype of interest. Traditional GWAS tends to look at each SNP independent from other SNPs when measuring association power. However, it has been discovered that there are SNPs that interact with each other to form a phenotypic response (epistasis). Capturing such epistasis interaction is a computational challenge. Random Forest is a machine learning approach that can be used to overcome the difficulty of this problem. This talk describes the strength and weaknesses of using Random Forest for this purpose.
BIOGRAPHY AND ABSTRACT
34
Dr Igor Makunin NeCTAR Genomics Virtual Laboratory (GVL) Project Research Computing Centre The University of Queensland
Biography: Igor has extensive experience in analysis of nextGen sequencing data, comparative genomics, genetics and molecular biology. He provides support for biologists working with nextGen sequencing data on the Galaxy platform. Igor has worked as a scientist at the Queensland Institute of Medical Research, The University of Queensland, Institute of Cytology and Genetics (Novosibirsk, Russia), the University of Geneva and the University of Cambridge. Date: Friday 5 July 2019 Presentation title: Galaxy Australia: advanced bioinformatics within a biologist‐friendly interface Abstract: Galaxy Australia https://usegalaxy.org.au is a national service designed for the analysis of genome scale data, with an emphasis on high throughput sequencing. Galaxy Australia provides preinstalled bioinformatic tools and public data, such as reference genomes. The Galaxy web interface does not require knowledge of Unix or programming skills, all analyses can be triggered with a mouse click. The interface also means that the service can be accessed from any web‐connected device providing users with flexibility and convenience of a virtual laboratory. For new users we offer step‐by‐step tutorials covering various topics ranging from basic operations in Galaxy to complex analysis, such as RNA‐Seq (differential gene expression analysis with high throughput sequencing data), genome assembly and variant calling. After a simple registration process, users get access to an ample amount of storage, compute resources and diverse public datasets. There is no wait time on registration and users can start data analysis immediately after uploading their data. Galaxy provides easy connections to external services for direct data import from public repositories and visualisation of user data on public servers such as UCSC Genome Browser. Raw data, analysis results or chain tool execution (tool workflow) can be shared with other users or made public. Galaxy workflows are one of the most powerful features of the service, allowing users to perform a series of tasks reproducibly on one or many files. Galaxy workflows record not only parameters for tools, but also tool version, providing a reproducible robustness to data analysis and also providing a historical record of the methodology applied to input data. With over 3,000 registered users Galaxy Australia is actively used for research and training. The talk will provide overview of Galaxy Australia and will be of interest for both researchers and educators.
BIOGRAPHY AND ABSTRACT
35
Dr Arash Bayat Researcher Transformational Bioinformatics Health and Biosecurity Business Unit CSIRO Sydney
Biography: Arash is a researcher in Transformational Bioinformatics team at CSIRO. He has completed his bachelor and master degrees in computer engineering and moves towards bioinformatics during his PhD study at University of New South Wales. His current research interest is using machine learning and cloud infrastructures to process big genomic data. Date: Friday 5 July2019 Presentation title: VariantSpark: A cloud‐based machine learning approach for big genomic data Abstract: Genomic data is going to set a new record. It is estimated that the volume of genomic data exceeds all astronomy and YouTube data combined. Such a dramatic increase in the amount of data is mainly due to the cost reduction in data production and the significant impact of genomic researches on our life. Neither traditional algorithms nor high‐performance computers are capable of dealing with such a massive data load. Machine learning on cloud‐platforms seems to be an appropriate solution to tackle this problem. Machine learning is a well‐suited method to extract valuable information out of big data in reasonable time especially when the traditional approach comes with exponential complexity. Yet, the computational requirement is beyond capabilities of commodity computers. Cloud platforms are able to provide adequate computational hardware to support our machine learning algorithm. VariantSpark is a cloud‐based machine learning software that can efficiently harvest cloud resources for machine learning algorithms processing genomic data.
BIOGRAPHY AND ABSTRACT
36
Mr Thom Cuddihy Bioinformatician and Software Developer QFAB Bioinformatics and UQ Research Computing Centre The University of Queensland
Biography: Thom Cuddihy is a bioinformatician and software developer for QFAB Bioinformatics and UQ Research Computing Centre. His work has him currently embedded with several research groups, providing bioinformatic and sysadmin services, as well as working on several national platforms for bioinformatics and bio‐data services. He specialises in multiple programming languages including Python, C#, Java, and R, and has a strong background in databases, system administration and high‐performance computing. He also has extensive experience working with NGS data and the generation of pipelines for single and multi‐omics analysis. Thom has a Bachelor of Arts and Bachelor of Science (Hons), in addition to a Masters of Bioinformatics from UQ. Date: Friday 5 July2019 Presentation title: Deploying and utilising virtual servers in the NeCTAR Cloud Abstract: Computing has increasingly become a vital component of modern research, driving researchers to seek solutions for additional computational power. In response to the increase in demand for computing resources, the National eResearch Collaboration Tools and Resources (NeCTAR) established a national cloud computing resource, the NeCTAR Cloud, that provides a flexible, low cost, low barrier to entry solution for Australian researchers. The NeCTAR Cloud features a self‐service portal that allows researchers to quickly and easily launch virtual instances of Linux servers, as well as manage volume and object storage, configure firewalls and security and perform backups (snapshots). In addition, as the NeCTAR Cloud consists of eight different organisations throughout Australia, yet connected as a single cloud system, researchers are able to take advantage of additional services offered by individual member organisations, such as data collections offered by QRIScloud, and can collaborate with other researchers across Australia. In this seminar, the basics of cloud computing and NeCTAR, as well as the process of registering, configuring, and launching a virtual instance on the NeCTAR Cloud will be discussed. In addition, considerations for security and backups will also be addressed. Finally, applications for virtual instances towards research will be reviewed.
BIOGRAPHY AND ABSTRACT
37
Ms Amanda Miotto Senior eResearch Analyst and Software Developer Griffith University and QCIF
Biography: Amanda Miotto is a Senior eResearch Analyst and Software Developer for Griffith University and QCIF. She started in the field of Bioinformatics and learnt to appreciate the beauty of science before discovering the joys of coding. She is also heavily involved in Software Carpentry, Hacky Hours and ResBaz, and has developed on platforms around HPC, microscopy & scientific database portals. Date: Friday 5 July2019 Presentation title: Hacky Hours ‐ communities to learn and build data science Abstract: Researchers starting their journey through data science often have an ambiguous path to follow. While online data science classes are plentiful, it can be challenging for researchers, who have often never seen programming code before, to know where to start or how apply methods to their own data. In Queensland, many of the universities, including UQ, Griffith, QUT USQ and JCU, have been supporting these researchers by running 'Hacky Hours'; an open session where researchers can meet research software engineers and other researchers doing similar work to share knowledge, ask questions freely and come together to work on projects in a friendly environment. Hacky Hour communities have also been a way to connect with the wider research and technical communities, providing links to relevant meetups, hackathons, workshops offered outside their university and national resources such as NeCTAR cloud compute and virtual labs, local High Performance Computing (HPC) and other NCRIS activities. As many researchers can operate in isolated silos, this can often be the first time clients learn about these resources and initiatives. This also leads to attendees becoming involved with the wider community and building networks nationally.
BIOGRAPHY AND ABSTRACT
38
Prof. Nicholas Lohman Professor of Microbial Genomics and Bioinformatics Institute of Microbiology and Infection University of Birmingham UK
Biography: Nick is Professor of Microbial Genomics and Bioinformatics in the Institute of Microbiology and Infection at the University of Birmingham and a Fellow at the Alan Turing Institute. He is supported by a Fellowship in Microbial Genomics Bioinformatics as part of the MRC CLIMB project. His research explores the use of cutting‐edge genomics and metagenomics approaches to the diagnosis, treatment and surveillance of infectious disease. Nick has so far used high‐throughput sequencing to investigate outbreaks of important Gram‐negative multi‐drug resistant pathogens, and recently helped establish real‐time genomic surveillance of Ebola in Guinea and Zika in Brazil. His current work and focuses on the development and evaluation of novel molecular biology, sequencing and bioinformatics methods to aid the interpretation of genome and metagenome scale data generated in clinical and public health microbiology. Date: Friday 5 July 2019 Presentation title: How can long reads help us diagnose and track microbial pathogens? Abstract: New sequencing technologies coupled with comparative genomics and evolutionary analysis are poised to impact our investigation of infections and outbreaks. Sequencing data, if collected in real‐time can directly inform epidemiological investigations of outbreaks, particularly for fast‐evolving pathogens. I will discuss recent progress in developing a real‐time genomics‐informed outbreak system focused around real‐time Oxford Nanopore sequencing methods. I will describe new opportunities for using long read sequencing and assembly methods to accurately detect transmission of pathogen strains between infected individuals and the environment, and some of the current challenges of deconvolving complex pathogen populations.
BIOGRAPHY AND ABSTRACT
39
Dr Momeneh (Sepideh) Foroutan Research Fellow in Computational Cancer Biology The University of Melbourne
Biography: Dr Momeneh (Sepideh) Foroutan is a research fellow in computational cancer biology in the Department of Clinical Pathology of the University of Melbourne Centre for Cancer Research in the Victorian Comprehensive Cancer Centre. Sepideh has Master’s degree in Molecular Genetics from Shahid Beheshti University in Tehran, Iran. She moved to Melbourne in 2014 to pursue her PhD in computational cancer biology in the University of Melbourne’s Department of Surgery and the Bioinformatics Division of the Walter and Eliza Hall Institute of Medical Research. She is experienced in analysing transcriptomics data, large data integration, batch correction and data visualisation. She co‐founded the R‐Ladies Melbourne in 2016, as the first R‐Ladies chapter in Australia, and is currently its main organiser. Date: Friday 5 July2019 Presentation title: Generating dynamic reports using R Markdown in RStudio Abstract: Data analysts are routinely required to generate reports, which not only include results and conclusions from the analysis, but should also include details about data, methods, code or pipelines that were used to generate the results, as well as any necessary explanatory notes. Recording such details together with results ensures that methods are always linked, and helps ensure computational reproducibility. In this talk, I introduce and demonstrate the use of R Markdown, which is an easy‐to‐write plain text format, and is designed to generate reports containing code chunks, figures and tables, as well as notes, all in one document. This avoids the labour of manual writing/maintaining reports and switching between different programs to generate reports. R Markdown ensures the consistency and reproducibility of reports, and is easy to share between collaborators. These reports can be converted into different formats (e.g. html, PDF and Word), and may include static or interactive outputs.
BIOGRAPHY AND ABSTRACT
40
Dr Melissa Davis Joint Head, Bioinformatics Division Walter and Eliza Hall Institute for Medical Research Melbourne
Biography: Dr Melissa Davis is a computational biologist and Joint Head of the Bioinformatics Division at the Walter and Eliza Hall Institute of Medical Research. Her background is in genetics and computational cell biology with expertise in the analysis of genome‐scale molecular networks and knowledge‐based modelling. Her research group is highly multidisciplinary, and focuses on computational research in cancer progression and plasticity. Melissa received her PhD at UQ and continued as a postdoc at the Institute for Molecular Bioscience. In 2014, she was awarded a National Breast Cancer Foundation Career Development Fellowship, and took up a position as Senior Research Fellow in Computational Systems Biology at the University of Melbourne in the Systems Biology Laboratory, before moving to the Walter and Eliza Hall Institute for Medical Research in 2016. Date: Friday 5 July 2019 Presentation title: An introduction to network analysis methods Abstract: This lecture will introduce software and online tools useful in the construction and analysis of biological networks.
BIOGRAPHY AND ABSTRACT
41
A/Prof. Jessica Mar Group Leader Australian Institute for Bioengineering and Nanotechnology The University of Queensland
Biography: A/Prof. Jessica Mar is a Group Leader at the Australian Institute for Bioengineering and Nanotechnology at The University of Queensland in Brisbane. The Mar group focuses on understanding variability in the transcriptome and how this informs regulation of cell phenotypes. Jess received her PhD in Biostatistics from Harvard University in 2008. She was a postdoctoral fellow at the Dana‐Farber Cancer Institute in Boston (2008‐2011), and an Assistant Professor at Albert Einstein College of Medicine in New York (2011‐2018). Having only just relocated back to Australia as an ARC Future Fellow in July 2018, a major focus of her work is on modelling the aging process using single cell bioinformatics. Jess has received several awards, including a Fulbright scholarship (2003), the Metcalf Prize for Stem Cell Research from the National Stem Cell Foundation of Australia (2017), and the LaDonne H. Shulman Award for Teaching Excellence (2017) from Albert Einstein College of Medicine. Date: Friday 5 July2019 Presentation title: What the heck is a Hackathon? How Hackathons promote creativity and community in bioinformatics Abstract: By definition, a hackathon is a competition where teams compete against each other to find new solutions. But in reality, a hackathon is an opportunity to learn new skills, expand your network, and be creative. This talk will step through the basics of what a hackathon is, and why you might want to participate in these fun‐filled events that have steadily become a global phenomenon. For bioinformatics, whether your skill level is either fresh (as in learned from this week alone) or expert, a hackathon can open up a multitude of doors to new friends, career paths, and more importantly, job opportunities that you may never knew existed.
~*~*~*~
Sponsors
UQ Genomics InitiativeSchool of Chemistry & Molecular Biosciences Faculty of Science