2014 11-12 sbsm032rstatsprogramming.key

53
SBSM035 - Stats/ Bioinformatics/ Programming [email protected] http://yannick.poulet.org

Upload: yannick-wurm

Post on 02-Jul-2015

156 views

Category:

Education


0 download

DESCRIPTION

R programming - for loop - function - regex

TRANSCRIPT

Page 1: 2014 11-12 sbsm032rstatsprogramming.key

SBSM035 - Stats/Bioinformatics/Programming

[email protected]://yannick.poulet.org

Page 2: 2014 11-12 sbsm032rstatsprogramming.key

© Alex Wild & others

Page 3: 2014 11-12 sbsm032rstatsprogramming.key
Page 4: 2014 11-12 sbsm032rstatsprogramming.key

© National Geographic

Atta leaf-cutter ants

Page 5: 2014 11-12 sbsm032rstatsprogramming.key

© National Geographic

Atta leaf-cutter ants

Page 6: 2014 11-12 sbsm032rstatsprogramming.key

© National Geographic

Atta leaf-cutter ants

Page 7: 2014 11-12 sbsm032rstatsprogramming.key
Page 8: 2014 11-12 sbsm032rstatsprogramming.key

Oecophylla Weaver ants

© ameisenforum.de

Page 9: 2014 11-12 sbsm032rstatsprogramming.key

© ameisenforum.de

Fourmis tisserandes

Page 10: 2014 11-12 sbsm032rstatsprogramming.key

© ameisenforum.de

Oecophylla Weaver ants

Page 11: 2014 11-12 sbsm032rstatsprogramming.key

© forestryimages.org© wynnie@flickr

Page 12: 2014 11-12 sbsm032rstatsprogramming.key

Tofilski et al 2008

Forelius pusillus

Page 13: 2014 11-12 sbsm032rstatsprogramming.key

Tofilski et al 2008

Forelius pusillus hides the nest entrance at night

Page 14: 2014 11-12 sbsm032rstatsprogramming.key

Tofilski et al 2008

Forelius pusillus hides the nest entrance at night

Page 15: 2014 11-12 sbsm032rstatsprogramming.key

Tofilski et al 2008

Forelius pusillus hides the nest entrance at night

Page 16: 2014 11-12 sbsm032rstatsprogramming.key

Tofilski et al 2008

Forelius pusillus hides the nest entrance at night

Page 17: 2014 11-12 sbsm032rstatsprogramming.key

Avant

Workers staying outside die« preventive self-sacrifice »

Tofilski et al 2008

Forelius pusillus hides the nest entrance at night

Page 18: 2014 11-12 sbsm032rstatsprogramming.key

Dorylus driver ants: ants with no home

© BBC

Page 19: 2014 11-12 sbsm032rstatsprogramming.key

Animal biomass (Brazilian rainforest)

from Fittkau & Klinge 1973

Other insects AmphibiansReptiles

Birds

Mammals

!Earthworms

!!

Spiders

Soil fauna excluding earthworms,

ants & termites

Ants & termites

Page 20: 2014 11-12 sbsm032rstatsprogramming.key

We use modern technologies to understand insect societies. • evolution of social behaviour • molecules involved in social behaviour • consequences of environmental change

Page 21: 2014 11-12 sbsm032rstatsprogramming.key
Page 22: 2014 11-12 sbsm032rstatsprogramming.key

Big data is invading biology

Page 23: 2014 11-12 sbsm032rstatsprogramming.key

This changes everything.454

Illumina Solid...

Any lab can sequence anything!

Page 24: 2014 11-12 sbsm032rstatsprogramming.key

Big data is invading biology• Genomics

• Biodiversity assessments

• Stool microbiome sequencing

• Personalized medicine

• Cancer genomics

• Sensor networks - e.g tracking microclimates, recording sounds

• Aerial surveys (Drones) - e.g. crop productivity; rainforest cover

• Camera traps

Page 25: 2014 11-12 sbsm032rstatsprogramming.key
Page 26: 2014 11-12 sbsm032rstatsprogramming.key
Page 27: 2014 11-12 sbsm032rstatsprogramming.key

Choosing a programming languageGood: Bad:

Excel quick & dirty easy to make mistakes doesn’t scale

R numbers, stats, genomics

programming

Unix command-line == shell == bash

Can’t escape it. Quick & Dirty. HPC.

programming, complicated things

Java 1990s user interfaces overcomplicated.

Perl 1980s. Everything.

Python scripting, text ugly

Ruby scripting, text

Javascript/Node scripting, flexibility(web & client), community only little bio-stuff

Page 28: 2014 11-12 sbsm032rstatsprogramming.key

First steps towards data handling

• Basic stats - done!

• Programming in R

• UNIX command-line

bioinformaticians

Page 29: 2014 11-12 sbsm032rstatsprogramming.key
Page 30: 2014 11-12 sbsm032rstatsprogramming.key

Practicals• Aim: get relevant data handling skills

• Doing things by hand: • impossible? • slow, • error-prone,

• Automate!

• Basic programming • in R • no stats!

Page 31: 2014 11-12 sbsm032rstatsprogramming.key

Practicals: contents

• Done: • data accessing/subsetting

• New: • search/replace • regular expressions

• New: • functions • loops

Text search on steroids

Reusable pieces of work

Repeating the same thing many times

Page 32: 2014 11-12 sbsm032rstatsprogramming.key
Page 33: 2014 11-12 sbsm032rstatsprogramming.key

• creating a vector

> myvector <- 5:11> myvector <- seq(from=5, to=11, by=1)> myvector <- c(5, 6, 7, 8, 9, 10, 11)> myvector[1] 5 6 7 8 9 10 11

• accessing a subset

• give me a vector containing numbers from 5 to 11 (3 variants)

• of a vector> bigvector <- 150:100> bigvector [1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132[20] 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113[39] 112 111 110 109 108 107 106 105 104 103 102 101 100> mysubset <- bigvector[myvector]> mysubset[1] 146 145 144 143 142 141 140!> subset(bigvector, bigvector > 120) [1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132[20] 131 130 129 128 127 126 125 124 123 122 121

Page 34: 2014 11-12 sbsm032rstatsprogramming.key
Page 35: 2014 11-12 sbsm032rstatsprogramming.key
Page 36: 2014 11-12 sbsm032rstatsprogramming.key
Page 37: 2014 11-12 sbsm032rstatsprogramming.key

Regular expressions (regex): Text search on steroids.

Page 38: 2014 11-12 sbsm032rstatsprogramming.key

Regular expressions (regex): Text search on steroids.

Regular expression FindsDavid David

Dav(e|(id)) David, DaveDav(e|(id)|(ide)|o) David, Dave, Davide, Davo

At{1,2}enborough Attenborough, Atenborough

Atte[nm]borough Attenborough, Attemborough

At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1}Atimbro,

attenbrough, ateinborow

Easy counting, replacing all with “Sir David Attenborough”

Page 39: 2014 11-12 sbsm032rstatsprogramming.key

Regex Special symbols

Regular expression Finds Example

[aeiou] any single vowel “e”

[aeiou]* between 0 and infinity vowels vowels, e.g.’ “eeooouuu"

[aeoiu]{1,3} between 1 and 3 vowels “oui” !

a|i one of the 2 characters “"

((win)|(fail)) one of the two words in () fail

Page 40: 2014 11-12 sbsm032rstatsprogramming.key

More Regex Special symbols

• Google “Regular expression cheat sheet”

• ?regexp

Synonymous with[:digit:] [0-9]

[A-z] [A-z], ie [A-Za-z]

\s whitespace

. any single character

.+ one to many of anything

b* between 0 and infinity letter ‘b’

[^abc] any character other than a, b or c.

\( (

[:punct:] any of these: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { |

Page 41: 2014 11-12 sbsm032rstatsprogramming.key

Your turn

Make a regular expression

• matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but not “LMVSQMIKTIP”

!

• matching all variants of “ok” (e.g., “O.K.”, “Okay”…)

Page 42: 2014 11-12 sbsm032rstatsprogramming.key

Regular expressions (regex): Text search on steroids.

Regular expression FindsDavid David

Dav(e|(id)) David, DaveDav(e|(id)|(ide)|o) David, Dave, Davide, Davo

At{1,2}enborough Attenborough, Atenborough

Atte[nm]borough Attenborough, Attemborough

At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1}Atimbro,

attenbrough, ateinborow

Easy counting, replacing all with “Sir David Attenborough”

matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but not “LMVSQMIKTIP”

Page 43: 2014 11-12 sbsm032rstatsprogramming.key
Page 44: 2014 11-12 sbsm032rstatsprogramming.key

Functions

Page 45: 2014 11-12 sbsm032rstatsprogramming.key

Functions• R has many. e.g.: plot(), t.test()

• Making your own:

tree_age_estimate <- function(diameter, species) { [...do the magic... maybe something like: growth.rate <- growth.rates[ species ] age.estimate <- diameter / growth.rate ...]! return(age.estimate)}> tree_age_estimate(25, “White Oak”)+ 66> tree_age_estimate(60, “Carya ovata”)+ 190

Page 46: 2014 11-12 sbsm032rstatsprogramming.key

Your turn

• Create a function that takes as input a length in centimetres and returns the length in feet+inches.

Page 47: 2014 11-12 sbsm032rstatsprogramming.key

Function

Page 48: 2014 11-12 sbsm032rstatsprogramming.key

Loops

Page 49: 2014 11-12 sbsm032rstatsprogramming.key

“for” Loop

> possible_colours <- c('blue', 'cyan', 'sky-blue', 'navy blue', 'steel blue', 'royal blue', 'slate blue', 'light blue', 'dark blue', 'prussian blue', 'indigo', 'baby blue', 'electric blue')!> possible_colours [1] "blue" "cyan" "sky-blue" "navy blue" [5] "steel blue" "royal blue" "slate blue" "light blue" [9] "dark blue" "prussian blue" "indigo" "baby blue" [13] "electric blue"!> for (colour in possible_colours) {+ print(paste("The sky is oh so, so", colour))+ }![1] "The sky is so, oh so blue"[1] "The sky is so, oh so cyan"[1] "The sky is so, oh so sky-blue"[1] "The sky is so, oh so navy blue"[1] "The sky is so, oh so steel blue"[1] "The sky is so, oh so royal blue"[1] "The sky is so, oh so slate blue"[1] "The sky is so, oh so light blue"[1] "The sky is so, oh so dark blue"[1] "The sky is so, oh so prussian blue"[1] "The sky is so, oh so indigo"[1] "The sky is so, oh so baby blue"[1] "The sky is so, oh so electric blue"

Page 50: 2014 11-12 sbsm032rstatsprogramming.key

Your turn

• What does the following code do (decompose on pen and paper)

Page 51: 2014 11-12 sbsm032rstatsprogramming.key

Your turn

• Create a loop that multiplies the numbers from ‘x’ to ‘y’

Page 52: 2014 11-12 sbsm032rstatsprogramming.key
Page 53: 2014 11-12 sbsm032rstatsprogramming.key