2014 11-12 sbsm032rstatsprogramming.key
DESCRIPTION
R programming - for loop - function - regexTRANSCRIPT
SBSM035 - Stats/Bioinformatics/Programming
[email protected]://yannick.poulet.org
© Alex Wild & others
© National Geographic
Atta leaf-cutter ants
© National Geographic
Atta leaf-cutter ants
© National Geographic
Atta leaf-cutter ants
Oecophylla Weaver ants
© ameisenforum.de
© ameisenforum.de
Fourmis tisserandes
© ameisenforum.de
Oecophylla Weaver ants
© forestryimages.org© wynnie@flickr
Tofilski et al 2008
Forelius pusillus
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
Avant
Workers staying outside die« preventive self-sacrifice »
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
Dorylus driver ants: ants with no home
© BBC
Animal biomass (Brazilian rainforest)
from Fittkau & Klinge 1973
Other insects AmphibiansReptiles
Birds
Mammals
!Earthworms
!!
Spiders
Soil fauna excluding earthworms,
ants & termites
Ants & termites
We use modern technologies to understand insect societies. • evolution of social behaviour • molecules involved in social behaviour • consequences of environmental change
Big data is invading biology
This changes everything.454
Illumina Solid...
Any lab can sequence anything!
Big data is invading biology• Genomics
• Biodiversity assessments
• Stool microbiome sequencing
• Personalized medicine
• Cancer genomics
• Sensor networks - e.g tracking microclimates, recording sounds
• Aerial surveys (Drones) - e.g. crop productivity; rainforest cover
• Camera traps
Choosing a programming languageGood: Bad:
Excel quick & dirty easy to make mistakes doesn’t scale
R numbers, stats, genomics
programming
Unix command-line == shell == bash
Can’t escape it. Quick & Dirty. HPC.
programming, complicated things
Java 1990s user interfaces overcomplicated.
Perl 1980s. Everything.
Python scripting, text ugly
Ruby scripting, text
Javascript/Node scripting, flexibility(web & client), community only little bio-stuff
First steps towards data handling
• Basic stats - done!
• Programming in R
• UNIX command-line
bioinformaticians
Practicals• Aim: get relevant data handling skills
• Doing things by hand: • impossible? • slow, • error-prone,
• Automate!
• Basic programming • in R • no stats!
Practicals: contents
• Done: • data accessing/subsetting
• New: • search/replace • regular expressions
• New: • functions • loops
Text search on steroids
Reusable pieces of work
Repeating the same thing many times
• creating a vector
> myvector <- 5:11> myvector <- seq(from=5, to=11, by=1)> myvector <- c(5, 6, 7, 8, 9, 10, 11)> myvector[1] 5 6 7 8 9 10 11
• accessing a subset
• give me a vector containing numbers from 5 to 11 (3 variants)
• of a vector> bigvector <- 150:100> bigvector [1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132[20] 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113[39] 112 111 110 109 108 107 106 105 104 103 102 101 100> mysubset <- bigvector[myvector]> mysubset[1] 146 145 144 143 142 141 140!> subset(bigvector, bigvector > 120) [1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132[20] 131 130 129 128 127 126 125 124 123 122 121
Regular expressions (regex): Text search on steroids.
Regular expressions (regex): Text search on steroids.
Regular expression FindsDavid David
Dav(e|(id)) David, DaveDav(e|(id)|(ide)|o) David, Dave, Davide, Davo
At{1,2}enborough Attenborough, Atenborough
Atte[nm]borough Attenborough, Attemborough
At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1}Atimbro,
attenbrough, ateinborow
Easy counting, replacing all with “Sir David Attenborough”
Regex Special symbols
Regular expression Finds Example
[aeiou] any single vowel “e”
[aeiou]* between 0 and infinity vowels vowels, e.g.’ “eeooouuu"
[aeoiu]{1,3} between 1 and 3 vowels “oui” !
a|i one of the 2 characters “"
((win)|(fail)) one of the two words in () fail
More Regex Special symbols
• Google “Regular expression cheat sheet”
• ?regexp
Synonymous with[:digit:] [0-9]
[A-z] [A-z], ie [A-Za-z]
\s whitespace
. any single character
.+ one to many of anything
b* between 0 and infinity letter ‘b’
[^abc] any character other than a, b or c.
\( (
[:punct:] any of these: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { |
Your turn
Make a regular expression
• matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but not “LMVSQMIKTIP”
!
• matching all variants of “ok” (e.g., “O.K.”, “Okay”…)
Regular expressions (regex): Text search on steroids.
Regular expression FindsDavid David
Dav(e|(id)) David, DaveDav(e|(id)|(ide)|o) David, Dave, Davide, Davo
At{1,2}enborough Attenborough, Atenborough
Atte[nm]borough Attenborough, Attemborough
At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1}Atimbro,
attenbrough, ateinborow
Easy counting, replacing all with “Sir David Attenborough”
matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but not “LMVSQMIKTIP”
Functions
Functions• R has many. e.g.: plot(), t.test()
• Making your own:
tree_age_estimate <- function(diameter, species) { [...do the magic... maybe something like: growth.rate <- growth.rates[ species ] age.estimate <- diameter / growth.rate ...]! return(age.estimate)}> tree_age_estimate(25, “White Oak”)+ 66> tree_age_estimate(60, “Carya ovata”)+ 190
Your turn
• Create a function that takes as input a length in centimetres and returns the length in feet+inches.
Function
Loops
“for” Loop
> possible_colours <- c('blue', 'cyan', 'sky-blue', 'navy blue', 'steel blue', 'royal blue', 'slate blue', 'light blue', 'dark blue', 'prussian blue', 'indigo', 'baby blue', 'electric blue')!> possible_colours [1] "blue" "cyan" "sky-blue" "navy blue" [5] "steel blue" "royal blue" "slate blue" "light blue" [9] "dark blue" "prussian blue" "indigo" "baby blue" [13] "electric blue"!> for (colour in possible_colours) {+ print(paste("The sky is oh so, so", colour))+ }![1] "The sky is so, oh so blue"[1] "The sky is so, oh so cyan"[1] "The sky is so, oh so sky-blue"[1] "The sky is so, oh so navy blue"[1] "The sky is so, oh so steel blue"[1] "The sky is so, oh so royal blue"[1] "The sky is so, oh so slate blue"[1] "The sky is so, oh so light blue"[1] "The sky is so, oh so dark blue"[1] "The sky is so, oh so prussian blue"[1] "The sky is so, oh so indigo"[1] "The sky is so, oh so baby blue"[1] "The sky is so, oh so electric blue"
Your turn
• What does the following code do (decompose on pen and paper)
Your turn
• Create a loop that multiplies the numbers from ‘x’ to ‘y’