2015 11-17-programming inr.key

70
Programming in R (and some other stuff) [email protected] https://wurmlab.github.io

Upload: yannick-wurm

Post on 25-Jan-2017

660 views

Category:

Education


0 download

TRANSCRIPT

Page 1: 2015 11-17-programming inr.key

Programming in R(and some other stuff)

[email protected]://wurmlab.github.io

Page 2: 2015 11-17-programming inr.key

© Alex Wild & others

Page 3: 2015 11-17-programming inr.key
Page 4: 2015 11-17-programming inr.key

© National Geographic

Atta leaf-cutter ants

Page 5: 2015 11-17-programming inr.key

© National Geographic

Atta leaf-cutter ants

Page 6: 2015 11-17-programming inr.key

© National Geographic

Atta leaf-cutter ants

Page 7: 2015 11-17-programming inr.key
Page 8: 2015 11-17-programming inr.key

Oecophylla Weaver ants

© ameisenforum.de

Page 9: 2015 11-17-programming inr.key

© ameisenforum.de

Fourmis tisserandes

Page 10: 2015 11-17-programming inr.key

© ameisenforum.de

Oecophylla Weaver ants

Page 11: 2015 11-17-programming inr.key

© forestryimages.org© wynnie@flickr

Page 12: 2015 11-17-programming inr.key

Tofilski et al 2008

Forelius pusillus

Page 13: 2015 11-17-programming inr.key

Tofilski et al 2008

Forelius pusillus hides the nest entrance at night

Page 14: 2015 11-17-programming inr.key

Tofilski et al 2008

Forelius pusillus hides the nest entrance at night

Page 15: 2015 11-17-programming inr.key

Tofilski et al 2008

Forelius pusillus hides the nest entrance at night

Page 16: 2015 11-17-programming inr.key

Tofilski et al 2008

Forelius pusillus hides the nest entrance at night

Page 17: 2015 11-17-programming inr.key

Avant

Workers staying outside die« preventive self-sacrifice »

Tofilski et al 2008

Forelius pusillus hides the nest entrance at night

Page 18: 2015 11-17-programming inr.key

Dorylus driver ants: ants with no home

© BBC

Page 19: 2015 11-17-programming inr.key

Animal biomass (Brazilian rainforest)

from Fittkau & Klinge 1973

Other insects AmphibiansReptiles

Birds

Mammals

Earthworms

Spiders

Soil fauna excluding earthworms,

ants & termites

Ants & termites

Page 20: 2015 11-17-programming inr.key

We use modern technologies to understand insect societies.• evolution of social behaviour• molecules involved in social behaviour• consequences of environmental change

Page 21: 2015 11-17-programming inr.key
Page 22: 2015 11-17-programming inr.key
Page 23: 2015 11-17-programming inr.key

Big data is invading biology

Page 24: 2015 11-17-programming inr.key

This changes everything.

Any lab can sequence anything!

Page 25: 2015 11-17-programming inr.key

http://gregoryzynda.com/ncbi/genome/python/2014/03/31/ncbi-genome.html

Page 26: 2015 11-17-programming inr.key

DATABIG

Page 27: 2015 11-17-programming inr.key

Big data is invading biology• Genomics

• Cancer genomics

• Biodiversity assessments

• Stool microbiome sequencing

• Personalized medicine

• Sensor networks - e.g tracking microclimates, recording sounds

• Huge medical studies

• Aerial surveys (Drones) - e.g. crop productivity; rainforest cover

• Camera traps

Page 28: 2015 11-17-programming inr.key
Page 29: 2015 11-17-programming inr.key

Learning to deal with big data takes time

Page 30: 2015 11-17-programming inr.key
Page 31: 2015 11-17-programming inr.key

Practicals• Aim: get relevant data handling skills

• Doing things by hand: • impossible? • slow, • error-prone,

• Automate!

• Basic programming• in R• no stats!

Page 32: 2015 11-17-programming inr.key

Why R?😳😟

😴😡😖😥

Page 33: 2015 11-17-programming inr.key

Practicals: contents• Done:

• data accessing/subsetting• New:

• search/replace• regular expressions

• New:• functions • loops

• Friday: (Introduction to Unix & High performance computing)

Text search on steroids

Reusable pieces of workRepeating the same thing many times

Page 34: 2015 11-17-programming inr.key
Page 35: 2015 11-17-programming inr.key

• create a variable that contains the number 35

• create a variable that contains the string “I love tofu”

• give me a vector containing the sequence of numbers from 5 to 11

• access the second number

• replace the second number with 42

• add 5 to the second number

• now add 5 to all numbers

• now add an extra number: 1999

• can you sum all the numbers?

Page 36: 2015 11-17-programming inr.key

• creating a vector

> my_vector <- c(5, 6, 7, 8, 9, 10, 11)> my_vector <- 5:11> my_vector <- seq(from=5, to=11, by=1)> my_vector[1] 5 6 7 8 9 10 11> length(my_vector)[1] 7> (10 > 30) [1] FALSE> my_vector > 8 [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE> my_vector[my_vector > 8] 9 10 11> other_vector <- my_vector[my_vector > 8]> other_vector9 10 11> other_vector + 3

• give me a vector containing numbers from 5 to 11 (3 variants)

Page 37: 2015 11-17-programming inr.key

• accessing a subset• of a vector

> big_vector <- 150:100> big_vector [1] 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132[20] 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113[39] 112 111 110 109 108 107 106 105 104 103 102 101 100> big_vector[5]146> mysubset <- big_vector[my_vector]> mysubset[1] 146 145 144 143 142 141 140> big_vector > 130 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE[13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE[49] FALSE FALSE FALSE> subset(x = big_vector, subset = big_vector > 140) [1] 150 149 148 147 146 145 144 143 142 141> big_vector[big_vector >= 140][1] 150 149 148 147 146 145 144 143 142 141 140

> my_vector[1] 5 6 7 8 9 10 11

Page 38: 2015 11-17-programming inr.key

Regular expressions (regex): Text search on steroids.

Page 39: 2015 11-17-programming inr.key

who dat?

Page 40: 2015 11-17-programming inr.key
Page 41: 2015 11-17-programming inr.key
Page 42: 2015 11-17-programming inr.key

Regular expressions (regex): Text search on steroids.

Regular expression FindsDavid David

Dav(e|(id)) David, DaveDav(e|(id)|(ide)|o) David, Dave, Davide, Davo

At{1,2}enborough Attenborough, Atenborough

Atte[nm]borough Attenborough, Attemborough

At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1}Atimbro,

attenbrough,ateinborow

Easy counting, replacing all with “Sir David Attenborough”

Yes: ”HATSOMIKTIP"yes: ”HAVSONYYIKTIP"not: ”HAVSQMIKTIP"

Page 43: 2015 11-17-programming inr.key

Regex special symbolsRegular expression Finds Example

[aeiou] any single vowel “e”

[aeiou]* between 0 and infinity vowels vowels, e.g.’ “eeooouuu"

[aeoiu]{1,3} between 1 and 3 vowels “oui”

a|i one of the 2 characters “"

((win)|(fail)) one of the two words in () fail

Yes: ”HATSOMIKTIP"yes: ”HAVSONYYIKTIP"not: ”HAVSQMIKTIP"

Page 44: 2015 11-17-programming inr.key

More Regex Special symbols

• Google “Regular expression cheat sheet”• ?regexp

Synonymous with[:digit:] [0-9]

[A-z] [A-z], ie [A-Za-z]

\s whitespace

. any single character

.+ one to many of anything

b* between 0 and infinity letter ‘b’

[^abc] any character other than a, b or c.

\( (

[:punct:] any of these: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { |

Page 45: 2015 11-17-programming inr.key
Page 46: 2015 11-17-programming inr.key

You want to scan a protein sequence database for a particular binding site. Type a single regular expression that will match the first two of the following peptide sequences,

but NOT the last one:

"HATSOMIKTIP""HAVSONYYIKTIP""HAVSQMIKTIP"

Page 47: 2015 11-17-programming inr.key

(rubular)

Page 48: 2015 11-17-programming inr.key

Variants of a microsatellite sequence are responsible for differential expression of vasopressin receptor, and in turn for

differences in social behaviour in voles & others. Create a regular expression that finds AGAGAGAGAGAGAGAG dinucleotide

microsatellite repeats with lengths of 5 to 500

Page 49: 2015 11-17-programming inr.key

Again

Make a regular expression

• matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but not “LMVSQMIKTIP”

• matching all variants of “ok” (e.g., “O.K.”, “Okay”…)

Page 50: 2015 11-17-programming inr.key
Page 51: 2015 11-17-programming inr.key

Ok… so how do we use this?

• ?grep

• ?gsub

Page 52: 2015 11-17-programming inr.key

Which species names include ‘y’?Create a vector with only species names, but replace all ‘y’ with ‘Y!

ants <- read.table("https://goo.gl/3Ek1dL") colnames(ants) <- c("genus", "species")

Remove all vowels

Replace all vowels with ‘o’

Page 53: 2015 11-17-programming inr.key
Page 54: 2015 11-17-programming inr.key

Functions

Page 55: 2015 11-17-programming inr.key

Functions• R has many. e.g.: plot(), t.test()

• Making your own:

tree_age_estimate <- function(diameter, species) { growth_rate <- growth_rates[ species ] age_estimate <- diameter / growth_rate return(age_estimate)}

> tree_age_estimate(25, “White Oak”)+ 66> tree_age_estimate(60, “Carya ovata”)+ 190

Page 56: 2015 11-17-programming inr.key

Make a function• That converts fahrenheit to celsius

(subtract 32 then divide the result by 1.8)

Page 57: 2015 11-17-programming inr.key

Loops

Page 58: 2015 11-17-programming inr.key

“for” Loop

> possible_colours <- c('blue', 'cyan', 'sky-blue', 'navy blue', 'steel blue', 'royal blue', 'slate blue', 'light blue', 'dark blue', 'prussian blue', 'indigo', 'baby blue', 'electric blue')

> possible_colours [1] "blue" "cyan" "sky-blue" "navy blue" [5] "steel blue" "royal blue" "slate blue" "light blue" [9] "dark blue" "prussian blue" "indigo" "baby blue" [13] "electric blue"

> for (colour in possible_colours) {+ print(paste("The sky is oh so, so", colour))+ }

[1] "The sky is so, oh so blue"[1] "The sky is so, oh so cyan"[1] "The sky is so, oh so sky-blue"[1] "The sky is so, oh so navy blue"[1] "The sky is so, oh so steel blue"[1] "The sky is so, oh so royal blue"[1] "The sky is so, oh so slate blue"[1] "The sky is so, oh so light blue"[1] "The sky is so, oh so dark blue"[1] "The sky is so, oh so prussian blue"[1] "The sky is so, oh so indigo"[1] "The sky is so, oh so baby blue"[1] "The sky is so, oh so electric blue"

Page 59: 2015 11-17-programming inr.key

What does this loop do?for (index in 10:1) { print(paste(index, "mins befo lunch"))}

Page 60: 2015 11-17-programming inr.key

Again

• What does the following code do (decompose on pen and paper)

Page 61: 2015 11-17-programming inr.key

for (letter in LETTERS) { begins_with <- paste("^", letter, sep="") matches <- grep(pattern = begins_with, x = ants$genus) print(paste(length(matches), "begin with", letter))}

> LETTERS [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"[20] "T" "U" "V" "W" "X" "Y" "Z"> ants <- read.table("https://goo.gl/3Ek1dL")> colnames(ants) <- c("genus", “species")> head(ants) genus species1 Anergates atratulus2 Camponotus sp.3 Crematogaster scutellaris4 Formica aquilonia5 Formica cunicularia6 Formica exsecta

What does this loop do?

Page 62: 2015 11-17-programming inr.key
Page 63: 2015 11-17-programming inr.key

Jasmin Zohren Bruno

VieiraRodrigo Pracana

JamesWright

Page 64: 2015 11-17-programming inr.key

Programming in R?

Page 65: 2015 11-17-programming inr.key

If/else

Page 66: 2015 11-17-programming inr.key

Logical Operators

Page 67: 2015 11-17-programming inr.key
Page 68: 2015 11-17-programming inr.key
Page 69: 2015 11-17-programming inr.key
Page 70: 2015 11-17-programming inr.key

going further