using gc content to distinguish phytophthora sequences from tomato sequences

24
Using GC content to distinguish Phytophthora sequences from tomato sequences

Upload: arlene-pierce

Post on 17-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using GC content to distinguish Phytophthora sequences from tomato sequences

Using GC content to distinguish Phytophthora sequences from

tomato sequences

Page 2: Using GC content to distinguish Phytophthora sequences from tomato sequences

Mission #1

Calculate the GC content of each sequence in the Phytophthora-tomato interactome

We will use a perl script to accomplish the mission.

Page 3: Using GC content to distinguish Phytophthora sequences from tomato sequences

Preparation

• Download the perl script (gc.pl) from the class web site and store it in C:/BioDownload folder

Page 4: Using GC content to distinguish Phytophthora sequences from tomato sequences

• Open cygwin, or command prompt (Vista users), or terminal (Mac users)

• Change directory (cd) to the BioDownload folder

perl<space>gc.pl<space>PhytophSeq1.txt<space>phyto_gc.out

Running the script

Page 5: Using GC content to distinguish Phytophthora sequences from tomato sequences

In cygwin (Windows users) or terminal (Mac users)

grep<space>--perl-regexp<space>”\t”<space>-c<space>phytoph_gc.out

grep<space>”>”<space>-c<space>PhytophSeq1.txt

You should get the same number from the two commands.

The number should be 3921.

Results

Page 6: Using GC content to distinguish Phytophthora sequences from tomato sequences

The output file

GC content column

Namecolumn

Page 7: Using GC content to distinguish Phytophthora sequences from tomato sequences

Build a histogram of the values of GC content

We will use R program to accomplish this mission.

Mission #2

Page 8: Using GC content to distinguish Phytophthora sequences from tomato sequences

http://www.r-project.org

Page 9: Using GC content to distinguish Phytophthora sequences from tomato sequences
Page 10: Using GC content to distinguish Phytophthora sequences from tomato sequences
Page 11: Using GC content to distinguish Phytophthora sequences from tomato sequences

Mac users

Page 12: Using GC content to distinguish Phytophthora sequences from tomato sequences

All Windows users

Page 13: Using GC content to distinguish Phytophthora sequences from tomato sequences

XP users

Vista users

Page 14: Using GC content to distinguish Phytophthora sequences from tomato sequences
Page 15: Using GC content to distinguish Phytophthora sequences from tomato sequences

getwd() to know which folder you are in now

Page 16: Using GC content to distinguish Phytophthora sequences from tomato sequences

setwd(“c:/BioDownload”) to change the working directory to C:/BioDownload

setwd(“/path/to/biodownload”) for Mac users

Page 17: Using GC content to distinguish Phytophthora sequences from tomato sequences

data<-read.table(“phytoph_gc.out”,sep=“\t”,header=FALSE)

to read in the data in the file phytoph_gc.out (your file name may be different)

Page 18: Using GC content to distinguish Phytophthora sequences from tomato sequences

data[1:10,]

to see the first 10 lines of the vector “data”

Page 19: Using GC content to distinguish Phytophthora sequences from tomato sequences

gc<-data[,2]

to assign the values from the 2nd column of “data” to a new vector “gc”

Page 20: Using GC content to distinguish Phytophthora sequences from tomato sequences

summary(gc)

to get the summary of the values in the vector “gc”

Page 21: Using GC content to distinguish Phytophthora sequences from tomato sequences

hist(gc,breaks=58)

to draw a histogram of the values in “gc” vector

Breaks indicates how many cells you want for the histogram. It was calculated as 78.7 (max) - 21.2 (min). It means the bin of the histogram is ~ 1 GC value

Page 22: Using GC content to distinguish Phytophthora sequences from tomato sequences

hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”)

to make the histogram look better

Page 23: Using GC content to distinguish Phytophthora sequences from tomato sequences

>pdf(“gc_histogram.pdf”)>hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”)>dev.off()

To output the histogram to a PDF file.

Page 24: Using GC content to distinguish Phytophthora sequences from tomato sequences

location

file