genmig.files.wordpress.com€¦  · web viewpractical: speciation genomics. jochen wolf. claire...

37
Practical: Speciation genomics Jochen Wolf Claire Peart Verena Kutschera Background In this practical, we will investigate European crows revisiting some analyses from Poelstra et al. (2014). This study was based on 60 whole genome sequences from four European crow populations representing the two colour morphs/sub-species, carrion and hooded crows. One of the main findings was that despite strong morphological differentiation between the two morphs, only a small part of the genome is genetically differentiated. You will get a subset of the genome data to analyse their population structure, to scan the genome for genetic differentiation and local phylogenies, and to estimate gene flow. In terms of bioinformatics experience, you will familiarize yourself with important file formats (.sam, .vcf), get to know common popgen software packages (angsd, hierfstat, vcftools, plink, saguaro) and visualize your data in R. Literature Poelstra JW, Vijay N, Bossu CM, Lantz H, Ryll B, Müller I, Baglione V, Unneberg P, Wikelski M, Grabherr MG, Wolf JBW. 2014. The genomic landscape underlying phenotypic integrity in the face of gene flow in crows. Science 344:1410–1414.

Upload: others

Post on 23-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Practical: Speciation genomics

Jochen WolfClaire PeartVerena Kutschera

Background

In this practical, we will investigate European crows revisiting some analyses from Poelstra et al. (2014). This study was based on 60 whole genome sequences from four European crow populations representing the two colour morphs/sub-species, carrion and hooded crows. One of the main findings was that despite strong morphological differentiation between the two morphs, only a small part of the genome is genetically differentiated.

You will get a subset of the genome data to analyse their population structure, to scan the genome for genetic differentiation and local phylogenies, and to estimate gene flow.

In terms of bioinformatics experience, you will familiarize yourself with important file formats (.sam, .vcf), get to know common popgen software packages (angsd, hierfstat, vcftools, plink, saguaro) and visualize your data in R.

LiteraturePoelstra JW, Vijay N, Bossu CM, Lantz H, Ryll B, Müller I, Baglione V, Unneberg P, Wikelski M, Grabherr MG, Wolf JBW. 2014. The genomic landscape underlying phenotypic integrity in the face of gene flow in crows. Science 344:1410–1414.

Page 2: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

PRACTICAL 1 PART IAll commands to type will be written as shown below:

Example

If commands are too long to fit into one line, their end is indicated by an empty line:

This example command is super long and does not fit into one line so it continues in the next line and the line afterwards is empty

The next command starts here

Let’s get started. We will perform all analyses on wallace. Let’s go!

Please log on:

ssh -X [email protected]

And use the following password:

!workshop17!

Make your own working directory with your own name. Please use underscores instead of white space, if needed.

mkdir your_name

cd your_name

We will be using three types of software: vcftools, plink and R. vcftools is available on the command line, plink is in /home/software and R needs to be loaded with the following command:

module load R/3.3.2

Page 3: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

To begin with we will use vcf files. These files only contain variable sites and each position is on a single line. Here is an example of the format, here only a single individual is shown (NA12878). For further information you can read https://www.broadinstitute.org/gatk/guide/article?id=1268.

[HEADER LINES STARTING WITH #]

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12878

1 873762 . T G 5231.78 PASS [ANNOTATIONS] GT:AD:DP:GQ:PL 0/1:173,141:282:99:255,0,255

1 877664 rs3828 A G 3931.66 PASS [ANNOTATIONS] GT:AD:DP:GQ:PL 1/1:0,105:94:99:255,255,0

1 899282 rs2854 C T 71.77 PASS [ANNOTATIONS] GT:AD:DP:GQ:PL 0/1:1,3:4:26:103,0,26

1 974165 rs9442 T C 29.84 LowQual [ANNOTATIONS] GT:AD:DP:GQ:PL 0/1:14,4:14:61:61,0,255

We would like to visualise the differentiation between populations by performing a principal component analysis (PCA). Patterns of structure within sites with missing data can bias this result so first we will remove all missing data using vcftools. We will start with scaffold 78:

vcftools --vcf /home/workshop/data/scaffold_78.crows.vcf --max-missing 1.0 --recode --out scaffold_78_noMissing

We will calculate the PCA using the program plink. Firstly we need to reformat the data and we can also do this using vcftools.

vcftools --vcf scaffold_78_noMissing.recode.vcf --plink --out scaffold_78_plink

Two data files are produced with a .ped and .map format. Take a look at the content of the files using less:

less scaffold_78_plink.ped

less scaffold_78_plink.map

Plink is fast and useful for our purpose here, but more tailored for GWAS, as indicated by the many columns that are ‘0’ in our case. You can get more information on plink file formats here: http://www.gwaspi.org/?page_id=145

You can close less by typing:

q

Page 4: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Briefly describe the format of these two files.

Next we will run the PCA in plink.

~/software/plink --ped scaffold_78_plink.ped --map scaffold_78_plink.map --pca --out scaffold_78

Four files are produced. We are interested in the .eigenvec and .eigenval. These contain the eigenvectors and eigenvalues of the PCA. What do these terms mean?

Now we want to visualise the results. We will do this using R:

R

scaffold_78<-read.table("scaffold_78.eigenvec", sep=" ", header=FALSE)

The table is called “scaffold_78”. Take a look at the first few rows of the table:

View(scaffold_78)

or type

scaffold_78

Next, add column names.

colnames(scaffold_78)<-c("Sample", "Location", "PC1", "PC2", "PC3", "PC4", "PC5", "PC6", "PC7", "PC8", "PC9", "PC10", "PC11", "PC12", "PC13", "PC14", "PC15", "PC16", "PC17", "PC18", "PC19", "PC20")

Page 5: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Remove the individual ID numbers from the location column. This column will then show only the sampling location for each sample (D = Germany, E = Spain, P = Poland, S = Sweden).

scaffold_78$Location = substr(scaffold_78$Location,1,nchar(as.character(scaffold_78$Location))-2)

Plot the PCA with the samples coloured by sampling location.

scaffold_78_PCA<-plot(scaffold_78$PC1, scaffold_78$PC2, col= c("red","blue", "orange", "mediumpurple")[as.factor(scaffold_78$Location)], pch = 16)

legend("topleft",pch=c(16,16,16,16),col=c("red","blue","orange","mediumpurple"), c("Germany","Spain","Poland", "Sweden"))

What does PC1 show? What does PC2 show?

Page 6: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Save the graph:

pdf("scaffold_78_PCA.pdf")

scaffold_78_PCA<-plot(scaffold_78$PC1, scaffold_78$PC2, col= c("red","blue", "orange", "mediumpurple")[as.factor(scaffold_78$Location)], pch = 16)legend("topleft",pch=c(16,16,16,16),col=c("red","blue","orange","mediumpurple"), c("Germany","Spain","Poland", "Sweden"))dev.off()

Now quit R:

q()

n

Repeat this process with data from another scaffold in the genome, scaffold 75. Use the file scaffold_75.crows.vcf

When you are finished with scaffold 75, download the pdf files to your local computer. First, you will need to open a new terminal window, and go to the desktop (don’t connect to uppmax):

cd Desktop

Next, download the file using scp:

Page 7: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

scp [email protected]:/home/workshop/your_name/scaffold*PCA.pdf .

Type the password (!workshop17!).

Now look at the two plots using a PDF viewer. How does the PCA for scaffold 75 differ from that for scaffold 78? Why?

Page 8: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

On the next page is Figure 1 from Poelstra et al., 2014. This is a PCA calculated from whole genome resequencing data. How does the PCA correspond to the geographic sampling? How do the results compare to those for scaffold 78 and scaffold 75?

Page 9: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

PRACTICAL 1 PART IINow we wish to investigate the divergence along these scaffolds at a finer scale. Specifically we will plot Fst values between carrion and hooded crows within windows (10Kb). There are many steps to create these files which we won’t do in detail today but briefly, only SNPs are used and missing data is filtered to a maximum of 50% per population. The data is reformatted and Fst values have been calculated using variance components in the R package Hierfstat. You can read more about Hierfstat here: http://www2.unil.ch/popgen/teaching/SISG15/demeeusGoudet_ige_hierfstattutorial_2007.pdf.

The input file is

/home/workshop/data/scaffold_78_50_miss_hierfstat_10000.out

Page 10: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Look at the beginning of this file (make sure you are in the right directory):

head scaffold_78_50_miss_hierfstat_10000.out

Which other summary statistics have we calculated?

Page 11: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European
Page 12: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Now, move into your directory and read the table into R:

cd your_name

R

scaffold_78_input<-read.table(file="/home/workshop/data/scaffold_78_50_miss_hierfstat_10000.out",header=T)

View(scaffold_78_input)

How many 10kb windows are there on scaffold 78?

You can find the number of windows by looking at the number of rows in the data frame or by typing:dim(scaffold_78_input)[1]

wincount<- Type the number of 10kb windows here

scaffold_78_input$winnumber<-as.vector(1:wincount)

This adds a column of window numbers now you can plot the graph of Fst along scaffold 78.

plot(scaffold_78_input$winnumber,scaffold_78_input$win.FST,type="n", xlim=c(0,wincount),ylim=c(-0.1,1.0),ylab="Fst")

points(scaffold_78_input$winnumber,scaffold_78_input$win.FST,pch=19,col="steelblue")

Save the graph:

pdf('scaffold_78_Fst_window.pdf')

plot(scaffold_78_input$winnumber,scaffold_78_input$win.FST,type="n", xlim=c(0,wincount),ylim=c(-0.1,1.0),ylab="Fst")

points(scaffold_78_input$winnumber,scaffold_78_input$win.FST,pch=19,col="steelblue")

dev.off()Close R:

q()

n

Page 13: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

How does Fst vary along scaffold 78? What could cause this pattern?

Page 14: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Plot the results for scaffold 75 using the file

/home/workshop/data/scaffold_75_50_miss_hierfstat_10000.out

Then download the pdfs to your local computer so you can look at both at once. First, open again a new terminal window and go to the Desktop:

cd Desktop

Next, download the file using scp:

scp [email protected]:/home/workshop/your_name/scaffold*Fst_window.pdf .

Type your password.

Now open the pdfs on your local computer. How does Fst vary along scaffold 75? How does the pattern contrast to scaffold 78?

Page 15: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Now we will investigate what the pattern looks like with 50kb windows. Use the file

/home/workshop/data/scaffold_78_50_miss_hierfstat_50000.out

and plot Fst along scaffold 78.

How does choosing a different window size affect the results? What do you think is important to consider when choosing a window size?

Page 16: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Why might we only compare Fst between the German and Polish populations? Do you think this would be a good idea?

Page 17: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

PRACTICAL 2 PART I

In the first part of practical 2, we would like to estimate directional gene flow between all possible population pairs using ABBA-BABA.

First, log on to wallace:

ssh -Y [email protected]

Type in the password and press Enter. Now move to your directory:

cd your_name

We will be using three types of software: angsd, saguaro and R. angsd and saguaro are in /home/software and R needs to be loaded with the following command:

module load R/3.3.2

ABBA-BABA does not take vcf files as input, but BAM files. BAM files are the binary format versions of SAM files (SAM = Sequence Alignment/Map format). This is the output from mapping your reads (in FastQ format) to your reference genome, e.g. using BWA. You can read more about the sam/bam format here http://samtools.github.io/hts-specs/SAMv1.pdf.

Briefly, this is what the SAM format looks like:

[HEADER LINES STARTING WITH @][ALIGNMENT LINES CONSISTING OF 11 MANDATORY COLUMNS]HWI-ST1001:137:C12FPACXX:7:1115:14131:66670 0 chr1 12805 142M4I5M * 0 0 TTGGATGCCCCTCCACACCCTCTTGATCTTCCCTGTGATGTCACCAATATGCCCFFFFFHHGHHJJJJJHJJJJJJJJJJJJJJJJIJJJJJJJJJJJJIJJ AS:i:-28 XN:i:0 XM:i:2 XO:i:1XG:i:4 NM:i:6 MD:Z:2C41C2 YT:Z:UU NH:i:3 CC:Z:chr15 CP:i:102518319 XS:A:+ HI:i:0

Each line of the alignment section consists of 11 mandatory and additional optional columns:

Page 18: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Nr Content Description

1 HWI-ST1001:137:C12FPACXX:7:1115:14131:66670

Query template (read) name

2 0 SAM flag

3 chr1 Chromosome (“*” if a read has no alignment)

4 12805 Position (1-based index, "left end of read")

5 1 MAPQ (mapping quality). Describes the uniqueness of the alignment (0=non-unique, >10 probably unique)

6 42M4I5M CIGAR string. Describes the position of insertions / deletions / matches in the alignment

7 * Name of mate (mate pair information for paired-end sequencing, often "=") or of the next read

8 0 Position of mate (mate pair information) or the next read

9 0 Observed template length (only for paired-end sequencing data, “0” if not available)

10 TTGGATGCCCCTCCACACCCTCTTGATCTTCCCTGTGATGTCACCAATATG

Read sequence

11 CCCFFFFFHHGHHJJJJJHJJJJJJJJJJJJJJJJIJJJJJJJJJJJJIJJ

Read quality (ASCII of Phred-scaled base quality)

12 AS:i:-28 XN:i:0 XM:i:2 XO:i:1XG:i:4 NM:i:6 MD:Z:2C41C2 YT:Z:UU NH:i:3 CC:Z:chr15 CP:i:102518319 XS:A:+ HI:i:0

Here: program specific flags

Page 19: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Let’s make sure you are in the correct directory by typing:

pwd

You should be in your own directory (/home/workshop/your_name).

Let’s have a look at the sam format it in our data. samtools view /home/workshop/data/hooded_crow.Poland.P05.bam | less

Run the first part of the ABBA-BABA analyses by typing: (for more information consult http://www.popgen.dk/angsd/index.php/Abbababa)

~/software/angsd/angsd -doAbbababa 1 -bam /home/workshop/data/bamFile.list -rf /home/workshop/data/scaffold.list -out out -blockSize 50000 -anc /home/workshop/data/anc_rm.fasta -doCounts 1

Run the second part of the analysis, which is a statistical analysis of the ABBA-BABA output to generate Z-scores:

Rscript ~/software/angsd/R/jackKnife.R file=out.abbababa indNames=/home/workshop/data/bamFile.list outfile=out

Take a look at the output by typing:

ls

You will find the following output files:

out.abbababaout.argout.txt

You will get your final results by opening the output file out.txt in R:

R

out<-read.table("out.txt",head=T)

out

The file contains the following information:

Table: Header of ABBA-BABA test output file and explanations.

Page 20: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

H1 H2 H3 The 3 individuals in the tree that are not the outgroup. H1 and H2 are the ingroup.

nABBA The total counts of ABBA patterns.

nBABA The total counts of BABA patterns.

Dstat The test statistic: (nABBA-nBABA)/(nABBA+nBABA). A negative value means that H1 is closer to H3 than H2 is. A positive value means that H2 is closer to H3 than H1 is.

JackEst Another estimate of the abbababa statistic that is bias corrected. This value is extremely similar to the value in the Dstat column.

SE The estimated m-delete blocked Jackknife Standard error of the estimate used to obtain the Z value.

Z Z value that can be used to determine the significance of the test. An absolute value of the Z score above 3 is often used as a critical value. However, note that this does not take into account the fact that we perform multiple tests.

This is a table with the ABBA-BABA results for the entire genome from Poelstra et al. (2014). Please add the results from your own analysis to the table (D-statistics sufficient):

H1 H2 H3 n.ABBA n.BABA DstatDstat your

resultsZ

Interpretation

Sp Po Ge 137846 112037 0.10 34.31 Germany associates closely with

Page 21: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

hooded crow populations, suggesting extensive introgression

Sp Sw Ge 138804 112003 0.11 35.43

Po Sw Ge 114733 113777 0.00 1.50

Hooded crow populations are more tightly associated among themselves than to the carrion crow populations

Po Sw Sp 88674 89535 -0.01 -1.66Ge Sw Po 127344 113777 0.06 20.02Sp Sw Po 152483 89535 0.26 95.79Ge Po Sw 127344 114733 0.05 18.61Sp Po Sw 152483 88674 0.27 97.51

Ge Sp Po 88529 137846 -0.22 -77.51 Evidence for gene flow between Germany and hooded crow populations,

but not between Spain and hooded crow populations

Ge Sp Sw 87649 138804 -0.23 -81.13

Ge Po Sp 88529 112037 -0.12 -39.6 Carrion crow populations are tightly associated with each otherGe Sw Sp 87649 112003 -0.12 -41.56

Page 22: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Do you find similarities or differences to your own results? How would you explain any differences between your results and those calculated using the entire genome?

How does the ABBA BABA test distinguish between gene flow and incomplete lineage sorting?

What does the parameter “blockSize” mean and why did we change it to 50,000 instead of using the default value of 500,000?

Close R by typing

q()

n

Page 23: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

PRACTICAL 2 PART IIIn the second part of practical 2, we will make an analysis of local phylogenies, so called ‘cacti’, along scaffold 78. We will use the software Saguaro to do that.

First, check again where you are:

pwd

You need to be in your own directory (/home/workshop/your_name)

First, you will need to convert the input file from vcf format to Saguaro binary format:

~/software/saguarogw-code/VCF2HMMFeature -i /home/workshop/data/scaffold_78.crows.vcf -o scaffold_78.hmm -m 60

If you type:

ls

you will find the following new output file:

scaffold_78.hmm

Next, run Saguaro:

~/software/saguarogw-code/Saguaro -f scaffold_78.hmm -o scaffold_78.out -iter 10

This step will take a few minutes. Once the analysis is finished, take a look at the output by typing:Once the analysis is finished, take a look at the output by typing:

ls

and the following new output directory should appear containing your results:

scaffold_78.out

List the files it contains:

ls scaffold_78.out

Page 24: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

You will find these files:

Page 25: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

HMMTrain.out.0HMMTrain.out.1HMMTrain.out.2HMMTrain.out.3HMMTrain.out.4HMMTrain.out.5HMMTrain.out.6HMMTrain.out.7HMMTrain.out.8HMMTrain.out.9HMMTrain.out.10LocalTrees.outsaguaro.cactussaguaro.configsaguaro.garbagesaguaro.garbage.vec

You will find the 10 hypotheses that have been generated during the run and that fit individual regions best scaffold-wide as distance matrices in the file

Page 26: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

saguaro.config

and a phylogeny for each genomic location, including coordinates and a distance matrix best describing this region in the file

LocalTrees.out

We will draw cacti from the distance matrices in the file saguaro.config. First, make a new directory for the cacti in your own directory (check with pwd where you are if you are not sure) and copy the Saguaro output file there:

mkdir scaffold_78.cacti

cp scaffold_78.out/saguaro.cactus scaffold_78.cacti

Move into the new directory:

cd scaffold_78.cacti

Page 27: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Now, split the saguaro.cactus output file into one distance matrix file per cactus using the following awk-command:

awk '/cactus/{n++}{print >"cactus" n-1 ".dist" }' saguaro.cactus

Generate Neighbor-Joining trees from saguaro distances using the Rscript makeNJtrees.R:

Rscript /home/workshop/data/makeNJtrees.R

Take a look at the different cacti that are saved as *.pdf files in the directory scaffold_78.cacti. To do that, you will again need to download them to your local computer using scp. First, open again a new terminal window and go to the Desktop:

cd Desktop

Next, download the file using scp:

scp [email protected]:/home/workshop/your_name/scaffold_78.cacti/*pdf .

Type the password.

Now, open the pdfs on your local computer and answer the questions below. The following table explains how to read the IDs of the individuals on the tips of the cactus branches:

Table: Individual ID abbreviations with population and taxon assignment.

ID = one letter indicating the population of origin, followed by two numbers to distinguish the individuals from each other

Population Taxon

S, e.g. S05 Sweden Hooded crow

P, e.g. P05 Poland Hooded crow

D, e.g. D05 Germany Carrion crow

E, e.g. E05 Spain Carrion crow

Page 28: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

How do hooded and carrion crows cluster in the different cacti? In which cacti are they mixed up, in which cacti do they form distinct clusters?

Page 29: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European
Page 30: genmig.files.wordpress.com€¦  · Web viewPractical: Speciation genomics. Jochen Wolf. Claire Peart. Verena Kutschera. Background. In this practical, we will investigate European

Next, extract the genome positions of the 10 different cacti from the Saguaro output. First, move back to your directory /home/workshop/your_name using cd. Now, type:

grep '^cactus' scaffold_78.out/LocalTrees.out | sort -n -k3,3 > scf78_genome_positions.cacti

Open the newly created file containing the genome positions to answer the questions below:

less scf78_genome_positions.cacti

In which genomic regions do you find cacti in which hooded and carrion crows are clearly separated?

Compare the genomic regions with cacti that clearly separate hooded and carrion crows to the genomic scans you completed earlier. Is there a general trend?

Exit less by typing

q