bioinformatics final product claire
TRANSCRIPT
DNA methylation coverage in two tissues of the Pacific Oyster
Claire Ellis
Bioinformatics Terminal Product
3/14/13
Epigenetics describes DNA modifications that change gene expression without altering nucleotide sequence.
DNA methylation in organisms is extremely diverse, variable among species, and can change genome function under external influences.
DNA methylation patterns in Crassostrea gigas
DNA methylation
Source: http://www.nist.gov/pml/div689/dna
_011911.cfm
CH3
Bisulfite sequencing was used to examine DNA methylation in gonad tissue
MBD-Seq was used to examine DNA methylation in gill tissue (Mackenzie)
Sequencing Approaches
Bisulfite sequencingCm= methylated cytosineC= unmethylated cytosine5’ ACmGTTCGCTTGAG 3’3’ TGCmAAGCGAACTC 5’
5’ ACmGTTUGUTTGAG 3’3’ TGCmAAGUGAAUTU 5’
Bisulfite Treatment
Bisulfite converted reads aligned to genome and % methylation value per base calculated by processing alignments
methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing
Approach
Goal: obtain methylation coverage and examine differential methylation between gonad and gill tissues
Read annotation files and perform basic statistical analyses for differentially methylated regions or bases
methylKit
Methylation Statistics
Histogram of % CpG methylation
% methylation per base
Fre
que
ncy
20 40 60 80 100
05
000
010
0000
150
000
2000
00
0 0 0 0 0 0.3 0 0
4.4
0 0 0 0.3 0 0 0 0 0
94.7test
Histogram of % CpG methylation
% methylation per base
Fre
qu
en
cy
0 20 40 60 80 100
01
00
00
20
00
03
00
00
40
00
0
4.5
2.4
1
1.81.4
1.7 1.6
2.62
3.8
2.7
4 4.2
5.4
6.4
7.1
8.1
9.8
12.5
16.8test
Gonad Gill
Coverage Statistics
Histogram of CpG coverage
log10 of read coverage per base
Fre
que
ncy
0.0 0.5 1.0 1.5 2.0
05
0000
100
000
15000
0200
000
91.6
0 0
7.2
0.7 0 0.2 0 0 0 0 0 0 0.1 0 0 0 0 0 0 0
test
Histogram of CpG coverage
log10 of read coverage per base
Fre
que
ncy
1.0 1.5 2.0 2.5 3.0
010
000
20000
3000
0400
00
16.9
12.512.7
13.6
9.4
8.5
7.6
5.5
3.9
2.9
21.5
10.70.50.30.20.10.1 0 0 0 0 0
testGonad Gill
CpG base correlation
~/Desktop/TJGR_GonadPE_BS_v9_90_CG_methylkit_modified.txt
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
0.068
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x
y
~/Desktop/TJGR_gillMBD_BS_v9_10x_methylkit_modified.tabular.txt
CpG base pearson cor.
Gonad
Gill
0.068
Methylation Clustering
0.0
0.2
0.4
0.6
0.8
CpG methylation clustering
Distance method: "correlation"; Clustering method: "ward"
Samples
Heig
ht
~/D
eskto
p/T
JG
R_G
onadP
E_B
S_v9
_90_
CG
_m
eth
ylk
it_m
odifie
d.txt
~/D
eskto
p/T
JG
R_
gill
MB
D_B
S_v9
_10x_
meth
ylk
it_
modifie
d.tabu
lar.
txt
Blue= Gonad
Red= Gill
PCA- Principal Component Analysis
-60 -40 -20 0 20 40 60
-2e-1
2-1
e-1
20e
+00
1e
-12
2e-1
2
CpG methylation PCA Analysis
PC1
PC
2
~/Desktop/TJGR_GonadPE_BS_v9_90_CG_methylkit_modified.txt
Blue= Gonad
Red= Gill
Additional analyses included examining type of differential methylation (hypo and hyper)
Extracted bases with a q-value <0.01 and % methylation difference >25%
The methylKit package was successfully used to characterize DNA methylation
Differences between gonad and gill methylation profiles may be due to library prep
Will use R script for future analyses comparing different samples’ methylation profiles
Conclusions
> getMethylationStats(gonad,plot=F,both.strands=F)
methylation statistics per base
summary (gonad):
Min. 1st Qu. Median Mean 3rd Qu. Max.
9.091 100.000 100.000 97.360 100.000 100.000
Percentiles (gonad):
0% 10% 20% 30% 40% 50% 60% 70% 80% 95% 99.5% 99.9% 100%
9.09 100 100 100 100 100 100 100 100 100 100 100 100
summary (gill):
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 54.55 78.70 69.56 91.89 100.000
Percentiles (gill):
0% 10% 20% 30% 40% 50% 60% 70% 80% 95% 99.5% 99.9% 100%
0 21.4 46.6 61.1 70.9 78.7 84.6 90 93.7 100 100 100 100
Methylation Statistics