next generation sequencing
DESCRIPTION
Algorithm to find G-Quadruplex in a DNA sequenceTRANSCRIPT
HIRA BATOOL
MAIRA KHAN
MARYAM ZAHRAH
NADIA ASHRAF
NEXT GENERATION SEQUENCING
Determining the sequence of the nucleotides i.e. A,T,C,G along a DNA strand
Why DNA to be sequenced…???
To know the order of nucleotides which will enable us to:› To know about genes
› Find information about the architecture of the genome
› Comparative genome analysis
Plus-minus strand sequencing
Maxam-Gilbert chemical sequencing
Sanger`s chain termination sequencing (Dye termination sequencing)
The most used method for last 30 years!
By the passage of time new techniques were developed which replaced old techniques because they were:
› Old
› Time consuming
› Laborious
› Expensive
› Hazardous(used hazardous reagents)
› Volume of reagents and space
Roche’s (454) GS FLX Genome Analyzer
Illuminas Solexa 1G sequencer
Applied Biosystem’s SOLiD system
Helicos
A DNA sequence that initially binds the RNA polymerase.
Upstream to the transcription start site.
Core promoter refers to the minimal set of sequence elements required for accurate transcription Initiation.
Usually -35 to +35
- 37 TO -32
Consensus sequence: G/C G/C G/A CGCCC
Recognized by TFIIB.
The TFIIB–BRE interaction facilitates the assembly of a TFIIB–TBP–TATA complex
-31 to – 26
Consensus sequence : T A T A A/T A A/T
Recognized by TBP( a subunit of TFIID)
In Humans, 32% of 1031 potential promoter regions have one.
Primary role is formation of pre-initiation complex(promoters + General TF).
-2 TO +4
C/T C/T A+1 N T/A C/T C/T
Recognized by TFIID
Nucleates PIC formation in TATA less promoter
facilitates the binding of Transcription Factor II D (TBP)..
+ 28 to +32
A/G G A/T CGTG
Recognized by TFIID
DPE plays a major role at TATA-less promoters.
+18 TO +29
Consensus sequence: CSARSSAACGC
cooperate with the initiator to stimulate transcription.
NO TATA in these promoters.
DNA sequences in which Four guanine bases can associate through Hoogsteen hydrogen bonding to form a square planar structure called a G-tetrad, and two or more G- tetrads can stack on top of each other to form a G-quadruplex.
Repeats of at least 3 guanine residues are separated by loops of 1-7 other base pairs
Present in DNA, RNA, LNA (locked), PNA (peptide)
Across a wide range of species, G4 DNA motifs were found in telomeres, G-rich micro- and mini-satellites, near promoters, and within the ribosomal DNA (rDNA)
Important components of human telomeres, and play a role in regulation of transcription and translation.
They are also interesting as nanotechnologicaldevices..
Generally, a simple pattern match is used for searching for possible quadruplex forming sequences:
G3+N1-7G3+N1-7G3+N1-7G3+
where N is any base (including G)
.
Text = Promoter or Telomere sequence
Pattern= GGG
N=text.length( )
M=pattern.length( )
Count=0 [ count the no of G’s]
Array[i][j] [ stores start and end position]
i=0 [stores start]
For(t=0;t<=N;) [scans Text]
{
for(p=0;p<=M;) [scans Pattern]
{
if(text[t]==Pattern[p])
{
p++;
t++;
count++;
}
else
t++; count=0
if(count>=M)
{
j=0;
array[i][j]=t -2; [stores start]
array[i][j+1]=t; [stores end]
i++;
}
t= prefix function( )
if(arr[i][j] – arr[i-1][j+1] >0 && arr[i][j]- arr[i-1][j+1]!<=7)
{
t= prefix function( )
}
}
}
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000861
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1636468/
Quadruplex.org
R Purine (A or G) Y Pyrimidine (C or T) N Any nucleotide W Weak (A or T) S Strong (G or C) M Amino (A or C) K Keto (G or T) B Not A (G or C or T) H Not G (A or C or T) D Not C (A or G or T) V Not T (A or G or C)
PNA's backbone is composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds. The bases are linked to the backbone by methylene carbonyl bonds.
LNA The ribose moiety of an LNA nucleotide is modified with an extra bridge connecting the 2' oxygen and 4' carbon