dna dna (deoxyribonucleic acid) and rna (ribonucleic acid) are composed of linear chains of...
TRANSCRIPT
![Page 1: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/1.jpg)
DNA
• DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides
• A nucleotide has three parts: a sugar, a phophate and a base
• Four bases
![Page 2: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/2.jpg)
• Two strands are complementary• Base pairing: A-T; G-C• Pyrimidine and Purine form complementary H
bonding
Secondary Structure of DNA
![Page 3: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/3.jpg)
• Genome– The entire DNAs of a cell is the genome– Individual units for coding proteins or RNA are genes
– A gene starts with ATG, ends with one or two stop codons
– Called ORF (Open Reading Frame)
– Biological Info– Contained in genome– Encoded in nucleotide sequences of DNA or RNA– Partitioned into discrete units, genes
Genome
![Page 4: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/4.jpg)
Genome Databases
Completed genomes ftp site -- ftp://ftp.ncbi.nlm.nih.gov/genomes/ http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/allorg.html http://www.ebi.ac.uk/genomes/mot/index.html http:/pir.goergetown.edu/pirwww/search/genome.html
Organism-specific databases http://www.unledu/stc-95/ResTools/biotools/biotools10.html http://www.fp.mcs.anl.gov/~gaasterland/genomes.html http://www.hgmp.mrc.ac.uk/GenomeWeb/genome-db.html http://www.bioinformatik.de/cgi-bin/browse/Catalog/
Databases/Genome_Proejcts
![Page 5: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/5.jpg)
Human Genome• Human Genome Project
– Conceived in 1984, begun in 1990, completed in 2001 ahead of 2003 schedule
• What did the sequence reveal ?– 3 Bbp (base pair)
– 24 chromosomes,
– 22 autosomes plus two sex chromasomes (X,Y)
– Longest 250 Mbp, shorted 55 Mbp
– Mitochondrial genome
– Circular DNA molecule of 16.569 Mbp
– ~10**(13) cells
– How many is 3 Bbp ?– Typical 11-pt font can print 60 nucleotide is 3 in (~10 cm).
– In this format, 3 Bbp writes out in 5,000 mi
![Page 6: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/6.jpg)
Other Species
Organism Genome size # of genes
Epstein – Barr virus 0.17 Mbp 80
E.Coli 4.6 Mbp 4,406
Yeast (S. cerevisiae) 12.5 Mbp 6,172
Nematode worm (C.elegans) 100.3 Mbp 19,099
Thale cress (A. thaliana) 115.4 Mbp 25,498
Fruit fly (D. melanogaster) 128.3 Mbp 13,601
Human (H. sapiens) 3223.0 Mbp 20,500
Fugu (Takifugu rubripes) 390.0 Mbp 30,000
Wheat 16000.0 Mbp 30,000
![Page 7: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/7.jpg)
• In double strands• # of A = # of T; # of G = # of C• Erwin Chargaff’s 1st Parity Rule, 1951
• In a single strand ?• # of A = # of T; # of G = # of C• Erwin Chargaff’s 2nd Parity Rule
Monomer counts in DNA
![Page 8: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/8.jpg)
• Download the Yeast Chromosome 1 sequence from www.cs.uml.edu/~kim/100/yeast01.txt to your C:\100
• Open a Command Prompt from Applications (NOT JES)
• cd C:\100• python• In Python
• NAME the DNA file• Read all lines and put them
into a single string, ‘dna’
• What does lines[0] have ?• What is happening here ?
Parsing DNA Data Files
>>> fp = open(‘yeast01.txt’)>>> lines=fp.readlines()
>>> lines[0]
![Page 9: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/9.jpg)
• Line by line processing is difficult• Each line ends with ‘\n’• How to concatenate all
the lines into a LONG string by removing ‘\n’
• Why lines[1:], not lines[0:]?
Parsing DNA Data Files
>>> dna = ‘’.join(lines[1:])>>> dna[0:100]>>> dna = dna.replace(‘\n’,’’)
![Page 10: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/10.jpg)
Base-Pair Distribution in a DNA String
• Write a Python function, basePairFreq(dna)• To count the number of ‘A’,’T’,’C’,’G’ in the concatenated dna
string
• How about the distribution of pairs of bases (bimers) ?• ACTTAGG
• AC, CT, TT, TA, AG, GG
• How about trimers, tetramers, pentamers, hexamers, … ?
![Page 11: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/11.jpg)
DNA Base Countingdef baseFreq(dna):
count = [0.0,0.0, 0.0, 0.0]
num = 0
length = len(dna)
for i in range(0,length):
if dna[i:i+1] == 'A': count[0] = count[0]+1
elif dna[i:i+1] == 'C': count[1] = count[1]+1
elif dna[i:i+1] == 'T': count[2] = count[2]+1
elif dna[i:i+1] == 'G': count[3] = count[3]+1
else: num=num
num = num+1
for i in range(0,4):
count[i] = count[i]/num
return count
![Page 12: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:](https://reader030.vdocuments.site/reader030/viewer/2022032607/56649ed35503460f94be36e4/html5/thumbnails/12.jpg)
Base Counting (in Notepad)
def baseFreq(dna): count = [0.0,0.0] num = 0 length = len(dna) for i in range(0,length): if dna[i:i+1] == 'A': count[0] = count[0]+1 elif dna[i:i+1] == 'C': count[1] = count[1]+1 elif dna[i:i+1] == 'T': count[2] = count[2]+1 elif dna[i:i+1] == 'G': count[3] = count[3]+1 else: num=num num = num+1 for i in range(0,4): count[i] = count[i]/num return count
##### main() function #############dataFile = input('Enter a DNA file name\n')fp = open(dataFile)lines = fp.readlines()dnaStr = ''.join(lines)dnaStr = dnaStr.replace('\n', '')
freq = basePairFreq(dnaStr)print(freq)