genomic signal processing dr. c.q. chang dept. of eee
Post on 21-Dec-2015
230 views
TRANSCRIPT
![Page 1: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/1.jpg)
Genomic Signal Processing
Dr. C.Q. Chang
Dept. of EEE
![Page 2: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/2.jpg)
Outline
• Basic Genomics
• Signal Processing for Genomic Sequences
• Signal Processing for Gene Expression
• Resources and Co-operations
• Challenges and Future Work
![Page 3: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/3.jpg)
Basic Genomics
![Page 4: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/4.jpg)
Genome• Every human cell contains 6 feet of double stranded (ds) DNA• This DNA has 3,000,000,000 base pairs representing 50,000-
100,000 genes• This DNA contains our complete genetic code or genome• DNA regulates all cell functions including response to disease,
aging and development• Gene expression pattern: snapshot of DNA in a cell• Gene expression profile: DNA mutation or polymorphism over
time• Genetic pathways: changes in genetic code accompanying
metabolic and functional changes, e.g. disease or aging.
![Page 5: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/5.jpg)
![Page 6: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/6.jpg)
Gene: protein-coding DNA
Protein
mRNA
DNA
transcription
translation
CCTGAGCCAACTATTGATGAA
PEPTIDE
CCUGAGCCAACUAUUGAUGAA
![Page 7: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/7.jpg)
In more detail(color ~state)
![Page 8: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/8.jpg)
Signal Processing for Genomic Sequences
![Page 9: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/9.jpg)
The Data Set
![Page 10: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/10.jpg)
The Problem• Genomic information is digital letters A, T, C and G• Signal processing deals with numerical sequences,
character strings have to be mapped into one or more numerical sequences
• Identification of protein coding regions• Prediction of whether or not a given DNA segment
is a part of a protein coding region• Prediction of the proper reading frame• Comparing to traditional methods, signal processing
methods are much quicker, and can be even more accurate in some cases.
![Page 11: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/11.jpg)
Sequence to signal mapping
1 , 1 , 1 , 1a j t j c j g j
[ ] [ ] [ 1] / 2 [ 2] / 4y n x n x n x n
![Page 12: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/12.jpg)
Signal Analysis
• Spectral analysis (Fourier transform, periodogram)
• Spectrogram
• Wavelet analysis
• HMT: wavelet-based Hidden Markov Tree
• Spectral envelope (using optimal string to numerical value mapping)
![Page 13: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/13.jpg)
Spectral envelope of the BNRF1 gene from the Epstein-Barr virus
(a) 1st section (1000bp), (b) 2nd section (1000bp),
(c) 3rd section (1000bp), (d) 4th section (954bp)
Conjecture: the 4th quarter is actually non-coding
![Page 14: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/14.jpg)
Signal Processing for Gene Expression
![Page 15: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/15.jpg)
Biological Question
Sample preparationMicroarray
Life Cycle
Data Analysis & Modeling
Microarray Reaction
MicroarrayDetection
Taken from Schena & Davis
![Page 16: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/16.jpg)
cDNA clones(probes)
PCR product amplificationpurification
printing
microarray Hybridise target to microarray
mRNA target)
excitation
laser 1laser 2
emission
scanning
analysis
overlay images and normalise
0.1nl/spot
![Page 17: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/17.jpg)
![Page 18: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/18.jpg)
Image Segmentation
• Simple way: fixed circle method• Advanced: fast marching level set segmentation
Advanced Fixed circle
![Page 19: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/19.jpg)
Clustering and filtering methodsPrincipal approaches:• Hierarchical clustering (kdb trees, CART, gene shaving)• K-means clustering• Self organizing (Kohonen) maps• Vector support machines• Gene Filtering via Multiobjective Optimization• Independent Component Analysis (ICA)Validation approaches:• Significance analysis of microarrays (SAM)• Bootstrapping cluster analysis• Leave-one-out cross-validation• Replication (additional gene chip experiments, quantitative PCR)
![Page 20: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/20.jpg)
ICA for B-cell lymphoma data
Data: 96 samples of normal and malignant lymphocytes.
Results: scatter-plotting of 12 independent components
Comparison: close related to results of hierarchical clustering
![Page 21: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/21.jpg)
Resources and Co-operations
Resources: databases on the internet such as
• GeneBank
• ProteinBank
• Some small databases of microarray data
Co-operations in need:
• First hand microarray data
• Biological experiment for validation
![Page 22: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d5e5503460f94a3df56/html5/thumbnails/22.jpg)
Challenges and Future Work• Genomic signal processing opens a new signal
processing frontier• Sequence analysis: symbolic or categorical signal,
classical signal processing methods are not directly applicable
• Increasingly high dimensionality of genetic data sets and the complexity involved call for fast and high throughput implementations of genomic signal processing algorithms
• Future work: spectral analysis of DNA sequence and data clustering of microarray data. Modify classical signal processing methods, and develop new ones.