chip arrays and gene expression data. motivation
TRANSCRIPT
![Page 1: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/1.jpg)
Chip arrays and gene
expression data
Chip arrays and gene
expression data
![Page 2: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/2.jpg)
Motivation
Motivation
![Page 3: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/3.jpg)
With the chip array technology, one can measure the expression of all genes at once (even all exons). Can answer questions such as:
1.Which genes are expressed in a muscle cell?
2.Which genes are expressed during the first weak of pregnancy in the mother? In the new baby?
3.Which genes are expressed in cancer?
![Page 4: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/4.jpg)
4. If one mutates a TF: which genes are not expressed following this change?
5. Which genes are not expressed in the brain of a retarded baby?
6. Which genes are expressed when one is asleep versuswhen the same personis awake?
![Page 5: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/5.jpg)
Analyzing Output
Analyzing Output
![Page 6: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/6.jpg)
Output
w.tBrain tumor
males
Brain tumor
females
Gene 1
Gene 2
Gene 3
Gene 25,000
Each cell is either an absolute number or a relative one, depending on the technology used.
![Page 7: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/7.jpg)
Repeats
w.tBrain tumor
male1
Brain tumor
male2
Brain tumor
female1
Gene 1
Gene 2
Gene 3
Gene 25,000
The repeat can either be the same sample – a different chip or a “real” biological repeat – a different sample.
![Page 8: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/8.jpg)
Expression profile
wt1wt2wt3wt4bt1bt2bt3bt4
g1435415161723
g275466379
g3232525263060
Genes 1 and 3 show the same trend (go both high under the same conditions). That is: they have the same expression profile.
![Page 9: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/9.jpg)
Clustering
wt1
wt2
wt3wt4bt1bt2bt3bt4
g1435415161723
g275466379
g3232525263060
In general, we want to find all the genes that share the same expression profile → suggestive of a functional linkage.
There are clustering algorithms, which do exactly that.
![Page 10: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/10.jpg)
Clustering
wt1
wt2
wt3wt4bt1bt2bt3bt4
g14354022023
g275460809
g32325601661
Clustering of the conditions can suggest two types of brain tumor (bt)
![Page 11: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/11.jpg)
Clustering
wt1
wt2
wt3wt4bt1bt2bt3bt4
g14354022023
g275460809
g323256173
Bi-clustering: both on the conditions and the genes.
![Page 12: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/12.jpg)
Applications
Applications
![Page 13: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/13.jpg)
Applications
Think of increasing the glucose concentration of E.coli and making a chip array in various concentration.
One can potentially discover allgenes in the glucose pathway.
Knocking out a gene → discoverall genes that interact with it.
![Page 14: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/14.jpg)
Applications
Analyzing expression of genes can help reveal the gene network of a given organism.
![Page 15: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/15.jpg)
Gene network
![Page 16: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/16.jpg)
Clinical
/
g111
g24
g30
Do someone has a brain tumor?
wt1
wt2
wt3wt4bt1bt2bt3bt4
g14354022023
g275460809
g32325601661
![Page 17: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/17.jpg)
MammaPrint
Used to assess the risk that a breast tumor will spread to other parts of the body (metastasis). It is based on the well-known 70-gene breast cancer gene signature
In February, 2007 the FDA cleared the MammaPrint test for use in the U.S
![Page 18: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/18.jpg)
Sequence by hybridization
It was thought that the following procedure could work for sequencing a genome:
1.Make a chip containing all x mers (e.g., x = 25).2.Hybridize a genome to the chip.3.By analyzing all the hybridizations with their overlaps – assemble the genome.
Problem: it doesn’t work.
![Page 19: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/19.jpg)
ChIP-on-chip : A method for measuring protein-DNA interaction.
Proteins that bind DNA includes:
Those responsible for transcription regulation
Transcription factors (TFs)
Replication proteins
Histones…
![Page 20: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/20.jpg)
ChIP-on-chip: One chip is for Chromatin ImmunoPrecipitation and the second chip is for DNA microarrays.
The method is used mostly to detect TF binding sites.
![Page 21: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/21.jpg)
Tiling arrays
Here the chip array should include not only protein coding genes but also control regions, or simply – the entire genome.
![Page 22: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/22.jpg)
Deep sequencing reads
Yoder-Himes D.R. et al. PNAS (2009)
![Page 23: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/23.jpg)
Machine learning
Learning mode on.Bioinfo is great.
![Page 24: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/24.jpg)
Clustering
![Page 25: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/25.jpg)
Clustering (of expression data)
UPGMA is one such direct method, receiving as input a distance matrix and giving as output an ultrametric tree.
It was suggested by Sokal and Michener (1958).
![Page 26: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/26.jpg)
Clustering (of expression data)
Often, there is a one-to-one transformation between the data and points in space.
For example, expression of all genes under a specific condition is a point:
Condition 1
Gene 15
Gene 27
Gene 32
Gene 2000054
(5,7,2,…, 54) a point in a space of dimension 20,000.
![Page 27: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/27.jpg)
Clustering (of expression data)
Another example, each expression profile is a point in a space whose dimension is the number of conditions
Condition 1
Condition 2
Condition 3
Condition 4
Gene 15020433
(50,20,4,33) a point in a space of dimension 4
![Page 28: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/28.jpg)
In space: each point is a gene
Condition 1
Condition 2 g1
![Page 29: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/29.jpg)
Our goal will be to cluster genes
Condition 1
Condition 2
Genes that are in the same cluster (show similar patterns of expression) are likely to be functionally related.
![Page 30: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/30.jpg)
Distance between two expression profiles
The Euclidian distance =
Condition 1
Condition 2
Condition 3
Condition 4
Gene 15020433
Gene 23020331
22222,1 )3133()34()2020()3050( d
![Page 31: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/31.jpg)
Distance between two expression profiles
We can compute the distances between each pair of expression profiles and obtain a distance table.
Condition 1
Condition 2
Condition 3
Condition 4
Gene 15020433
Gene 23020331
Gene 33020331
Gene 43020331
![Page 32: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/32.jpg)
The distance table
g1g2g3g4g5g6g7g8
g10324851504898148g202634293384136g3042444492152g40443886142g502489142g6090142g70148g80
![Page 33: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/33.jpg)
The distance table
g1g2g3g4g5g6g7g8
g10324851504898148g202634293384136g3042444492152g40443886142g502489142g6090142g70148g80
![Page 34: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/34.jpg)
Starting tree
g5 g6
We call the father node of g5 and g6 -- “g56”.
g56
![Page 35: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/35.jpg)
Removing the g5 and g6 rows and columns,and adding the g56 row and column
g1g2g3g4g56g7g8
g10324851?98148g202634?84136g3042?92152g40?86142
g56089142g70148g80
![Page 36: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/36.jpg)
Computing distances
g1g2g3g4g5g6g7g8
g10324851504898148
( 56, 1)
1 1( 5, 1) ( 6, 1)
2 249
D g g
D g g D g g
![Page 37: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/37.jpg)
The updated table. Starting the second iteration…
g1g2g3g4g56g7g8
g103248514998148g2026343184136g30424492152g404186142
g56089142g70148g80
![Page 38: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/38.jpg)
Building the tree - Continued
We call the father node of g2 and g3 -- “g23”.
g5 g6
g56
g2 g3
g23
![Page 39: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/39.jpg)
Computing distances
g1g2g3g4g56g7g8
g5649314441089142
( 23, 56)
1 1( 2, 56) ( 3, 56)
2 237.5
D g g
D g g D g g
![Page 40: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/40.jpg)
The updated table. Starting a new iteration…
g1g23g4g56g7g8
g1040514998148g2303837.588144
g404186142g56089142
g70148g80
![Page 41: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/41.jpg)
Tree
g5 g6
g56
g2 g3
g2356
g23
![Page 42: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/42.jpg)
Computing distances
g1g23g4g56g7g8
g1040514998148
( 2356, 1)
1 1( 23, 1) ( 56, 1)
2 244.5
D g g
D g g D g g
![Page 43: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/43.jpg)
Starting a new iteration…
g1g2356g4g7g8
g1044.55198148g2356039.588.75143
g4086142g70148g80
![Page 44: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/44.jpg)
Building the tree
g5 g6
g56
g2 g3
g2356
g23
g4
g23456
![Page 45: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/45.jpg)
Computing distances
g1g2356g4g7g8
g1044.55198148
( 23456, 1)
4 1( 2356, 1) ( 4, 1)
5 545.8
D g g
D g g D g g
![Page 46: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/46.jpg)
Starting an additional iteration…
g1g23456g7g8
g1045.898148g23456088.2142.8
g70148g80
![Page 47: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/47.jpg)
Constructing the tree
g5 g6
g56
g2 g3
g2356
g23
g4
g123456
g1
g23456
![Page 48: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/48.jpg)
One more iteration…
g123456g7g8
g123456089.833143.66g70148g80
![Page 49: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/49.jpg)
Reconstructing the tree
g5 g6
g56
g2 g3
g2356
g23
g4
g1234567
g1
g23456
g7
g123456
![Page 50: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/50.jpg)
The new table
g1234567g8
g12345670144.2857g80
![Page 51: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/51.jpg)
Resulting tree
g5 g6
g56
g2 g3
g2356
g23
g4
g123456
g1
g23456
g7
g1234567
g8
![Page 52: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/52.jpg)
From tree to clusters
g5 g6 g2 g3 g4 g1g7
g8
If we want two clusters, we will cut here, and obtain g8 versus g1-7.
![Page 53: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/53.jpg)
From tree to clusters
g5 g6 g2 g3 g4 g1g7
g8
If we want 3 clusters, we will cut here, and obtain g8,g7, and g1-6.
![Page 54: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/54.jpg)
From tree to clusters
g5 g6 g2 g3 g4 g1g7
g8
The 4 clusters are: g8,g7,g1,g23456
![Page 55: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/55.jpg)
Classification
Condition 1
Condition 2
Gene 15020
Gene 23020
Gene 33020
Gene 43020
Gene 1
Gene 2
?
If red = brain tumor and yellow healthy – do I have a brain tumor?
![Page 56: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/56.jpg)
Gene 1
Gene 2
?
In SVM we find a (hyper)plane that divides the space in two.
SVM = support vector machine
Condition 1
Condition 2
Gene 15020
Gene 23020
Gene 33020
Gene 43020
![Page 57: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/57.jpg)
Gene 1
Gene 2
?
The further the point is from the separating (hyper)plane, the more confident we are in the classification
SVM – confidence in classification
![Page 58: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/58.jpg)
Gene 1
Gene 2
?
Sometimes we cannot perfectly separate the training data. In this case, we will find the best separation.
SVM – cannot always perfectly classify
![Page 59: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/59.jpg)
KNN = k nearest neighbors
Gene 1
Gene 2
?
KNN is another method for classification. For each point it looks at its k nearest neighbors.
If red = brain tumor and yellow healthy – do I have a brain tumor?
![Page 60: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/60.jpg)
Gene 1
Gene 2
?
For each point it looks at its k nearest neighbors. For example, the method with k=3 looks at points 3 nearest neighbors to decide how to classify it. If the majority are “Red” it will classify the point as red.
If red = brain tumor and yellow healthy – do I have a brain tumor?
KNN = k nearest neighbors
![Page 61: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/61.jpg)
Gene 1
Gene 2
?
KNN is better than SVM for the above case.
If red = brain tumor and yellow healthy – do I have a brain tumor?
KNN = k nearest neighbors
![Page 62: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/62.jpg)
In the above example – how will the point be classified in KNN with K=1? In SVM?
Gene 1
Gene 2
?
KNN - exercise
![Page 63: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/63.jpg)
Training dataset
Gene 1
Gene 2
?
The red and yellow points are used to train the classifier.
The more training data one has -> the better the classifier will perform.
![Page 64: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/64.jpg)
Test dataset
Gene 1
Gene 2
?
Usually some points for which we know the answer are not given to the classifier and are used to TEST its performance.
![Page 65: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/65.jpg)
Decision tree
AgeGene1Gene2SmokerOperation
>20highhighyesyes
>20highhighyesyes
>20lowlownono
[20,40]lowhighyesyes
[20,40]highhighnoyes
[20,40]highlowyesno
>40lowlowyesno
>40highlownono
>40lowhighnono
![Page 66: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/66.jpg)
Decision tree
Age >40
Operation = no
Yes No
Gene 2
high low
Operation = yes
Operation = no
Decision trees are automatically built from “train data” and are used for classification.
They also tell us which features are most important.
![Page 67: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/67.jpg)
Voting
Decision trees
Training data that need a classification algorithm (Yes/No)
Voting uses an array of machine learning algorithms and chooses the classification suggested by most classifiers.
KNN SVMTrain:
New datum(Test) No YesYes
YES
![Page 68: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/68.jpg)
Classification is used outside the scope of bioinformatics
The distance between the query and each point in the dataset is computed. Based on the identity of the k nearest members, the digit is identified.
*More advanced algorithms allow rotation and enlargement of the digit to be classified.
![Page 69: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/69.jpg)
UPGMA - exercise
x12x34
x12015x340
In the above example – how will the point be clustered using UPGMA?
x1x2x3x4
x1021230x20810x304x40
x12x3x4
x1201020x304x40
![Page 70: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/70.jpg)
Dataset sizes
A classifier is needed to detect “Pupko disease” based on gene expression.
Pupko disease is extremely rare (say, it inflicts 1 out of 100000 people).
A classifier was trained on a large volume of samples in which all cases are negative. On a test dataset it correctly classified 99.9% of the cases…
the fraction of positive cases in the test data is only :"לא חוכמה"~0.01%.
Take home message: (1) better to train classifier on ~equal number of “positive” and “negative” cases.
(2) Reporting only “% accurate classifications” is not enough. One has to report both FP ,FN, TP, TN (in this example, all positive are FP FALSE POSITIVE RATE OF 100%).
![Page 71: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/71.jpg)
Exercises - examples
, 7 היה T לגן X. המרחק בין גן Y וגן X איחדתי את גן UPGMA ע"י Clusteringב . אלו מהמשפטים הבאים נכון?9 היה T לגן Yוהמרחק בין גן
.8 הוא T ל Y ו Xהמרחק בין הקבוצה שמאחדת את גנים •
כי לא נתון המרחק בין T ל Y ו Xאי אפשר לחשב את המרחק בין האיחוד של •X ל Y.
. 7 קטן מ Y וגן Xהמרחק בין גן •
א'+ב'.•
א'+ג'. •
ב'+ג'.•
א'+ב'+ג'.•
אף תשובה אינה נכונה.•
![Page 72: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/72.jpg)
Exercises - examples
, 7 היה T לגן X. המרחק בין גן Y וגן X איחדתי את גן UPGMA ע"י Clusteringב . אלו מהמשפטים הבאים נכון?9 היה T לגן Yוהמרחק בין גן
.8 הוא T ל Y ו Xהמרחק בין הקבוצה שמאחדת את גנים •
כי לא נתון המרחק בין T ל Y ו Xאי אפשר לחשב את המרחק בין האיחוד של •X ל Y.
. 7 קטן מ Y וגן Xהמרחק בין גן •
א'+ב'.•
א'+ג'. •
ב'+ג'.•
א'+ב'+ג'.•
אף תשובה אינה נכונה.•
![Page 73: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/73.jpg)
Exercises - examples
. אלו מהמשפטים הבאים נכון?23
ככל שמרחק בין הנקודה שרוצים לסווג למשטח המפריד קטן יותר – SVMב א-הסיכוי שהסיווג שגוי קטן יותר.
תמיד כל הנקודות מסוג א' הן בצד אחד וכל הנקודות מסוג ב' הן בצד SVMב ב-השני.
שיסווג חלבונים לטרנס-ממברנליים ולכאלה שלא. SVMניתן לפתח ג-
אף תשובה אינה נכונה.ד-
![Page 74: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/74.jpg)
Exercises - examples
. אלו מהמשפטים הבאים נכון?23
ככל שמרחק בין הנקודה שרוצים לסווג למשטח המפריד קטן יותר – SVMב א-הסיכוי שהסיווג שגוי קטן יותר.
תמיד כל הנקודות מסוג א' הן בצד אחד וכל הנקודות מסוג ב' הן בצד SVMב ב-השני.
שיסווג חלבונים לטרנס-ממברנליים ולכאלה שלא.SVMניתן לפתח ג-
אף תשובה אינה נכונה.ד-
![Page 75: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/75.jpg)
Exercises - examples
. נתון האיור הבא:24
אלו מהמשפטים הבאים נכון?
(ליניארי) הנקודה עם הסימן שאלה תסווג להיות נקודה שחורה.SVMלפי א-
כשמספר השכנים שווה אחד, הנקודה עם הסימן שאלה תסווג להיות KNNלפי ב-נקודה לבנה.
א'+ב'ג-
אף תשובה אינה נכונהד-
Gene 1
Gene 2
?
![Page 76: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/76.jpg)
Exercises - examples
. נתון האיור הבא:24
אלו מהמשפטים הבאים נכון?
(ליניארי) הנקודה עם הסימן שאלה תסווג להיות נקודה שחורה.SVMלפי א-
כשמספר השכנים שווה אחד, הנקודה עם הסימן שאלה תסווג להיות KNNלפי ב-נקודה לבנה.
א'+ב'ג-
אף תשובה אינה נכונהד-
Gene 1
Gene 2
?
![Page 77: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/77.jpg)
Legionalla pneumophilacase-study
Legionalla pneumophilacase-study
![Page 78: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/78.jpg)
How did it all begin? How did it all begin?
Legionella pneumophila
![Page 79: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/79.jpg)
Legionnaire disease nowadaysLegionnaire disease nowadays
Legionella pneumophila
![Page 80: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/80.jpg)
Legionella pneumophila Legionella pneumophila
Legionella pneumophila
Copyright © 2005 Nature Publishing Group. Created by Arkitek from Nature Reviews Microbiology
![Page 81: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/81.jpg)
Identifying the effectorsIdentifying the effectors
Legionella pneumophila
![Page 82: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/82.jpg)
Homology to host proteins
Regulatory
elements
Genome proximity to
other effectors
Secretion signalAbundance in Metazoa / Bacteria
GC contentSequence homology
The featuresThe features
Legionella pneumophila
![Page 83: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/83.jpg)
The effectors machineThe effectors machine
5
5
Legionella pneumophila
![Page 84: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/84.jpg)
The big pictureThe big pictureSimilarity to
known effectors
Regulatory elements
Features
Similarity tohost proteins
G-C content
Secretory signals
Feature selection
NN
SVMNaïve Bayes
Bayesian Net
Voting
Classification algorithms
Experimentalvalidation
Predictedeffectors
Prior knowledge
Trainedmodel
Unclassifiedgenes
Predictednon-effectors
Newly validatedeffectors
Non-effectors
Validatedeffectors
Abundance in Metazoa\Bacteria
Genome arrangement
Legionella pneumophila
![Page 85: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/85.jpg)
Does it really work??Does it really work??
Machine learning
![Page 86: Chip arrays and gene expression data. Motivation](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c755503460f94928baa/html5/thumbnails/86.jpg)