ponnapalli phd thesis
TRANSCRIPT
![Page 1: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/1.jpg)
Copyright
by
Sri Priya Ponnapalli
2010
![Page 2: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/2.jpg)
The Dissertation Committee for Sri Priya Ponnapallicertifies that this is the approved version of the following dissertation:
Higher-Order Generalized
Singular Value Decomposition:
Comparative Mathematical Framework with
Applications to Genomic Signal Processing
Committee:
Orly Alter, Supervisor
Joydeep Ghosh, Supervisor
William Beckner
Constantine Caramanis
Brian L. Evans
Charles Van Loan
![Page 3: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/3.jpg)
Higher-Order Generalized
Singular Value Decomposition:
Comparative Mathematical Framework with
Applications to Genomic Signal Processing
by
Sri Priya Ponnapalli, B.E.; M.S.
DISSERTATION
Presented to the Faculty of the Graduate School of
The University of Texas at Austin
in Partial Fulfillment
of the Requirements
for the Degree of
DOCTOR OF PHILOSOPHY
THE UNIVERSITY OF TEXAS AT AUSTIN
August 2010
![Page 4: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/4.jpg)
Dedicated to my parents
Uma Devi
and
Kamalakar Rao Ponnapalli.
![Page 5: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/5.jpg)
Acknowledgments
There are many people I’d like to thank for having made my time at
UT and in Austin most wonderful. First and foremost, I thank my advisor
Professor Orly Alter. She is most amazing and has been very supportive and
encouraging of me through some difficult times. Prof. Orly is one of my role
models and I have learnt a lot from her both personally and professionally.
I admire her patience and persistence and her strong emphasis on quality. I
am also greatly inspired by her encouragement of young women in science and
technology. I consider myself very lucky to have had the opportunity to work
with her.
Next, I thank the members of my thesis committee: my co-supervisor
Professor Joydeep Ghosh, Professor William Beckner, Professor Constantine
Caramanis, Professor Brian L. Evans and Professor Charles Van Loan, for
their guidance and support throughout the course of my graduate studies. I
have learnt a lot from the classes I took with them and each of those classes
have inspired me in some way.
I thank my collaborators, coauthors and friends Professor Gene Golub,
Professor Michael Saunders and Professor Charles Van Loan. It has been an
honor working with them and I have learnt so much from these experts in
linear algebra. They have been very kind and encouraging of my work and
this has always kept me motivated.
I thank my lab mates and friends Justin Drake, Andy Gross, Cheng
H. Lee, Chaitanya Muralidhara, Larsson Omberg, and Daifeng Wang for their
v
![Page 6: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/6.jpg)
companionship and technical assistance. I thank Melanie Gulick for her pa-
tience and kindness in guiding me through the often confusing forms and pro-
cedures. I thank Keith Kachtick of Dharma yoga for his spiritual guidance.
I’d now like to thank my best friends and support system. Their friend-
ship is invaluable to me. I thank Chase S. Krumpelman for everything and
just being there for me, always. I thank Apurva Sarathy also for everything
and standing by me through everything. I thank Sidney D. Rosario for the
great friend and companion he has been to me during the testing phase of my
Ph.D. completion. I thank David Parr, for standing by me and comforting me
through some difficult times, Richard Naething, my first friend in the US for
the camaraderie, and Abhishek Ramkumar, for his timely and valuable advise
and support. I thank Shruti Tiwari, Christine Vogel, Ramya Bhagavatula,
Adithi Rao, Sanaa Ansari, and Shahela Sajanlal for always lending a patient
ear and a supportive arm. And all my friends, for just being great fun.
Finally, I’d like to thank my family. I thank my grand aunt Sudha atha
and grand uncle Babu mama for their love and encouragement. My aunt and
uncle, Padmasree and Mohan Warrior for taking care of me and supporting
me through my difficult times. Special thanks to Mohan uncle for teaching me
the 5 C’s to face any situation: calm, cool, collected, confident and composed,
along with 10 deep breaths.
To my parents, I dedicate this thesis. My mother Uma Devi is and
always has been the driving force behind everything I do. Her interest and
encouragement of my education is what has brought me to where I am. I
thank my father Kamalakar Rao for always supporting and encouraging my
bold decisions and making me believe I could do anything I wanted to. I owe
them everything.
vi
![Page 7: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/7.jpg)
Higher-Order Generalized
Singular Value Decomposition:
Comparative Mathematical Framework with
Applications to Genomic Signal Processing
Publication No.
Sri Priya Ponnapalli, Ph.D.
The University of Texas at Austin, 2010
Supervisors: Orly AlterJoydeep Ghosh
The number of high-dimensional datasets recording multiple aspects
of a single phenomenon is ever increasing in many areas of science. This is
accompanied by a fundamental need for mathematical frameworks that can
compare data tabulated as multiple large-scale matrices of different numbers
of rows. The only such framework to date, the generalized singular value
decomposition (GSVD), is limited to two matrices.
This thesis mathematically defines a higher-order GSVD (HO GSVD)
of N ≥ 2 datasets, tabulated as N matrices Di ∈ Rmi×n of the same number
of columns and, in general, different numbers of rows. Each data matrix Di is
assumed to have full column rank and is factored as the product Di = UiΣiVT ,
where the columns of Ui ∈ Rmi×n and V ∈ Rn×n have unit length and are
the left and right basis vectors respectively, and each Σi ∈ Rn×n is diagonal
vii
![Page 8: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/8.jpg)
and positive definite. Identical in all factorizations, V is obtained from the
eigensystem SV = V Λ of the arithmetic mean S of all pairwise quotients
AiA−1j , i 6= j, of the matrices Ai = DT
i Di. The kth diagonal of Σi = diag(σi,k)
indicates the significance of the kth right basis vector vk in the ith matrix Di
in terms of the overall information that vk captures in Di. The ratio σi,k/σj,k
indicates the significance of vk in Di relative to its significance in Dj. We
assume that the eigensystem of S exists, and that V is real and not severely
ill-conditioned. We show that when a left basis vector ui,k is orthonormal
to all other left basis vectors in Ui for all i, this new decomposition extends
to higher order all of the mathematical properties of the generalized singular
value decomposition: the corresponding right basis vector vk is an eigenvector
of all pairwise quotients AiA−1j , the corresponding eigenvalue of S satisfies
λk ≥ 1, and an eigenvalue λk = 1 corresponds to a right basis vector vk of
equal significance in all matrices Di and Dj (meaning σi,k/σj,k = 1 for all i
and j).
The second part of the thesis illustrates this HO GSVD with a compar-
ison of cell-cycle mRNA expression from three disparate organisms: S. pombe,
S. cerevisiae and human. Comparative analyses of global mRNA expression
from multiple model organisms promise to enhance fundamental understanding
of the universality and specialization of molecular biological mechanisms, and
may prove useful in medical diagnosis, treatment and drug design. Existing
algorithms limit analyses to subsets of homologous genes among the different
organisms, effectively introducing into the analysis the assumption that se-
quence and functional similarities are equivalent. For sequence-independent
comparisons, mathematical frameworks are required that can distinguish the
similar from the dissimilar among multiple large-scale datasets tabulated as
viii
![Page 9: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/9.jpg)
matrices of different numbers of rows, corresponding to the different sets of
genes of the different organisms. Unlike existing algorithms, a mapping among
the genes of these disparate organisms is not required in our HO GSVD frame-
work, enabling correct classification of genes of highly conserved sequences but
significantly different cell-cycle peak times.
The last part of this thesis further investigates the mathematical prop-
erties of the HO GSVD and removes the assumption that the eigensystem of S
exists and that V is real and not severely ill-conditioned. The the eigenvalues
of S satisfy λk ≥ 1. We show that when an eigenvalue of S satisfies λk = 1, the
corresponding eigenvector vk is an eigenvector of sums of pairwise quotients(N∑
i=1
Ai
)A−1
j , that it satisfies vTk A−1j vk ≥ σ−2
j for j = 1, . . . , N and that we
can compute the corresponding eigenvector vk from the Di without the need
for computing any inverse. We comment on certain conditions under which
the kth left basis vectors ui,k are orthonormal to all other left basis vectors in
Ui for all i.
ix
![Page 10: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/10.jpg)
Table of Contents
Acknowledgments v
Abstract vii
List of Tables xiii
List of Figures xiv
Chapter 1. Introduction 1
1.1 Generalized Singular Value Decomposition (GSVD) . . . . . . 4
1.1.1 GSVD for Comparison of Global mRNA Expression Datasetsfrom Two Different Organisms . . . . . . . . . . . . . . 4
1.2 Thesis Outline and Contributions . . . . . . . . . . . . . . . . 7
1.2.1 Chapter 2: HO GSVD Mathematical Framework . . . . 7
1.2.2 Chapter 3: HO GSVD for Comparison of Global mRNAExpression Datasets from Three Different Organisms . . 9
1.2.3 Chapter 4: Additional Investigation of the MathematicalProperties of the HO GSVD . . . . . . . . . . . . . . . 10
1.2.4 Chapter 5: Summary and Future Work . . . . . . . . . 11
Chapter 2. Mathematical Framework: HO GSVD 12
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 The Matrix GSVD . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Construction of the matrix GSVD. . . . . . . . . . . . . 14
2.2.2 Interpretation of the GSVD construction. . . . . . . . . 16
2.2.3 Mathematical properties of Λ and V . . . . . . . . . . . . 17
2.3 Higher-order generalized singular value decomposition (HO GSVD) 20
2.3.1 Construction of the HO GSVD. . . . . . . . . . . . . . . 20
2.3.2 Interpretation of the HO GSVD construction. . . . . . . 22
x
![Page 11: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/11.jpg)
2.3.3 Mathematical properties of Λ and V . . . . . . . . . . . . 23
2.4 The Matrix GSVD and SVD as Special Cases . . . . . . . . . 25
2.5 Role in Approximation Algorithms . . . . . . . . . . . . . . . . 27
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Chapter 3. HO GSVD for Comparison of Global mRNA Ex-pression Datasets from Three Different Organisms 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 S. pombe, S. cerevisiae and Human Datasets . . . . . . . . . . 33
3.3 Missing Data Estimation . . . . . . . . . . . . . . . . . . . . . 34
3.4 Common HO GSVD Subspace . . . . . . . . . . . . . . . . . . 35
3.5 Common Subspace Interpretation . . . . . . . . . . . . . . . . 40
3.6 HO GSVD Data Reconstruction . . . . . . . . . . . . . . . . . 42
3.7 Simultaneous HO GSVD Classification . . . . . . . . . . . . . 42
3.8 Sequence-Independence of the Classification . . . . . . . . . . 47
3.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Chapter 4. Additional Investigation of the Mathematical Prop-erties of the HO GSVD 56
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Mathematical Properties of the Matrix GSVD . . . . . . . . . 57
4.3 Mathematical Properties of the HO GSVD . . . . . . . . . . . 59
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Chapter 5. Summary and Future Work 68
5.1 Summary of HO GSVD Mathematical Framework . . . . . . . 68
5.2 Future Direction: A stable HO GSVD Algorithm . . . . . . . . 70
5.2.1 A Stable GSVD Algorithm . . . . . . . . . . . . . . . . 70
5.2.2 Extending this Algorithm to N > 2 Matrices . . . . . . 70
5.3 Summary of HO GSVD for Comparison of Global mRNA Ex-pression Datasets from Three Different Organisms . . . . . . . 72
5.4 Future Direction: Additional Applications of the HO GSVD . 74
5.5 Summary of Further Investigation of the Mathematical Proper-ties of the HO GSVD . . . . . . . . . . . . . . . . . . . . . . . 75
5.6 Future Direction: Orthogonality of the Left Basis Vectors Ui . 76
5.7 Are Genetic Networks Linear and Orthogonal? . . . . . . . . . 76
xi
![Page 12: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/12.jpg)
Appendices 78
Appendix A. Global mRNA Expression Datasets 79
Appendix B. Matrix Representation of an mRNA ExpressionDataset 82
Appendix C. The Cell-Cycle 84
Appendix D. De Lathauwer’s Generalized GSVD 86
D.1 Alternating Least Squares Algorithm . . . . . . . . . . . . . . 86
D.2 Updating {Ui} . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
D.3 Updating {Di} and {V } . . . . . . . . . . . . . . . . . . . . . 87
Appendix E. Mathematica Notebook and Datasets 88
E.1 Mathematica Notebook . . . . . . . . . . . . . . . . . . . . . . 88
E.2 Supplementary Dataset 1 . . . . . . . . . . . . . . . . . . . . . 88
E.3 Supplementary Dataset 2 . . . . . . . . . . . . . . . . . . . . . 88
E.4 Supplementary Dataset 3 . . . . . . . . . . . . . . . . . . . . . 89
Appendix F. Poincare’s Inequality and Weyl’s Inequalities 90
F.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
F.2 Poincare’s Inequality . . . . . . . . . . . . . . . . . . . . . . . 91
F.3 Weyl’s Inequalites . . . . . . . . . . . . . . . . . . . . . . . . . 92
Bibliography 93
Vita 102
xii
![Page 13: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/13.jpg)
List of Tables
3.1 Probabilistic significance of the enrichment of the arraylets thatspan the common HO GSVD subspace in each dataset in over-expressed or underexpressed cell cycle-regulated genes. . . . . 41
3.2 The closest S. pombe, S. cerevisiae and human homologs of(a) the S. pombe gene BFR1, (b) the S. cerevisiae gene PLB1,and (c) the S. pombe gene CIG2, as determined by an NCBIBLAST[7] of the protein sequence that corresponds to each geneagainst the NCBI RefSeq database [46] . . . . . . . . . . . . . 54
xiii
![Page 14: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/14.jpg)
List of Figures
1.1 The price of a stock in different financial markets is a functionof “global factors” and “local factors” . . . . . . . . . . . . . . 2
1.2 Generalized singular value decomposition . . . . . . . . . . . . 5
2.1 The GSVD of two matrices D1 and D2 . . . . . . . . . . . . . 15
2.2 The HO GSVD of three matrices D1, D2, and D3 . . . . . . . 21
3.1 Higher-order generalized singular value decomposition . . . . . 36
3.2 The genelets, i.e., the HO GSVD patterns of expression varia-tion across time. . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Correlations among the n=17 arraylets in each organism. . . 39
3.4 The three-dimensional least-squares approximation of the five-dimensional approximately common HO GSVD subspace. . . . 43
3.5 Simultaneous HO GSVD reconstruction and classification of S.pombe, S. cerevisiae and human global mRNA expression in theapproximately common HO GSVD subspace. . . . . . . . . . . 44
3.6 S. pombe global mRNA expression reconstructed in the five-dimensional common HO GSVD subspace with genes sortedaccording to their phases in the two-dimensional subspace thatapproximates it. . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.7 S. cerevisiae global mRNA expression reconstructed in the five-dimensional common HO GSVD subspace with genes sortedaccording to their phases in the two-dimensional subspace thatapproximates it. . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.8 Human global mRNA expression reconstructed in the five-dimensionalcommon HO GSVD subspace with genes sorted according totheir phases in the two-dimensional subspace that approximatesit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.9 Simultaneous HO GSVD sequence-independent classification ofgenes of significantly different cell-cycle peak times but highlyconserved sequences. . . . . . . . . . . . . . . . . . . . . . . . 53
A.1 Flow of information in a cell . . . . . . . . . . . . . . . . . . . 80
xiv
![Page 15: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/15.jpg)
B.1 Matrix representation of an mRNA expression dataset . . . . . 83
C.1 The cell-cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
xv
![Page 16: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/16.jpg)
Chapter 1
Introduction
The number of high-dimensional datasets recording multiple aspects of
a single phenomenon is ever increasing in many areas ranging from finance
to biology. This is accompanied by a fundamental need for mathematical
frameworks that can compare such datasets to distinguish the similar from
the dissimilar.
In finance, we have datasets recording stock prices in different markets.
The price of a stock in any of these markets is a function of “global factors” that
influence markets worldwide, and “local factors” that are specific to a given
market (Fig. 1). Comparative analyses of stock prices across these markets to
identify the global (the similar) and local (the dissimilar) factors is important
to understand how these markets behave relative to one another.
In biology, we have DNA mircroarray datasets recording global messen-
ger RNA (mRNA) or gene expression activity in cells of different organisms
such as yeasts and human under a given set of conditions (Appendix A: Global
mRNA Expression Datasets). The expression of a gene in any of the organisms
is again a function of “global factors” that are universally conserved biologi-
cal mechanisms (the similar), and “local factors” that are specific to a given
organism (the dissimilar). Comparative analyses of global mRNA expression
from multiple model organisms promise to enhance fundamental understand-
ing of the universal and organism-specific biological mechanisms, and may
1
![Page 17: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/17.jpg)
Comparison of Data
New York Stock Exchange
Bombay Stock Exchange
London Stock Exchange
Identify the Global and Local factors affecting stock prices.
Stock Price = f(G, L)G = Global FactorsL = Local Factors
Figure 1.1: Consider the price of a stock in the New York, Bombay and Lon-don stock exchanges. It is a function of “global factors” that influence allthree markets, and “local factors” that are specific to a given market. Identi-fying the global (the similar) and local (the dissimilar) factors is important tounderstand how these markets behave relative to one another.
2
![Page 18: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/18.jpg)
prove useful in medical diagnosis, treatment and drug design [21, 33].
High-dimensional datasets in finance or biology have natural matrix
representations. For example, a single DNA microarray monitors the mRNA
expression levels of m genes of an organism in a single experiment. A se-
quence of n such microarrays measure the expression levels of the set of m
genes across n different experimental conditions. The full mRNA expression
dataset can therefore be represented as an m × n matrix D (Appendix B:
Matrix Representation of an mRNA Expression Dataset). The ith row of ma-
trix D represents the mRNA expression of the ith gene across the n different
experiments. The jth column of D represents the mRNA expression of all m
genes in the jth experiment.
Different organisms have different numbers of genes. Hence, their full
mRNA expression datasets are matrices of different numbers of rows. At
present, any comparison of such functional data across multiple organisms is
limited to and synonymous with performing the comparison across subsets of
homologous genes (i.e., genes of similar sequence) among the different organ-
isms. Thus, existing algorithms require the mapping of genes (rows) of one
organism (matrix) onto those of another based on sequence similarity and effec-
tively introduce into the analysis the assumption that sequence and functional
similarities are equivalent.
For sequence-independent comparisons, mathematical frameworks are
required that can distinguish the similar from the dissimilar among multiple
large-scale datasets tabulated as matrices of different numbers of rows, corre-
sponding to the different sets of genes of the different organisms. The only
such framework to date, the generalized singular value decomposition (GSVD)
[5, 19, 27, 38], is limited to two matrices.
3
![Page 19: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/19.jpg)
1.1 Generalized Singular Value Decomposition (GSVD)
Suppose we have two real matrices D1 ∈ Rm1×n and D2 ∈ Rm2×n of
full column rank n. Van Loan [27] defined the GSVD of D1 and D2 as
UT1 D1X ≡ Σ1,
UT2 D2X ≡ Σ2,
(1.1)
where each Ui ∈ Rmi×n has orthonormal columns, X ∈ Rn×n is nonsingular,
and the Σi = diag(σi,k) ∈ Rn×n are diagonal with σi,k > 0 (i = 1, 2). Paige
and Saunders [38] showed that the GSVD can be computed in a stable way
by orthogonal transformations. In the full column-rank case it takes the form
D1 ≡ U1Σ1VT ,
D2 ≡ U2Σ2VT ,
(1.2)
where U1, U2 and
(Σ1
Σ2
)have orthonormal columns [19] and V is square and
nonsingular.
Alter et al. [5] showed that the GSVD provides a comparative math-
ematical framework for global mRNA expression datasets from two different
organisms, tabulated as two matrices of the same number of columns and
different numbers of rows, where the mathematical variables and operations
represent biological reality.
1.1.1 GSVD for Comparison of Global mRNA Expression Datasetsfrom Two Different Organisms
In this application, one matrix tabulates DNA microarray-measured
genome-scale mRNA expression from the yeast S. cerevisiae, sampled at n
time points at equal time intervals during the cell-cycle program (Appendix
C: The Cell-Cycle). This matrix is of size m1-S. cerevisiae genes × n-DNA
4
![Page 20: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/20.jpg)
D1
D2smsinagrO
Arrayss
en
eG
=U1
U2smsinagrO
Arraylets
se
neG
S1
S2smsinagrO
Genelets
ste
ly
arr
A
VTArrays
ste
le
neG
Figure 1.2: Generalized singular value decomposition (GSVD). In this rasterdisplay of Equation (1.2) with overexpression (red), no change in expression(black), and underexpression (green) centered at gene- and array-invariantexpression, the S. cerevisiae and human global mRNA expression datasets aretabulated as organism-specific genes × 18-arrays matrices D1 and D2. Theunderlying assumption is that there exists a one-to-one mapping among the18 columns of the two matrices but not necessarily among their rows.
These matrices are transformed to the reduced diagonalized 18-arraylets× 18-genelets matrices Σ1 and Σ2, by using the organism-specific genes ×18-arraylets transformation matrices U1 and U2, and the shared 18-genelets ×18-arrays transformation matrix V T .
The GSVD provides a sequence-independent comparative mathematicalframework for datasets from two organisms, where the mathematical variablesand operations represent biological reality: Genelets of common significancein the multiple datasets, and the corresponding arraylets, represent processescommon to S. cerevisiae and human [5].
5
![Page 21: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/21.jpg)
microarrays. The second matrix tabulates data from the HeLa human cell
line, sampled at the same number of time points, also at equal time intervals,
and is of size m2-human genes × n-arrays. The underlying assumption of the
GSVD as a comparative mathematical framework for the two matrices is that
there exists a one-to-one mapping among the columns of the matrices, but not
necessarily among their rows. The GSVD factors each matrix into a product of
an organism-specific matrix of size m1-S. cerevisiae genes or m2-human genes
× n-“arraylets”, an organism-specific diagonal matrix of size n-arraylets ×
n-“genelets’”, and a shared matrix of size n-genelets × n-arrays.
Alter et al. showed that the mathematical variables of the GSVD, i.e.,
the patterns of the genelets and the two sets of arraylets, represent either
the similar or the dissimilar among the biological programs that compose the
S. cerevisiae and human datasets. Genelets of common significance in both
datasets, and the corresponding arraylets, represent cell-cycle checkpoints that
are common to S. cerevisiae and human. Simultaneous reconstruction and
classification of both the S. cerevisiae and human data in the common subspace
that these patterns span outlines the biological similarity in the regulation of
their cell-cycle programs. Patterns almost exclusive to either dataset correlate
with either the S. cerevisiae or the human exclusive synchronization responses.
Reconstruction of either dataset in the subspaces of the common vs. ex-
clusive patterns represents differential gene expression in the S. cerevisiae and
human cell-cycle programs vs. their synchronization-response programs, re-
spectively. Notably, relations such as these between the expression profiles of
the S. cerevisiae genes KAR4 and CIK1, which are known to be correlated in
response to the synchronizing agent, the α-factor pheromone, yet anticorre-
lated during cell division, are correctly depicted.
6
![Page 22: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/22.jpg)
Remark 1.1.1. This GSVD formulation of Alter et al. has inspired sev-
eral new applications of the GSVD; for example, in a comparison of DNA
microarray-measured mRNA expression and DNA copy-number from the same
set of breast cancer tumors [9], or the integration of DNA microarray- and
quantitative real-time PCR data [49]. This paper also inspired a new mathe-
matical approach to the computation of the GSVD [16].
However, in all of these applications, the comparison is limited to two
high-dimensional datasets at a time since the GSVD provides a comparative
mathematical framework for two matrices only. With an ever increasing num-
ber of high-dimensional datasets, this is a major limitation.
This thesis addresses this limitation and defines a higher-order GSVD
(HO GSVD) of N ≥ 2 datasets, that provides a mathematical framework
that can compare multiple high-dimensional datasets tabulated as large-scale
matrices of different numbers of rows.
1.2 Thesis Outline and Contributions
Chapter 2 defines the HO GSVD mathematical framework. Chapter
3 illustrates the HO GSVD with a comparison of global mRNA expression
datasets from three different organisms. In Chapter 4, we investigate further
the mathematical properties of our HO GSVD. Chapter 5 summarizes the
thesis and proposes directions for future research.
1.2.1 Chapter 2: HO GSVD Mathematical Framework
In this chapter, we define a HO GSVD of N ≥ 2 datasets, tabulated as
N real matrices Di of the same number of columns and, in general, different
7
![Page 23: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/23.jpg)
numbers of rows. In our form of the HO GSVD, each data matrixDi is assumed
to have full column rank and is factored as the product Di = UiΣiVT , where
Ui is the same shape as Di (rectangular), Σi is diagonal and positive definite,
and V is square and nonsingular. The columns of Ui have unit length; we call
them the left basis vectors for Di (a different set for each i). The columns of V
also have unit length; we call them right basis vectors, and they are the same
in all factorizations. The notation we use is:
Ui ≡(ui,1 . . . ui,n
), ‖ui,k‖ = 1, (1.3)
Σi ≡ diag(σi,k), σi,k > 0, (1.4)
V ≡(v1 . . . vn
), ‖vk‖ = 1. (1.5)
We call {σi,k} the higher-order generalized singular value set. They are weights
in the following sums of rank-one matrices of unit norm:
Di = UiΣiVT =
∑k
σi,kui,kvTk , ‖ui,kv
Tk ‖ = 1. (1.6)
Hence we regard the kth values σi,k as indicating the significance of the kth
right basis vector vk in the matrices Di (reflecting the overall information that
vk captures in each Di in turn). The ratio σi,k/σj,k indicates the significance
of vk in Di relative to its significance in Dj.
To obtain the factorizations, we work with the matrices Ai = DTi Di
and the matrix sum S defined as the arithmetic mean of all pairwise quotients
AiA−1j , i 6= j. The eigensystem SV = V Λ is used to define V , and the factors
Ui and Σi are computed from Di and V . We assume that the eigensystem of
S exists, where V is real and not severely ill-conditioned. For N = 2 we show
that our particular V leads algebraically to the GSVD.
8
![Page 24: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/24.jpg)
We show that when a left basis vector ui,k is orthonormal to all other
left basis vectors in Ui for all i, this new decomposition extends to higher order
all of the mathematical properties of the generalized singular value decompo-
sition: the corresponding right basis vector vk is an eigenvector of all pairwise
quotients AiA−1j , the corresponding eigenvalue of S satisfies λk ≥ 1, and an
eigenvalue λk = 1 corresponds to a right basis vector vk of equal significance
in all matrices Di and Dj (meaning σi,k/σj,k = 1 for all i and j).
We also show that the existing SVD and GSVD decompositions are in
some sense special cases of our HO GSVD. Finally, we conjecture a role for
our exact HO GSVD in iterative approximation algorithms.
1.2.2 Chapter 3: HO GSVD for Comparison of Global mRNA Ex-pression Datasets from Three Different Organisms
The cell-cycle is an orderly sequence of events by which a cell repro-
duces (Appendix C: The Cell-Cycle). The details of the cell-cycle vary from
organism to organism and at different times in an organism’s life. However,
certain characteristics are universal [1]. The global mRNA expression dur-
ing the cell cycle has been investigated in many organisms including S. pombe,
S. cerevisiae and human. A comprehensive analysis of the level of conservation
of the fundamental mechanisms governing the cell-cycle is yet deficient.
In this chapter, we illustrate the HO GSVD mathematical framework
defined in Chapter 2 with a comparison of global cell-cycle mRNA expression
from the three disparate organisms S. pombe, S. cerevisiae and human. As
simple eukaryotes (organisms with membrane bound organelles and nucleus),
the yeasts S. pombe and S. cerevisiae serve as a model organisms to analyze
features of more complex eukaryotic genomes such as the human genome.
9
![Page 25: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/25.jpg)
In analogy with the GSVD comparative modeling of DNA microarray
data from two organisms [5] (Section 1.1.1), we find that the approximately
common HO GSVD subspace represents cell-cycle mRNA expression in the
three disparate organisms. Simultaneous reconstruction and classification of
the three datasets in the common subspace that these patterns span outlines
the biological similarity in the regulation of their cell-cycle programs.
Remark 1.2.1. The HO GSVD mathematical framework is the only one to
date where, unlike existing algorithms, the mapping of genes of one organism
onto those of another based on sequence similarity is not required. This enables
sequence-independent simultaneous comparison of global gene expression data
from multiple organisms.
1.2.3 Chapter 4: Additional Investigation of the MathematicalProperties of the HO GSVD
In Chapter 4, we investigate further the mathematical properties of our
HO GSVD by exploiting the structure of the representation of S as
S =(A1 + . . .+ AN)(A−1
1 + . . .+ A−1N )−NI
N(N − 1)(1.7)
We still work with a set of N real matrices Di ∈ Rmi×n of full column rank n
but remove all other assumptions of Chapter 1. Specifically, that the eigensys-
tem of S exists and that V is real and not severely ill-conditioned. We show
that the eigenvalues of S satisfy λk ≥ 1. We then show that when an eigen-
value of S satisfies λk = 1, the corresponding eigenvector vk is an eigenvector of
sums of pairwise quotients
(N∑
i=1
Ai
)A−1
j , that it satisfies vTk A−1j vk ≥ σ−2
j for
j = 1, . . . , N and that we can compute the corresponding eigenvector vk from
the Di without the need for computing any inverse. Finally, we comment on
10
![Page 26: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/26.jpg)
certain conditions under which the kth left basis vectors ui,k are orthonormal
to all other left basis vectors in Ui for all i.
1.2.4 Chapter 5: Summary and Future Work
In this chapter, we summarize the thesis and propose directions for
future work. Chapter 2 highlights the need for a stable numerical algorithm
for the HO GSVD. We have chosen the form Di = UiΣiVT for our HO GSVD
because a stable numerical algorithm is known for this form when N = 2
[38]. It would be ideal if our procedure reduced to the stable algorithm when
N = 2. To achieve this ideal, we would need to find a procedure that allows
rank(Di) < n for some i, and that transforms each Di directly without forming
the products Ai = DTi Di. We investigate directions toward such a stable
numerical algorithm.
Chapter 3 illustrates the HO GSVD comparative mathematical frame-
work in one important example, but it is applicable, and is expected to be
applied, in a huge range of research fields. The GSVD formulation of Alter et
al. has inspired several new applications of the GSVD (Remark 1.1.1). Our
new HO GSVD will enable all of these applications to compare and integrate
N ≥ 2 datasets. We examine additional applications of the HO GSVD.
Chapter 4 further investigates the mathematical properties of the HO
GSVD. We comment and conjecture on the orthogonality of the left basis
vectors Ui. Finally, we discuss the importance of linearity and orthogonality
in modeling genetic networks.
11
![Page 27: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/27.jpg)
Chapter 2
Mathematical Framework: HO GSVD
2.1 Introduction
We define a higher-order GSVD (HO GSVD) of N ≥ 2 datasets, tab-
ulated as N real matrices Di of the same number of columns and, in general,
different numbers of rows [39, 40, 42–45]. In our form of the HO GSVD, each
data matrix Di is assumed to have full column rank and is factored as the
product Di = UiΣiVT , where Ui is the same shape as Di (rectangular), Σi is
diagonal and positive definite, and V is square and nonsingular. The columns
of Ui have unit length; we call them the left basis vectors for Di (a different
set for each i). The columns of V also have unit length; we call them right
basis vectors, and they are the same in all factorizations. The notation we use
is:
Ui ≡(ui,1 . . . ui,n
), ‖ui,k‖ = 1, (2.1)
Σi ≡ diag(σi,k), σi,k > 0, (2.2)
V ≡(v1 . . . vn
), ‖vk‖ = 1. (2.3)
We call {σi,k} the higher-order generalized singular value set. They are weights
in the following sums of rank-one matrices of unit norm:
Di = UiΣiVT =
∑k
σi,kui,kvTk , ‖ui,kv
Tk ‖ = 1. (2.4)
12
![Page 28: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/28.jpg)
Hence we regard the kth values σi,k as indicating the significance of the kth
right basis vector vk in the matrices Di (reflecting the overall information that
vk captures in each Di in turn).
To obtain the factorizations, we work with the matrices Ai = DTi Di
and the matrix sum S defined as the arithmetic mean of all pairwise quotients
AiA−1j , i 6= j. The eigensystem SV = V Λ is used to define V , and the factors
Ui and Σi are computed from Di and V . For N = 2, we show that our
particular V leads algebraically to the GSVD (Section 2.2).
For N > 2, in Equations (2.17)–(2.19), we assume that each matrix Di
has full column rank and V is not severely ill-conditioned. We show that if
there exists an exact decomposition with Ui real and column-wise orthonormal
for all i, our particular V leads algebraically to that exact decomposition
(Theorem 2.3.1). Assuming that the eigensystem SV = V Λ exists and V is
real, we explore some properties of the eigenvalues Λ and their relation to the
left basis vectors ui,k and the values σi,k. We prove that when a left basis vector
ui,k is orthonormal to all other left basis vectors in Ui for all i, this HO
GSVD extends to higher orders all of the mathematical properties of the GSVD
that enable the use of the GSVD for comparative modeling of two datasets:
Specifically, that the corresponding right basis vector vk is an eigenvector of all
pairwise quotients AiA−1j (Lemma 2.3.2), that the corresponding eigenvalue of
S satisfies λk ≥ 1 (Theorem 2.3.3), and that an eigenvalue λk = 1 corresponds
to a right basis vector vk of equal significance in all matrices Di and Dj,
meaning σi,k/σj,k = 1 for all i and j (Theorem 2.3.4).
We also show that the existing SVD and GSVD decompositions are in
some sense special cases of our HO GSVD (Section 2.4). Finally, we conjecture
a role for our exact HO GSVD in iterative approximation algorithms (Section
13
![Page 29: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/29.jpg)
2.5). The work in this chapter was done in collaboration with Michael Saunders
of Stanford University.
2.2 The Matrix GSVD
2.2.1 Construction of the matrix GSVD.
Suppose we have two real matrices D1 ∈ Rm1×n and D2 ∈ Rm2×n of
full column rank n. Van Loan [27] defined the GSVD of D1 and D2 as
UT1 D1X ≡ Σ1,
UT2 D2X ≡ Σ2,
(2.5)
where each Ui ∈ Rmi×n has orthonormal columns, X ∈ Rn×n is nonsingular,
and the Σi = diag(σi,k) ∈ Rn×n are diagonal with σi,k > 0 (i = 1, 2). Paige
and Saunders [38] showed that the GSVD can be computed in a stable way
by orthogonal transformations. In the full column-rank case it takes the form
D1 ≡ U1Σ1VT ,
D2 ≡ U2Σ2VT ,
(2.6)
where U1, U2 and
(Σ1
Σ2
)have orthonormal columns [19] and V is square and
nonsingular.
We work with the form of Equation (2.6), but we find it useful and
more similar to the standard SVD [19] if we assume that the columns of V are
scaled to have unit length, with the columns of Σi scaled accordingly. Note
that the ratios σ1,k/σ2,k are not altered by the scaling.
In place of the methods of Van Loan [27] and Paige and Saunders [38],
we construct the GSVD of Equation (2.6) as follows. We obtain V from the
14
![Page 30: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/30.jpg)
D1
D2stesataD
Columns
sw
oR
=
U1
U2stesataD
LeftBasis
Vectors
sw
oR
S1
S2stesataD
RightBasisVectors
tf
eL
sis
aB
sro
tc
eV
VTColumns
th
giR
si
saB
sr
otc
eV
Figure 2.1: The GSVD of two matrices D1 and D2 is reformulated as a lineartransformation of the two matrices from the two rows × columns spaces totwo reduced and diagonalized left basis vectors × right basis vectors spaces.The right basis vectors are shared by both datasets. Each right basis vectorcorresponds to two left basis vectors.
15
![Page 31: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/31.jpg)
eigensystem of S, the arithmetic mean of the quotients A1A−12 and A2A
−11 of
the matrices A1 = DT1 D1 and A2 = DT
2 D2:
S ≡ 12(A1A
−12 + A2A
−11 ),
SV = V Λ,
V ≡(v1 . . . vn
), Λ = diag(λk), (2.7)
with ‖vk‖ = 1. Given V , we compute matrices Bi by solving two linear systems
V BTi = DT
i ,
Bi ≡(bi,1 . . . bi,n
), i = 1, 2, (2.8)
and we construct Σi and Ui =(ui,1 . . . ui,n
)by normalizing the columns of
Bi:
σi,k = ‖bi,k‖,
Σi = diag(σi,k),
Bi = UiΣi. (2.9)
We prove below that S is nondefective and that V is real.
From Equations (2.8) and (2.9) we have Di = BiVT = UiΣiV
T as in
Equation (2.6). We see that the rows of both D1 and D2 are superpositions
of the same right basis vectors, the columns of V (Fig. 2.1). This is the
construction that we generalize in Equations (2.17)–(2.19) to compute our HO
GSVD.
2.2.2 Interpretation of the GSVD construction.
The kth diagonals of Σ1 and Σ2, i.e., the generalized singular value
pair (σ1,k, σ2,k), indicate the significance of the kth right basis vector vk in the
16
![Page 32: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/32.jpg)
matrices D1 and D2, reflecting the overall information that vk captures in D1
and D2 respectively. The ratio σ1,k/σ2,k indicates the significance of vk in D1
relative to its significance in D2. A ratio σ1,k/σ2,k ≈ 1 corresponds to a basis
vector vk of similar significance in D1 and D2. Such a vector might reflect
a theme common to both matrices. A ratio of σ1,k/σ2,k � 1 corresponds to
a basis vector vk of almost negligible significance in D2 relative to its signifi-
cance in D1. Likewise, a ratio of σ1,k/σ2,k � 1 indicates a basis vector vk of
almost negligible significance in D1 relative to its significance in D2. Vectors
of negligible significance in one matrix might reflect themes exclusive to the
other matrix.
2.2.3 Mathematical properties of Λ and V .
Note that our GSVD construction in Equations (2.7)–(2.9) is well de-
fined for any square nonsingular V . We now show that our particular V leads
algebraically to the GSVD of Equation (2.6), ignoring the rescaled columns of
Σi and V . Recall that Ai = DTi Di and Di = UiΣiV
T with Σi diagonal.
In practice we would prefer not to form Ai or S directly. Instead we
may work with the QR factorizations Di = QiRi, where QTi Qi = I and Ri is
upper triangular and nonsingular. Define another triangular matrix R and its
SVD to be
R ≡ R1R−12 = UΣV T , (2.10)
where U and V are square and orthogonal, and Σ = diag(σk), σk > 0. We
17
![Page 33: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/33.jpg)
then have
S = 12
[(RT
1R1)(RT2R2)
−1 + (RT2R2)(R
T1R1)
−1],
R−T1 SRT
1 = 12
[R1(R
T2R2)
−1RT1 +R−T
1 (RT2R2)R
−11
]= 1
2
[RRT + (RRT )−1
]= UΛUT , (2.11)
where Λ ≡ 12
[Σ2 + Σ−2
]≡ diag(λk). Thus
S(RT1 U) = (RT
1 U)Λ, (2.12)
SV = V Λ, V ≡ RT1 UD, (2.13)
where the diagonal matrix D normalizes RT1 U so that V has columns of unit
length.
Theorem 2.2.1. The matrices U1 and U2 constructed in Equation (2.9) have
orthonormal columns (UT1 U1 = UT
2 U2 = I).
Proof. From Equation (2.8) we have V BTi BiV
T = DTi Di, so that
(RT1 UD)BT
i Bi(DUTR1) = RT
i Ri
⇒ DBT1 B1D = I,
DBT2 B2D = UTR−T
1 (RT2R2)R
−11 U = Σ−2,
with help from Equations (2.10), (2.12) and (2.13). We see that both BTi Bi
are diagonal, and the quantities in Equation (2.9) must be
U1 = B1D, Σ1 = D−1, (2.14)
U2 = B2DΣ, Σ2 = Σ−1D−1, (2.15)
with Ui column-wise orthonormal.
18
![Page 34: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/34.jpg)
Theorem 2.2.2. The eigenvalues of S satisfy λk ≥ 1, and S has a full set of
eigenvectors (it is non-defective). Also, the eigenvectors are real.
Proof. The equivalence transformation of Equation (2.11) shows that S has
the same eigenvalues as UΛUT , namely Λ. From the definition of Λ and Σ,
the eigenvalues are λk = 12(σ2
k + 1/σ2k) ≥ 1. Also, in the eigensystem of S in
Equation (2.13), V = RT1 UD is a product of real nonsingular matrices and
hence is real and nonsingular.
Theorem 2.2.3. An eigenvalue λk = 1 corresponds to a right basis vector vk
of equal significance in both matrices D1 and D2. That is, σ1,k/σ2,k = 1.
Proof. From the orthonormality of U1 and U2 of Equations (2.14) and (2.15),
and our GSVD construction of Equations (2.7)–(2.9), we have λk = (σ21,k/σ
22,k+
σ22,k/σ
21,k)/2. Therefore, λk = 1 can occur only if σ1,k/σ2,k = 1. In other words,
λk = 1 if the kth right basis vector vk is equally significant in D1 and D2.
Note that the GSVD is a generalization of the SVD in that if one of
the matrices is the identity matrix, the GSVD reduces to the SVD of the other
matrix.
In Equations (2.17)–(2.19), we now define a HO GSVD and in Theorems
(2.3.1)–(2.3.4) we show that this new decomposition extends to higher orders
most of the mathematical properties of the GSVD. We proceed in the same
way as in Equations (2.7)–(2.9).
19
![Page 35: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/35.jpg)
2.3 Higher-order generalized singular value decomposi-tion (HO GSVD)
2.3.1 Construction of the HO GSVD.
Suppose we have a set of N real matrices Di ∈ Rmi×n of full column
rank n. We define a HO GSVD of these N matrices [39, 40, 42–45] as
D1 = U1Σ1VT ,
D2 = U2Σ2VT ,
...
DN = UNΣNVT ,
(2.16)
where each Ui ∈ Rmi×n is composed of normalized left basis vectors, each
Σi = diag(σi,k) ∈ Rn×n is diagonal with σi,k > 0, and the nonsingular V is
composed of normalized right basis vectors. Note that V is identical in all N
matrix factorizations. In the application of this HO GSVD to a comparison
of global mRNA expression from N organisms, the right basis vectors are the
“genelets” and the N sets of left basis vectors are the N sets of “arraylets”.
We obtain V from the eigensystem of S, the arithmetic mean of all
pairwise quotients AiA−1j , i 6= j, of the matrices Ai = DT
i Di:
S ≡ 1
N(N − 1)
N∑i=1
N∑j>i
(AiA−1j + AjA
−1i ),
SV = V Λ,
V ≡(v1 . . . vn
), Λ = diag(λk), (2.17)
with ‖vk‖ = 1. We assume that the eigensystem of S exists, where V is real
and not severely ill-conditioned. Given V , we compute matrices Bi by solving
20
![Page 36: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/36.jpg)
D1
D2
D3
stesataD
Columns
sw
oR
=U1
U2
U3
stesataD
LeftBasis
Vectors
sw
oRS1
S2
S3
stesataD
RightBasis
Vectors
tf
eLs
is
aBs
ro
tce
V
VTColumns
th
gi
Rs
is
aB
sro
tc
eV
Figure 2.2: The higher-order GSVD (HO GSVD) of three matrices D1, D2,and D3 is a linear transformation of the three matrices from the three rows× columns spaces to three reduced and diagonalized left basis vectors × rightbasis vectors spaces. The right basis vectors are shared by all three datasets.Each right basis vector corresponds to three left basis vectors.
21
![Page 37: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/37.jpg)
N linear systems
V BTi = DT
i ,
Bi ≡(bi,1 . . . bi,n
), i = 1, . . . , N, (2.18)
and we construct Σi and Ui =(ui,1 . . . ui,n
)by normalizing the columns of
Bi:
σi,k = ‖bi,k‖,
Σi = diag(σi,k),
Bi = UiΣi. (2.19)
2.3.2 Interpretation of the HO GSVD construction.
In this construction, the rows of each of the N matrices Di are super-
positions of the same right basis vectors, the columns of V (Fig. 2.2). The kth
diagonals of Σi, the higher-order generalized singular value set {σi,k}, indicate
the significance of the kth right basis vector vk in the matrices Di, reflecting
the overall information that vk captures in each Di respectively. The ratio
σi,k/σj,k indicates the significance of vk in Di relative to its significance in Dj.
A ratio of σi,k/σj,k ≈ 1 for all i and j corresponds to a right basis vector
vk of similar significance in all N matrices Di. Such a vector might reflect a
theme common to the N matrices under comparison. A ratio of σi,k/σj,k � 1
indicates a basis vector of almost negligible significance in Di relative to its
significance in Dj. Vectors of negligible significance in one matrix might reflect
themes exclusive to the other matrices.
22
![Page 38: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/38.jpg)
2.3.3 Mathematical properties of Λ and V .
Recent research showed that several tensor generalizations are possible
for a given matrix decomposition, each preserving only some but not all of its
properties, such as exactness, orthogonality and diagonality. Several different
tensor decompositions were defined that generalize the eigenvalue decompo-
sition (EVD) [6] the singular value decomposition (SVD) [23, 36, 37], and the
GSVD [52] to higher orders. Our HO GSVD is an exact decomposition that
preserves the diagonality of the GSVD, i.e., the matrices Σi in Equation (2.16)
are diagonal. In general, the orthogonality of the GSVD is not preserved, i.e.,
the matrices Ui in Equation (2.16) are not column-wise orthonormal.
Theorem 2.3.1. If ∃ an exact HO GSVD of Equation (2.16) where the ma-
trices Ui are real and column-wise orthonormal, and V is not severely ill-
conditioned, then V is real, the eigensystem SV = V Λ exists, and our partic-
ular V leads algebraically to the exact decomposition.
Proof. If ∃ an exact decomposition with real and column-wise orthonormal Ui
for all i, then, following Equation (2.18), V = DTi UiΣ
−1i is a product of real
nonsingular matrices and hence is real and nonsingular. Also, Ai = DTi Di =
V Σ2iV
T and the right basis vectors, i.e., the columns of V , simultaneously di-
agonalize all pairwise quotients AiA−1j V = V Σ2
i Σ−2j as well as their arithmetic
mean, such that the eigensystem SV = V Λ exists. Therefore, the V we obtain
from the eigensystem of S is equivalent to the V of the HO GSVD of Equation
(2.16) with real and column-wise orthonormal Ui for all i.
We now show (Lemma 2.3.2 and Theorems 2.3.3 and 2.3.4) that with
this V , when a left basis vector ui,k is orthonormal to all other left basis
vectors in Ui for all i, this new decomposition extends to higher orders all of
23
![Page 39: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/39.jpg)
the mathematical properties of the GSVD. Specifically, that the corresponding
right basis vector vk is an eigenvector of all pairwise quotients AiA−1j , that the
corresponding eigenvalue of S satisfies λk ≥ 1, and that an eigenvalue λk = 1
corresponds to a right basis vector vk of equal significance in all matrices Di
and Dj (meaning σi,k/σj,k = 1 for all i and j).
Lemma 2.3.2. If a left basis vector ui,k is orthonormal to all other left ba-
sis vectors in Ui for all i, then the corresponding right basis vector vk is an
eigenvector of all pairwise quotients AiA−1j .
Proof. From Equations (2.17)–(2.19),
AiA−1j V ek = VWiW
−1j ek, (2.20)
where Wi ≡ BTi Bi = ΣiU
Ti UiΣi and ek is a unit n-vector. Without loss of
generality, let k = n and ui,n be orthonormal to all other left basis vectors in
Ui for all i so that
WiW−1j =
(∆ij 0
0 σ2i,n/σ
2j,n
), (2.21)
where ∆ij is an (n − 1) × (n − 1) principal submatrix of WiW−1j . Substitut-
ing Equation (2.21) into Equation (2.20) we obtain AiA−1j vk = (σ2
i,k/σ2j,k) vk.
That is, vk is an eigenvector of AiA−1j with the corresponding eigenvalue
(σ2i,k/σ
2j,k) for all i and j.
Theorem 2.3.3. If a left basis vector ui,k is orthonormal to all other left basis
vectors in Ui for all i, then the corresponding eigenvalue of S satisfies λk ≥ 1.
Proof. Substituting the result of Lemma (2.3.2) into Equation (2.17), we ob-
tain
λk =1
N(N − 1)
N∑i=1
N∑j>i
(σ2i,k/σ
2j,k + σ2
j,k/σ2i,k) ≥ 1. (2.22)
24
![Page 40: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/40.jpg)
That is, λk is the arithmetic mean of those eigenvalues of all pairwise quotients
AiA−1j that correspond to the right basis vector vk.
Theorem 2.3.4. If a left basis vector ui,k is orthonormal to all other left basis
vectors in Ui for all i, then an eigenvalue λk = 1 corresponds to a right basis
vector vk of equal significance in all matrices Di and Dj. That is, σi,k/σj,k = 1
for all i and j.
Proof. Equating λk in Equation (2.22) with one, we see that λk = 1 can occur
only if σi,k/σj,k = 1 for all i and j. In other words, λk = 1 if the kth right
basis vector vk is equally significant in Di and Dj for all i and j.
Following Theorems (2.3.1)–(2.3.4), we show that the SVD and GSVD
decompositions are special cases of our HO GSVD, and conjecture a role for
our HO GSVD in iterative approximation algorithms. We also mathematically
define the approximately common HO GSVD subspace of multiple matrices
as follows.
Corollary 2.3.5. The approximately common HO GSVD subspace is spanned
by the subset of right basis vectors {vk} of similar significance in all matrices
Di, i.e., by the vectors for which each of the corresponding λk ≈ 1 and each of
the left basis vectors ui,k are ε-orthonormal to all other vectors in Ui for all i,
with ε ≈ 0.
2.4 The Matrix GSVD and SVD as Special Cases
Let us now show that the GSVD and the standard SVD are special
cases of our tensor HO GSVD.
25
![Page 41: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/41.jpg)
Theorem 2.4.1. Suppose matrices D and Dj are real and have full column
rank. The HO GSVD of N matrices satisfying Di = D for all i 6= j reduces to
the GSVD of the two matrices D and Dj.
Proof. Substituting Di = D for all i 6= j in Equation (2.17), we obtain the
matrix sum S = [AA−1j + AjA
−1 + (N − 2)I]/N , with A = DTD. The eigen-
vectors of S are the same as the eigenvectors V of 12(AA−1
j + AjA−1), and
Theorem 2.2.2 shows that V exists. Solving two linear systems for B and Bj
and normalizing the solutions,
V BT = DTi = DT , B = UΣ,
V BTj = DT
j , Bj = UjΣj,
reduces the HO GSVD of Equation (2.16) to the GSVD of D and Dj of Equa-
tion (2.6):Di = D = UΣV T ∀ i 6= j,
Dj = UjΣjVT .
Theorem 2.2.1 shows that the columns of U and Uj are orthonormal.
Theorem 2.4.2. Suppose the matrix Dj is real and has full column rank. The
HO GSVD of N matrices satisfying Di = I for all i 6= j reduces to the SVD
of Dj.
Proof. Substituting Di = I for all i 6= j in Equation (2.17), we obtain the
matrix sum S = [Aj + A−1j + (N − 2)I]/N . The symmetry of S implies that
its eigenvectors V exist and are orthonormal. Computing the matrix Bj from
V BTj = DT
j , Bj = UjΣj,
gives Dj = UjΣjVT , and Theorem 2.2.1 shows that the columns of Uj are
orthonormal. Hence the factorization must be the SVD of Dj [19].
26
![Page 42: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/42.jpg)
2.5 Role in Approximation Algorithms
Our tensor HO GSVD preserves the exactness as well as the diagonality
of the matrix GSVD, i.e., all N matrix factorizations in Equation (2.16) are
exact and all N matrices Σi are diagonal. In general, our HO GSVD does
not preserve the orthogonality of the matrix GSVD, i.e., the matrices Ui in
Equation (2.16) are not necessarily column-wise orthonormal. For some appli-
cations, however, one might want to preserve the orthogonality instead of the
exactness of the matrix GSVD. An iterative approximation algorithm might
be used to compute for a set of N > 2 real matrices Di ∈ Rmi×n of full column
rank n an approximate decomposition
D1 ≈ U1Σ1VT ,
D2 ≈ U2Σ2VT ,
...
DN ≈ UNΣNVT ,
(2.23)
where each Ui ∈ Rmi×n is composed of orthonormal columns, each Σi =
diag(σi,k) ∈ Rn×n is diagonal with σi,k > 0, and V is identical in all N matrix
factorizations.
If ∃ an exact decomposition of Equation (2.16) where the matrices Ui
are column-wise orthonormal (and real), it is reasonable to expect that the
iterative approximation algorithm will converge to that exact decomposition.
More than that, when the iterative approximation algorithm is initialized with
the exact decomposition, it is reasonable to expect convergence in just one it-
eration. As we show in Theorem 2.3.1, if ∃ an exact decomposition of Equation
(2.16) in which the matrices Ui are column-wise orthonormal (and real), our
HO GSVD leads algebraically to that exact decomposition. We conjecture,
27
![Page 43: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/43.jpg)
therefore, the following role for our exact HO GSVD in iterative approxima-
tion algorithms.
Conjecture 1. An iterative approximation algorithm will converge to the op-
timal approximate decomposition of Equation (D.1) in a significantly reduced
number of iterations when initialized with our exact HO GSVD, rather than
with random Ui, Σi and V .
For example, De Lathauwer [22] defined an approximate decomposition
based on an alternating least squares algorithm seeded with estimates of Ui,
Σi, and V for all i to decompose a set of N real matrices Di ∈ Rmi×n as in
Equation (D.1) where each Ui ∈ Rmi×n is composed of orthonormal columns,
each Σi = diag(σi,k) ∈ Rn×n is diagonal with σi,k > 0, and V is identical in all
N matrix factorizations (Appendix D: De Lathauwer’s Generalized GSVD).
Theorem 2.5.1. Suppose ∃ an exact HO GSVD decomposition with orthog-
onal Ui for all i. De Lathauwer’s approximation algorithm will converge to
this exact decomposition in one iteration, when seeded with the Σi and V of
the exact decomposition which is computed by our HO GSVD (rather than by
random Σi and V ).
Proof. Consider the symmetric positive definite matrix Ci = ΣiVTV Σi and its
eigensystem CiYi = YiΛ, where Yi is orthogonal. Premultiplying with Ui,
DiV Σi = (UiΣiVT )V Σi = (UiYi)ΛY
Ti = ZiΛY
Ti , (2.24)
where Di = UiΣiVT and Zi = UiYi is orthogonal for orthogonal Ui. Note that
ZiΛYTi is the SVD of DiV Σi and Ui = ZiY
Ti .
28
![Page 44: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/44.jpg)
The first step of the De Lathauwer approximation algorithm (Appendix
D) updates Ui by forming the product DiV Σi using the seed matrices Σi and
V and computing it’s SVD,
DiV Σi = ZiΛYTi . (2.25)
De Lathauwer defines the new Ui = ZiYTi and therefore, Ui = Ui.
2.6 Discussion
We defined a HO GSVD of more than two real matrices Di that is
computed from the eigensystem SV = V Λ of a matrix sum S, the arithmetic
mean of all pairwise quotients of the matrices DTi Di. We assumed that each
matrix has full column rank, that the eigensystem of S exists, and that V
is real and not severely ill-conditioned. We proved that when a left basis
vector ui,k is orthonormal to all other left basis vectors in Ui for all i, this new
decomposition extends to higher order all of the mathematical properties of
the GSVD [19, 27, 38]: the corresponding right basis vector vk is an eigenvector
of all pairwise quotients AiA−1j , the corresponding eigenvalue of S satisfies
λk ≥ 1, and an eigenvalue λk = 1 corresponds to a right basis vector vk of
equal significance in all matrices Di and Dj, meaning σi,k/σj,k = 1 for all i
and j.
Recent research showed that several tensor generalizations are possible
for a given matrix decomposition, each preserving only some but not all of the
properties of the matrix decomposition, such as exactness, orthogonality and
diagonality. Several different tensor decompositions were defined that general-
ize the eigenvalue decomposition (EVD) [6], the singular value decomposition
(SVD) [23, 36, 37], and the GSVD [52] to higher orders.
29
![Page 45: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/45.jpg)
Our HO GSVD is an exact decomposition that preserves the diagonal-
ity of the GSVD, i.e., the matrices Σi in Equation (2.16) are diagonal. In
general, the orthogonality of the GSVD is not preserved, i.e., the matrices Ui
in Equation (2.16) are not column-wise orthonormal.
We have chosen the form Di = UiΣiVT (2.6) for our HO GSVD because
a stable numerical algorithm is known for this form when N = 2 [38]. It would
be ideal if our procedure reduced to the stable algorithm when N = 2. To
achieve this ideal, we would need to find a procedure that allows rank(Di) < n
for some i, and that transforms each Di directly without forming the products
Ai = DTi Di.
The GSVD enabled comparative modeling of DNA microarray data
from yeast and human, the first comparative modeling of data from two dif-
ferent organisms and the only one to date that does not require a one-to-one
mapping between the rows of the two datasets, i.e., their genes [5]. In the
next chapter, we show that our new HO GSVD provides a comparative math-
ematical framework for N > 2 large-scale DNA microarray datasets from N
organisms tabulated as N matrices, where there exists a one-to-one mapping
among the columns of all the matrices but not necessarily among their rows.
The new HO GSVD therefore enables comparative modeling of data from
complete genomes of these multiple organisms.
30
![Page 46: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/46.jpg)
Chapter 3
HO GSVD for Comparison of Global mRNA
Expression Datasets from Three Different
Organisms
3.1 Introduction
Comparative analyses of global mRNA expression from multiple model
organisms promise to enhance fundamental understanding of the universality
and specialization of molecular biological mechanisms, and may prove useful
in medical diagnosis, treatment and drug design [21, 33]. Existing algorithms
limit analyses to subsets of homologous (similar sequence) genes among the dif-
ferent organisms, effectively introducing into the analysis the assumption that
sequence and functional similarities are equivalent. For sequence-independent
comparisons, mathematical frameworks are required that can distinguish the
similar from the dissimilar among multiple large-scale datasets tabulated as
matrices of different numbers of rows, corresponding to the different sets of
genes of the different organisms.
In this chapter, we illustrate the HO GSVD mathematical framework
defined in Chapter 2 with a comparison of global cell-cycle mRNA expression
from the three disparate organisms S. pombe, S. cerevisiae and human, without
requiring, unlike existing algorithms, the mapping of genes of one organism
onto those of another based on sequence similarity [39, 40, 42–45].
31
![Page 47: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/47.jpg)
The cell-cycle is an orderly sequence of events by which a cell repro-
duces (Appendix C: The Cell-Cycle). The details of the cell-cycle vary from
organism to organism and at different times in an organism’s life. However,
certain characteristics are universal [1]. The global mRNA expression during
the cell cycle has been investigated in many organisms including S. pombe,
S. cerevisiae and human.
Remark 3.1.1. The yeasts S. pombe and S. cerevisiae are commonly used
for microarray experiments because, like human, they are eukaryotes, but with
small genomes. As simple eukaryotes, they serve as a model organisms to ana-
lyze features of more complex eukaryotic genomes such as the human genome.
Studies on yeast provide insight into critical cellular processes of all eukaryotes.
Yeast’s genome has been widely studied, and has more complete annotations
than any other organism.
The datasets are tabulated as matrices of n = 17 columns each, corre-
sponding to DNA microarray-measured mRNA expression from each organism
at 17 time points equally spaced during approximately two cell-cycle periods
(Section 3.2). SVD [19] is used to estimate the missing data in each of the data
matrices (Section 3.3). We compute the tensor HO GSVD of the three data
matrices after missing data estimation by using Equations (2.17)–(2.19). Fol-
lowing Corollary 2.3.5, the approximately common HO GSVD subspace of the
three datasets is determined (Section 3.4). We find that the approximately
common HO GSVD subspace represents cell-cycle mRNA expression in the
three disparate organisms (Section 3.5).
The decoupling of the HO GSVD genelets and N sets of arraylets, i.e.,
the diagonality of the matrices Σi in Equation (2.16), allows simultaneous
32
![Page 48: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/48.jpg)
reconstruction of the N = 3 datasets in the common HO GSVD subspace
without eliminating genes or arrays (Section 3.6). Identifying the subset of
genelets and the corresponding arraylets that span the approximately common
HO GSVD subspace allows simultaneous classification of the genes and arrays
of the three datasets by similarity in their expression of these genelets or
corresponding arraylets, respectively, rather than by their overall expression
3.7). Simultaneous sequence-independent reconstruction and classification of
the three datasets in the common subspace outlines cell-cycle progression in
time and across the genes in the three organisms (Sections 3.6 and 3.7).
Most notably, genes of significantly different cell-cycle peak times but
highly conserved sequences [7, 46] from the three organisms, are correctly clas-
sified (Section 3.8).
3.2 S. pombe, S. cerevisiae and Human Datasets
Consider now the HO GSVD comparative analysis of global mRNA
expression datasets from the N = 3 organisms S. pombe, S. cerevisiae and hu-
man (Appendix E: Supplementary Datasets 1–3 and Mathematica Notebook).
The datasets are tabulated as matrices of n = 17 columns each, correspond-
ing to DNA microarray-measured mRNA expression from each organism at 17
time points equally spaced during approximately two cell-cycle periods. The
underlying assumption is that there exists a one-to-one mapping among the
17 columns of the three matrices but not necessarily among their rows, which
correspond to either m1 = 3167-S. pombe genes, m2 = 4772-S. cerevisiae genes
or m3 = 13,068-human genes.
Rustici et al. [47] monitored mRNA levels in the yeast Schizosaccha-
romyces pombe over about two cell-cycle periods, in a culture synchronized
33
![Page 49: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/49.jpg)
initially late in the cell-cycle phase G2, relative to reference mRNA from an
asynchronous culture, at 15min intervals for 240min. The S. pombe dataset we
analyze (Appendix E Supplementary Dataset 1) tabulates the ratios of gene
expression levels for the m1=3167 gene clones with no missing data in at least
14 of the n=17 arrays. Of these, the mRNA expression of 380 gene clones was
classified as cell cycle-regulated by Rustici et al. or Oliva et al. [35].
Spellman et al. [50] monitored mRNA expression in the yeast Saccha-
romyces cerevisiae over about two cell-cycle periods, in a culture synchronized
initially in the cell-cycle phase M/G1, relative to reference mRNA from an
asynchronous culture, at 7min intervals for 112min. The S. cerevisiae dataset
we analyze (Appendix E Supplementary Dataset 2) tabulates the ratios of
gene expression levels for the m2=4772 open reading frames (ORFs), or genes,
with no missing data in at least 14 of the n=17 arrays. Of these, the mRNA
expression of 641 ORFs was traditionally or microarray-classified as cell cycle-
regulated by Spellman et al.
Whitfield et al. [55] monitored mRNA levels in the human HeLa cell line
over about two cell-cycle periods, in a culture synchronized initially in S-phase,
relative to reference mRNA from an asynchronous culture, at 2hr intervals for
34hr. The human dataset we analyze (Appendix E Supplementary Dataset 3)
tabulates the ratios of gene expression levels for the m3=13,068 gene clones
with no missing data in at least 14 of the n=17 arrays. Of these, the mRNA
expression of 787 was classified as cell cycle-regulated by Whitfield et al. [55].
3.3 Missing Data Estimation
Of the 53,839, 81,124 and 222,156 elements in the S. pombe, S. cere-
visiae and human data matrices, 2420, 2936 and 14,680 elements, respectively,
34
![Page 50: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/50.jpg)
i.e., ∼5%, are missing valid data. SVD [19] is used to estimate the missing data
as described [5]. In each of the data matrices, SVD of the expression patterns
of the genes with no missing data uncovered 17 orthogonal patterns of gene
expression, i.e., “eigengenes.” The five most significant of these patterns, in
terms of the fraction of the mRNA expression that they capture, are used to
estimate the missing data in the remaining genes. For each of the three data
matrices, the five most significant eigengenes and their corresponding fractions
are almost identical to those computed after the missing data are estimated,
with the corresponding correlations >0.95 (Appendix E Mathematica Note-
book). This suggests that the five most significant eigengenes, as computed
for the genes with no missing data, are valid patterns for estimation of missing
data. This also indicates that this SVD estimation of missing data is robust
to variations in the data selection cutoffs.
3.4 Common HO GSVD Subspace
We compute the tensor HO GSVD of the three data matrices after
missing data estimation by using Equations (2.17)–(2.19). Our HO GSVD of
Equation (2.16) transforms the datasets from the organism-specific genes ×
17-arrays spaces to the reduced 17-genelets × 17-arraylets spaces, where the
datasets Di are represented by the diagonal nonnegative matrices Σi, by using
the organism-specific genes × 17-arraylets transformation matrices Ui and the
one shared 17-genelets × 17-arrays transformation matrix V T (Fig. 3.1 and
Fig. 3.2).
Following Corollary 2.3.5, the approximately common HO GSVD sub-
space of the three datasets is spanned by the five genelets vk, k = 13, . . . , 17,
which are approximately equally significant in the three datasets with 1 .
35
![Page 51: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/51.jpg)
D1
D2
D3
smsinagrO
Arraysse
ne
G
=U1
U2
U3
smsinagrO
Arraylets
sen
eG
S1
S2
S3
smsinagrO
Genelets
st
ely
ar
rA
VTArrays
st
ele
ne
G
λk . 2 (Fig. 3.2 a and b), and by the five corresponding arraylets ui,k in each
dataset, which are ε = 0.33-orthonormal to all other arraylets (Fig. 3.3).
36
![Page 52: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/52.jpg)
Figure 3.1: Higher-order generalized singular value decomposition (HOGSVD). In this raster display of Equation (2.16) with overexpression (red), nochange in expression (black), and underexpression (green) centered at gene-and array-invariant expression, the S. pombe, S. cerevisiae and human globalmRNA expression datasets are tabulated as organism-specific genes × 17-arrays matrices D1, D2 and D3. The underlying assumption is that thereexists a one-to-one mapping among the 17 columns of the three matrices butnot necessarily among their rows.
These matrices are transformed to the reduced diagonalized 17-arraylets× 17-genelets matrices Σ1, Σ2 and Σ3, by using the organism-specific genes ×17-arraylets transformation matrices U1, U2 and U3 and the shared 17-genelets× 17-arrays transformation matrix V T . We prove that with our particularV of Equations (2.17)–(2.19), this decomposition extends to higher orders allof the mathematical properties of the GSVD to define genelets vk of equalsignificance in all the datasets, when the corresponding arraylets u1,k, u2,k
and u3,k are orthonormal to all other arraylets in U1, U2 and U3, and whenthe corresponding higher-order generalized singular values are equal: σ1,k =σ2,k = σ3,k.
We show that like the GSVD for two organisms [5], this HO GSVDprovides a sequence-independent comparative mathematical framework fordatasets from more than two organisms, where the mathematical variablesand operations represent biological reality: Genelets of common significancein the multiple datasets, and the corresponding arraylets, represent cell-cyclecheckpoints or transitions from one phase to the next, common to S. pombe,S. cerevisiae and human. Simultaneous reconstruction and classification ofthe three datasets in the common subspace that these patterns span outlinesthe biological similarity in the regulation of their cell-cycle programs. Mostnotably, genes of significantly different cell-cycle peak times [18] but highlyconserved sequence [7, 46] are correctly classified.
37
![Page 53: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/53.jpg)
1716151413121110987654321
ste
le
neG
1 2 3 4 5 6 7 8 9 01 11 21 31 41 51 61 71
HaL Arrays
1716151413121110987654321
0
2.
0
4.
0
6.
0
8.
0 1
HbL Inverse Eigenvalues lk-1
$ 2TcosH 4 pt
T+
p4
+p16
L
-0.5
0
0.5
1
no
iss
er
pxE
le
veL
1 2 3 4 5 6 7 8 901 11 21 31 41 51 61 71
HcL Arrays
$ 2TcosH 4 pt
T-
p2L
$ 2TcosH 4 pt
T-
p4L $ 2
TcosH 4 pt
T+
p2L
-0.5
0
0.5
1
1 2 3 4 5 6 7 8 901 11 21 31 41 51 61 71
HdL Arrays
$ 2TcosH 4 pt
TL
Figure 3.2: The genelets, i.e., the HO GSVD patterns of expression variationacross time.
(a) Raster display of the expression of the 17 genelets across the 17 timepoints, with overexpression (red), no change in expression (black) andunderexpression (green) around the array-invariant, i.e., time-invariantexpression.
(b) Bar chart of the corresponding inverse eigenvalues λ−1k , showing that the
13th through the 17th genelets are approximately equally significant inthe three datasets with 1 . λk . 2, where the five corresponding ar-raylets in each dataset are ε = 0.33-orthonormal to all other arraylets(Fig. 3.3).
(c) Line-joined graphs of the 13th (red), 14th (blue) and 15th (green) geneletsin the two-dimensional subspace that approximates the five-dimensionalHO GSVD subspace (Fig. 3.4 and Section 3.7), normalized to zero aver-age and unit variance.
(d) Line-joined graphs of the projected 16th (orange) and 17th (violet)genelets in the two-dimensional subspace. The five genelets describeexpression oscillations of two periods in the three time courses.
38
![Page 54: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/54.jpg)
1716151413121110987654321
st
ely
ar
rA
1 2 3 4 5 6 7 8 90
11
12
13
14
15
16
17
1
HaL S. pombe Arraylets
1716151413121110987654321
1 2 3 4 5 6 7 8 9 01 11 21 31 41 51 61 71
HbL S. cerevisiae Arraylets
1716151413121110987654321
1 2 3 4 5 6 7 8 90
11
12
13
14
15
16
17
1
HcL Human Arraylets
Figure 3.3: Correlations among the n=17 arraylets in each organism. Rasterdisplays of UT
i Ui, with correlations ≥ ε = 0.33 (red), ≤ −ε (green) and∈ (−ε, ε) (black), show that for k = 13, . . . , 17 the arraylets ui,k are ε = 0.33-orthonormal to all other arraylets in each dataset. The corresponding fivegenelets, vk with k = 13, . . . , 17, are approximately equally significant in thethree datasets with 1 . λk . 2 (Fig. 3.2). Following Corollary 2.3.5, there-fore, these arraylets and genelets span the HO GSVD subspace approximatelycommon among the three datasets.
39
![Page 55: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/55.jpg)
3.5 Common Subspace Interpretation
In analogy with the GSVD [5], a common HO GSVD genelet and the
N = 3 corresponding arraylets, which are approximately orthonormal to all
other arraylets in the corresponding datasets (Corollary 2.3.5), are inferred to
represent a biological process common to the three organisms and the corre-
sponding cellular states when a consistent biological or experimental theme is
reflected in the interpretations of the patterns of the genelet and the arraylets.
A genelet vk is associated with a biological process or an experimental
artifact when its pattern of expression variation across the arrays, i.e., across
time, is interpretable.
An arraylet ui,k is parallel- and antiparallel-associated with the most
likely parallel and antiparallel cellular states according to the annotations of
the two groups of m genes each, with largest and smallest levels of expression
in this arraylet among all mi genes, respectively. The P -value of a given asso-
ciation, i.e., P (y;m,mi, z), is calculated assuming hypergeometric probability
distribution of the z annotations among the mi genes, and of the subset of
y ⊆ z annotations among the subset of m genes, as described [51],
P (y;m,mi, z) =
(mi
m
)−1 m∑x=y
(z
x
)(mi − zm− x
). (3.1)
We find that the approximately common HO GSVD subspace repre-
sents cell-cycle mRNA expression in the three disparate organisms (Fig. 3.2
and Table 3.1).
40
![Page 56: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/56.jpg)
Table 3.1: Probabilistic significance of the enrichment of the arraylets thatspan the common HO GSVD subspace in each dataset in overexpressed orunderexpressed cell cycle-regulated genes. The P -value of each enrichment iscalculated as described [51] (Section 3.5) assuming hypergeometric distribu-tion of the annotations (Appendix E Supplementary Datasets 1–3) among thegenes, including the m=100 genes most overexpressed or underexpressed ineach arraylet.
Overexpression UnderexpressionDataset Arraylet Annotation P -value Annotation P -value
S. pombe 13 G2 2.4× 10−10 G1 1.0× 10−15
14 M 2.2× 10−21 G2 1.3× 10−9
15 M 4.1× 10−13 S 1.6× 10−17
16 G2 5.2× 10−18 G1 1.2× 10−26
17 G2 2.4× 10−10 S 5.3× 10−35
S. cerevisiae 13 S/G2 4.3× 10−15 M/G1 1.4× 10−9
14 M/G1 4.9× 10−26 G2/M 2.2× 10−12
15 G1 7.7× 10−17 S 1.3× 10−8
16 G2/M 2.3× 10−38 G1 2.0× 10−32
17 G2/M 2.3× 10−41 G1 2.6× 10−40
Human 13 G1/S 1.1× 10−33 G2 2.4× 10−44
14 M/G1 5.7× 10−3 G2 4.7× 10−2
15 G2 9.8× 10−24 None 1.4× 10−1
16 G1/S 9.8× 10−13 G2 4.1× 10−4
17 G2 9.3× 10−33 M/G1 2.7× 10−2
The expression variations across time of the five genelets fit normal-
ized cosine functions of two periods, superimposed on time-invariant expres-
sion (Fig. 3.2 c and d). Consistently, the corresponding organism-specific ar-
raylets are enriched [51] in overexpressed or underexpressed organism-specific
cell cycle-regulated genes, with 24 of the 30 P -values < 10−8 (Table 3.1).
For example, the three 17th arraylets, which correspond to the 0-phase 17th
41
![Page 57: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/57.jpg)
genelet, are enriched in overexpressed G2 S. pombe genes, G2/M and M/G1
S. cerevisiae genes and S and G2 human genes, respectively, representing the
cell-cycle checkpoints in which the three cultures are initially synchronized.
3.6 HO GSVD Data Reconstruction
The decoupling of the HO GSVD genelets and N sets of arraylets, i.e.,
the diagonality of the matrices Σi in Equation (2.16), allows simultaneous
reconstruction of the N = 3 datasets in the common HO GSVD subspace
without eliminating genes or arrays. In analogy with the GSVD [5], given a
common HO GSVD subspace that is spanned by the K genelets {vk} where
k = n − K + 1, . . . , n, we reconstruct each dataset in terms of only these
genelets and the corresponding {ui,k} arraylets,
Di = UiΣiVT =
n∑k=1
σi,kui,kvTk
→n∑
k=n−K+1
σi,kui,kvTk . (3.2)
Note that this reconstruction is mathematically equivalent to setting to zero
the higher-order generalized singular values {σi,k} in Σi for all k 6= n −K +
1, . . . , n, and then multiplying the matrices UiΣiVT to obtain the reconstructed
Di.
3.7 Simultaneous HO GSVD Classification
Identifying the subset of genelets and the corresponding arraylets that
span the approximately common HO GSVD subspace allows simultaneous
classification of the genes and arrays of the three datasets by similarity in their
42
![Page 58: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/58.jpg)
$ 2TcosH 4 pt
TL
$ 2TcosH 2 pt
T-
p3L
$ 2TcosH 4 pt
T-
p2L
-0.5
0
0.5
1
no
iss
er
pxE
le
veL
1 2 3 4 5 6 7 8 90
11
12
13
14
15
16
17
1
Arrays
Figure 3.4: The three-dimensional least-squares approximation of the five-dimensional approximately common HO GSVD subspace. Line-joined graphsof the first (red), second (blue) and third (green) most significant orthonor-mal vectors in the least squares approximation of the genelets vk with k =13, . . . , 17, which span the common HO GSVD subspace. We approximatethis five-dimensional subspace with the two orthonormal vectors x (green)and y (red), which fit normalized cosine functions of two periods, and 0- and−π/2-initial phases, i.e., normalized zero-phase cosine and sine functions oftwo periods, respectively.
expression of these genelets or corresponding arraylets, respectively, rather
than by their overall expression, as described [5].
We least squares-approximate the K=5-dimensional subspace spanned
by the five genelets vk with k = 13, . . . , 17, with the two-dimensional space
spanned by two of the three orthonormal vectors x, y and z ∈ Rn that maxi-
mize the norm∑n
k=n−K+1 ‖vTk x‖2 + ‖vT
k y‖2 + ‖vTk z‖2 (Fig. 3.4).
Since the approximately common HO GSVD subspace represents cell-
cycle mRNA expression, the two vectors that we select to approximate the
43
![Page 59: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/59.jpg)
common subspace, x and y, describe expression oscillations of two periods in
the three time courses (see Remark C.0.1).
We plot the projection of each gene of the dataset Di from the K-
genelets subspace onto y, i.e., eTm(∑n
k=n−K+1 σi,kui,kvTk )y/am where emi
is a
unit mi-vector, along the y-axis vs. that onto x along the x-axis, normal-
ized by its ideal amplitude am, where the contribution of each genelet to
the overall projected expression of the gene adds up rather than cancels out,
a2m =
∑k
∑j |σi,kσi,j(e
Tmiui,ku
Ti,jemi
)vTk (xxT + yyT )vj| (Fig. 3.5). In this plot,
the distance of each gene from the origin is the amplitude of its normalized
projection. A unit amplitude indicates that the genelets add up; a zero ampli-
tude indicates that they cancel out. The angular distance of each gene from
the x-axis is its phase in the progression of expression across the genes from
x to y and back to x, going through the projection of each genelet vk in this
subspace, i.e., (xxT + yyT )vk. Sorting the genes according to these angular
distances gives the angular order, or classification, of the genes.
Similarly, we plot the projection of each array from the K-arraylets
subspace onto (∑n
k=n−K+1 ui,kvTk )y, i.e., yT (
∑nk=n−K+1 σi,kvkv
Tk )en where en is
a unit n-vector, along the y-axis vs. that onto (∑n
k=n−K+1 ui,kvTk )x along the
x-axis, normalized by its ideal amplitude an, where the contribution of each
arraylet to the overall projected expression of the array adds up rather than
cancels out, a2n =
∑k
∑j |σi,kσi,j(e
Tnvkv
Tj en)vT
k (xxT + yyT )vj| (Fig. 3.5). We
sort the arrays according to their angular distances from the x-axis.
44
![Page 60: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/60.jpg)
HcL
x
y
MêG1
G1êS
G2
G2êM
S
u1,17
u1,16
u1,15u1,14
u1,13
1
23
4
5
6 78
9
10
1112
13
14
15 16 17
HfL
x
y
MêG1
G1êS
G2
G2êM
S
v17
v16
v15v14
v13
HiL
x
y
MêG1
G1êS
G2
G2êM
S
17
16
1514
13
HbL
x
y
MêG1G1
G2êM
SêG2S
u1,17
u1,16
u1,15u1,14
u1,13
1
23
4
5
6 7
8
9
10
1112
13
14
15 16 17
HeL
x
y
MêG1G1
G2êM
SêG2S
v17
v16
v15v14
v13
HhL
x
y
MêG1G1
G2êM
SêG2S
17
16
1514
13
HaL
x
y
G2
M
S
G1u1,17
u1,16
u1,15u1,14
u1,13
1
23
4
5
6 7
8
9
10
1112
13
14
15 1617
HdL
x
y
G2
M
S
G1v17
v16
v15v14
v13
HgL
x
y
G2
M
S
G1 17
16
1514
13
45
![Page 61: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/61.jpg)
Figure 3.5: Simultaneous HO GSVD reconstruction and classification of S.pombe, S. cerevisiae and human global mRNA expression in the approximatelycommon HO GSVD subspace.
(a–c) S. pombe, S. cerevisiae and human array expression, projected fromthe five-dimensional common HO GSVD subspace onto the two-dimensionalsubspace that approximates it (Sections 3.6 and 3.7). The arrays are color-coded according to their previous cell-cycle classification [35, 47, 50, 55]. Thearrows describe the projections of the k = 13, . . . , 17 arraylets of each dataset.The dashed unit and half-unit circles outline 100% and 50% of added-up(rather than canceled-out) contributions of these five arraylets to the over-all projected expression.
(d–f) Expression of 380, 641 and 787 cell cycle-regulated genes of S. pombe,S. cerevisiae and human, respectively, color-coded according to previous clas-sifications.
(g–i) The HO GSVD pictures of the S. pombe, S. cerevisiae and humancell-cycle programs. The arrows describe the projections of the k = 13, . . . , 17shared genelets and organism-specific arraylets that span the common HOGSVD subspace and represent cell-cycle checkpoints or transitions from onephase to the next.
For classification, we set to zero the arithmetic mean of each genelet
across the arrays, i.e., time, and that of each arraylet across the genes, such
that the expression of each gene and array is centered at its time- or gene-
invariant level, respectively.
Simultaneous sequence-independent reconstruction and classification of
the three datasets in the common subspace outlines cell-cycle progression in
time and across the genes in the three organisms (Sections 3.6 and 3.7). Pro-
jecting the expression of the 17 arrays of either organism from the correspond-
ing five-dimensional arraylets subspace onto the two-dimensional subspace that
approximates it (Fig. 3.4), ≥ 50% of the contributions of the arraylets add up,
rather than cancel out (Fig. 3.5 a–c). In these two-dimensional subspaces, the
angular order of the arrays of either organism describes cell-cycle progression
46
![Page 62: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/62.jpg)
in time through approximately two cell-cycle periods, from the initial cell-cycle
phase and back to that initial phase twice. Projecting the expression of the
genes, ≥ 50% of the contributions of the five genelets add up in the overall
expression of 343 of the 380 S. pombe genes classified as cell cycle-regulated,
554 of the 641 S. cerevisiae cell-cycle genes, and 632 of the 787 human cell-
cycle genes (Fig. 3.5 d–f). Simultaneous classification of the genes of either
organism into cell-cycle phases according to their angular order in these two-
dimensional subspaces is consistent with the classification of the arrays, and is
in good agreement with the previous classifications of the genes (Fig. 3.5 g–i).
With all 3167 S. pombe, 4772 S. cerevisiae and 13,068 human genes
sorted, the expression variations of the five k = 13, . . . , 17 arraylets from each
organism approximately fit one period cosines, with the initial phase of each
arraylet similar to that of its corresponding genelet. The global mRNA ex-
pression of each organism, reconstructed in the common HO GSVD subspace,
approximately fits a traveling wave, oscillating across time and across the genes
(Fig. 3.6 – 3.8).
3.8 Sequence-Independence of the Classification
Our new HO GSVD provides a comparative mathematical framework
for N ≥ 2 large-scale DNA microarray datasets from N organisms tabulated
as N matrices that does not require a one-to-one mapping between the genes
of the different organisms. This HO GSVD, therefore, can be used to identify
genes of common function across different organisms independently of the se-
quence similarity among these genes, and to study, e.g., nonorthologous gene
displacement [33].
This HO GSVD can also be used to identify homologous genes, of sim-
47
![Page 63: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/63.jpg)
2G
M1
GS
2G
se
neG
0ni
m_ 5
1n
im_
03
nim
_5
4n
im_
06
nim
_5
7n
im_
09
nim
_ 50
1n
im_
02
1n
im_
53
1n
im_
05
1n
im_
56
1n
im_
08
1n
im_
59
1n
im_
01
2n
im_
52
2n
im_
04
2n
im_
HaL Arrays
1 2 3 4 5 6 7 8 90
11
12
13
14
15
16
17
1
HbL Arraylets
0
60
.0
21
.0
81
.0
42
.0
HcL Expression Level
Figure 3.6: S. pombe global mRNA expression reconstructed in the five-dimensional common HO GSVD subspace with genes sorted according to theirphases in the two-dimensional subspace that approximates it (Sections 3.6 and3.7). (a) Expression of the sorted 3167 genes in the 17 arrays, centered at theirgene- and array-invariant levels, showing a traveling wave of expression. (b)Expression of the sorted genes in the 17 arraylets, centered at their arraylet-invariant levels. Arraylets k = 13, . . . , 17 display the sorting. (c) Line-joinedgraphs of the 13th (red), 14th (blue), 15th (green), 16th (orange) and 17th(violet) arraylets fit one-period cosines with initial phases similar to those ofthe corresponding genelets (Fig. 3.2).
48
![Page 64: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/64.jpg)
2G
êMMê1G
1G
SSê
2G
2GêM
se
neG
0n
im_
7n
im_ 41
ni
m_
12n
im
_82
ni
m_
53n
im
_24
ni
m_
94n
im
_65
ni
m_
36n
im
_07
ni
m_
77n
im
_48
ni
m_
19n
im
_89
ni
m_ 50
1n
im_
211
ni
m_
HaL Arrays
1 2 3 4 5 6 7 8 90
11
12
13
14
15
16
17
1
HbL Arraylets
0
50.
0
01.
0
51.
0
02.
0
HcL Expression Level
Figure 3.7: S. cerevisiae global mRNA expression reconstructed in the five-dimensional common HO GSVD subspace with genes sorted according to theirphases in the two-dimensional subspace that approximates it (Sections 3.6 and3.7). (a) Expression of the sorted 4772 genes in the 17 arrays, centered at theirgene- and array-invariant levels, showing a traveling wave of expression. (b)Expression of the sorted genes in the 17 arraylets, centered at their arraylet-invariant levels. Arraylets k = 13, . . . , 17 display the sorting. (c) Line-joinedgraphs of the 13th (red), 14th (blue), 15th (green), 16th (orange) and 17th(violet) arraylets fit one-period cosines with initial phases similar to those ofthe corresponding genelets (Fig. 3.2).
49
![Page 65: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/65.jpg)
S2G
2G
êMMê1
G1
GêS
S
sen
eG
2r
h_4
rh_
6r
h_8
rh_ 0
1r
h_
21
rh
_4
1r
h_
61
rh
_8
1r
h_
02
rh
_2
2r
h_
42
rh
_6
2r
h_
82
rh
_0
3r
h_
23
rh
_4
3r
h_
HaL Arrays
1 2 3 4 5 6 7 8 90
11
12
13
14
15
16
17
1
HbL Arraylets
0
53
0.0
07
0.0
50
1.0
04
1.0
HcL Expression Level
Figure 3.8: Human global mRNA expression reconstructed in the five-dimensional common HO GSVD subspace with genes sorted according to theirphases in the two-dimensional subspace that approximates it (Sections 3.6 and3.7). (a) Expression of the sorted 13,068 genes in the 17 arrays, centered attheir gene- and array-invariant levels, showing a traveling wave of expression.(b) Expression of the sorted genes in the 17 arraylets, centered at their arraylet-invariant levels. Arraylets k = 13, . . . , 17 display the sorting. (c) Line-joinedgraphs of the 13th (red), 14th (blue), 15th (green), 16th (orange) and 17th(violet) arraylets fit one-period cosines with initial phases similar to those ofthe corresponding genelets (Fig. 3.2).
50
![Page 66: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/66.jpg)
ilar DNA or protein sequences in one organism or across multiple organisms,
that have different cellular functions. For example, we examine the HO GSVD
classifications of genes of significantly different cell-cycle peak times [18] but
highly conserved sequences [7, 46]. We consider three subsets of genes, the
closest S. pombe, S. cerevisiae and human homologs of (i) the S. pombe gene
BFR1, which belongs to the evolutionarily highly conserved ATP-binding cas-
sette (ABC) transporter superfamily [14, 20, 28, 29, 32, 54, 56], (ii) the S. cere-
visiae phospholipase B-encoding gene PLB1 [13, 26], and (iii) the S. pombe
strongly regulated S-phase cyclin-encoding gene CIG2 [15, 31] (Table 3.8).
We find, most notably, that genes of significantly different cell-cycle
peak times [18] but highly conserved sequences [7, 46] are correctly classified
(Fig. 3.9).
For example, consider the G2 S. pombe gene BFR1 (Fig. 3.9 a), which
belongs to the evolutionarily highly conserved ATP-binding cassette (ABC)
transporter superfamily [14]. The closest homologs of BFR1 in our S. pombe,
S. cerevisiae and human datasets are the S. cerevisiae genes SNQ2, PDR5,
PDR15 and PDR10 (Table 3.8 a). The expression of SNQ2 and PDR5 is
known to peak at the S/G2 and G2/M cell-cycle phases, respectively [50].
However, sequence similarity does not imply similar cell-cycle peak times, and
PDR15 and PDR10, the closest homologs of PDR5, are induced during sta-
tionary phase,[29] which has been hypothesized to occur in G1, before the
Cdc28-defined cell-cycle arrest [54]. Consistently, we find PDR15 and PDR10
at the M/G1 to G1 transition, antipodal to (i.e., half a cell-cycle period apart
from) SNQ2 and PDR5, which are projected onto S/G2 and G2/M, respec-
tively (Fig. 3.9 b). We also find the transcription factor PDR1 at S/G2, its
known cell-cycle peak time, adjacent to SNQ2 and PDR5, which it positively
51
![Page 67: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/67.jpg)
regulates and might be regulated by, and antipodal to PDR15, which it nega-
tively regulates [20, 28, 32, 56].
Another example is the S. cerevisiae phospholipase B-encoding gene
PLB1 [26], which peaks at the cell-cycle phase M/G1 [13]. Its closest homolog
in our S. cerevisiae dataset, PLB3, also peaks at M/G1 [50] (Fig. 3.9d). How-
ever, among the closest S. pombe and human homologs of PLB1 (Table 3.8 b),
we find the S. pombe genes SPAC977.09c and SPAC1786.02, which expressions
peak at the almost antipodal S. pombe cell-cycle phases S and G2, respectively
[18] (Fig. 3.9 c).
As a third example, consider the S. pombe G1 B-type cyclin-encoding
gene CIG2 [15, 31] (Table 3.8c). Its closest S. pombe homolog, CDC13, peaks at
M [18] (Fig. 3.9 e). The closest human homologs of CIG2, the cyclins CCNA2
and CCNB2, peak at G2 and G2/M, respectively (Fig. 3.9 g). However, while
periodicity in mRNA abundance levels through the cell cycle is highly con-
served among members of the cyclin family, the cell-cycle peak times are not
necessarily conserved [21]: The closest homologs of CIG2 in our S. cerevisiae
dataset, are the G2/M promoter-encoding genes CLB1,2 and CLB3,4, which
expressions peak at G2/M and S respectively, and CLB5, which encodes a
DNA synthesis promoter, and peaks at G1 (Fig. 3.9 f).
52
![Page 68: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/68.jpg)
HeL
S. pombe
x
y
G2
M
S
G1
CIG2CDC13
CDC13
HfL
S. cerevisiae
x
y
MêG1G1
G2êM
SêG2
S
CLN1CLN2
CLN3
CLB1
CLB2CLB3
CLB4
CLB5
HgL
Human
x
y
MêG1
G1êS
G2
G2êM
S
CCNB2CCNA2
CCNA2
HcL
S. pombe
x
y
G2
M
S
G1 C1786.02
C1786.02
C977.09C
HdL
S. cerevisiae
x
y
MêG1G1
G2êM
SêG2
S
PLB1PLB3
HaL
S. pombe
x
y
G2
M
S
G1
BFR1
BFR1
HbL
S. cerevisiae
x
y
MêG1G1
G2êM
SêG2
S
SNQ2
PDR10PDR15
PDR5PDR1
Figure 3.9: Simultaneous HO GSVD sequence-independent classification ofgenes of significantly different cell-cycle peak times but highly conserved se-quences. (a) The S. pombe gene BFR1, and (b) its closest S. cerevisiae ho-mologs. (c) The S. pombe and (d) S. cerevisiae closest homologs of the S. cere-visiae gene PLB1. (e) The S. pombe cyclin-encoding gene CIG2 and its closestS. pombe, (f) S. cerevisiae and (g) human homologs.
53
![Page 69: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/69.jpg)
Table 3.2: The closest S. pombe, S. cerevisiae and human homologs of (a) theS. pombe gene BFR1, (b) the S. cerevisiae gene PLB1, and (c) the S. pombegene CIG2, as determined by an NCBI BLAST[7] of the protein sequence thatcorresponds to each gene against the NCBI RefSeq database [46].
Query Gene Gene Organism RefSeq ID Bit Score E-value(a) Bfr1 Snq2 S. cerevisiae NP 010294.1 1149 0
S. pombe Pdr5 S. cerevisiae NP 014796.1 1103 0NP 587932.3 Pdr18 S. cerevisiae NP 014468.1 1097 0
Pdr15 S. cerevisiae NP 010694.1 1093 0Pdr12 S. cerevisiae NP 015267.1 1070 0Pdr10 S. cerevisiae NP 014973.1 1029 0
(b) Plb1 Plb2 S. cerevisiae NP 013719.1 825 0S. cerevisiae Plb3 S. cerevisiae NP 014632.1 813 0NP 013721.1 SPAC977.09c S. pombe NP 592772.1 385 7× 10−107
SPAC1A6.03c S. pombe NP 593194.1 372 7× 10−103
SPCC1450.09c S. pombe NP 588308.1 369 8× 10−102
SPAC1786.02 S. pombe NP 594024.1 355 1× 10−97
(c) Cig2 Cdc13 S. pombe NP 595171.1 346 3× 10−95
S. pombe Clb2 S. cerevisiae NP 015444.1 248 1× 10−65
NP 593889.1 Cig1 S. pombe NP 588110.2 241 1× 10−63
Clb1 S. cerevisiae NP 011622.1 234 2× 10−61
Clb4 S. cerevisiae NP 013311.1 222 5× 10−58
Clb3 S. cerevisiae NP 010126.1 221 2× 10−57
Ccnb2 Human NP 004692.1 202 5× 10−52
Clb6 S. cerevisiae NP 011623.1 183 4× 10−46
Clb5 S. cerevisiae NP 015445.1 179 5× 10−45
Ccnb1 Human NP 114172.1 174 1× 10−43
Rem1 S. pombe NP 595798.1 160 2× 10−39
Ccna1 (isoform c) Human NP 001104516.1 159 7× 10−39
Ccna1 (isoform a) Human NP 003905.1 158 1× 10−38
Ccna1 (isoform b) Human NP 001104515.1 157 1× 10−38
Ccna2 Human NP 001228.1 145 6× 10−35
54
![Page 70: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/70.jpg)
3.9 Discussion
We have illustrated the HO GSVD mathematical framework with a
comparison of global cell-cycle mRNA expression from the three disparate or-
ganisms S. pombe, S. cerevisiae and human, where the mathematical variables
and operations represent biological reality: Genelets of common significance
in the multiple datasets, and the corresponding arraylets, represent cell-cycle
checkpoints or transitions from one phase to the next, common to S. pombe,
S. cerevisiae and human.
In analogy with the GSVD comparative modeling of DNA microarray
data from two organisms [5], we find that the approximately common HO
GSVD subspace represents cell-cycle mRNA expression in the three disparate
organisms. Simultaneous reconstruction and classification of the three datasets
in the common subspace that these patterns span outlines the biological sim-
ilarity in the regulation of their cell-cycle programs.
This HO GSVD can be used to identify genes of common function
across different organisms independently of the sequence similarity among
these genes, and to study, e.g., nonorthologous gene displacement [33]. This
HO GSVD can also be used to identify homologous genes, of similar DNA or
protein sequences in one organism or across multiple organisms, that have dif-
ferent cellular functions. The HO GSVD framework described is the only such
framework to date where, unlike existing algorithms, a mapping among the
genes of these disparate organisms is not required, enabling correct classifica-
tion of genes of highly conserved sequences but significantly different cell-cycle
peak times.
55
![Page 71: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/71.jpg)
Chapter 4
Additional Investigation of the Mathematical
Properties of the HO GSVD
4.1 Introduction
Charles Van Loan of Cornell University proposed a novel representation
of S in Equation (2.17) defined
S ≡ 1
N(N − 1)
N∑i=1
N∑j>i
(AiA−1j + AjA
−1i )
as
S =(A1 + . . .+ AN)(A−1
1 + . . .+ A−1N )−NI
N(N − 1)(4.1)
(private communication, April 29, 2010). In this chapter, we investigate fur-
ther the mathematical properties of our HO GSVD by exploiting the structure
of the representation of S in Equation (4.1). The work in this chapter was
done in collaboration with Charles Van Loan.
We still work with a set of N real matrices Di ∈ Rmi×n of full column
rank n but remove all other assumptions of Chapter 1. Specifically, that the
eigensystem of S exists and that V is real and not severely ill-conditioned.
As in Chapter 1, we begin with the matrix GSVD using the construction in
Equations (2.7)–(2.9). We show that S = 12(A1A
−12 + A2A
−11 ) is nondefective
and that the eigenvectors and eigenvalues of S are real (Theorem 4.2.2). We
then show that the eigenvalues of S satisfy λk ≥ 1 (Theorem 4.2.3) and that
the eigenvectors of S satisfy vTk A−1j vk = σ−2
j for j = 1, . . . , N (Theorem 4.2.4).
56
![Page 72: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/72.jpg)
Considering our HO GSVD, we proceed in the same way as in Section
4.2 and show, with help from Lemma 4.2.1, that S is nondefective and that
the eigenvectors and eigenvalues of S are real (Theorem 4.3.1). This removes
our previous assumptions that the eigensystem of S exists and that V is real
and not severely ill-conditioned (Section 2.3.1). We then show that the eigen-
values of S satisfy λk ≥ 1 (Theorem 4.3.2). We show that when an eigenvalue
of S satisfies λk = 1, the corresponding eigenvector vk is an eigenvector of
sums of pairwise quotients
(N∑
i=1
Ai
)A−1
j (Theorem 2.3.2), that it satisfies
vTk A−1j vk ≥ σ−2
j for j = 1, . . . , N (Theorem 4.3.7) and that we can compute
the corresponding eigenvector vk from the Di without the need for computing
any inverse (Remark 4.3.5). Finally, we comment on certain conditions under
which the kth left basis vectors ui,k are orthonormal to all other left basis
vectors in Ui for all i (Remark 4.3.8).
4.2 Mathematical Properties of the Matrix GSVD
Suppose we have two real matrices D1 ∈ Rm1×n and D2 ∈ Rm2×n of
full column rank n. We consider the definition of the GSVD
D1 ≡ U1Σ1VT ,
D2 ≡ U2Σ2VT ,
(4.2)
as in Equations (2.7)–(2.9).
We prove below the results of Theorem 2.2.2, i.e., that S = 12(A1A
−12 +
A2A−11 ) is nondefective and that the eigenvectors and eigenvalues of S are real.
We will need the following lemma.
Lemma 4.2.1. If M1,M2 ∈ Rn×n are symmetric and positive definite, ∃ a
nonsingular X ∈ Rn×n so that X−1(M1M2)X = diag(µi) with µi > 0.
57
![Page 73: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/73.jpg)
Proof. Consider the Cholesky factorizations M1 = L1LT1 amd M2 = L2L
T2 . We
then have
M1M2 = L1LT1L2L
T2 ,
L−11 M1M2L1 = LT
1L2LT2L1
= (LT2L1)
T (LT2L1)
= V Σ2V T , (4.3)
where LT2L1 = UΣV T is the SVD of LT
2L1. Thus, (L1V )−1M1M2(L1V ) =
Σ2.
Theorem 4.2.2. S = 12(A1A
−12 +A2A
−11 ) is nondefective, and the eigenvectors
and eigenvalues of S are real.
Proof. Since
(A1 + A2)(A−11 + A−1
2 ) = (A1A−12 + A2A
−11 ) + 2I, (4.4)
it follows that
S =(A1 + A2)(A
−11 + A−1
2 )− 2I
2(4.5)
and the eigenvectors of S are the same as the eigenvectors of (A1 +A2)(A−11 +
A−12 ). Since (A1 + A2) and (A−1
1 + A−12 ) are symmetric positive definite, ∃ a
nonsingular V ∈ Rn×n so that
V −1(A1 + A2)(A−11 + A−1
2 )V = diag(µk)
with µk > 0 (Lemma 4.2.1). Thus, S is nondefective with real eigenvectors V .
Also, the eigenvalues of S
diag(λk) = 12(diag(µk)− 2I), (4.6)
where µk > 0 are the eigenvalues of (A1 + A2)(A−11 + A−1
2 ). Thus, the eigen-
values of S are real.
58
![Page 74: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/74.jpg)
Theorem 4.2.3. The eigenvalues of S satisfy λk ≥ 1.
Proof. Since A1 and A−12 are symmetric positive definite, ∃ a nonsingular V ∈
Rn×n so that V −1A1A
−12 V = diag(σ2
k) (Lemma 4.2.1). Thus, the eigenvalues
of S
λk = 12(σ2
k +1
σ2k
), (4.7)
≥ 1. (4.8)
Theorem 4.2.4. The eigenvectors of S satisfy vTk A−1j vk = σ−2
j for j =
1, . . . , N .
Proof. From Equation (4.2) and the definition Aj = DTj Dj we have
A−1j = V −T Σ−1
j (UTj Uj)
−1Σj−1V −1,
V TA−1j V = Σ−2
j , since UTj Uj = I,
eTk V
TA−1j V ek = eT
k Σ−2j ek,
where ek is a unit n-vector. Thus, vTk A−1j vk = σ−2
j for j = 1, . . . , N .
4.3 Mathematical Properties of the HO GSVD
Suppose we have a set of N real matrices Di ∈ Rmi×n of full column
rank n. We consider the definition of the HO GSVD of these N matrices
D1 = U1Σ1VT ,
D2 = U2Σ2VT ,
...
DN = UNΣNVT ,
(4.9)
59
![Page 75: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/75.jpg)
as in Equations (2.17)–(2.19).
We proceed in the same way as in Section 4.2 and prove below, with
help from Lemma 4.2.1, that S is nondefective and that the eigenvectors and
eigenvalues of S are real. This removes our previous assumptions that the
eigensystem of S exists and that V is real and not severely ill-conditioned
(Section 2.3.1).
Theorem 4.3.1. S is nondefective, and the eigenvectors and eigenvalues of
S are real.
Proof. Since
(A1 + . . .+ AN)(A−11 + . . .+ A−1
N ) =N∑
i=1
N∑j>i
(AiA−1j + AjA
−1i ) +NI, (4.10)
it follows that
S =(A1 + . . .+ AN)(A−1
1 + . . .+ A−1N )−NI
N(N − 1)(4.11)
and the eigenvectors of S are the same as the eigenvectors of
H = (A1 + . . .+ AN)(A−11 + . . .+ A−1
N ). (4.12)
Since (A1 + . . .+ AN) and (A−11 + . . .+ A−1
N ) are symmetric positive definite,
∃ a nonsingular V ∈ Rn×n so that V −1HV = diag(µk) with µk > 0 (Lemma
4.2.1). Thus, S is nondefective with real eigenvectors V . Also, the eigenvalues
of S
diag(λk) =1
N(N − 1)(diag(µk)−NI), (4.13)
where µk > 0 are the eigenvalues of H. Thus, the eigenvalues of S are real.
60
![Page 76: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/76.jpg)
Theorem 4.3.2. The eigenvalues of S satisfy λk ≥ 1.
Proof. From Equations (4.10)–(4.13), the theorem is equivalent to asserting
that the eigenvalues of H satisfy µk ≥ N2.
Let D1...DN
=
U1...
UN
ΣV T (4.14)
be the thin SVD so that
Di = UiΣVT , i = 1, . . . , N,
Ai = DTi Di = (UiΣV
T )T (UiΣVT ) = V ΣUT
i UiΣVT , (4.15)
andN∑
i=1
UTi Ui = I. (4.16)
From Equations (4.12) and (4.15),
H =
(N∑
i=1
V ΣUTi UiΣV
T
)(N∑
j=1
V Σ−1(UTj Uj)
−1Σ−1V T
)(4.17)
= V Σ
(N∑
i=1
UTi Ui
)(N∑
j=1
(UTj Uj)
−1
)Σ−1V T . (4.18)
Thus, H is similar to
H1 =
(N∑
i=1
UTi Ui
)(N∑
j=1
(UTj Uj)
−1
)(4.19)
= NI +N∑
i=1
N∑j>i
[(UTi Ui)(U
Tj Uj)
−1 + (UTj Uj)(U
Ti Ui)
−1] (4.20)
= NI +N∑
i=1
N∑j>i
(FiF−1j + FjF
−1i ) (4.21)
61
![Page 77: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/77.jpg)
where Fi = UTi Ui. Thus, we must show that the eigenvalues of H1 are ≥ N2.
From Equations (4.16) and (4.19), H1 is symmetric and positive definite so
λmin(H1) = σmin(H1). (4.22)
Also,
2H1 = H1 +HT1
= 2NI +N∑
i=1
N∑j>i
[(FiF−1j + FjF
−1i ) + (F−1
j Fi + F−1i Fj)]
= 2NI +N∑
i=1
N∑j>i
[(FiF−1j + F−1
i Fj) + (FjF−1i + F−1
j Fi)]
= 2NI +N∑
i=1
N∑j>i
[(Mij +M−Tij ) + (Mji +M−T
ji )] (4.23)
where Mij = FiF−1j and (FiF
−1j )−T = (F−1
j Fi)−1 = F−1
i Fj.
Recall that if σmin(C) denotes the minimum singular value of a square
matrix C, then σmin(C+C−T ) ≥ 2. Using this fact with Equations (4.23) and
(4.22) we have
λmin(H1) = σmin(H1)
= σmin
(NI +
1
2
N∑i=1
N∑j>i
[(Mij +M−Tij ) + (Mji +M−T
ji )]
)
≥ N +1
2
N∑i=1
N∑j>i
[σmin(Mij +M−Tij ) + σmin(Mji +M−T
ji )]
≥ N +1
2
N∑i=1
N∑j>i
(2 + 2)
= N2. (4.24)
62
![Page 78: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/78.jpg)
We now prove that when an eigenvalue of S satisfies λk = 1, the
corresponding eigenvector vk is an eigenvector of sums of pairwise quotients(N∑
i=1
Ai
)A−1
j . We will use the lemma below.
Lemma 4.3.3. Let A,B ∈ Rn×n be symmetric matrices. Then
λmin(A+B) ≥ λmin(A) + λmin(B). (4.25)
At equality, the corresponding eigenvectors of A+B, A, and B are identical.
Proof. Let i = j = n in Theorem F.3.1 (Appendix F: Poincare’s Inequality
and Weyl’s Inequalities). Thus,
λ↓n(A+B) ≥ λ↓n(A) + λ↓n(B). (4.26)
This proves the inequality. In the proof of Theorem F.3.1, the three sub-
spaces under consideration reduce to those spanned by {un}, {v1, . . . , vn}, and
{w1, . . . , wn}, where, uj, vj, and wj denote the eigenvectors of A, B, and A+B
respectively, corresponding to their eigenvalues in decreasing order. If x is the
unit vector in the intersection of the three subspaces, then
x = un (4.27)
since x ∈ the subspace spanned by {un} and x, un are unit vectors. Since
λ↓n(A+B) = 〈x, (A+B)x〉 for x = wn, (4.28)
= 〈x,Ax〉+ 〈x,Bx〉 (4.29)
= λ↓n(A) + 〈x,Bx〉 since x = un, (4.30)
at equality of (4.26) we have
〈x,Bx〉 = λ↓n(B). (4.31)
Thus, x = vn.
63
![Page 79: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/79.jpg)
Theorem 4.3.4. If an eigenvalue of S satisfies λk = 1, the corresponding
eigenvector vk is an eigenvector of sums of pairwise quotients
(N∑
i=1
Ai
)A−1
j
for j = 1, . . . , N .
Proof. From Theorem 4.3.2, when an eigenvalue of S satisfies λk = 1, the
corresponding eigenvalue of H1 is = N2, its minimum. From Equations (4.16)
and (4.19), H1 is a sum of N symmetric matrices(N∑
i=1
UTi Ui
)(UT
j Uj)−1. (4.32)
Thus, from Lemma 4.3.3, when the kth eigenvalue of H1 is at its min-
imum of N2, the kth eigenvalues of each of the symmetric matrices in (4.32),
for j = 1, . . . , N , are at their minimum, and the corresponding kth eigenvec-
tors are identical. Let θj,k denote the kth minimum eigenvalue and yk the
corresponding eigenvector so that
(UTj Uj)
−1yk = θj,kyk, for j = 1, . . . , N. (4.33)
From Equation (4.18), the matrices
V Σ
(N∑
i=1
UTi Ui
)(UT
j Uj)−1Σ−1V T =
(N∑
i=1
Ai
)A−1
j (4.34)
in the summation of H are similar to those of (4.32) in the summation of H1.
Thus, the kth eigenvalue of each of the matrices in (4.34) for j = 1, . . . , N is
θj,k with corresponding identical eigenvectors V Σyk.
Remark 4.3.5. When an eigenvalue of S satisfies λk = 1, the corresponding
eigenvector
vk = V Σyk (4.35)
64
![Page 80: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/80.jpg)
where yk is the kth eigenvector of (UTj Uj)
−1 for j = 1, . . . , N . Since yk is
also the kth eigenvector of (UTj Uj), we can compute vk in a stable way from
Equations (4.14) and (4.35) without computing any inverse.
We prove below that when an eigenvalue of S satisfies λk = 1, the
corresponding eigenvector vk satisfies vTk A−1j vk ≥ σ−2
j for j = 1, . . . , N . We
will use Remark 4.3.5 and the following lemma.
Lemma 4.3.6. The minimum value of the diagonal entries of the inverse of
a correlation matrix C is one, i.e., C−1kk ≥ 1.
Proof. If x and y are n-dimensional column vectors, the Cauchy inequality
asserts that (xTx)(yTy) ≥ |xTy|2, with equality if and only if y = αx for some
scalar α. As C is a positive-definite symmetric matrix, if C1/2 is its unique
positive definite symmetric square root, the choice x = C1/2e, y = C−1/2e in
the Cauchy inequality yields
(eTCe)(eTC−1e) ≥ 1 ∀ eT e = 1, (4.36)
with equality if and only if e is an eigenvector of C [30]. If e = ek is the kth
column of the identity, it follows from (4.36) that the diagonal entries of C−1
are always greater than or equal to one.
Theorem 4.3.7. If an eigenvalue of S satisfies λk = 1, the corresponding
eigenvector vk satisfies vTk A−1j vk ≥ σ−2
j for j = 1, . . . , N .
Proof. From Equation (4.9), the definition Aj = DTj Dj, and Equation (4.15)
A−1j = V −T Σ−1
j (UTj Uj)
−1Σj−1V −1 = V −T Σ−1(UT
j Uj)−1Σ−1V −1,
V TA−1j V = Σ−1
j (UTj Uj)
−1Σj−1 = V T V −T Σ−1(UT
j Uj)−1Σ−1V −1V,
eTk V
TA−1j V ek = eT
k Σ−1j (UT
j Uj)−1Σj
−1ek = eTk V
T V −T Σ−1YjΘjYTj Σ−1V −1V ek,
65
![Page 81: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/81.jpg)
where ek is a unit n-vector and (UTj Uj)
−1Yj = YjΘj. Thus,
vTk A−1j vk = σ−2
j eTk (UT
j Uj)−1ek = vT
k (V ΣYj)−T Θj(V ΣYj)
−1vk. (4.37)
From Remark 4.3.5 and Theorem 4.33, when an eigenvalue of S satisfies λk = 1,
the corresponding eigenvector vk = V Σyk and the kth eigenvalue of (UTj Uj)
−1
is at its minimum θj,k. Substituting vk = V Σyk in Equation (4.37) gives
vTk A−1j vk = σ−2
j cj,k = eTk Θjek = θj,k, (4.38)
where
eTk (UT
j Uj)−1ek = cj,k. (4.39)
Since cj,k ≥ 1 (Lemma 4.3.6),
θj,k ≥ σ−2j , (4.40)
with equality if and only if ek is an eigenvector of (UTj Uj)
−1. Thus, when
λk = 1, the kth eigenvalue of H1 is at its minimum of N2, the kth eigenvalues
of (UTj Uj)
−1 are at their minimum and satisfy θj,k ≥ σ−2j , and
vTk A−1j vk ≥ σ−2
j . (4.41)
Remark 4.3.8. When an eigenvalue of S satisfies λk = 1, from Equations
(4.39)–(4.40) and Lemma 4.3.6, we see that θj,k = σ−2j if and only if ek is
an eigenvector of (UTj Uj). Furthermore, ek is an eigenvector of (UT
j Uj) if and
only if the kth left basis vectors uj,k are orthonormal to all other left basis
vectors in Uj.
Remark 4.3.9. From our numerical experiments, we have observed that when
an eigenvalue of S satisfies λk = 1, the left basis vectors ui,k are orthogonal to
all other left basis vectors in Ui for all i. We have not been able to prove this
mathematically as yet.
66
![Page 82: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/82.jpg)
4.4 Discussion
“It is a basic tenet of numerical analysis that structure should be ex-
ploited whenever solving a problem”, [19]. By exploiting the structure of the
representation of S in Equation (4.1), in the presence of properties such as
symmetry and definiteness, we have further investigated the mathematical
properties of the HO GSVD and have removed our previous assumptions from
Chapter 1: Specifically, that the eigensystem of S exists and that V is real
and not severely ill-conditioned.
First, the representation of S in Equation (4.1) relates the eigensystem
of S to that of H in Equation (4.12), a product of two symmetric and positive
definite matrices with a well-defined eigensystem. Next, by exploiting the
structure and properties of the matrix in Equation (4.14), we were able to
relate the eigensystem of H, and thus of S, to that of a symmetric and positive
definite matrix H1 in Equation (4.19). We proved that the eigenvalues of H1
are ≥ N2 and so the eigenvalues of S satisfy λk ≥ 1.
Using Poincare’s Inequality and Weyl’s Inequalities (Appendix F: Poincare’s
Inequality and Weyl’s Inequalities), we investigated conditions at λk = 1. We
proved that when an eigenvalue of S satisfies λk = 1, the corresponding eigen-
vector vk is an eigenvector of sums of pairwise quotients
(N∑
i=1
Ai
)A−1
j , that
it satisfies vTk A−1j vk ≥ σ−2
j for j = 1, . . . , N and that we can compute the
corresponding eigenvector vk from the Di without the need for computing any
inverse (see Section 5.2.2).
Finally, from Remark 4.3.8, we note that when an eigenvalue of S
satisfies λk = 1, the kth left basis vectors uj,k are orthonormal to all other left
basis vectors in Uj if and only if θj,k = σ−2j , for all j.
67
![Page 83: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/83.jpg)
Chapter 5
Summary and Future Work
This thesis addresses the fundamental need for a mathematical frame-
work that can compare multiple high-dimensional datasets tabulated as large-
scale matrices of different numbers of rows.
5.1 Summary of HO GSVD Mathematical Framework
In the first part of this thesis, we defined a HO GSVD of more than
two real matrices Di of the same number of columns and, in general, differ-
ent numbers of rows. Each matrix is factored as Di = UiΣiVT , where the
columns of Ui and V have unit length and are the left and right basis vectors
respectively, and each Σi is diagonal and positive definite. The matrix V is
identical in all factorizations, and is obtained from the eigensystem SV = V Λ
of a matrix sum S, the arithmetic mean of all pairwise quotients AiA−1j , i 6= j,
of the matrices Ai = DTi Di.
The rows of each of the N matrices Di are superpositions of the same
right basis vectors, the columns of V (Fig. 2.2). The kth diagonal of Σi =
diag(σi,k) indicates the significance of the kth right basis vector vk in the ith
matrix Di in terms of the overall information that vk captures in Di. The ratio
σi,k/σj,k indicates the significance of vk in Di relative to its significance in Dj.
A ratio of σi,k/σj,k ≈ 1 for all i and j corresponds to a right basis vector vk of
68
![Page 84: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/84.jpg)
similar significance in all N matrices Di. Such a vector might reflect a theme
common to the N matrices under comparison.
We assumed that each matrix has full column rank, that the eigensys-
tem of S exists, and that V is real and not severely ill-conditioned. We proved
that when a left basis vector ui,k is orthonormal to all other left basis vectors
in Ui for all i, this new decomposition extends to higher order all of the math-
ematical properties of the GSVD: the corresponding right basis vector vk is an
eigenvector of all pairwise quotients AiA−1j , the corresponding eigenvalue of S
satisfies λk ≥ 1, and an eigenvalue λk = 1 corresponds to a right basis vector
vk of equal significance in all matrices Di and Dj, meaning σi,k/σj,k = 1 for
all i and j. We also showed that the existing SVD and GSVD decompositions
are in some sense special cases of our HO GSVD.
Recent research showed that several tensor generalizations are possible
for a given matrix decomposition, each preserving only some but not all of
the properties of the matrix decomposition, such as exactness, orthogonality
and diagonality. Our HO GSVD is an exact decomposition that preserves the
diagonality of the GSVD, i.e., the matrices Σi in Equation (2.16) are diagonal.
In general, the orthogonality of the GSVD is not preserved, i.e., the matrices
Ui in Equation (2.16) are not column-wise orthonormal (see Remark 4.3.9).
We have chosen the form Di = UiΣiVT (2.6) for our HO GSVD because
a stable numerical algorithm is known for this form when N = 2 [38]. It would
be ideal if our procedure reduced to the stable algorithm when N = 2. To
achieve this ideal, we would need to find a procedure that allows rank(Di) < n
for some i, and that transforms each Di directly without forming the products
Ai = DTi Di.
69
![Page 85: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/85.jpg)
5.2 Future Direction: A stable HO GSVD Algorithm
We begin with reviewing the stable numerical algorithm known for
N = 2 [19, 38].
5.2.1 A Stable GSVD Algorithm
Suppose we have two real matrices D1 ∈ Rm1×n and D2 ∈ Rm2×n. Let[D1
D2
]=
[Q1
Q2
]R (5.1)
be the QR factorization with Q1 ∈ Rm1×n, D2 ∈ Rm2×n, and R ∈ Rn×n. The
SVD’s of Q1 and Q2 are highly related as
Q1 = UCW T , Q2 = Y SW T , (5.2)
where U , Y , and W are orthogonal and C = diag(ci), S = diag(si), and
CTC + STS = I [19, 38]. From Equations (5.1)–(5.2), we have the GSVD
D1 = Q1R = UC(W TR), (5.3)
D2 = Q2R = Y S(W TR). (5.4)
Note that our right basis vectors V = W TR, ignoring the scaling of the
columns. The highly related SVD’s of Q1 and Q2 follows from the CS de-
composition [19, 38].
5.2.2 Extending this Algorithm to N > 2 Matrices
Suppose we have a set of N real matrices Di ∈ Rmi×n. Let D1...DN
=
Q1...QN
R (5.5)
70
![Page 86: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/86.jpg)
be the QR factorization with Qi ∈ Rmi×n, and R ∈ Rn×n. To extend the
algorithm of Section 5.2.1, we need to investigate the relation between the
SVD’s of Qi, for i = (1, . . . , N), in the block partitioning of Equation (5.5).
Remark 5.2.1. The CS decomposition is defined on a 2-by-2 block partition-
ing of an n-by-n orthogonal matrix and shows that the SVD’s of the blocks
are highly related. We may now need to generalize the CS decomposition to
establish the relation between the SVD’s of a block partitioning as in Equation
(5.5).
Conjecture 2. If an eigenvalue of S satisfies λk = 1, the corresponding kth
right singular vectors of Qi for i = (1, . . . , N) are identical and we can deter-
mine the corresponding right basis vector vk = wTkR in a stable way.
Recall Remark 4.3.5 that observes that if an eigenvalue of S satisfies
λk = 1, we can compute the corresponding eigenvector vk from the Di directly
without forming the products Ai = DTi Di (and without the need for computing
any inverse). Computing the eigenvalues of S, however, requires having to
compute A−1i . It would be ideal to find an alternative condition to check,
rather than having to check if an eigenvalue of S satisfies λk = 1, that does
not require forming (and inverting) Ai.
To summarize, finding a stable numerical algorithm for our HO GSVD
is important. To accomplish this, we need to investigate the relation between
the SVD’s of Qi for i = (1, . . . , N) in the block partitioning of Equation
(5.5). This may require generalizing the CS decomposition for such a block
partitioning. Also, investigating Conjecture 2 and finding an alternative to
checking if an eigenvalue of S satisfies λk = 1 may be required.
71
![Page 87: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/87.jpg)
5.3 Summary of HO GSVD for Comparison of GlobalmRNA Expression Datasets from Three DifferentOrganisms
The second part of this thesis illustrates the HO GSVD mathematical
framework with a comparison of global cell-cycle mRNA expression from the
three disparate organisms S. pombe, S. cerevisiae and human, without requir-
ing, unlike existing algorithms, the mapping of genes of one organism onto
those of another based on sequence similarity.
The datasets are tabulated as matrices of n = 17 columns each, corre-
sponding to DNA microarray-measured mRNA expression from each organism
at 17 time points equally spaced during approximately two cell-cycle periods.
The underlying assumption is that there exists a one-to-one mapping among
the 17 columns of the three matrices but not necessarily among their rows,
which correspond to either m1 = 3167-S. pombe genes, m2 = 4772-S. cerevisiae
genes or m3 = 13,068-human genes.
Our HO GSVD of Equation (2.16) transforms the datasets from the
organism-specific genes × 17-arrays spaces to the reduced 17-genelets × 17-
arraylets spaces, where the datasetsDi are represented by the diagonal nonneg-
ative matrices Σi, by using the organism-specific genes × 17-arraylets transfor-
mation matrices Ui and the one shared 17-genelets × 17-arrays transformation
matrix V T (Fig. 3.1 and Fig. 3.2). Recall that in the application of this HO
GSVD to a comparison of global mRNA expression from N ≥ 2 organisms,
the right basis vectors are the genelets and the N sets of left basis vectors are
the N sets of arraylets.
In analogy with the GSVD [5], a common HO GSVD genelet and the
N = 3 corresponding arraylets, which are approximately orthonormal to all
72
![Page 88: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/88.jpg)
other arraylets in the corresponding datasets (Corollary 2.3.5), are inferred to
represent a biological process common to the three organisms and the corre-
sponding cellular states when a consistent biological or experimental theme is
reflected in the interpretations of the patterns of the genelet and the arraylets.
We find that the approximately common HO GSVD subspace represents cell-
cycle mRNA expression in the three disparate organisms (Fig. 3.2 and Table
3.1). Simultaneous reconstruction and classification of the three datasets in the
common subspace that these patterns span outlines the biological similarity
in the regulation of their cell-cycle programs.
Using HOGSVD, we were able to correctly classify genes of highly con-
served sequences but significantly different cell-cycle peak times [7, 46], such
as genes from the ABC transporter superfamily [14, 20, 28, 29, 32, 54, 56], phos-
pholipase B [13, 26]- and B cyclin [15, 31]-encoding genes from the three or-
ganisms.
This HO GSVD can be used to identify genes of common function
across different organisms independently of the sequence similarity among
these genes, and to study, e.g., nonorthologous gene displacement [33]. This
HO GSVD can also be used to identify homologous genes, of similar DNA
or protein sequences in one organism or across multiple organisms, that have
different cellular functions.
The GSVD formulation of Alter et al. has inspired several new appli-
cations of the GSVD (Remark 1.1.1). Our new HO GSVD will enable all of
these applications to compare and integrate N ≥ 2 datasets.
73
![Page 89: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/89.jpg)
5.4 Future Direction: Additional Applications of theHO GSVD
We have illustrated the HO GSVD comparative mathematical frame-
work in one important example, but it is applicable, and is expected to be
applied, in a huge range of research fields. We now describe a second ap-
plication of HO GSVD for the comparative and integrative analysis of The
Cancer Genome Atlas (TCGA) data by Cheng H. Lee of the Genomic Signal
Processing Lab at University of Texas at Austin [24, 25].
The Cancer Genome Atlas (TCGA) is a multi-institution effort aimed
at using integrated analysis of high-throughput sequencing and microarray
data to identify key genomic changes contributing to the development and
progression of human cancers. The first cancer studied by TCGA is glioblas-
toma multiforme (GBM), the most common type of primary brain tumor in
adults which is characterized by aggressive growth and treatment resistance,
with median survival times of approximately 12 months after initial diagnosis
[17]. Previous studies, including preliminary analyses by TCGA, have identi-
fied numerous aberrations in gene expression, DNA methylation status, DNA
copy number, chromosome structure, and nucleotide sequence associated with
the pathogenesis and progression of GBM [17, 34]. However, a great deal of
work remains in developing a more comprehensive and global view of the inter-
actions among these genomic aberrations and how such interactions contribute
to clinically relevant phenotypes and outcomes.
Lee and Alter are using HO GSVD to compare and integrate TCGA mi-
croarray measurements of mRNA expression, microRNA (miRNA) expression,
DNA methylation status, and DNA copy number alterations data to develop
a more comprehensive view of the complex interactions among the genomic
74
![Page 90: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/90.jpg)
aberrations present in GBM. For example, they might identify how deletions
of certain genomic regions lead to the loss of key miRNAs and the subsequent
overexpression of the miRNAs oncogenic targets [24, 25]. Such analyses will
provide a better understanding of the cascade of genomic events involved in
the development and clinical progression of GBM.
5.5 Summary of Further Investigation of the Mathe-matical Properties of the HO GSVD
The third part of this thesis exploits the structure of the representation
of S as
S =(A1 + . . .+ AN)(A−1
1 + . . .+ A−1N )−NI
N(N − 1),
in the presence of properties such as symmetry and definiteness, to further
investigate the mathematical properties of the HO GSVD and remove our
previous assumptions from Chapter 1: Specifically, that the eigensystem of S
exists and that V is real and not severely ill-conditioned.
We proved that the eigenvalues of S satisfy λk ≥ 1. We then proved
that when an eigenvalue of S satisfies λk = 1, the corresponding eigenvector vk
is an eigenvector of sums of pairwise quotients
(N∑
i=1
Ai
)A−1
j , that it satisfies
vTk A−1j vk ≥ σ−2
j for j = 1, . . . , N and that we can compute the corresponding
eigenvector vk from the Di without the need for computing any inverse.
Finally, we noted that when an eigenvalue of S satisfies λk = 1, the kth
left basis vectors uj,k are orthonormal to all other left basis vectors in Uj for all
j if and only if θj,k = σ−2j . In fact, from our numerical experiments, we have
observed that when an eigenvalue of S satisfies λk = 1, the left basis vectors
75
![Page 91: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/91.jpg)
ui,k are orthogonal to all other left basis vectors in Ui for all i. We have not
been able to prove this mathematically and make the following conjecture.
5.6 Future Direction: Orthogonality of the Left BasisVectors Ui
Conjecture 3. If an eigenvalue of S satisfies λk = 1, the left basis vectors
ui,k are orthogonal to all other left basis vectors in Ui for all i.
We conjecture that in analogy with Theorem 4.2.4, when an eigenvalue
of S satisfies λk = 1, the corresponding eigenvector vk satisfies vTk A−1j vk = σ−2
j
for j = 1, . . . , N , the kth eigenvalues of (UTj Uj)
−1 are at their minimum and
satisfy θj,k = σ−2j , and the kth left basis vectors uj,k are orthonormal to all
other left basis vectors in Uj, for all j.
We now ask: why is orthogonality important? are genetic networks
linear and orthogonal?
5.7 Are Genetic Networks Linear and Orthogonal?
The SVD model [4], the GSVD model [5], and the common subspace
(see Corollary 2.3.5) of the HO GSVD model, are all mathematically linear
and orthogonal. These models formulate genome-scale biological signals as
linear superpositions of mathematical patterns, which correlate with activities
of cellular elements that drive the measured signal, and cellular states where
these elements are active. These models associate the independent cellular
states with orthogonal, i.e., decorrelated, mathematical profiles suggesting that
the overlap or crosstalk between the genome-scale effects of the corresponding
cellular elements or modules is negligible [2].
76
![Page 92: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/92.jpg)
Orthogonality of the cellular states that compose a genetic network
suggests an efficient network design. With no redundant functionality in the
activities of the independent cellular elements, the number of such elements
needed to carry out a given set of biological processes is minimized. An ef-
ficient network, however, is fragile. The robustness of biological systems to
diverse perturbations suggests functional redundancy in the activities of the
cellular elements, and therefore also correlations among the corresponding cel-
lular states. Trade-offs between efficiency and robustness might explain the
behavior of complex biological systems [12].
Linearity of a genetic network may seem counterintuitive in light of the
non-linearity of the chemical processes, which underlie the network. However,
even if the kinetics of biochemical reactions are nonlinear, the mass balance
constraints that govern these reactions are linear [48].
Although it is still an open question whether genetic networks are linear
and orthogonal, linear and orthogonal mathematical frameworks have already
proven successful in describing the physical world, in such diverse areas as
mechanics and perception. It may not be surprising, therefore, that linear
and orthogonal mathematical models for genome-scale biological signals (1)
provide mathematical descriptions of the genetic networks that generate and
sense the measured data, where the mathematical variables and operations
represent biological or experimental reality; (2) elucidate the design principles
of cellular systems as well as guide the design of synthetic ones; and (3) predict
previously unknown biological principles.
These models may become the foundation of a future in which biological
systems are modeled as physical systems are today [3].
77
![Page 93: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/93.jpg)
Appendices
78
![Page 94: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/94.jpg)
Appendix A
Global mRNA Expression Datasets
A generally accepted definition of “life” for a cell is that a cell is an
enclosed volume capable of reproducing itself and maintaining itself at a high
energy state relative to its exterior [1]. The machines that a living cell uses
to perform both of these tasks are called proteins. All living cells store the
blueprints for the proteins it may need in the course of living in the form of
double-stranded molecules of deoxyribonucleic acid (DNA) [53].
Genes are segments of this DNA that encode for proteins. Genes are
translated from the DNA code in the genome into functional proteins via an
intermediate messenger called mRNA (messenger RNA) (Fig. A). The amount
of a particular protein’s mRNA that is present in a cell at a given time indicates
that rate at which the protein is being manufactured. Cells have intricate con-
trol systems which increase (upregulate) or decrease (downregulate) a protein’s
mRNA expression levels in response to environmental or internal stimuli.
The Northern Blot is a traditional lab technique which allows the sep-
aration and monitoring of a single type of mRNA molecule [8]. Typical exper-
iments using the Northern Blot involve exposing a culture of cells to various
environmental stimuli, and monitoring the one mRNA with successive blots.
Statistically significant changes in response to particular stimuli give some
indication as to the function of the observed gene’s destination protein.
Of course, even the smallest living organism has thousands of proteins,
79
![Page 95: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/95.jpg)
NucleusDNA
messenger RNA
Transcription
Translation
↑Protein
(mRNA)
AAT
T CG
TA
↑
Figure A.1: A cell is an enclosed volume capable of reproducing itself andmaintaining itself at a high energy state relative to its exterior [1]. The ma-chines that a living cell uses to perform both of these tasks are called proteins.All living cells store the blueprints for the proteins it may need in the courseof living in the form of double-stranded molecules of deoxyribonucleic acid(DNA) [53]. Genes are segments of this DNA that encode for proteins. Genesare translated from the DNA code in the genome into functional proteins viaan intermediate messenger called mRNA (messenger RNA). The amount of aparticular protein’s mRNA that is present in a cell at a given time indicatesthat rate at which the protein is being manufactured.
80
![Page 96: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/96.jpg)
and therefore thousands of genes and thousands of mRNA. Monitoring each
mRNA individually by Northern Blot experiments is infeasible. In the mid
1990’s, as micro-manufacturing technology became available, Patrick Brown’s
lab at Stanford University devised a way of performing Northern Blots on
all of an organism’s genes simultaneously. This assay is called the “DNA
microarray” [11], and has led to a revolution in cellular biology.
Individual microarray assays are a snapshot of the mRNA activity in
a cell under a given set of conditions. The power of microarray assays comes
from varying the conditions, and observing how all the genes are regulated
in response to the changes. The tight connection between the function of a
gene and its expression pattern make the mRNA expression data generated
by DNA microarrays invaluable.
81
![Page 97: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/97.jpg)
Appendix B
Matrix Representation of an mRNA
Expression Dataset
A single DNA microarray monitors the mRNA expression levels of m
genes of an organism in a single experiment. A sequence of n such microar-
rays measure the expression levels of the set of m genes across n different
experimental conditions. The full mRNA expression data set can therefore be
represented as an m × n matrix D (Fig. B). The ith row of matrix D repre-
sents the mRNA expression of the ith gene across the n different experiments.
The jth column of D represents the mRNA expression of all m genes in the
jth experiment.
82
![Page 98: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/98.jpg)
Time = 20 min
3.51.62.7 . . .
2.3
Column 5
1.40.62.0 . . .
0.8
Gene 1Gene 2Gene 3 . . .
Gene m
3.40.21.7 . . .
1.1
0.92.60.1 . . .
3.2
2.14.53.6 . . .
1.8
3.51.62.7 . . .
2.3
0 5 10 15 20
1.40.62.0 . . .
0.8
Column 1
Time = 0 min Time = 5 min
3.40.21.7 . . .
1.1
Column 2
Time = 10 min
0.92.60.1 . . .
3.2
Column 3
Time = 15 min
2.14.53.6 . . .
1.8
Column 4
Gene 1Gene 2Gene 3 . . .
Gene m
Gene 1Gene 2Gene 3 . . .
Gene m
Gene 1Gene 2Gene 3 . . .
Gene m
Gene 1Gene 2Gene 3 . . .
Gene m
Gene 1Gene 2Gene 3 . . .
Gene m
Figure B.1: A single DNA microarray monitors the mRNA expression levelsof m genes of an organism in a single experiment. A sequence of n = 5such microarrays measure the expression levels of the set of m genes across5 time points (5min intervals for 20min).The full mRNA expression data setcan therefore be represented as an m × 5 matrix D. The ith row of matrixD represents the mRNA expression of the ith gene across the 5 different timepoints. The jth column of D represents the mRNA expression of all m genesin the jth time point.
83
![Page 99: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/99.jpg)
Appendix C
The Cell-Cycle
The cell-cycle is an orderly sequence of events by which a cell repro-
duces. It is composed of four primary phases (Fig. C ):
• G1 phase - A gap phase for cell growth,
• S phase - DNA synthesis occurs during this phase,
• G2 phase - A gap phase for cell growth,
• M phase - Mitosis with cell division occurs during this phase.
Remark C.0.1. Genes involved in the cell-cycle are characterized by period-
ically varying mRNA expression levels. Consider the S. cerevisiae cell-cycle
gene SNQ2. It is classified, using wet lab experiments, as an S/G2 cell-cycle
phase gene [50]: its expression is known to peak at the S/G2 cell-cycle phases.
Thus, SNQ2 gene gets “turned on” (or becomes active) in the S/G2 cell-cycle
phase and gets“turned off” upon exiting that phase. Thus, in the analysis of
global mRNA expression data of an organism during the cell-cycle, oscillating
patterns of expression are observed [4].
84
![Page 100: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/100.jpg)
The Cell-cycleOrderly sequence of events by which a cell reproduces.
Primary Cellular States: M, G1, S, G2The cell-cycle out of control --> Excessive cell divisions --> Cancer
G2M
G1 S
Gap for growthMitosis with cell division
Gap for growth DNA synthesis
Figure C.1: The cell-cycle: an orderly sequence of events by which a cellreproduces. It is composed of the above four primary cellular states or phases.
85
![Page 101: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/101.jpg)
Appendix D
De Lathauwer’s Generalized GSVD
De Lathauwer [22] defined an iterative approximation algorithm called
the generalized GSVD which preserves the orthogonality instead of the ex-
actness of the matrix GSVD. Given N > 2 real matrices Di ∈ Rmi×n, with
mi � n for all i, find an approximate decomposition of the form
D1 ≈ U1Σ1VT ,
D2 ≈ U2Σ2VT ,
...
DN ≈ UNΣNVT ,
(D.1)
where each Ui ∈ Rmi×n is composed of orthonormal columns, each Σi =
diag(σi,k) ∈ Rn×n is diagonal with σi,k > 0, and V is identical in all N ma-
trix factorizations. This is an overdetermined problem (more equations than
unknowns). A least-squares based solution is proposed where the function
f =N∑
i=1
∣∣∣∣Di − UiΣiVT∣∣∣∣2
F(D.2)
is to be minimized.
D.1 Alternating Least Squares Algorithm
The solution is computed by means of an Alternating least squares
(ALS) algorithm where we update {Ui} for estimates of {Σi} and {V }, update
86
![Page 102: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/102.jpg)
{Σi} and {V } for estimates of {Ui}, and repeat these until convergence. As
each step minimizes the cost function f , convergence to a local minimum is
guaranteed. Multiple initializations may be required to find the global mini-
mum.
D.2 Updating {Ui}
The optimal Ui is
Ui = PiQTi ,
where the SVD of DiV Σi is given by PiΣiQTi for all i.
D.3 Updating {Di} and {V }
Let Bj ∈ RN×n be the matrix in which the jth rows of UTi Di are stacked
for i = 1, . . . , N . Define ej = (Σ1(j, j), . . . ,ΣN(j, j)) and let vj denote the jth
column of V . Then, ej is the dominant left singular vector of Bj, and vj is
the dominant right singular vector times the dominant singular value of Bj,
for all j.
87
![Page 103: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/103.jpg)
Appendix E
Mathematica Notebook and Datasets
The Mathematica Notebook and supplementary datasets described be-
low are available upon request.
E.1 Mathematica Notebook
(a) A Mathematica 5.2 code file, executable by Mathematica 5.2 and
readable by Mathematica Player, freely available at http://www.wolfram.com/products/player/.
(b) A PDF format file, readable by Adobe Acrobat Reader.
E.2 Supplementary Dataset 1
A tab-delimited text format file, readable by both Mathematica and
Microsoft Excel, reproducing the relative mRNA expression levels of m1=3167
S. pombe gene clones at n=17 time points during about two cell-cycle periods
from Rustici et al. [47] with the cell-cycle classifications of Rustici et al. or
Oliva et al. [35].
E.3 Supplementary Dataset 2
A tab-delimited text format file, readable by both Mathematica and
Microsoft Excel, reproducing the relative mRNA expression levels of m2=4772
88
![Page 104: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/104.jpg)
S. cerevisiae open reading frames (ORFs), or genes, at n=17 time points during
about two cell-cycle periods, including cell-cycle classifications, from Spellman
et al. [50].
E.4 Supplementary Dataset 3
A tab-delimited text format file, readable by both Mathematica and Mi-
crosoft Excel, reproducing the relative mRNA expression levels of m3=13,068
human genes at n=17 time points during about two cell-cycle periods, includ-
ing cell-cycle classifications, from Whitfield et al. [55].
89
![Page 105: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/105.jpg)
Appendix F
Poincare’s Inequality and Weyl’s Inequalities
In this appendix, we review Poincare’s Inequality and Weyl’s Inequali-
ties that we use in the proof of Theorem 2.3.2 from Chapter 4 [10]. We consider
finite-dimensional vector spaces over the field of real numbers with an inner
product between the vectors u, v denoted as 〈u, v〉. We always assume that
the inner product is definite, i.e., 〈u, v〉 = 0 if and only if u = 0. Recall that
such vector spaces are finite-dimensional Hilbert spaces which we denote by
symbols H,M etc.
F.1 Notation
Let x = (x1, . . . , xn) ∈ Rn. Let x↑ and x↓ be the vectors obtained by
rearranging the coordinates of x in increasing and decreasing orders, respec-
tively. Thus, if x↓ = (x↓1, . . . , x↓n), then x↓1,≥ . . . ≥ x↓n. Note that
x↑j = x↓n−j+1, 1 ≤ j ≤ n. (F.1)
The vector λ↓(A) = (λ↓1(A), . . . , λ↓n(A)) has as components the eigen-
values of A arranged in decreasing order. Thus, from Equation (F.1)
λ↑j(A) = λ↓n−j+1(A), 1 ≤ j ≤ n. (F.2)
90
![Page 106: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/106.jpg)
F.2 Poincare’s Inequality
Theorem F.2.1. Let A be a symmetric operator on H and let M be any
k-dimensional subspace of H. Then, ∃ unit vectors x, y ∈M such that
〈x,Ax〉 ≤ λ↓k(A)
and
〈y, Ay〉 ≥ λ↑k(A).
Proof. Let N be the subspace spanned by the eigenvectors uk, . . . , un of A that
correspond to the eigenvalues λ↓k(A), . . . , λ↓n(A). Since
dim M + dim N = n+ 1, (F.3)
the intersection of M and N is nontrivial. Consider a unit vector x ∈M ∩N.
Since x ∈ N, we can write x =n∑
j=k
αjuj, wheren∑
j=k
|αj|2 = 1 so that
〈x,Ax〉 = xTAx
=n∑
j=k
|αj|2uTj Auj
=n∑
j=k
|αj|2λ↓j(A)
≤n∑
j=k
|αj|2λ↓k(A)
= λ↓k(A). (F.4)
Note that we used the fact that λ↓k(A) ≥ λ↓j(A) for j = k, . . . , n. This proves
the first inequality. The second inequality can be proved by replacing A with
−A in the above argument and using the fact that λ↓j(−A) = λ↑j(A).
91
![Page 107: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/107.jpg)
F.3 Weyl’s Inequalites
Theorem F.3.1. Let A,B ∈ Rn×n be symmetric matrices. Then,
λ↓j(A+B) ≤ λ↓i (A) + λ↓j−i+1(B), for i ≤ j, (F.5)
λ↓j(A+B) ≥ λ↓i (A) + λ↓j−i+n(B), for i ≥ j. (F.6)
Proof. Let uj, vj, and wj denote the eigenvectors of A, B, and A + B re-
spectively, corresponding to their eigenvalues in decreasing order. Let i ≤ j.
Consider the three subspaces spanned by {ui, . . . , un}, {vj−i+1, . . . , vn}, and
{w1, . . . , wj}. These have dimensions n− i + 1, n− j + i, and j respectively,
and hence have a nontrivial intersection. Let x be a unit vector in their inter-
section. From Theorem F.2.1 and Equation (F.2)
λ↑1(A+B) = λ↓j(A+B) ≤ 〈x, (A+B)x〉
= 〈x,Ax〉+ 〈x,Bx〉
≤ λ↓i (A) + λ↓j−i+1(B) (F.7)
This proves the first inequality. The second inequality can be proved by replac-
ing A and B in the first inequality with −A and −B and using the relations
λ↓j(−A) = λ↑j(A) and λ↓j(−A) = −λ↓n−j+1(A). Rewriting Equation (F.7) with
i ≥ j and replacing A and B with −A and −B respectively, we have
λ↓i (−(A+B)) ≤ λ↓j(−A) + λ↓i−j+1(−B),
−λ↓n−i+1(A+B) ≤ λ↑j(A) + λ↑i−j+1(B),
λ↓n−i+1(A+B) ≥ λ↓n−j+1(A) + λ↓j−i+n(B),
λ↓j(A+B) ≥ λ↓
i(A) + λ↓
j−i+n(B), i ≥ j, (F.8)
where j = n− i+ 1, i = n− j + 1 and j − i+ n = j − i+ n with i ≥ j.
92
![Page 108: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/108.jpg)
Bibliography
[1] B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, and J. D. Watson.
Molecular Biology of the Cell. Garland Publishing, New York, 3rd edi-
tion, 1994.
[2] O. Alter. Genomic signal processing: From matrix algebra to genetic net-
works. In M. J. Korenberg, editor, Microarray Data Analysis: Methods
and Applications, volume 377. Humana Press Inc., Totowa, NJ.
[3] O. Alter. Discovery of principles of nature from mathematical modeling
of DNA microarray data. Proc Natl Acad Sci USA, 103:16063–16064,
2006.
[4] O. Alter, P. O. Brown, and D. Botstein. Singular value decomposition for
genome-wide expression data processing and modeling. Proc Natl Acad
Sci USA, 97:10101–10106, 2000.
[5] O. Alter, P. O. Brown, and D. Botstein. Generalized singular value
decomposition for comparative analysis of genome-scale expression data
sets of two different organisms. Proc Natl Acad Sci USA, 100:3351–3356,
2003.
[6] O. Alter and G. H. Golub. Reconstructing the pathways of a cellular
system from genome-scale signals using matrix and tensor computations.
Proc Natl Acad Sci USA, 102:17559–17564, 2005.
93
![Page 109: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/109.jpg)
[7] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman.
Basic local alignment search tool. J Mol Biol, 215:403–410, 1990.
[8] J. C. Alwine, D. J. Kemp, and G. R. Stark. Method for detection of
specific RNAs in agarose gels by transfer to diazobenzyloxymethyl-paper
and hybridization with DNA probes. Proc Natl Acad Sci USA, 74:5350–
5354, 1977.
[9] J. A. Berger, S. Hautaniemi, S. K. Mikra, and J. Astola. Jointly ana-
lyzing gene expression and copy number data in breast cancer using data
reduction models. IEEE/ACM Transactions on Computational Biology
and Bioinformatics, 3:2–16, 2006.
[10] R. Bhatia. Matrix Analysis. Springer-Verlag, New York, 1996.
[11] P. O. Brown and D. Botstein. Exploring the new world of the genome
with DNA microarrays. Nature Genetics, 21:33–37, 1999.
[12] J. M. Carlson and J. Doyle. Highly optimized tolerance: a mechanism
for power laws in designed systems. Phys Rev E, 60:1414–1427, 1999.
[13] R. J. Cho, M. J. Campbell, E. A. Winzeler, L. Steinmetz, A. Conway,
L. Wodicka, T. G. Wolfsberg, A. E. Gabrielian, D. Landsman, D. J. Lock-
hart, and R. W. Davis. A genome-wide transcriptional analysis of the
mitotic cell cycle. Mol Cell, 2:65–73, 1998.
[14] A. Decottignies and A. Goffeau. Complete inventory of the yeast ABC
proteins. Nat Genet, 15:137–145, 1997.
94
![Page 110: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/110.jpg)
[15] D. L. Fisher and P. Nurse. A single fission yeast mitotic cyclin B p34cdc2
kinase promotes both S-phase and mitosis in the absence of G1 cyclins.
EMBO J, 15:850–860, 1996.
[16] S. Friedland. A new approach to generalized singular value decomposi-
tion. SIAM J Matrix Anal Appl, 27:434–444, 2005.
[17] F. B. Furnari, T. Fenton, R. M. Bachoo, A. Mukasa, J. M. Stommel,
A. Stegh, W. C. Hahn, K. L. Ligon, D. N. Louis, C. Brennan, L. Chin,
R. A. DePinho, and W.K. Cavenee. Malignant astrocytic glioma: genet-
ics, biology, and paths to treatment. Genes Dev, 21:2683–2710, 2007.
[18] N. P. Gauthier, M. E. Larsen, R. Wernersson, S. Brunak, and T. S.
Jensen. Cyclebase.org: version 2.0, an updated comprehensive, multi-
species repository of cell cycle experiments and derived analysis results.
Nucleic Acids Res, 38:D699–D702, 2010.
[19] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins
University Press, Maryland, 3rd edition, 1996.
[20] O. Hlavacek, H. Kucerova, K. Harant, Z. Palkova, and L. Vachova. Pu-
tative role for ABC multidrug exporters in yeast quorum sensing. FEBS
Lett, 583:1107–1113, 2009.
[21] L. J. Jensen, T. S. Jensen, U. de Lichtenberg, S. Brunak, and P. Bork. Co-
evolution of transcriptional and post-translational cell-cycle regulation.
Nature, 443:594–597, 2006.
[22] L. De Lathauwer. An extension of the generalized SVD for more than two
matrices. Technical Report 09-206, ESAT/SISTA, K.U.Leuven, 2009.
95
![Page 111: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/111.jpg)
[23] L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular
value decomposition. SIAM J Matrix Anal Appl, 21:1253–1278, 2000.
[24] C. H. Lee and O. Alter. GSVD comparison of genome-wide array CGH
data from patient-matched normal and tumor TCGA samples reveals
known and novel copy number alterations in GBM, contributed poster.
Honolulu, HI, October 2009. 59th Annual Meeting of the American So-
ciety of Human Genetics (ASHG).
[25] C. H. Lee and O. Alter. GSVD comparison of genome-wide array CGH
data from patient-matched normal and tumor TCGA samples reveals
known and novel copy number alterations in GBM, contributed poster.
Hyderabad, India, January 2010. Rao Conference at the Interface Be-
tween Statistics and the Sciences.
[26] K. S. Lee, J. L. Patton, K. Harant, M. Fido, L. K. Hines, S. D. Kohlwein,
F. Paltauf, S. A. Henry, and D. E. Levin. The saccharomyces cerevisiae
PLB1 gene encodes a protein required for lysophospholipase and phos-
pholipase B activity. J Biol Chem, 269:19725–19730, 1994.
[27] C. F. Van Loan. Generalizing the singular value decomposition. SIAM
J Numer Anal, 13:76–83, 1976.
[28] Y. Mahe, A. Parle-McDermott, A. Nourani, A. Delahodde, A. Lamprecht,
and K. Kuchler. The ATP-binding cassette multidrug transporter Snq2
of saccharomyces cerevisiae: A novel target for the transcription factors
Pdr1 and Pdr3. Mol Microbiol, 20:109–117, 1996.
[29] Y. M. Mamnun, C. Schuller, and K. Kuchler. Expression regulation of
the yeast PDR5 ATP-binding cassette (ABC) transporter suggests a role
96
![Page 112: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/112.jpg)
in cellular detoxification during the exponential growth phase. FEBS
Lett, 559:111–117, 2004.
[30] A. W. Marshall and L. Olkin. Matrix versions of the Cauchy and Kan-
torovich inequalities. Aequationes Mathematicae, 40:89–93, 1990.
[31] C. Martin-Castellanos, K. Labib, and S. Moreno. B-type cyclins regulate
G1 progression in fission yeast in opposition to the p25rum1 cdk inhibitor.
EMBO J, 15:839–849, 1996.
[32] S. Meyers, W. Schauer, E. Balzi, M. Wagner, A. Goffeau, and J. Golin.
Interaction of the yeast pleiotropic drug resistance genes PDR1 and PDR5.
Curr Genet, 21:431–436, 1992.
[33] A. R. Mushegian and E. V. Koonin. A minimal gene set for cellular life
derived by comparison of complete bacterial genomes. Proc Natl Acad
Sci USA, 93:10268–10273, 1996.
[34] The Cancer Genome Atlas Research Network. Comprehensive genomic
characterization defines human glioblastoma genes and core pathways.
Nature, 455:1061–1068, 2008.
[35] A. Oliva, A. Rosebrock, F. Ferrezuelo, S. Pyne, H. Chen, S. Skiena,
B. Futcher, and J. Leatherwood. The cell cycle-regulated genes of Schizosac-
charomyces pombe. PLoS Biol, 3:1239–1260, 2005.
[36] L. Omberg, G. H. Golub, and O. Alter. Tensor higher-order singular
value decomposition for integrative analysis of DNA microarray data from
different studies. Proc Natl Acad Sci USA, 104:18371–18376, 2007.
97
![Page 113: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/113.jpg)
[37] L. Omberg, J. R. Meyerson, K. Kobayashi, L. S. Drury, J. F. Diffley, and
O. Alter. Global effects of DNA replication and DNA replication origin
activity on eukaryotic gene expression. Mol Syst Biol, 5:312, 2009.
[38] C. C. Paige and M. A. Saunders. Towards a generalized singular value
decomposition. SIAM J Numer Ana, 18:398–405, 1981.
[39] S. P. Ponnapalli, G. H. Golub, and O. Alter. A novel higher-order gener-
alized singular value decomposition for comparative analysis of multiple
genome-scale datasets, contributed poster. Stanford, California, June
2006. Stanford/Yahoo! Workshop on Algorithms for Modern Massive
Data Sets.
[40] S. P. Ponnapalli, G. H. Golub, and O. Alter. A novel higher-order gener-
alized singular value decomposition for comparative analysis of multiple
genome-scale datasets from different organisms. Bremen, Germany, July
2007. Heraeus International Summer School on Statistical Physics of
Gene Regulation From Networks to Expression Data and Back.
[41] S. P. Ponnapalli, M. A. Saunders, and O. Alter. A higher-order GSVD
for comparison of global mrna expression from multiple organisms. Sub-
mitted.
[42] S. P. Ponnapalli, M. A. Saunders, and O. Alter. A higher-order gener-
alized SVD for comparative analysis of genome-scale data from multiple
organisms, contributed poster. Saint Louis, Missouri, October 2008. 2008
BMES Annual Fall Meeting.
[43] S. P. Ponnapalli, M. A. Saunders, and O. Alter. Higher-order generalized
singular value decomposition for comparative analysis of genome-scale
98
![Page 114: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/114.jpg)
data from multiple organisms. Hyderabad, India, January 2010. Rao
Conference at the Interface Between Statistics and the Sciences.
[44] S. P. Ponnapalli, M. A. Saunders, G. H. Golub, and O. Alter. A higher-
order generalized singular value decomposition for comparative analysis of
large-scale datasets. Hong Kong, April 2008. Croucher Advanced Study
Institute on Mathematical and Algorithmic Challenges for Modeling &
Analyzing Modern Datasets.
[45] S. P. Ponnapalli, M. A. Saunders, G. H. Golub, and O. Alter. A novel
higher-order generalized singular value decomposition for comparative
analysis of DNA microarray data from different organisms. San Diego,
California, July 2008. 2008 SIAM Annual Meeting.
[46] K. D. Pruitt, T. Tatusova, and D. R. Maglott. NCBI reference sequences
(refseq): a curated non-redundant sequence database of genomes, tran-
scripts and proteins. Nucleic Acids Res, 35:D61–D65, 2007.
[47] G. Rustici, J. Mata, K. Kivinen, P. Lio, C. J. Penkett, G. Burns, J. Hayles,
A. Brazma, P. Nurse, and J. Bahler. Periodic gene expression program
of the fission yeast cell cycle. Nat Genet, 36:809–817, 2004.
[48] C. H. Schilling and B. O. Palsson. The underlying pathway structure of
biochemical reaction networks. Proc Natl Acad Sci USA, 95:6163–6168,
1998.
[49] A. W. Schreiber, N. J. Shirley, R. A. Burton, and G. B. Fincher. Com-
bining transcriptional datasets using the generalized singular value de-
composition. BMC Bioinformatics, 9, 2008.
99
![Page 115: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/115.jpg)
[50] P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B.
Eisen, P. O. Brown, D. Botstein, and B. Futcher. Comprehensive identi-
fication of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae
by microarray hybridization. Mol Biol Cell, 9:3273–3297, 1998.
[51] S. Tavazoie, J. D. Hughes, M. J. Campbell, R. J. Cho, and G. M. Church.
Systematic determination of genetic network architecture. Nat Genet,
22:281–285, 1999.
[52] J. Vandewalle, L. De Lathauwer, and P. Comon. The generalized higher
order singular value decomposition and the oriented signal-to-signal ratios
of pairs of signal tensors and their use in signal processing. Proc European
Conf on Circuit Theory and Design, pages I–389–I–393, 2003.
[53] J. Watson and F. Crick. Molecular structure of nucleic acids; a structure
for deoxyribose nucleic acid. Nature, 171:737–738, 1953.
[54] M. Werner-Washburne, E. Braun, G. C. Johnston, and R. A. Singer.
Stationary phase in the yeast Saccharomyces cerevisiae. Microbiol Mol
Biol Rev, 57:383–401, 1993.
[55] M. L. Whitfield, G. Sherlock, A. Saldanha, J. I. Murray, C. A. Ball, K. E.
Alexander, J. C. Matese, C. M. Perou, M. M. Hurt, P. O. Brown, and
D. Botstein. Identification of genes periodically expressed in the human
cell cycle and their expression in tumors. Mol Biol Cell, 13:1977–2000,
2002.
[56] H. Wolfger, Y. Mahe, A. Parle-McDermott, A. Delahodde, and K. Kuch-
ler. The yeast ATP binding cassette (ABC) protein genes PDR10 and
100
![Page 116: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/116.jpg)
PDR15 are novel targets for the Pdr1 and Pdr3 transcriptional regulators.
FEBS Lett, 418:269–274, 1997.
101
![Page 117: Ponnapalli PhD Thesis](https://reader031.vdocuments.site/reader031/viewer/2022020122/53fa205edab5cad23a8b4e45/html5/thumbnails/117.jpg)
Vita
Sri Priya Ponnapalli is the only child of Uma Devi and Kamalakar
Rao Ponnapalli. She did her schooling at Nasr School, Hyderabad, India and
then completed her Bachelor of Engineering in electronics and communications
engineering from the Chaitanya Bharathi Institue of Technology, Hyderabad
in 2004.
Sri Priya joined Dr. Orly Alter’s Genomic Signal Processing lab in the
fall of 2004 and has since been developing and applying matrix and tensor
decompositions to model and analyze large-scale data, such as from DNA
microarrays. She received a Master of Science degree in electrical and computer
engineering from the University of Texas at Austin in 2005. Upon completing
her Ph.D., Sri Priya will join Bloomberg’s research & development group at
New York City.
Permanent address: 503 Sai Ratnadeep Apts 3-6-190/a HimayatnagarHyderabad, Andhra Pradesh, India 500029
orPlot No. 1129, Rd. No. 36, Jubilee HillsHyderabad, Andhra Pradesh, India 500033email: [email protected]
This dissertation was typeset with LATEX† by the author.
†LATEX is a document preparation system developed by Leslie Lamport as a specialversion of Donald Knuth’s TEX Program.
102