stemness revisited: a meta analysis of stem cell ... · university of california santa cruz...

253
UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA INTEGRATION A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in BIOINFORMATICS by Martina I. Koeva December 2009 The Dissertation of Martina I. Koeva is approved: Professor Joshua Stuart, Chair Professor Camilla Forsberg Professor Kevin Karplus Tyrus Miller Vice Provost and Dean of Graduate Studies

Upload: others

Post on 23-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

UNIVERSITY OF CALIFORNIA

SANTA CRUZ

STEMNESS REVISITED: A META ANALYSIS OF STEM CELLSIGNATURES USING HIGH-THROUGHPUT DATA

INTEGRATION

A dissertation submitted in partial satisfaction of therequirements for the degree of

DOCTOR OF PHILOSOPHY

in

BIOINFORMATICS

by

Martina I. Koeva

December 2009

The Dissertation of Martina I. Koevais approved:

Professor Joshua Stuart, Chair

Professor Camilla Forsberg

Professor Kevin Karplus

Tyrus MillerVice Provost and Dean of Graduate Studies

Page 2: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Copyright c! by

Martina I. Koeva

2009

Page 3: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Table of Contents

List of Figures vii

List of Tables x

Abstract xii

Dedication xiv

Acknowledgments xv

1 Introduction 11.1 Motivation and problem statement . . . . . . . . . . . . . . . . . . . . . 11.2 Goals of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Overview of main results . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Organization of the dissertation . . . . . . . . . . . . . . . . . . . . . . . 5

2 Stem cells 62.1 Normal stem cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Definitions, origins and key properties of normal stem cells . . . 72.1.2 Normal stem cell types . . . . . . . . . . . . . . . . . . . . . . . . 102.1.3 Known functional pathways that regulate normal stem cell behavior 17

2.2 Relationship between stem cells and cancer . . . . . . . . . . . . . . . . 232.2.1 Cancer and tumor heterogeneity . . . . . . . . . . . . . . . . . . 232.2.2 Cancer evolution theories . . . . . . . . . . . . . . . . . . . . . . 252.2.3 Cancer stem cells . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2.4 Shared mechanisms between normal and cancer stem cells . . . . 302.2.5 Metastasis and stem cells . . . . . . . . . . . . . . . . . . . . . . 32

2.3 Side populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.4 High-throughput technologies and stem cells . . . . . . . . . . . . . . . . 36

2.4.1 High-throughput technologies . . . . . . . . . . . . . . . . . . . . 362.4.2 High-throughput data . . . . . . . . . . . . . . . . . . . . . . . . 402.4.3 Public data repositories . . . . . . . . . . . . . . . . . . . . . . . 43

iii

Page 4: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Meta-analysis 453.1 Overview of Meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 453.2 Meta-analysis and microarray data . . . . . . . . . . . . . . . . . . . . . 463.3 Techniques for combining study-specific e!ects . . . . . . . . . . . . . . 47

3.3.1 Vote counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.3.2 E!ect size combination . . . . . . . . . . . . . . . . . . . . . . . 49

3.4 Choice of techniques for the stemness meta-analysis . . . . . . . . . . . 503.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Previous expression-based approaches to stemness 524.1 Gene-level approaches to stemness . . . . . . . . . . . . . . . . . . . . . 53

4.1.1 Founder studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.2 Global-level approaches to stemness . . . . . . . . . . . . . . . . . . . . 574.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5 Stemness Meta-Analysis Method 615.1 Stemness Meta-Analysis method overview . . . . . . . . . . . . . . . . . 625.2 Input data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2.1 Mouse profiling studies . . . . . . . . . . . . . . . . . . . . . . . . 685.2.2 Human profiling studies . . . . . . . . . . . . . . . . . . . . . . . 73

5.3 Input gene modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.3.1 Homolog gene families (modules) . . . . . . . . . . . . . . . . . . 785.3.2 Functional gene modules . . . . . . . . . . . . . . . . . . . . . . . 81

5.4 Recurrence scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.4.1 Notation definitions . . . . . . . . . . . . . . . . . . . . . . . . . 855.4.2 General form of a recurrence score . . . . . . . . . . . . . . . . . 865.4.3 Simulation of synthetic module data . . . . . . . . . . . . . . . . 885.4.4 Evaluation and selection of a recurrence score . . . . . . . . . . . 925.4.5 Significance of recurrence scores . . . . . . . . . . . . . . . . . . . 95

5.5 Diversity scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.5.1 Notation definitions . . . . . . . . . . . . . . . . . . . . . . . . . 985.5.2 Cell type and gene usage diversity . . . . . . . . . . . . . . . . . 995.5.3 Significance of diversity scores . . . . . . . . . . . . . . . . . . . . 101

5.6 Specificity scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.7 Pattern classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035.8 Stemness “on” and stemness “o!” modules . . . . . . . . . . . . . . . . 1065.9 Formulation of stemness index . . . . . . . . . . . . . . . . . . . . . . . 107

5.9.1 Notation definitions . . . . . . . . . . . . . . . . . . . . . . . . . 1085.9.2 Cross-validation setup . . . . . . . . . . . . . . . . . . . . . . . . 1085.9.3 Binary switch and fractional scoring . . . . . . . . . . . . . . . . 1125.9.4 Log-likelihood approach . . . . . . . . . . . . . . . . . . . . . . . 119

iv

Page 5: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

5.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6 Stemness mechanisms in mouse stem cells 1276.1 Identification of recurrent modules from mouse dataset compendium . . 128

6.1.1 Recurrent module swap control . . . . . . . . . . . . . . . . . . . 1316.1.2 Cultured cell bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.2 Selection of significant of cell-diversity scores . . . . . . . . . . . . . . . 1356.3 Classification of modules using diversity and specificity scoring . . . . . 138

6.3.1 Single-gene stemness . . . . . . . . . . . . . . . . . . . . . . . . . 1386.3.2 Module-level stemness . . . . . . . . . . . . . . . . . . . . . . . . 138

6.4 Stemness modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1476.4.1 Oncogenes: Myb family . . . . . . . . . . . . . . . . . . . . . . . 1496.4.2 Tumor suppressor factors: Sfrp family . . . . . . . . . . . . . . . 1506.4.3 NM23 family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1526.4.4 Chaperone roles: Heat shock (Hspa) and importin families . . . . 1556.4.5 Lineage-specific gene inhibition: Inhibitor of di!erentiation/DNA

binding (Id) family . . . . . . . . . . . . . . . . . . . . . . . . . . 1576.5 Comparison with other global stemness methods . . . . . . . . . . . . . 1596.6 Di!erentiation modules . . . . . . . . . . . . . . . . . . . . . . . . . . . 1616.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

7 Stemness mechanisms in human stem cells 1677.1 Recurrent modules in a human stem cell compendium . . . . . . . . . . 1687.2 Cell type diversity assessment and classification of recurrent modules . . 1767.3 Human stemness modules . . . . . . . . . . . . . . . . . . . . . . . . . . 180

7.3.1 Angiogenesis: FGFR/FLT/PDGFR family . . . . . . . . . . . . 1807.3.2 Heparan sulfate proteoglycans (HSPGs): Glypican family . . . . 182

7.4 Mammalian stemness modules: Notch, TCF/LEF, Frizzled, Integrin andChd families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1837.4.1 Cell adhesion and communication: Integrin alpha family . . . . . 1837.4.2 Wnt pathway: Tcf/LEF and Frizzled families . . . . . . . . . . . 1847.4.3 Chromatin: Chd/Smarca family . . . . . . . . . . . . . . . . . . 185

7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

8 Applications of stemness mechanisms to stem cell and cancer classifi-cation 1878.1 Motivation and overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 1888.2 Stem cell-like populations . . . . . . . . . . . . . . . . . . . . . . . . . . 1898.3 What about side populations? . . . . . . . . . . . . . . . . . . . . . . . . 1948.4 Are normal stemness mechanisms conserved in cancer stem cells? . . . . 1958.5 Stemness in metastasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1998.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

v

Page 6: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

9 Conclusion 2029.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2029.2 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

9.2.1 Application to human data . . . . . . . . . . . . . . . . . . . . . 2059.2.2 Application to alternative splicing and miRNA data . . . . . . . 2059.2.3 Addition of niche data . . . . . . . . . . . . . . . . . . . . . . . . 2069.2.4 Methodological improvements . . . . . . . . . . . . . . . . . . . . 2069.2.5 Possible biological experiments . . . . . . . . . . . . . . . . . . . 207

A Definitions of key terms 209

B Tables of stemness modules 213

Bibliography 218

vi

Page 7: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

List of Figures

2.1 Hierarchical organization of stem cell types within the ectoderm, meso-derm, and endoderm lineages. . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Stem cells can undergo symmetric or asymmetric cell division. . . . . . . 102.3 Hematopoietic stem cell system di!erentiation tree. . . . . . . . . . . . . 122.4 General scheme of the fluorescence-activated cell-sorting (FACS) flow cy-

tometry technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5 Wnt pathway activation scheme. . . . . . . . . . . . . . . . . . . . . . . 182.6 Notch pathway activation scheme. . . . . . . . . . . . . . . . . . . . . . 202.7 TGF beta pathway example activation scheme. . . . . . . . . . . . . . . 222.8 Clonal evolution cancer expansion model. . . . . . . . . . . . . . . . . . 262.9 Cancer stem cell expansion model. . . . . . . . . . . . . . . . . . . . . . 272.10 Illustration of functional self-renewal assay. . . . . . . . . . . . . . . . . 292.11 Illustration of the role of Bmi1 in self-renewal. . . . . . . . . . . . . . . 312.12 A FACS plot for a toy example that illustrates the isolation and definition

of a side population. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.1 Overlap between the genes upregulated in four stem cell types in threestemness founder studies. . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.1 Overview of stemness meta-analysis method. . . . . . . . . . . . . . . . 625.2 Published gene list (PGL) input form to Stemness Meta-Analysis Method

(SMA). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.3 Module-level view of stemness. . . . . . . . . . . . . . . . . . . . . . . . 675.4 Classification of modules based on cell-diversity, specificity and gene-

diversity scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.5 Global inter-experiment similarities in mouse stem cell compendium. . . 745.6 Global inter-experiment similarities in human stem cell compendium. . . 775.7 Protein similarity network generation approach used for homolog family

definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.8 Possible pitfalls of the homolog module definition methodology. . . . . . 805.9 Pattern classification procedure used to identify stemness modules . . . 104

vii

Page 8: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

5.10 Visual illustration of the five-fold cross-validation setup. . . . . . . . . . 1095.11 Robustness of recurrence scores across cross-validation folds. . . . . . . . 1135.12 Precision-recall comparison of four stemness index scores. . . . . . . . . 1165.13 Precision comparison between stemness index scores. . . . . . . . . . . . 1175.14 Precision-recall comparison of twelve stemness index scores, based on

di!erent feature types and elements. . . . . . . . . . . . . . . . . . . . . 1235.15 Precision-recall comparison of the real stemness and di!erentiation fea-

tures to a randomly selected feature set. . . . . . . . . . . . . . . . . . . 1245.16 A box-plot comparison of the stemness indices of all stem cell and di!er-

entiated cell experiments in the mouse stem cell compendium. . . . . . . 126

6.1 Representative recurrence score distribution of modules of size 3. . . . . 1296.2 Selection of significant recurrently upregulated homolog families in mouse

stem cell compendium. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306.3 Distribution of the significant recurrently upregulated modules, based on

input (evolutionary or functional) source. . . . . . . . . . . . . . . . . . 1306.4 Swap control of mouse stem cell data. . . . . . . . . . . . . . . . . . . . 1346.5 Proliferation bias control shows low bias impact of cultured cell data. . . 1356.6 Selection of a significant cell-type diversity modules in mouse SMA. . . 1376.7 Global overview of classes of functional and homolog recurrent modules. 1426.8 Stem-cell-only expression pattern of a tumor suppressor family - the p53

tumor module. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1446.9 Stem-cell-only expression pattern of transcriptional master regulators of

di!erentiation - the Myb gene family. . . . . . . . . . . . . . . . . . . . . 1456.10 Stem-cell-only expression pattern of Rbp gene family. . . . . . . . . . . . 1466.11 Functional enrichment of stemness homolog and functional gene modules. 1486.12 Secreted Frizzled-related protein (Sfrp) family expression in stem cells. . 1516.13 Sfrp1 model of regulation in intestinal stem cells. . . . . . . . . . . . . . 1526.14 Non-metastatic expressed (Nme) family expression in stem cells. . . . . 1546.15 Heat shock protein 70 (Hsp70/Hspa) family expression in stem cells. . . 1566.16 Inhibitor of di!erentiation (Id) family structural elements and functional

role. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1586.17 Global similarity comparison between homolog modules and GO gene

sets/KEGG pathways) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1606.18 Comparison of the deviation from recurrence cuto! scores between ho-

molog and Wong et al. modules. . . . . . . . . . . . . . . . . . . . . . . 1626.19 Upregulation pattern of the ATP-binding cassette, subtype B (Abcb)

family of proteins in di!erentiated and stem cells. . . . . . . . . . . . . . 165

7.1 Fraction of overlap distribution between human and mouse networks. . . 1707.2 Selection of significant recurrently upregulated human homolog families. 1717.3 Comparison of the global distributions of homolog family sizes and re-

currence scores in human and mouse. . . . . . . . . . . . . . . . . . . . . 172

viii

Page 9: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

7.4 Correlation between the recurrence cuto!s used for every module size incommon between mouse and human. . . . . . . . . . . . . . . . . . . . . 173

7.5 Swap analysis of human recurrently upregulated families. . . . . . . . . 1777.6 Cell type recurrence cuto! selection in human SMA. . . . . . . . . . . . 178

8.1 Stemness scores for a new mouse prostate stem cell experiment from theliterature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

8.2 Stemness scores for new mouse stem cell and di!erentiated cell experi-ments collected from the literature. . . . . . . . . . . . . . . . . . . . . . 193

8.3 Stemness scores for side and non-side populations. . . . . . . . . . . . . 1958.4 Stemness scores for cancer stem cell populations and di!erentiated (non-

stem-cell-like) cancer cell populations. . . . . . . . . . . . . . . . . . . . 1978.5 A modified hematopoietic di!erentiation tree with L-GMP cells. . . . . 1988.6 Stemness scores for metastatic and non-metastatic cancer populations. . 200

ix

Page 10: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

List of Tables

2.1 Examples of stem cell (LT-HSC) marker signatures in mouse and human. 12

5.1 List of freshly isolated (primary) mouse stem cell profiling studies usedin the mouse stem cell compendium. . . . . . . . . . . . . . . . . . . . . 70

5.2 List of cultured mouse stem cell profiling studies used in the mouse stemcell compendium. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.3 List of cultured human cell profiling studies collected in human stem cellcompendium. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.4 List of non-cultured (primary) human stem cell profiling studies collectedin human stem cell compendium. . . . . . . . . . . . . . . . . . . . . . . 76

5.5 Summary of the distribution of mouse homolog (evolutionary) gene mod-ules used in the stemness meta-analysis. . . . . . . . . . . . . . . . . . . 81

5.6 Summary of the distribution of human homolog (evolutionary) gene mod-ules used as input to the stemness meta-analysis method. . . . . . . . . 81

5.7 Summary of the distribution of the mouse functional gene modules usedin the stemness meta-analysis. . . . . . . . . . . . . . . . . . . . . . . . . 83

5.8 Summary of the distribution of human functional gene modules used asinput to the stemness meta-analysis method. . . . . . . . . . . . . . . . 83

5.9 Overview of the recurrence-score-associated notation. . . . . . . . . . . . 865.10 AUC results for all 13 tested recurrence scoring methods, based on syn-

thetic data evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.11 Recurrence score cuto!s for modules of di!erent sizes. . . . . . . . . . . 975.12 Overview of the diversity-score-associated notation. . . . . . . . . . . . . 995.13 Overview of the stemness index score notation. . . . . . . . . . . . . . . 1105.14 Number of stemness families identified in each cross-validation (CV) fold. 1115.15 Number of di!erentiation families identified in each cross-validation (CV)

fold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.1 The 38 stemness genes and significantly enriched functional categories. . 1396.2 Summary of the classification of recurrently upregulated homolog and

functional gene modules. . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

x

Page 11: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

6.3 List of all mouse di!erentiation homolog modules. . . . . . . . . . . . . 164

7.1 Summary of the classification of recurrently upregulated human homologand functional gene modules. . . . . . . . . . . . . . . . . . . . . . . . . 180

7.2 List of all human stemness evolutionary and functional gene modules. . 1817.3 List of all human di!erentiated evolutionary and functional gene modules. 182

8.1 Distribution of the sources of mouse test populations used to predictself-renewal capacity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

B.1 List of all mouse stemness homolog AFA modules. . . . . . . . . . . . . 215B.2 List of all mouse stemness functional AFA modules. . . . . . . . . . . . 216B.3 List of all mouse stemness homolog OFA modules. . . . . . . . . . . . . 217B.4 List of all mouse stemness functional OFA modules. . . . . . . . . . . . 217

xi

Page 12: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Abstract

Stemness Revisited: A Meta Analysis of Stem Cell Signatures Using

High-Throughput Data Integration

by

Martina I. Koeva

Stem cells are functionally defined cells with a high therapeutic potential for

many diseases. The stemness hypothesis states that stem cells share a core set of mech-

anisms that regulate the shared stem cell properties of self-renewal and multi-lineage

potential. Previous attempts to identify genes required for core stem cell function across

stem cell types using transcriptional profiling have identified few such genes. My work

focused on the development of a computational stemness meta-analysis (SMA) method

that uses high-throughput di!erential gene expression data integration to address three

main questions: do functional redundancy and tissue-specific expression mask common

molecular mechanisms shared between stem cell types? Are stemness mechanisms con-

served between mouse and human stem cells? Can we use gene expression signatures

to predict stem cell state?

The SMA method identified 103 mouse evolutionarily related groups of ho-

mologous genes with reproducible, statistically significant, cell type diverse and stem

cell-specific upregulation in multiple stem cell types. The results point to specific exam-

ples of functional redundancy in modules controlling cell adhesion, quiescence, and gene

silencing. Shared homolog modules also include genes in the Myc, Myb, Chd, Hspa,

Page 13: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Id, and many other families. Genes within the stemness homolog families are prime

candidate regulators of conserved stemness mechanisms and may play critical roles as

stem cell markers.

I directly measured the level of conservation of stemness mechanisms between

mouse and human cells. Application of the SMA method to a human stem cell com-

pendium indicates that human data are globally more heterogeneous than murine stem

cell data. However, human stemness families incorporate several conserved mammalian

stemness modules, such as the Integrin !, TCF/LEF, Frizzled, Notch, and Chd families.

Finally, I used the stemness modules identified in the mouse SMA to define a

stemness index score and evaluate how stem cell-like a new gene expression signature

is. I validated the predictiveness of the stemness modules through an internal cross-

validation test and applied the stemness index test to a large set of new experiments

from normal stem cells, side populations, cancer stem cells, and metastatic populations.

The results indicate that mouse stemness modules could predict stem cell-like features

in various data sources with high accuracy.

Page 14: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

To my family,

Stefka, Iordan and Petya,

whose brilliance, dedication and support have been an amazing source of energy

and inspiration.

xiv

Page 15: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Acknowledgments

My stemness research was supported by a pre-doctoral fellowship from the

California Institute for Regenerative Medicine (CIRM).

I would like to thank Joshua Stuart and Camilla Forsberg for their invaluable

ideas, guidance, support and energy throughout this work. I have had the wonderful

opportunity to work closely with them to and draw from their complementary expertise,

which has been instrumental to the completion of this project.

I would also like to thank Kevin Karplus and Raquel Prado for a great learning

experience, valuable feedback, as well as their patience with reading of my work.

I would like to thank David Bernick, Charles Vaske, and Alex Williams who

have discussed various aspects of my research on many occasions during the last few

years. Many other graduate students have provided feedback and recommendations

on various aspects of the project and its presentation, including Matt Weirauch, Grant

Thiltgen, Courtney Onodera, Daniel Carlin, Josue Samayoa, Daniel Sam, Thomas Juet-

temann, Marcos Woerhmann, Firas Khatib and others.

Thank you to Katy Elliot who has helped me on many occasions throughout

my CIRM appointment and Carol Mullane for additional support and help.

Finally, thank you to my family without whose support and belief in me, this

would have been harder to accomplish.

xv

Page 16: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Chapter 1

Introduction

1.1 Motivation and problem statement

My research focuses on elucidating global mechanisms of stem cell function.

Stem cells are functionally defined cells, characterized by their ability to self-renew

and give rise to many di!erent mature cell types. In recent years stem cells have

gained much scientific prominence, because their functional characteristics make them

highly relevant for therapeutic purposes in many neuro-, muscle- and other degenerative

diseases. Understanding the functional and regulatory mechanisms of these cells has

become even more important with the discovery of cancer stem cells, which seem to

play a central role in cancer through their ability to regenerate the fully di!erentiated

cancer cell populations generally targeted by current cancer treatments.

Normal stem cells have been identified in many major organs, tissues and

systems (e.g. blood, liver, kidney, lung, muscle, etc.), while cancer stem cells have

1

Page 17: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

been discovered for cancer cell types varying from leukemia to neuroblastoma. Despite

their importance, both normal adult and cancer stem cells have been very hard to

study because of their rare nature in humans and many model organisms. Isolation and

purification of these cells for many tissue types has been a major hurdle, because of the

lack of good marker genes that would allow the separation of stem cells from progenitor

and fully di!erentiated cells in the tissue of interest.

Much work has been done to elucidate the mechanisms of stem cell function.

Especially with the advent of high-throughput technologies, many large-scale data sets

have been published to examine the gene expression patterns of various stem cell types

in the hopes of discovering stemness genes. However, researchers have still not found a

single set of individual stemness genes common to all stem cell types that allow all of

these cells to self-renew and maintain their stem-like cell state.

So how do stem cells achieve their function? We know that genes do not act on

their own within the cell. They often need to interact with other genes within pathways

or protein complexes to achieve their designated cellular function. In relation to this

observation, stem cell researchers have started to look to more global mechanisms of

stem cell state regulation. Because of the stochasticity of gene expression, it is possible

that even though di!erent stem cell types do not directly use the same genes, they

may share the same, or redundant functional pathways. The genome also has a built-

in robustness provided by gene duplications, which suggests that evolutionarily-related

genes may also provide a way for stem cells to use di!erent genes, and yet perform the

same function. If we account for these possibilities, would common stem cell mechanisms

2

Page 18: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

emerge?

1.2 Goals of the study

In the context of the problem stated in the previous section, my work in this

dissertation aims to computationally address three important questions:

1. Do functional redundancy and tissue-specific expression mask the common stem

cell mechanisms?

2. If common stem cell mechanisms exist, are they conserved between mouse and

human stem cells?

3. Can we predict the state of di!erentiation of a cell based on its gene expression

signature?

1.3 Overview of main results

To address the role of functional redundancy in stem cells, I first develop the

methodology to test for global reproducible expression of entire gene sets across multiple

conditions. I develop the Stemness Meta-Analysis (SMA) method that uses techniques

derived from standard meta-analysis theory to identify gene sets (modules), which are

significantly recurrently upregulated across many stem cell experiments, are represented

in most stem cell types, and are specific to stem cells as opposed to di!erentiated cells.

Using this method, I identify approximately 103 stemness modules of evolutionarily re-

3

Page 19: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

lated homologous genes with reproducible, statistically significant and stem cell-specific

upregulation in many mouse stem cell types. Genes within these homolog families are

prime candidate regulators of conserved stemness mechanisms and may play critical

roles as stem cell markers. They include many known self-renewal genes, such as Myc,

Myb, Chd1.

To address the conservation of stem cell mechanisms between mouse and hu-

man cells, I apply the SMA method to a large compendium of stem cell data in human.

Results suggest that even though much data is available for human stem cells, the gene

expression signatures associated with them are much more heterogeneous than their

mouse counterparts, presumably related to the lack of good marker genes for the iso-

lation of pure populations in human. Data are also available for fewer stem cell types.

Nevertheless, I find conservation of five major stemness families: the Notch, Frizzled,

Chd, TCF/LEF, and Integrin ! families. I expect that some of the discovered mouse

stemness families can be used as putative markers in human stem cell populations.

Finally, to address the predictability of stem cells based on their expression

signatures, I define a stemness index score that uses stemness modules to measure how

stem cell-like a new gene expression signature is. I apply this index scoring to mouse ex-

pression signatures derived from new stem cell experiments, side populations (putatively

stem-cell-enriched populations of cells with highly active transporter proteins), cancer

stem cells, and metastatic populations. The results suggest that at least in mouse the

stemness modules, as used by the stemness index score, are highly predictive of normal

and cancer stem cell populations, as well as side and metastatic populations.

4

Page 20: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

1.4 Organization of the dissertation

The dissertation is organized in nine main chapters with a separate appendix

section that presents the definitions of key italicized terms used in the main text. The

current chapter (Chapter 1 ) provides some brief motivation for the study along with

some basic definitions. It also describes the central problem addressed by the current

work and gives a summary of the significance of the results. The purpose of Chapter 2

is to provide some of the stem cell- and cancer-related biological background needed to

understand the main results, discussed in later chapters. Chapter 3 is focused on the

introduction of meta-analysis, as well as its relevance and application to high-throughput

data integration. This chapter is integral to the understanding of the methodology used

in this study. Chapter 4 introduces the work directly relevant to the study of common

stem cell mechanisms, including previous attempts at a molecular definition of stemness.

A detailed presentation of all methods designed for this study is presented in Chapter

5. The subsequent three chapters focus on results and applications: Chapter 6 discusses

stemness mechanisms in mouse, Chapter 7 identifies stemness mechanisms in human

and discusses the similarities and di!erences between mouse and human, while Chapter

8 shows some interesting applications of the discovered stemness mechanisms to the

study of metastasis, cancer stem cells and stem cell niches. The final chapter (Chapter

9) provides a discussion of some of the major implications of this work and describes

directions for future work. Definitions of key terms are included in Appendix A.

5

Page 21: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Chapter 2

Stem cells

This chapter aims to provide the biological background needed by the reader to

understand the basic mechanisms of stem cell function. Section 2.1 illustrates some basic

stem cell definitions and key properties of normal stem cells. It also introduces important

stem cell concepts, describes the most common stem cell types and examines some known

pathways associated with self-renewal. Section 2.2 introduces di!erent cancer evolution

theories and discusses the relationship between normal stem cells, cancer stem cells

and metastasis. Section 2.3 describes side populations and how they relate to stem

cells, and finally Section 2.4 introduces various high throughput technologies used for

measurement of gene expression and discusses their use in stem cell analysis.

An understanding of the stem cell background provides the context for the

interpretation of the results of my work, and gives the reader ideas for potential positive

controls. While the input data used in my study of stemness vary substantially, the

underlying theme behind all input types is their tie to stem cells and stem-cell-like

6

Page 22: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

properties.

2.1 Normal stem cells

2.1.1 Definitions, origins and key properties of normal stem cells

Stem cells are undi!erentiated cells characteristic of multi-cellular species.

They have two characteristic properties that functionally define them as stem cells:

the ability to di!erentiate to many di!erent mature cell types and the ability to self-

renew, or produce more stem cells of their own type. [192]. At the broadest level of

classification, stem cells can be characterized as either embryonic stem cells (ESC) or

adult stem cells (ASC).

Early in embryogenesis before implantation, the blastocyst develops two sep-

arate components: the inner cell mass (ICM) and the trophectoderm [128]. The tro-

phectoderm forms the outside of the embryo and its trophoblast cells are the ones that

are destined to give rise to all extra-embryonic tissues and form the placenta. The inner

cell mass consists of the cells that will eventually develop into all cells and tissues of the

embryo. It is from the ICM that embryonic stem cells are derived. Thus, ES cells are

pluripotent in nature and can give rise to any of the three developmental germ layers –

ectoderm, mesoderm, or endoderm – and develop into any cell type in the organism with

the exception of extra-embryonic tissues. This quality makes them uniquely valuable

and highly potent for therapeutic purposes.

The extent of self-renewal of embryonic stem cells in vivo is not very well-

7

Page 23: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Embryonic stem cells (ESC)

Ectoderm Mesoderm Endoderm

Neural stem

cells

Epithelial

stem cells

Skin

Hair

Neurons

Glia

Hematopoietic

stem cells

Muscle stem

cells

Mesenchymal

stem cells

Blood

Muscle

Osteoblasts

Myocytes

Liver stem

cells

Intestinal

stem cells

Gastric stem

cells

Lung stem

cells

Liver Stomach

Intestines

Lung

Figure 2.1: Hierarchical organization of stem cell types within the ectoderm, mesoderm,and endoderm lineages. Individual stem cell types are shown in red ellipses.

understood, but in vitro both human and mouse ES cells have the ability to extensively

proliferate and self-renew for an unlimited amount of time [115, 161]

Unlike embryonic stem cells, adult stem cells (ASC) have to maintain cells and

tissues for the life span of an organism. Even though they can self-renew and give rise to

more stem cells of the same type, they are generally multipotent in nature and can only

generate more mature cells of the same system or tissue type. Specifically, within the

ectoderm lineage neural stem cells will give rise to all the cells of the nervous system,

while epithelial (skin) stem cells will develop and maintain the skin of the animal (Figure

2.1). Similarly, within the mesoderm lineage hematopoietic stem cells will support the

development of the bone marrow, the blood system and all its myeloid and lymphoid

components, while mesenchymal stem cells will be involved in the development of the

support cell system that contributes to stem cell maintenance. Other mesoderm-derived

8

Page 24: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

tissue-specific stem cells will give rise to the muscle and bone tissues. Finally, within

the endoderm lineage there are organ-specific stem cells, such as lung, liver, gastric and

intestinal stem cells that can give rise to the di!erent cells that make up each of these

corresponding organs (respectively lung, liver, stomach and intestines).

Based on their properties, stem cells can generally fall into one of several

states: quiescence, proliferation, di!erentiation or apoptosis. Unlike most normal cells,

stem cells have the ability to enter into a special state of the cell cycle, called G0, and

only maintain their stem cell state with the support of the appropriate stem cell niche.

This state is generally referred to as the quiescent state. The more active states of

proliferation and di!erentiation are most relevant in wound repair or for the general

maintenance of tissues with a fast cell turnover.

Proliferation and self-renewal require that stem cells enter an active state.

Dependent on the needs of the organism or the tissue type, stem cells can undergo

either symmetric or asymmetric divisions (Figure 2.2). In asymmetric divisions, a stem

cell gives rise to two di!erent daughter cells: a stem cell and a more di!erentiated

cell. This process ensures that stem cells are not lost, while di!erentiated cells are

generated. In symmetric divisions, a stem cell gives rise to either two stem cells, or two

more di!erentiated cells.

Di!erentiation is another active state important for a stem cell and the breadth

of di!erentiation potency is one of the major factors that determines how a stem cell is

defined. Besides pluripotent and multipotent stem cells, there are also cells that have

a limited ability to di!erentiate into only one or several mature cell types and have

9

Page 25: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Stem cell

Differentiated cell

Symmetric division

Asymmetric division

Figure 2.2: Stem cells can undergo symmetric or asymmetric cell division. In symmetricdivision, the stem cell (red) can produce either two stem cells, or two more di!erentiated(green) cells. In asymmetric division, the stem cell can produce one stem cell and onemore di!erentiated cell.

a very limited self-renewal ability. These cells are referred to as progenitor cells and

even though their functional properties partially overlap with stem cells, they are not

generally considered such.

Finally, stem cells can also undergo apoptosis under certain conditions in a

fashion similar to other normal cell types.

2.1.2 Normal stem cell types

2.1.2.1 Hematopoietic stem cells

The hematopoietic system is one of the best-studied systems of stem cells in

both human and mouse. Its structure can be most easily represented and visualized

as a hierarchically organized tree, where the stem cell is at the root and the leaves

10

Page 26: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

represent the most commited cells with no self-renewal capacity (Figure 2.3). At the

top of the cellular and di!erentiation hierarchy is the long-term hematopoietic stem cell

(LT-HSC), which can give rise to a short-term hematopoietic stem cell (ST-HSC) and

can also proliferate generally for the life span of the organism [116]. ST-HSC cells can

give rise to a multipotent progenitor cell (MPP) and retain some renewal capabilities,

while MPP cells only have the ability to di!erentiate into any myeloid or lymphoid

cell type without a considerable self-renewal potential. They can give rise to two more

lineage-committed types of progenitor cells: common lymphoid progenitors (CLP) [98]

and common myeloid progenitors (CMP) [2], which as their names indicate can give rise

to all types of blood cells within their lineage.

To study the functional features of these di!erent cell populations indepen-

dently, scientists generally use cell surface marker genes that collectively can distinguish

between individual cell types. Cell surface markers are particularly well-understood in

the hematopoietic system and are used along with other features to discriminate between

di!erent types of hematopoietic stem, progenitor and lineage-specific (myeloid/lymphoid)

cells. Such markers include CD13, CD19, CD34, CD38, CD45 (B220), CD71, CD133,

c-kit, Mac-1, etc.

We can describe or summarize the state of a cell by the expression of its

markers and examples of marker signatures are shown in Table 2.1. The marker genes

can then be used for purification purposes to sort di!erent populations through an

experimental method, called fluorescence-activated cell-sorting (FACS) flow cytometry

(Figure 2.4). The resulting products are well-separated cell populations, which can then

11

Page 27: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

LT-

HSC

ST-

HSC

MPP

CMP CLP

GMP MEP

Self-renewing

stem cells

Progenitor

populations

Pro-B Pro-T

Granulocytes

Macrophages

B-cells

Red blood cells

Platelets

T-cells

Se

lf-r

en

ew

al

ca

pa

cit

y

Lin

ea

ge

co

mm

itm

en

t

Figure 2.3: The hematopoietic stem cell system di!erentiation tree shows the entiredi!erentiation hierarchy of cells in the blood system. The most multipotent cell – theLT-HSC cell – is at the top of the hierarchy and can give rise to all cells downstream ofit. The leaves of the tree represent the most committed mature cells with no self-renewalcapacity.

Organism and cell type Marker signatureMouse LT-HSC CD34! Sca-1+ Thy1.1+/lo CD38+ c-kit+ lin!

CD48! CD150+ CD248!

Human LT-HSC CD34+ CD38lo/!Thy1.1+ c-kit+ lin!

Table 2.1: Examples of LT-HSC marker signatures in mouse and human used influorescence-activated cell-sorting (FACS) to sort cells. The first column shows theorganism and cell type for each signature, while the second column contains the signa-ture used for sorting of the cells. The “+” superscript next to a gene indicates a cellof that type is positive for that marker, “-” indicates a cell is negative for that marker,and “lo” indicates a cell expresses low levels of that marker.

12

Page 28: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Fluorescently

label cells

Laser beam

- +

Figure 2.4: General scheme of fluorescence-activated cell-sorting (FACS) flow cytometrytechnique: cells are bound to fluorescently tagged antibodies and are measured one byone in a stream by the flow cytometer. When each cell is scanned, the dye associatedwith its bound antibody emanates color, which is captured by the instrument. Eachcell is then positively or negatively electrically charged based on the color and sortedinto a separate population.

13

Page 29: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

be used in comparative or other experiments. Knowledge of cell surface markers has

been tremendously useful in the field of stem cell biology and the study of the functional

properties of individual stem cell types. However, many systems lack an abundance of

well-known markers, which shows the urgent need for the discovery of new marker genes

in these systems and tissue types.

2.1.2.2 Neural stem cells

Neural stem cells, like hematopoietic stem cells, have also been widely stud-

ied [57], but their mechanisms are much less understood than those of hematopoietic

cells, partially because of the di!erent nature of the nervous system from the blood

system. The mammalian brain consists of a large number of structures and it is not

clear whether stem cells isolated from the independent components may have a di!erent

di!erentiation potential. This observation is also relevant for di!erences between the

adult and embryonic brain [57].

Neural stem cells can be isolated from the subventricular zone (SVZ) of the

adult mammalian brain or alternatively from various other locations in the central

and peripheral nervous system of the developing mammalian brain. Most neural stem

cells are studied in vitro and can be cultured in the presence of either EGF or FGF2

and subsequently di!erentiated into various mature neural cell types [57], which include

various types of neurons, astrocytes, oligodendrocytes and others. Some of these lineages

are well-understood and appropriate markers exist for each cell or lineage type, while

others still remain elusive. Specifically, neural stem cell self-renewal is dependent on the

14

Page 30: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

expression of Bmi-1 [113] and many neural progenitor cells express Nestin. Similarly,

Tuj1 is a general neuronal marker [59], GFAP (glial fibrillary acid protein) is an astrocyte

marker [191], and GalC is an oligodendrocyte marker [21].

2.1.2.3 Embryonic stem cells

Embryonic stem cells are perhaps the most extensively studied type of stem

cells, because of their readiness for culture and expansion in vitro. One set of commonly

expressed transcription factors and marker genes truly characteristic of embryonic stem

cells that regulate some of the initial self-renewal and cell fate choices of ES cells include

Nanog, Oct-4 and Sox2. The role of these transcriptional factors is to suppress the

expression of genes required for the process of di!erentiation[23].

As previously discussed, embryonic stem cells are pluripotent in nature and can

give rise to any cell or tissue type in an organism. Most recently, a di!erent set of induced

pluripotent stem (iPS) cells that display the hallmark features of embryonic stem cells

has also been identified in both human and mouse [171, 170, 130]. The reprogramming

or induction of pluripotency features usually begins with a primary fibroblast cell and

the integration and activation of several transcription factors. The initial set consisted

of four di!erent transcription factors including two oncogenes (Oct4, Sox2, Klf4 and

Myc), but since then various other combinations and subsets of factors have been found

to also induce the reprogramming of the fibroblast cells into ES-like pluripotent cells.

15

Page 31: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

2.1.2.4 Intestinal stem cells

Intestinal stem cells (ISC) are adult stem cells that give rise to the various

lineages of cells that make up the lining of the intestinal epithelium. There are four

di!erent cell types that can be generated by intestinal stem cells: Paneth cells, goblet

cells, enteroendocrine and columnar enterocyte cells [16, 72, 107]. The intestinal stem

cells can be found above the Paneth cells, which are situated at the base of the crypt – the

location of the intestinal stem cell niche. They express a variety of genes, a few of which

can be used as markers, such as Noggin, Tcf4, Ephb3, and Musashi-1 [13, 18, 69, 99, 120].

However, most of them are not exclusive to intestinal stem cells. Other markers include

the hematopoietic stem cell marker Bmi-1, which has most recently been identified to

be expressed in intestinal stem cells as well [150].

2.1.2.5 Epithelial stem cells

Epithelial stem cells can be found in the hair bulge of animals, underneath the

sebaceous gland and reside in their stem cell niche in a quiescent state. After activation,

the stem cells can give rise to either stem cells and progenitors that continue to reside in

the bulge, or to progenitor cells that transition upwards closer to the skin surface, where

they can be used to generate skin (epidermal) cells for wound repair or general epidermal

maintenance. Alternatively, the daughter progenitor cells can transition down to the

hair matrix, where they can di!erentiate and develop into hair shaft cells [107, 127, 174].

16

Page 32: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

2.1.3 Known functional pathways that regulate normal stem cell be-

havior

Given the variety of stem cell types, one naturally wonders: if all of these cells

need to perform similar roles in their environment, could they have any mechanisms

in common? What mechanisms do stem cells use to self-renew and how do they main-

tain the fine homeostatic balance between quiescence, proliferation, di!erentiation and

apoptosis?

Several functional pathways have emerged over the years in the context of their

role in various stem cell types. I next describe some of the most important functional

networks known to regulate stem cell behavior in at least a few stem cell types. In the

study of stemness, these networks – Wnt, Notch and TGF" – are especially important

as they are the most likely stemness candidates.

2.1.3.1 Wnt pathway

The Wnt pathway is activated when a Wnt ligand, one of the many molecules

within the Wnt family of proteins, binds its partner receptor – Frizzled – at the cell

surface (Figure 2.5). This binding event activates a signal across the membrane and

the intracellular Disheveled gene becomes itself activated, which negatively regulates

the machinery responsible for the degradation of "-catenin in the cell. The degradation

complex consists of several molecules including APC, GSK3" and Axin/Compactin. In

the absence of this degradation complex, "-catenin is free to accumulate and translocate

to the nucleus, where it can interact with the TCF/LEF transcription factor family

17

Page 33: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Wnt

FzdTCF/LEFbeta-catenin

beta-catenin

cell membrane

nucleus

APCGSK

Axin

Degradation

machinery

Dsh

Figure 2.5: Wnt pathway activation scheme. The Wnt ligand binds a Frizzled (Fzd)receptor on the cell surface, which activates the Disheveled (Dsh) gene. Dsh inhibitsthe activity of the degradation complex, which is responsible for the degradation of"-catenin. In the absence of the degradation, "-catenin accumulates and activatesTCF/LEF in the nucleaus. TCF/LEF factors regulate the activity of self-renewal genes.

members, which in turn activate many proliferation, self-renewal, and other stem-cell

related genes. The mode of action of the Wnt pathway is similar in most stem cell types

and the activation of this network is generally associated with self-renewal[183].

Wnt plays a significant role in embryonic stem cell self-renewal, as observed in

many biological experiments. Activation of the pathway can experimentally occur either

through the overexpression and binding of a Wnt ligand (e.g. Wnt1) to the Frizzled

receptor, allowing the accumulation of "-catenin in the cell, or alternatively through the

direct inhibition of the components of the degradation machinery also resulting in the

accumulation of "-catenin [183]. Both of these techniques have been directly used and

results indicate that the accumulation of "-catenin generally results in the inhibition of

di!erentiation [8, 68] and the persistence of the undi!erentiated state[151].

18

Page 34: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

This pathway also has a central role in the regulation of stem cell expansion and

di!erentiation in hematopoietic and epithelial stem cells. In cultured HSCs the addition

of Wnt ligand, as well as the direct overexpression of "-catenin result in HSC expansion

and prolonged HSC self-renewal. Other work suggests that overexpression of one of the

degradation complex components, Axin, has a negative e!ect on the growth of HSCs

[9, 144]. Similarly, in culture conditions "-catenin has a significant e!ect on the number

of epithelial stem cells and the pathway has been shown to directly modulate epithelial

stem cell fates and regulate self-renewal and di!erentiation in the skin [111, 201].

2.1.3.2 Notch pathway

Another pathway with an important role in the regulation of stem cell self-

renewal and di!erentiation is the Notch pathway (Figure 2.6). The central molecules

of this pathway are the members of the Notch family of receptors — Notch1–4. These

receptors can be activated by several groups of ligands that include the Delta-like family

of proteins, as well as the family of Jagged proteins. Once binding of the ligand to the

extracellular component of the Notch receptor occurs at the cell surface, the Notch

intracellular domain cleaves and moves to the cell nucleus. There it binds CSL (or

Lag1), which is associated with the transcriptional activity of the target genes of Notch.

When Notch is inactive, CSL leads to the transcriptional inhibition of Notch targets, but

in the presence of the Notch intracellular domain, CSL associates with activators instead

of repressors and the Notch target genes (including Hes1 and Hey1) are activated. Many

of these target genes are themselves repressors of di!erent lineage-specifying genes and

19

Page 35: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Dll

Notch targets

cell membrane nucleus

Jag

Notch Notch intracellular

domain (NID)

CSL

Notch targets

CSL

NID

NIDor

Figure 2.6: Notch pathway activation scheme. A Delta-like (Dll) or Jagged (Jag) proteinbinds the Notch receptor and upon activation, the intracellular domain of the Notchreceptor (NID) cleaves and relocates to the nucleus. There, NID binds to CSL andfunctions to activate many Notch targets, which often function as repressors of lineage-specifying genes.

20

Page 36: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Notch at a global level plays a role in self-renewal [183].

For example, expression of Notch1 has been shown to increase the level of

HSC self-renewal in vivo, reduce the di!erentiation capacity of hematopoietic cells,

as well as guide more committed progenitors towards T-cell development, rather than

the B-cell lineage [183]. The Notch pathway also regulates stem cell self-renewal in the

neural system: Notch activation promotes self-renewal and expansion of neural stem and

progenitor cells, as well as inhibition of neural di!erentiation into particular lineages

(oligodendrocytes, glial and others)[118, 124, 188]. Inhibition of the downstream targets

of Notch, such as Hes1 has been shown to lead to depletion of the stem and progenitor

pool and acceleration of di!erentiation [78].

2.1.3.3 TGF" pathway

The transforming growth factor (TGF)" signaling pathway plays a central role

in the regulation of proliferation and di!erentiation in embryonic and adult stem cells.

The TGF" superfamily of proteins consists of three major components: the TGF"

proteins, the activins and the largest set, the bone morphogenic proteins (BMP). Each

of these molecules has the ability to signal into the cell through binding to a cell surface

receptor, as follows: the signaling molecules bind to a type II TGF" receptor (Figure

2.7) [5].

This interaction activates the receptor and it associates with the other receptor

component – the type I TGF" receptor. This process promotes the phosphorylation

and activation of any of the five possible regulatory receptor-activated SMAD (“moth-

21

Page 37: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

TGF!

type II

cell membrane

nucleus

type I

Smad2Smad2

Smad4

Smad2

Smad4

target genesSmad

6/7

1

2

3

4

Smad2

Smad4

5

6 7

Figure 2.7: TGF" pathway example activation scheme. When the TGF" ligand bindsto the type II receptor (1), the type I receptor is activated, the inhibitory I-Smads –Smad 6/7 are displaced (2), the R-Smad Smad2 is recruited (3) and activated throughphosphorylation (4). Smad2 binds to the Co-Smad protein, Smad4 (5) and the complexmoves into the nucleus (6), where it can interact with other proteins to activate theTGF" targets (7).

ers against decapentaplegic” homolog) proteins. Smad2 and Smad3 are activated by

TGF" and activins, while Smad1, Smad5, and Smad8 are phosphorylated upon BMP

activation. Once phosphorylated, they bind the co-activator SMAD protein, Smad4,

and move into the nucleus, where they can associate with other molecules and bind

DNA to activate BMP or TGF" target genes. Two other SMAD proteins exist – the

inhibitory Smad6 and Smad7 – and are known to interact with the type I receptors to

regulate the activation and de-activation of this pathway [183].

In mouse embryonic stem cells, BMP signals actively prevent neural di!erenti-

ation and maintain ESC self-renewal [189]. After application of SB-421542, an inhibitor

of the TGF" signaling molecules[77], mouse embryonic stem cells showed decreased lev-

els of ESC markers in human [82, 184], as well as lower proliferation levels in mouse

22

Page 38: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[123]. In some somatic stem cells, such as intestinal and epithelial (hair) stem cells, acti-

vation of TGF" signaling is associated with stem cell quiescence, as BMP is thought to

counteract and balance the Wnt signaling pathway by limiting and reducing the amount

of "-catenin available in the cell [189]. In other stem cell types, such as mesenchymal

stem cells, however, TGF" can also positively regulate proliferation [189].

2.2 Relationship between stem cells and cancer

One of the major roles of normal stem cells in an organism is to replenish the

supply of cells constantly needed to maintain all tissues, organs, and systems in “a good

running condition” as well as respond in emergency conditions. In normal homeostatic

conditions the cells within a tissue or an organ interact and maintain a balance between

growth, di!erentiation, and cell death. Balance is achieved through the activation and

interaction of a system of regulatory molecules and a well-developed cell communication

network. Thus, a disruption of this well-organized network of molecular and cellular

interactions can have serious consequences and lead to various diseases among which

one of the most serious, well-studied and yet poorly understood diseases is cancer [190].

2.2.1 Cancer and tumor heterogeneity

Cancer is a disease of unchecked cellular growth and is currently among the

leading causes of death in the United States. Each individual growth of abnormal cells

is called a tumor and arises from normal cells and tissues. Tumors are monoclonal in

nature [190], which means they arise from a single common abnormal “ancestral” cell,

23

Page 39: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

as opposed to multiple di!erent cells that could have independently undergone cancer-

related mutations to give rise to the highly heterogenous assembly of cells that a tumor

often contains.

Tumors can be divided into primary and metastatic, based on the location of

the original abnormal changes. The former are tumors that have developed at the site of

origin of the ancestral cell mutation, while the latter are tumors that have evolved from

a primary tumor, but have subsequently invaded a more distant and often unrelated to

the primary tumor location. The frequently fatal consequences of metastasis have led

researchers to study the mechanisms, which guide the development of metastatic cancer

[190].

Within the general cancer categories, the distinct cancer types and subtypes

can present much heterogeneity. To date, more than 80% of tumors that a"ict the pop-

ulation are epithelial-derived cancers, known as carcinomas and these include cancers

of most internal organs, such as the stomach, small intestine, large intestine, liver, lung,

pancreas. They also include ovarian, testicular, bladder, breast and skin cancers, as

well some other more rare types.

Understanding the sources of heterogeneity is key to the development of new

treatments and methods. Tumor heterogeneity has been observed at several di!erent

levels, but the heterogeneity that is most relevant to stem cell analysis is the one observed

at the cellular level within a single tumor growth [71]. Specifically, each individual tumor

can be a highly phenotypically heterogeneous assembly of cells. The cells that make up

the tumor can di!er in their morphology, di!erentiation status, proliferation potential,

24

Page 40: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

the cell surface markers they express, their pathology, as well as other factors [71]. From

the monoclonal tumor origin perspective this type of heterogeneity is most interesting.

Why is it that tumors that arise from a single cell are so heterogeneous in

nature? A potential answer to this question lies within the stem cell field, where in

at least some organs and tissues a single stem cell can be su#cient to regenerate the

entire system of diverse and narrowly-specialized di!erentiated cells. This link between

normal stem cells and cancer can give us clues as to how cancer cells can evolve to such

a heterogeneous state.

2.2.2 Cancer evolution theories

Two main cancer evolution theories have been developed through the years:

the clonal expansion and cancer stem cell models.

In the clonal expansion model all cells are equal and each one has the po-

tential to give rise to a new tumor. Tumor development begins with the mutation of

a random cell that gives it the advantageous ability to proliferate fast and dominate

the surrounding tissue (Figure 2.8). Because of the unchecked growth, the cell and

its progeny soon dominate the cellular neighborhood su#ciently and accumulate an-

other mutation. The new mutation allows these cells to grow and expand even more,

which inadvertently makes them even more susceptible to becoming targets of the next

mutation. Each mutation hit and subsequent cellular growth can be described as an

expansion. Within several such clonal expansions, the cells have grown and mutated

su#ciently to be considered the origins of development of a new tumor [186, 190].

25

Page 41: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

MUT1MUT2

MUT3

Clonal evolutional model

MUT1-associated cell MUT2-associated cell (MUT1 and MUT2)

MUT3-associated cell

(MUT1, MUT2 and MUT3)Non-tumor cell

Figure 2.8: Clonal evolution cancer expansion model. The clonal expansion model as-sumes that all cells within a tumor have equal proliferative ability. Once an initialmutation (MUT1) occurs within a cell (dark red), the cell gains proliferative poten-tial and its progeny quickly dominates the surrounding tissue. This event predisposesthe already mutated cells to a second hit (MUT2; pink cell) and after the subsequentexpansion to a third hit (MUT3; blue cell). After several rounds of expansion andaccumulation of new mutations, a tumor can arise from the mutated cells.

26

Page 42: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

MUT1

MUT2

Cancer stem cell model

Cancer stem cell (CSC) Cancer progenitor cell

Differentiated tumor cell Non-tumor cell

Figure 2.9: Cancer stem cell expansion model. The cancer stem cell model assumesthat some cells within a tumor have a higher proliferative and self-renewal ability thanother cells. If a mutation (MUT1) occurs in a normal stem cell that causes it to becomeuncontrollably proliferative, the new cancer stem cell (dark green) can give rise to manycancer progenitor (green) and more di!erentiated cancer cells (yellow). Alternatively,if a mutation occurs in more di!erentiated cell, the event will only cause an extensiveexpansion if the mutation (MUT2) caused the cell to de-di!erentiate to a more stemcell-like state.

The cancer stem cell model postulates that all cells are not created equal and

some cells – the cancer stem cells (CSCs) – have a higher proliferative potential than

others. Similarly to the normal stem cell hierarchy, tumor cells also have a hierarchical

organization, where only the cancer stem cells can give rise to new cancer cells and

allow tumor proliferation1 (Figure 2.9). Most di!erentiated cancer cells have a limited

proliferative potential and capacity and are incapable of promoting tumor growth [186].1It should be noted that the CSC is not necessarily the cell that was originally hit by the mutation

that prompted the beginning of the oncogenic process.

27

Page 43: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

2.2.3 Cancer stem cells

The existence of cancer stem cells was first shown by John Dick in an acute

myeloid leukemia (AML) study, which aimed to compare the similarities between nor-

mal hematopoietic stem cells and leukemic cells. The researchers in this study used a

self-renewal assay technique [37] that has been widely used as the method for discov-

ery of new cancer stem cells. They used the markers associated with normal human

HSCs (CD34+ CD38neg Lin!) to sort and distinguish between two leukemic subpop-

ulations - a minority subpopulation of CD34+ CD38neg and a majority population of

CD34+ CD38pos cells. Cells from both populations were then independently injected

into severely immunocompromised mice and the level of engraftment and reconstitution

of the cancer were subsequently measured. Only the cells derived from the minority

subpopulation had the ability to proliferate and give rise to the heterogenous assembly

of cancer cells that resembled the original cancer population [41]. This study showed

the existence of a small number of cells – the CSCs – with an enormous proliferative

capacity, su#cient to reconstitute a cancer population.

The self-renewal assay (Figure 2.10) used for discovery of new cancer stem

cells has not changed substantially and requires four main steps: selection of cell sur-

face markers that allow cellular subsampling, separation of the cells of interest into

subsamples using FACS technology based on marker expression, independent injection

of cells from each subsample into immunocompromised mice, and measurement of self-

renewal capacity in the host. This technique has now been used to discover cancer stem

28

Page 44: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Fluorescently

label cells

Laser beam

- +

CSC

self-renewal and

expansion capacity

Figure 2.10: Illustration of functional self-renewal assay. The first step of the assayrequires the selection cell surface markers and the fluorescent labeling of cells. FACSflow cytometry is then used to sort the cells based on the expression state of the mark-ers. Each purified sub-sample can then be injected into severely immuno-compromisedmice. If the sub-sample contains primarily putative cancer stem cells (red cells), atumor growth will develop because of the high self-renewal and expansion capacity ofthese cells. Alternatively, if the sample does not contain cancer stem cells, but moredi!erentiated cancer cells (green and/or white), no tumor growth will develop becauseof the low proliferative capacity of these cells.

29

Page 45: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

cells in both breast [4, 129] and brain [158] tissue, as well as other cell types.

Al-Hajj et al. [4] discovered a small subpopulation of CD44+CD24! cells with

an extraordinary proliferative capacity when injected into SCID mice in samples from

breast cancer patients. Specifically, approximately 100-200 of these cells were su#cient

to induce tumor reconstitution, while several fold higher quantities of other cells were

insu#cient and incapable of achieving the same. The CD44+CD24! cells had both

the ability to self-renew and to give rise to di!erentiated cancer cells, exhibiting the

functional properties of a stem cell [4].

Similarly, Singh et al. [158] observed that in glioblastomas a small population

of cells defined by CD133+ expression also had very high proliferative capacity, unlike

the rest of the tumor and could give rise not only to more cells of the same phenotype,

but also to other cell types in a manner similar to the one observed in the original tumor

[129, 158].

2.2.4 Shared mechanisms between normal and cancer stem cells

As outlined in the earlier sections, evidence has accumulated to suggest that

normal and cancer stem cells share some functional properties, namely the abilities to

self-renew and di!erentiate. Shared function indicates that normal and cancer stem

cells may also share functional pathways that regulate self-renewal and di!erentiation

and some evidence already exists to support this hypothesis.

One of the best examples of molecules and pathways that regulate both nor-

mal and cancer stem cells is Bmi-1. Bmi-1 is a marker of self-renewal in both normal

30

Page 46: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Bmi-1(Pcgf4)

p16 (Ink4a)

p19 (Arf)

ProliferationSelf-renewal

Cyclin-dependent

kinase inhibitors

Figure 2.11: Illustration of the role of Bmi1 in self-renewal. Bmi-1, also known as Pcgf4,functions to suppress the cyclin-dependent kinase inhibitors p16 and p19. p16 and p19are involved in the inhibition of self-renewal and proliferation genes. Thus, activationof Bmi-1 promotes self-renewal.

hematopoietic and neural stem cells [113, 131]. This gene, known also as Pcgf4, is a

member of the Polycomb family of proteins, a component of the PRC1 (Polycomb Re-

pressor Complex 1) complex, and a chromatin modifying repressor [108]. Its role in

self-renewal may be independent of PRC1 and is likely associated with its inhibition

of the CDKN2A gene, which codes for two di!erent proteins: Ink4a (p16) and Arf

(either p19 or p14 in mice and humans respectively). Both of these proteins are cyclin-

dependent kinase inhibitors and are responsible for the suppression of cell proliferation

[129] (Figure 2.11). Activation of Bmi-1 is associated with the promotion of cell pro-

liferation and self-renewal in normal hematopoietic stem cells [131], as well as leukemic

stem cells (LSC) [106].

Another important shared pathway that plays a significant role in self-renewal

is the Wnt/"-catenin pathway. The mechanism of action of this pathway was already

described in Section 2.1.3. Wnt/"-catenin pathway activation occurs in many cancer

types, including breast and brain cancer [143]. "-catenin accumulation has also been

31

Page 47: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

observed in common myeloid leukemia in the GMP progenitor population when the

disease advances to a blast crisis [83, 108], which suggests the possibility of progenitor-

to-LSC transformation in these patients. Mutations in the degradation machinery (i.e.,

APC mutations) have been detected in many colon cancer types as well [143].

2.2.5 Metastasis and stem cells

Cancer is often associated with the development of metastatic tumors. Metas-

tasis is the process of the invasion of a new distant organ from the tumor growth of

origin. Understanding metastatic cancer has been a center of focus of a large body of

research. However, I will only present selected definitions, observations and studies in

this dissertation that relate directly or indirectly to stem cell biology. A nice overview

of metastasis is presented in an article by Bacac and Stamenkovic [10].

A primary tumor that successfully metastasizes to a new organ or tissue has to

complete several stages: separation from the tumor of origin through the degradation

of the basal membrane, migration and invasion of the stromal layers, transition into

the blood stream through a process called intravasation, survival in the blood stream,

transition out of it through another process called extravasation, preparation of the new

local microenvironment for the development of micrometastases and finally proliferation

and establishment in the new location [10].

The epithelial-mesenchymal transition (EMT) is often considered one of the

hallmarks of metastatic cancer, as it is indispensable to the successful invasion of a

new organ or tissue. This event refers to the process of conversion of cells from a more

32

Page 48: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

epithelial-like state to a more mesenchymal-like state. Specifically, epithelial-like cells

are well-organized, closely attached to each other, and are not generally considered

migratory. Mesenchymal cells, on the other hand, are morphologically di!erent (more

spindle-shaped), are not well-organized and are more readily migratory. At the molec-

ular level this transition can be observed in the change of the markers that the cells

undergoing this transition express. For example, before the transition cells express E-

cadherin. E-cadherin is associated with the maintenance of the epithelial cellular nature

and exhibits properties of a tumor suppressor gene, as mutations in this gene drive the

development of various metastatic carcinomas [32]. When cells undergo EMT, they lose

the expression of E-cadherin and gain new markers, such as N-cadherin, fibronectin,

vimentin and others [10, 105, 176].

In the past several years some number of studies have linked metastasis and

stem cells through self-renewal and the EMT process. Interestingly, to successfully

invade a new tissue, a tumor has to establish a stable colony at the distant site of

metastasis, which means that as most cancer cells have an only limited proliferative

ability, the cells that drive the metastatic growth expansion have to be able to actively

self-renew.

One link between metastasis, poor cancer prognosis and stem cells was the

identification of an 11-gene death-from-cancer signature, consisting of genes related to

the Polycomb Repressor Complex (PRC)[63]. Specifically, this 11-gene signature was

predictive of patients with poor survival rates, death after treatment and cancers with

high metastatic potential. In the center of this signature was Bmi-1, which as already

33

Page 49: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

mentioned is an important stem cell marker in hematopoietic, neural, intestinal and

some cancer stem cells. The discovery of this signature and the involvement of the Bmi-

1 pathway suggested an indirect role for stem cells in tumor metastasis. The theory

that emerged from the study is that stem cells may be recruited to the tumor mass,

where through the process of a rare fusion between the stem cell and a cancer cell, a

new cell is formed that is neoplastic in nature, but has acquired additional stem cell-like

properties which confer upon it the metastatic ability [62].

Several more recent studies also implicate self-renewal and EMT in metasta-

sis, not through the Bmi-1 pathway, but as the result of activation of the canonical

Wnt pathway. In particular, breast cancer metastases to lung show activation of Lrp6

(the co-receptor of the Frizzled receptor), where self-renewal is directly modulated by

TCF/LEF transcription factor activation. Twist, an embryonic transcription factor,

which is causally related to the epithelial-mesenchymal transition [199] is a downstream

target of TCF/LEF. Inhibition of Lrp6 and the Wnt pathway reduces the self-renewal

and metastatic capacity of these cancer cells and silences Twist and therefore the EMT

modulation [45]. Wnt pathway activation and TCF/LEF have also been recently shown

to regulate the self-renewal capacity of lung adenocarcinoma cells that metastasize to

the brain [119].

There is now also some evidence for a direct link between EMT and stem cell-

like properties, as shown in mammary carcinomas. Specifically, as mentioned previously

Twist has the ability to induce the EMT transition of epithelial cells [87, 199]. If

Twist is purposefully expressed in mammary epithelial cells when it should otherwise

34

Page 50: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

be repressed, it causes their transition to a more mesenchymal morphological state.

Besides the transition from epithelial to mesenchymal marker expression, these cells

begin to also express the typical markers of normal and cancer mammary stem cells —

CD24 and CD44 [4], as shown by FACS flow cytometry [110]. These transformed cells

display not only the molecular, but also the functional features of mammary stem cells.

The authors of the study show that the converse is true as well: normal mammary stem

cells also express EMT-related markers [110].

2.3 Side populations

One central tool for the identification of stem-cell enriched populations from

various tissues and organs is the isolation of side populations. This technique was first

introduced in the hematopoietic system through the isolation of side populations from

the bone marrow [163], but has since then been successfully used in many other tissue

types, including liver [196], lung [168], brain [92], testis [53], and breast [193].

So what are side populations? They represent populations of cells with high

e"ux rates that actively remove any Hoechst 33342 or Rhodamine 123 dyes and exhibit

low fluorescence levels after staining. In a typical FACS plot, they look like a collection

of cells to the side of the fluorescence levels associated with most cells – the main

population – which is how they received the name “side population.” A FACS plot toy

example is shown in Figure 2.12 to illustrate this concept.

Even though the side population phenotype is not believed to be su#ciently

35

Page 51: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

specific to be used as the sole marker of a stem cell, side populations are enriched

for stem cells [33]. The abundance and enrichment vary between di!erent tissues and

organs, but within bone marrow the abundance ranges around 0.05—1% [33, 64] . Stem

cell-enriched populations show the highest e"ux activity and the rate of e"ux is thought

to be correlated with the level of commitment of cell populations, so it decreases as cells

become more mature [65].

Di!erent theories exist as to the cause of this unexpected phenotype. One

theory suggests that because of their importance, stem cells must have a very active

protein transporter system. The transporter system includes ATP-binding cassette

(ABC) transporters that hydrolyze ATP to move various compounds across the cell

membrane. ABC transporters include many proteins, among which the most well-known

are Multi-Drug Resistance 1 (MDR1), ATP-binding cassette sub-family G member 2

(ABCG2), ATP-binding cassette sub-family B member 1 (ABCB1). They are heavily

used by stem cells to remove various toxins and drugs and preserve the cell intact from

potential harm [33, 42]. As a result of this ability, these cells can also remove dyes at a

very fast rate.

2.4 High-throughput technologies and stem cells

2.4.1 High-throughput technologies

The development of various high-throughput technologies and methods over

the last decade has enabled the accumulation of an enormous databank and has facil-

36

Page 52: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Hoechst Red 660/20

Hoechst B

lue 4

24/4

4

side population (SP)

main population (MP)

Figure 2.12: A FACS plot for a toy example that illustrates the isolation and definitionof a side population. Each axis represents Hoechst dye labeling. The cells that havee#ciently removed the dye (the “side population”) are located in the lowest left hand-side corner of the quadrant.

itated many scientific discoveries. High-throughput technologies have been invaluable,

because of their ability to measure thousands of quantities in a given sample within

a short time frame and to create a quantitative profile that describes the state of the

sample at the point of measurement.

In the next few paragraphs, I introduce very briefly some high-throughput

technologies and high-throughput data types as they relate to the work in this disser-

tation.

2.4.1.1 Microarrays

Microarray technology has been one of the dominant technologies to emerge in

the past 15 years. The purpose of microarrays is to capture the relative or absolute level

37

Page 53: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

of expression of all transcripts in a given sample or in a pair of samples. Expression

profiling, or the measurement of the level of mRNA produced from a given gene, is not

a new concept and has been widely used in molecular biology since the 1970s. However,

the experimental scale has grown immensely and within the last five years microarray

platforms have reached genome-wide scale.

Conceptually, microarray technology can be described as follows: you create

a chip with thousands of features (currently with genome-wide coverage). Each feature

represents a probe or a short RNA sequence attached to the chip and complementary

to some mRNA transcribed from some gene of interest. You select a sample of interest,

or in the case of some microarray technologies, two samples of interest. The mRNAs

in the sample are going to hybridize and attach to their complementary probes on the

microarray chip. In the case of two samples, the mRNAs from the individual samples

competetively hybridize to the array, based on the abundance of the specific mRNA

in each sample. The mRNAs are fluorescently labeled so high abundance of a given

mRNA will spot brightly, while low mRNA abundance will not and both can be assessed

quantitatively [46].

Two general types of microarrays are in widespread use: oligonucleotide arrays

and cDNA spotted arrays. Oligo arrays consist of short probes corresponding to small

segments of genes or other genomic elements. The probes can vary in size from 25

to 60nt depending on the microarray design (A!ymetrix and Agilent). The platforms

can also vary and range from the silicon chip (A!ymetrix technology) to micro-beads

(Illumina technology). cDNA spotted arrays use larger pre-assembled probes spotted

38

Page 54: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

directly onto a glass slide [46]. Spotted arrays are often used in the context of compet-

itive hybridization, where two samples are introduced instead of one and the resulting

measurements represent the relative abundance of the mRNA in one of the samples

versus the other. Usually, when the relative abundance of mRNA between two samples

is of higher interest than independent absolute measurements, this context is preferred

as it avoids noise issues associated with the non-reproducibility of individual spot mea-

surements.

2.4.1.2 SAGE

Serial analysis of gene expression (SAGE) technology gained prominence in

the 1990s and is similar to the microarray technology in that it is used to measure mRNA

levels in a given sample. Unlike microarrays in which a probe must be present on the chip

to be measured, SAGE is a sequencing-based technology and does not require knowledge

of all the molecules present in the sample. The general technique can be described as

follows: mRNAs are isolated from the sample of interest and reverse-transcribed into

cDNA. Then short sequence tags su#ciently long to identify transcripts uniquely are

isolated from individual mRNAs and ligated together. The ligated sequences are then

amplified, sequenced and analyzed. The abundance of a given transcript is measured

as the count of the number of that transcript sequenced from the sample [46].

39

Page 55: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

2.4.1.3 RNA-Seq

RNA-Seq is the most recently emerged technology and considered as the tran-

scriptome technology of the future, perhaps soon to replace microarrays. Similarly to

SAGE, because it is sequenced-based, the RNA-Seq technique does not require prior

knowledge of the mRNAs potentially present in a given sample, which may be an ad-

vantage over microarrays. Though this technique is quite new, some stem cell studies

that use RNA-Seq have already been published in the field [38] and many more are

expected to emerge in the next several years.

2.4.2 High-throughput data

The above three technologies are some of the main high-throughput techniques

used to generate expression data in stem cell experiments. Because expression data is the

core of my stemness analysis, I next review the most common methods used to identify

di!erentially expressed genes. I take the opportunity to introduce some other high-

throughput non-expression data types, though description is restricted to the protein

interaction data used in my stemness research. I also present some of the main public

repositories for these data, as they are important for data collection.

2.4.2.1 Di!erential expression

Expression data are most commonly used in a comparative manner, where

mRNA levels are compared in a pairwise fashion between two samples of interest. De-

pending on the experimental platform, these samples can include

40

Page 56: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

• normal tissue vs. normal tissue, e.g. embryonic stem cells vs. neural stem cells

• normal tissue vs. cancer tissue, e.g. normal liver cells vs. liver cancer cells

• time-course experiments, e.g. undi!erentiated cell at 0h vs. di!erentiating cell at

96h after treatment with a di!erentiation-inducing drug

• normal tissue vs. universal (and/or pooled) reference, e.g. liver stem cells vs.

universal mouse reference

Genes that are highly expressed in one tissue, but not the other are often most

interesting. Di!erential regulation of a gene is expected to be associated with the specific

functional roles the gene may play in the tissue, where it is found to be di!erentially

expressed. Many techniques can be used to identify di!erentially expressed genes. The

two most common methods used in the stem cell literature are SAM [181] and the

fold-change [40] methods.

Based on much of the published stem cell data in the last decade the fold-

change method is perhaps the most utilized technique. For a given probe or gene of

interest G, the fold change between samples S1 and S2 is measured as the ratio between

the (normalized) expression E of the gene in each sample:

FC(G) = |log(ES1

ES2

)| > c.

If the fold-change FC(G) exceeds the cuto! c and ES1 > ES2 , the gene is said to

be di!erentially upregulated in S1. Similarly, if ES1 < ES2 , the gene is marked as

41

Page 57: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

di!erentially downregulated in S1. Common fold-change cuto!s range between 2- and

3-fold.

The other commonly used method for the identification of di!erentially ex-

pressed genes is Significance Analysis of Microarrays (SAM) [181]. The di!erential

expression statistic or the d-score reported by SAM for each gene can be viewed as

a modified gene-specific t-statistic most often measured between two classes, samples,

or conditions. The significance of the d-score is evaluated through permutation analy-

sis and significantly di!erentially expressed (upregulated or downregulated) genes are

usually selected based on a FDR cuto!.

Both of these methods provide as output a list of genes di!erentially upregu-

lated or downregulated between two classes or samples. I will come back to these lists

in the next chapter as they are a particular important component of meta-analysis.

2.4.2.2 Protein-protein interactions

The interaction between proteins is often central to the exercise of their func-

tional roles in the cell. Many protein complexes require the assembly of individual

protein components before the complex can become functionally active. Therefore,

studying protein-protein interactions on a wide scale has been of biological interest to

many researchers. Protein-protein interactions are of particular interest to the study of

stemness as they could be used to test for putative protein complexes with central roles

in stem cells.

Various high-throughput methods have been developed not only for the mea-

42

Page 58: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

surement of mRNA levels in a given cell in a given condition, but also for the mea-

surement of protein levels, as well as the interaction between proteins in a given cell or

condition. Such methods include yeast two-hybrid, co-immunoprecipitation, tandem-

a#nity purification and many other techniques. These technologies produce interaction

data on a high-throughput scale and even though it may be impossible to ascertain the

specificity of the inferred protein interactions, we still get a glimpse of the global protein

state of the cell.

2.4.3 Public data repositories

There are many public data repositories available for use to the larger research

community. The most central databases for expression data include Gene Expression

Omnibus (GEO) [51] and Array Express [132]. Other gene expression databases used in

the past include the Stanford Microarray Database (SMD) [43], however the import of

all publicly available data in SMD into GEO has diminished the need for independent

usage.

Public repositories for protein-protein interaction data include BioGRID [166],

Bind [11], Dip [197], and HPRD [134]. These databases contain both interactions char-

acterized by the techniques mentioned in the previous section, as well as computation-

ally predicted interactions. Large-scale experimentally verified protein complex data

in mammalian species is also available in the CORUM database with independently

verified mouse, rat and human protein complexes [148].

43

Page 59: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

2.5 Summary

In this chapter, I introduced the core of the biological background needed to

understand the basis for my stemness analysis, as well as the results generated by it.

Section 2.1 gave some basic stem cell definitions and described the key properties of

normal stem cells. It also gave a broad overview of the most common normal stem cell

types studied in the literature. It should be noted that many more stem cell types do ex-

ist and the description is by no means exhaustive. Section 2.1.3 introduced some known

pathways associated with self-renewal. Section 2.2 described tumor heterogeneity, the

di!erent cancer evolution theories that have evolved to explain it, and the possible links

between normal stem cells, cancer stem cells, and metastasis. Section 2.3 described the

stem-cell enriched side populations, while Section 2.4 introduced the main gene expres-

sion high-throughput technologies, the public repositories used for such data, and the

main di!erential expression inference methods.

44

Page 60: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Chapter 3

Meta-analysis

This chapter aims to introduce meta-analysis and describe some of the most

commonly used techniques in this field to provide the context for the methodology used

in my stemness work. Section 3.1 provides a brief overview of the meta-analysis field

and what the concept of meta-analysis actually means and entails. Section 3.2 briefly

discusses how meta-analysis can be used for the analysis of microarray experiments.

The most commonly used integration techniques are introduced in Section 3.3, while

the final Section 3.4 analyzes the applicability of the various integration techniques to

my stemness research and justifies my final technique choice.

3.1 Overview of Meta-analysis

Meta-analysis entails the statistical evaluation of a large number of studies for

the measurement of the size and significance of a certain e!ect [20]. It is common to

consider the size of an e!ect measured by a given study. This e!ect size could represent

45

Page 61: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

• the extent of a drug e!ect on a set of patients,

• the level of upregulation of a set of marker genes in a given cell population, or

• the odds of survival after metastasis of a specific tumor type.

The only requirement is that the e!ect size has to be a measurable quantity. So,

how does meta-analysis improve upon previous e!ect evaluation methods? In measuring

one combined e!ect size from many data sources, meta-analysis can take into account

many important, but indirectly relevant factors, such as the size of each study, the

variability of each experiment and its results, the methodological di!erences between

individual studies, etc.

3.2 Meta-analysis and microarray data

Robust cross-study statistical analysis has become very important with the

emergence of gene expression high-throughput technologies. Meta-analysis lends itself

particularly relevant to microarray analysis, because it provides a consistent framework

for the integration of information across di!erent studies explicitly accounting for dif-

ferences in platforms, methodology, cell types and other technical factors that can a!ect

the inference of a true biological e!ect.

Many di!erent meta-analysis methods have been applied to microarray data

[36, 66, 104, 160, 167, 200]. These can be broadly classified in three categories: di!eren-

tial expression-based, co-expression-based and most recently, di!erential co-expression-

based meta-analysis. I will not provide a separate overview of the results and methods

46

Page 62: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

used by independent studies, as most of the recommended and used techniques are sum-

marized in Section 3.3. The most relevant individual studies will be mentioned in the

next chapter, which specifically focuses on previous gene expression-based approaches

to stemness.

Because of the recent development of meta-analysis, books and review papers

have just begun outlining general guidelines – almost in a cookbook recipe style – for

the application of meta-analysis to microarray data. A great example of a review study

that presents a set of potential microarray meta-analysis guidelines is the recent study

of Ramasamy, et al [142]. Even though I do not explicitly follow all recommendations

outlined in this review, I provide a summary of the recommended steps and techniques

as they closely reflect the steps I followed in the development of the methodology for

my stemness studies. Any deviations from the outlined recommendations are discussed

where necessary.

3.3 Techniques for combining study-specific e!ects

One important aspect in the meta-analysis process involves the selection of

appropriate studies, along with the pre-processing of the input data [142]. The pre-

processing steps can include preparation of the data associated with each study, such

as normalization, or mapping of all probes to a common gene space. However, once the

pre-processing has been completed, the most crucial elements of meta-analysis are the

inference of a study-specific e!ect and the combination of the individual study-specific

47

Page 63: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

e!ects.

The choice of appropriate techniques for analysis often depends on the input

data type. Most commonly, researchers use either the raw data directly, or alternatively

the sets of di!erentially expressed genes that can be estimated from the raw data. For

both of these input types, a measurement can be generated for every probe in every

sample.

Several meta-analysis techniques are most commonly used. The two most

common methods for the integration of e!ects across multiple studies include p-value

or rank combination methods [142]. However, as they are not as relevant to my stem-

ness analysis, I do not discuss them in detail in this dissertation. Other meta-analysis

integration methods include vote counting and e!ect size combination.

3.3.1 Vote counting

One of the simpler, but often very useful meta-analysis approaches is vote

counting. The researcher can simply count the number of studies in which a gene

is marked as present. To assess if the count is actually significant, one can employ

permutation techniques to build a null distribution and estimate a significance score.

Another common strategy used to evaluate the significance of overlap is to compare the

amount of observed overlap to the expected amount of overlap based on the binomial

distribution.

48

Page 64: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

3.3.2 E!ect size combination

The final meta-analysis technique incorporates indirectly relevant factors to

the measure of one common combined e!ect size. Meta-analysis theory separates fixed

and random e!ects. The fixed e!ects model assumes that there is one true common

e!ect size and all of the variability observed in the measurement of individual study

e!ect sizes stems from within-study measurement error [20]. On the other hand, the

random e!ects model assumes that there is a distribution of true e!ect sizes from which

the individual study e!ect sizes are drawn and the observed variability can be modeled

as a combination of both the between-study variability in e!ect sizes, as well as the

sampling error [20].

In either case, however, the most important distinguishing feature of this ap-

proach is the use of the inverse variance approach. What this generally means is that

after the measurement of an individual study e!ect, each study’s contribution to the

overall combined e!ect is weighted by the inverse variance associated with that study.

The only di!erence between the fixed and random e!ects models is that the variance

in the former case involves only the within-study variance, while in the latter case

the inverse variance weight has two components: the within- and the between-study

variance.

49

Page 65: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

3.4 Choice of techniques for the stemness meta-analysis

The techniques described in the previous section have been applied to di!erent

studies and data types, but there is no universal consensus on the best input data choice

or the best method for combining measurements across studies.

Ramasamy et al. [142] discusses the advantages of using the raw data, as

studies can use di!erent methodologies to infer significant gene lists of di!erentially

expressed genes. For example, to identify significantly upregulated genes in liver, study

A may use a fold change method with a cuto! of 2, while study B may use a cuto!

of 4. The di!erence in cuto!s may place these two studies on unequal grounds when

assessing the combined set of significantly upregulated genes.

However, even though the use of raw data can be preferable or advisable when

combining results across studies performed on the same microarray platform within

similar tissues, the use of pre-defined lists of di!erentially expressed genes is invaluable

when combining e!ects across di!erent microarray platforms, precisely because it avoids

the need for direct cross-platfom integration. As to the di!erences in the methodology

used by di!erent studies for assessment of significant genes, many authors seem to

prefer similar methodology with closely related cuto!s for the inference of significantly

di!erentially regulated genes. Also, one can explicitly model these di!erences in a

manner similar to the inverse variance approach method by accounting for the level

of specificity or stringency in the measurement of di!erentially expressed genes in each

study. I describe my implementation of such an approach for the stemness meta-analysis

50

Page 66: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

in Chapter 5.

As to the techniques outlined by Ramasamy [142], some are less relevant to the

study of stemness, especially if one uses the lists of di!erentially expressed genes from

each study as input. Both the rank combination and p-value combination methods

require an explicit assignment of significance to each di!erentially expressed gene in

every study, which is not easily available (p-values) or even meaningful (ranks) for

many studies. Because of that, I use a combined approach between a vote-counting

and an e!ect-size technique to define a recurrence scoring method, which can identify

reproducibly upregulated genes across many di!erent stem cell studies.

3.5 Summary

The goal of this chapter was to introduce meta-analysis and some of the basic

techniques used to infer combined e!ect sizes across many di!erent studies. Section 3.1

gave a brief history of the evolution of meta-analysis and its basic premise. Section 3.2

introduced the use of meta-analysis in microarray studies. Section 3.3 outlined the four

most common techniques used to combine study-specific e!ect sizes. Finally, the last

section 3.4 discussed some of the pros and cons of using individual techniques on stem

cell data. This discussion is further extended in the next chapter, which introduces

some of the pre-existing methods and applications of meta-analysis to the search of a

common molecular program shared by all stem cell types.

51

Page 67: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Chapter 4

Previous expression-based approaches to

stemness

The goal of this chapter is to give a broad introduction to the previous work

in the literature on stemness. The previous research can be broadly divided into two

categories: gene-level approaches and global-level approaches. Section 4.1 introduces

the most relevant gene-level approaches, including the three main studies that led the

discussion in this field. Section 4.2 discusses the more global approaches to the study of

stemness and introduces the closest study in the literature to my own stemness work.

It also outlines the elements still missing in the existing literature and highlights the

points of contribution of my work.

52

Page 68: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

4.1 Gene-level approaches to stemness

4.1.1 Founder studies

The concept of stemness, or a shared molecular program between stem cells,

gained prominence with the introduction of the high-throughput microarray technology,

which made possible the large scale analysis of gene expression. Researchers hypothe-

sized that as functionally defined cells, stem cells with their two characteristic properties

of self-renewal and multi- or pluripotency may share a common molecular program, i.e.

there may be specific genes that allow all stem cells to retain their stem cell state.

The first high-throughput gene expression experiments on stem cells were pub-

lished in 2000–2001, but the most relevant studies to stemness are three foundation

experiments published in 2002 and 2003 by Ivanova et al. [79], Ramalho-Santos et al.

[141], and Fortunel et al. [56]. Each of these studies examined several stem cell types

and identified a set of di!erentially upregulated and di!erentially downregulated genes

between each stem cell type and a corresponding di!erentiated or mixed cell type. Both

Ivanova and Ramalho-Santos studied embryonic, hematopoietic and neural stem cells,

while Fortunel examined embryonic, neural and retinal stem cells [56, 79, 141]. Notably,

all three studies used a similar microarray platform, but the downstream comparison

populations used to identify di!erentially expressed genes in each stem cell experiment

varied between each study. Despite some of the methodological di!erences in stem cell

and comparison cell definitions, if a common molecular program did exist, it should have

emerged from the three study comparisons, especially since the comparisons within a

53

Page 69: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

stem cell type between the di!erent studies identified many common di!erentially up-

regulated genes, so the di!erences were not su#cient to mask the common mechanisms

within a single stem cell type.

Interestingly, each study could also identify a high number of genes commonly

upregulated across stem cell types within the study (each Venn bubble in Figure 4.1),

but surprisingly as Fortunel et al [56] reported, there was only one gene commonly

upregulated across all three studies and nine stem cell populations (Figure 4.1). The

ONE common gene

(Itga6)

Ivanova: 283 genes

Ramalho-Santos:

230 genes

Fortunel:

385 genes

359

201

23

275

52

Figure 4.1: Overlap between the genes upregulated in four stem cell types in three stem-ness founder studies. Each Venn bubble represents all genes that are commonly upreg-ulated between all tested stem cell populations within that study. Figure is adaptedfrom the original study of Fortunel et al. [56].

Fortunel study used a simple vote-counting procedure, which is highly appropriate for

meta-analysis on a small number of studies [142], and directly counted the number of

experiments in which each gene was identified as significantly upregulated. Integrin !6

54

Page 70: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

(Itga6) was the only gene that shared upregulation across all stem cell types and studies.

Since then, other transcriptional profiling studies have also shown a low overlap with

the genes selected by the founder studies and few stemness genes have been identified

to date [73].

There are several possible factors that can account for the lack of stemness

genes [50]. The first one, as mentioned above, involves methodological di!erences be-

tween the comparison populations, but others include

• Absence of stemness genes from microarray platform

• Transient expression of stemness genes that has not been captured by a “static”

population

• Expression of stemness genes in di!erentiated cells with regulation at the post-

transcriptional level

• Stemness defined at a modular level, such that individual genes are dispensable,

but shared pathways, protein complexes, or functional gene module are necessary

for the maintenance of stem cell state.

While some of the more technical factors have since lost their relevance (for

example, most recent experiments have been performed on genome-wide microarray

platforms) and others cannot be addressed with the use of microarrays and gene expres-

sion (for example, if the stemness gene levels are determined post-transcriptionally),

more evidence is pointing the search for stemness to more global approaches, such as

the identification of pathways and complexes shared between all stem cell types.

55

Page 71: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Before I discuss the global-level approaches to stemness, I first focus on another

gene-based study with a similar aim that identified stem cell markers through data

integration. Krzyzanowski et al. [102] identified genes that distinguish between two

conditions, such as a stem cell population and a selected control condition. Specifically,

they analyzed every probe set (gene) to find genes that create “gaps” between samples

in their data, i.e. there is a partition of the data under which the probe set showed

high level of expression in one of the subpartitions and a low level of expression in the

other. Each probe set was scored for its expression in the two sample partitions using

the U-statistic associated with the Mann-Whitney non-parametric test [102].

The approach of Krzyzanowski et al. [102] is relatively similar to other ap-

proaches that aim to directly identify consistently di!erentially expressed genes in a large

set of studies (a vote counting strategy). The di!erence in the Krzyzanowski study is

that the sample partition is not predefined, so it is possible to discover marker genes

associated with di!erent stem cell types. Their analysis identified 426 possible stem cell

markers for many di!erent individual stem cell types, but many of these markers have

already been discovered.

The authors also explored the evolution of stem cell marker genes by observing

superfamilies of proteins that have been independently suggested as stem cell markers

but show very di!erent tissue-stem cell specificities. Even though the complementarity

of protein function and specialization to di!erent stem cell types is hinted at in this

study, the authors choose the simplest possible data integration method and each study

in a given cell type was treated as a replicate. Such treatment, however, was only

56

Page 72: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

possible because the same microarray platform was used in all datasets included in the

study.

4.2 Global-level approaches to stemness

Recently, researchers have started looking at more global mechanisms to un-

derstand stemness and several independent e!orts have been made in this direction. One

common feature of all global-level stemness studies is their use of pathways and func-

tional gene sets to either distinguish between di!erent stem cell types, or find common

mechanisms.

A large-scale e!ort by Kluger et al. [94] attempted to identify gene set modules

that could classify between di!erent stem cell populations [94]. The study focused on

a number of stem, progenitor, and di!erentiated cell populations, such as neutrophils,

monocyte, macrophages, lymphocytes, and hematopoietic stem cells and the expression

profiles all populations were assessed using oligonucleotide arrays. To find large gene

sets that could di!erentiate between these populations, the authors applied principle

component analysis (PCA) to various pathways and functional gene-sets as defined by

Biocarta, KEGG [85, 86] and Gene Ontology [7] . They identified gene sets that could

be used as good features in a classification setting because of their ability to separate

accurately the di!erent lineages [94]. These results suggested that gene sets and path-

ways could play a significant role in classification, due to their ability to capture more

subtle changes in transcription that may be otherwise missed by di!erential expression

57

Page 73: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

and gene-level approaches.

Another study with a similar aim was recently published by Doherty et al

[47]. The goal of the study was to identify higher patterns of expression shared between

di!erent stem cells as compared to transit-amplifying and di!erentiated cells, based on

GO gene set activation di!erences between cell types at di!erent levels of di!erentiation.

One interesting aspect of this paper was that the authors avoided the use of raw data

and cross-platform analysis by reducing the raw data for each sample to a single vector,

where each entry was associated with one GO category. The specific value associated

with the entry corresponded to the fraction of genes in that category that were activated

in the sample of interest. Finally, a two-tailed t-test was applied for each GO category to

identify categories that could significantly distinguish between stem, transit-amplifying

and di!erentiated cells.

In 2008, Muller et al. [117] introduced the PluriNet - a protein-protein in-

teraction network of embryonic-stem cell-specific genes that can be used to define and

classify the embryonic stem cell state. To achieve this goal, the researchers applied

non-negative matrix factorization to their stem cell data, which allowed the cluster-

ing of experiments with similar expression patterns, including emrbyonic and induced

pluripotent stem cells.

Perhaps the most relevant work to the study of stemness is the “stem cell

module map.” Wong et al [194] initially tried to construct a stemness signature in

mouse and human through measurement of the activation of functional gene modules in

each stem cell and di!erentiated cell type used in their study. Even though the authors

58

Page 74: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

did not find a shared stemness signature, they defined two independent signatures: an

embryonic-stem cell-like and an adult stem cell-like signature and showed that one of

the key regulators of the embryonic stem (ES) cell-like signature is c-myc [194]. This

transcription factor has the ability to induce the ES-signature in a normal epithelial cell,

causing its transformation to an epithelial cancer stem cell. Biologically, this regulatory

dependency allows the induction of a cancer-like state and the potential creation of

pluripotent cancer stem cells, which could be invaluable for more advanced functional

laboratory studies.

The advantage of these global-level approaches to stemness is the recognition of

the stochastic nature of gene expression and the dispensability of individual genes for the

maintenance of the stem cell state. However, one aspect that all global-level approaches

to date overlook is the potential functional redundancy a!orded by gene duplications

and paralogs. The only study to hint at functional redundancy between evolutionarily-

related proteins is the study of Krzyzanowski et al [102], but their focus is still primarily

gene-based and the study relies on very simple data integration techniques. Another

disadvantage of the majority of previous studies is the lack of a unified scheme for

integration of stem cell experiments that could account for the di!erences in specificity,

the potential underlying correlation between individual experiments, and other factors

that can potentially skew the analysis, such as the influence of primary versus cultured

cells.

The methodology I introduce in the next chapter tries to address these im-

portant issues and fill the voids in the current literature. I make use of meta-analysis

59

Page 75: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

techniques to define a unified scheme for the measurement of reproducible and recurrent

gene or module expression across many experiments. I also explicitly test the functional

redundancy hypothesis, evaluate the contribution of homolog-based gene modules to

the maintenance of the stem cell state and compare their role in stemness to the one of

functional gene modules.

4.3 Summary

This chapter presented a summary of the previous work in the literature on

shared stem cell mechanisms. Section 4.1 introduced the founder studies – the best

known gene-level stemness approaches, as well as other gene-level analyses. Section 4.2

briefly described the more global approaches in the literature used to either distinguish

between di!erent stem cell types, or between cells at various stages of di!erentiation.

This section also described some of the possible improvements that my stemness meta-

analysis methodology (introduced next) could contribute to the literature.

60

Page 76: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Chapter 5

Stemness Meta-Analysis Method

This chapter provides a description of the methodology used in this study to

identify and measure stemness mechanisms in di!erent types of stem cells. Section 5.1

gives a brief outline of the Stemness Meta-Analysis method. The method that I describe

has two major inputs: Section 5.2 summarizes the input profiling data used to generate

mouse and human stem cell compendia, while Section 5.3 describes the generation of

functional and evolutionarily-related gene modules. The subsequent three sections de-

scribe in detail the individual steps—recurrence (Section 5.4), diversity (Section 5.5) and

specificity (Section 5.6)—of the meta-analysis and the selection of appropriate scoring

methods for each step. Section 5.7 uses these three scoring measures to classify modules

into di!erent pattern types, including the patterns associated with stemness. To aid

the understanding of the reader, a graphical flowchart overview of the SMA method

is shown in Figure 5.1 and it summarizes the input, steps and classification patterns

described in the chapter. Section 5.8 outlines the two types of stemness-associated

61

Page 77: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Modules

Stem cell

expression

compendium (PGLs)

Recurrent

modulesStep 1

Recurrence

Input A

Input B

AFA (all-for-all)

OFA (one-for-all)

AFO (all-for-one)

CM (constitutive module)

CG (constitutive gene)

Ste

mn

es

sm

od

ule

s

Patterns

g1

g2

g3

g4

SC1 SC2 SC3 SC4

g1

g2

g3

g4

SC1 SC2 SC3 SC4

g1

g2

g3

g4

SC1 SC2 SC3 SC4

g1

g2

g3

g4

SC1 SC2 SC3 SC4

g1

g2

g3

g4

SC1 SC2 SC3 SC4

g1

g2

g3

g4

SC1 SC2 SC3 SC4

g1

g2

g3

g4

SC1 SC2 SC3 SC4

g1

g2

g3

g4

SC1 SC2 SC3 SC4

g1

g2

g3

g4

SC1 SC2 SC3 SC4

Diversity

Specificity

Stemness ("on")

Stemness ("off")

Figure 5.1: Overview of stemness meta-analysis method. The two main inputs to themethod are gene modules and a large gene expression compendium. The gene mod-ules (Input A) encompass both evolutionarily-related sets of genes, and functionally-associated sets of genes, such as pathways and protein complexes. The large stem cellgene expression compendium (Input B) consists of lists of di!erentially upregulated anddownregulated genes (PGLs) derived from the literature. The method identifies recur-rently expressed modules from the input data (Step 1), and classifies their expressionpattern based on two additional scores – cell diversity and specificity. The final outputconsists of stemness “on” modules (red dashed line), which are upregulated across moststem cell types, and stemness “o!” modules (green dashed line), which are downregu-lated across most stem cell types.

modules – stemness “on” and stemness “o!” modules. Finally, Section 5.9 describes a

few scoring metrics that could be used to define a stemness index score – a score that

measures how stem cell-like a gene signature is.

5.1 Stemness Meta-Analysis method overview

The Stemness Meta-Analysis (SMA) method uses two input types. The first

input consists of evolutionary or functional gene modules. Here the term module is used

in an all-inclusive manner and incorporates individual genes as well. The second input

consists of the published gene lists (PGL) of di!erentially expressed genes as identified

62

Page 78: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

by individual stem-cell-related studies. This second input can also be visualized as

a matrix, where each row-column entry can be represented by one of three values:

di!erentially upregulated, not di!erentially upregulated, or not tested (Figure 5.2).

Two such matrices are used: one is associated with the stem cell data, and the

other is associated with the di!erentiated cell data, such that the genes that are di!er-

entially upregulated in the di!erentiated cells can also be viewed as the di!erentially

downregulated genes in the stem cells. The stem cell upregulated and downregulated

gene lists are separated for both ease of computation and clear determination of the role

(“on” or “o!”) of individual gene modules.

Let the set of genes tested in experiment j be denoted as Wj . I will define the

set of genes upregulated in experiment j as Uj and the set of genes downregulated in

experiment j as Dj . I will further introduce two indicator matrices X (an upregulation

status matrix) and T (a test status matrix), such that xij is 1 if the gene i is upregulated

in experiment j, and it is 0 otherwise. Similarly, tij is 1 if the gene i is tested in

experiment j, and it is 0 otherwise.

An obvious advantage of using PGLs reported by the authors of each original

study is that it allows the results from di!erent microarray platforms to be compared.

The set of reference cell populations can vary between di!erent studies and thus, the

genes identified as upregulated (or downregulated) in stem cells from di!erent studies

can also vary. However, the goal of the analysis was to detect robust and reproducible

stem cell-specific expression over multiple studies, so while the variability might impact

the detection sensitivity, I expect few false positives.

63

Page 79: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

g1

g2

g3

g4

Exp1 ExpN

upregulated not upregulated not tested

Figure 5.2: Published gene list (PGL) input form to Stemness Meta-AnalysisMethod(SMA). The data can be represented as a trinary matrix, such that each columncorresponds to a single literature-derived experiment, while each row represents a singlegene. In a given row and column, the entry corresponds to one of three marks: “upreg-ulated” (green star), “not upregulated” (red cross), or “not tested” (white square).

The SMA method follows several steps: the initial step tests for the existence of

genes and gene modules with stem-cell-associated patterns of expression across multiple

studies. To measure whether a module (or gene) is upregulated in a significant number

of studies, I compute a recurrence score, which builds upon previous meta-analysis

techniques and incorporates the redundancy among studies. Some PGLs have a high

degree of overlap with other PGLs in the input data, such as lists of di!erentially

expressed genes in the same stem cell type or those obtained from the same study.

To avoid double-counting redundant information, I group highly similar PGLs into

equivalence classes and weight the score such that each equivalence class is allowed to

contribute one unit of weight.

The method then applies a cell-diversity measure, based on information-theoretic

entropy, to quantify how a module’s (or gene’s) upregulation is distributed across di!er-

ent stem cell types. Genes or modules upregulated to the same extent in each stem cell

64

Page 80: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

type (i.e. same fraction of studies for each type) are associated with a high cell-diversity,

while those expressed disproportionately are assigned a low cell-diversity.

Finally, to measure whether a module’s (or gene’s) pattern of upregulation is

specific to stem cells, I also measure the level of its upregulation in di!erentiated cells.

Genes or modules found to be significantly upregulated in stem cells may also have roles

in di!erentiated cells. However, those with upregulation specific to stem cells may shed

light on stemness properties. I quantify the degree to which a gene was not upregulated

in di!erentiated cells using a specificity score, based on the level of recurrence of the

module (or gene) across di!erentiated cells.

Altogether, modules that exhibit significant recurrence scores across stem cell

experiments, significant cell-diversity across most stem cell types, and significant speci-

ficity to stem cells are labeled as stemness “on” modules. However, the SMA method

can also be applied to the di!erentiated cell data as well, so modules that show signif-

icant recurrence scores across di!erentiated cell experiments, significant cell-diversity

across most di!erentiated cell types, and significant specificity to the di!erentiated cells

are identified as stemness “o!” (di!erentiation) modules. The stemness “on” modules

include genes that need to be consistently upregulated by stem cells, while stemness

“o!” modules include genes that need to be consistently downregulated by stem cells

(or upregulated by di!erentiated cells). To avoid redundancy, the descriptions of the

methodology focus on the selection of stemness “on” modules, but the selection of stem-

ness “o!” modules is exactly the same, only based on the di!erentiated cell data.

I use the recurrence, diversity and specificity scores to test for many module-

65

Page 81: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

level patterns of upregulation. A gene may be weakly associated with stemness (e.g.,

di!erentially expressed in a fraction of studies) because of the stochastic nature of gene

expression. The cell’s use of alternate pathways and the ability of genes to compensate

for one another, as evidenced by extensive genetic synthetic-lethal maps in several model

organisms [27, 177], suggest that an examination of evolutionarily-related gene modules

could identify stemness patterns. Genes also interact with one another in signaling

cascades, complexes, and metabolic reaction pathways, so interpreting gene expression

data using sets of genes that approximate modules of activity, rather than single genes in

isolation, may also elucidate significant patterns of expression associated with stemness.

A conceptual illustration of stemness at the module level is shown in Figure 5.3.

A module may have several possible significant patterns of upregulation in stem

cells consistent with it playing a key role in stem cell function. I use the diversity score,

coupled with recurrence and specificity, to categorize patterns of module upregulation. A

module may contain genes upregulated across a diverse set of stem cells, which I quantify

using the cell-diversity score introduced earlier. In addition, a module may employ

many or a few constituent members, which I quantify with an analogous gene-diversity

score. Combinations of cell-diversity, gene-diversity, and specificity represent possible

module-level patterns as illustrated in Figure 5.4. All-for-all (AFA) modules express

many members in most stem cell types and o!er the clearest example of stemness-

associated groups of genes (Figure 5.4: first row). One-for-all (OFA) modules express

predominantly a single gene in the majority of stem cell types, even though they may

have other gene members occasionally active as well (Figure 5.4: second row). All-

66

Page 82: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Stem cell type 1

a1

d

Module

member

interactions

Upregulated gene

Non-upregulated

gene

a bGenes

c

a3

a2a1

a3

a2a1

a3

a2

f2

f3

f1

b1

b3

dd

Stem cell type 2 Stem cell type 3

Putative

stemness

module

b2b1

b3

b2b1

b3

b2

c2c3

c1 c4

c2c3

c1 c4

c2c3

c1 c4

f2

f3

f1

f2

f3

f1

Figure 5.3: Module-level view of stemness. A stemness module (dashed line) couldrepresent an evolutionarily-related set of genes (a1, a2, a3), a pathway (b1, b2, b3), or aprotein complex (c1, c2, c3, and c4), in which individual genes are not upregulated inall stem cell types, but the module through the cooperative upregulation (red) of itsmembers is represented in all stem cell types.

67

Page 83: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

for-one (AFO) modules express many genes in a limited set of cell types and may

represent modules with redundant gene functions important for a specific stem cell

lineage but dispensable for others (Figure 5.4: third row). Constitutive module sets

(CM) express many genes in most stem cell types, but also across a significant proportion

of di!erentiated cell types (Figure 5.4: fourth row). Finally, constitutive gene sets (CG)

express predominantly a single gene in most stem cell types, but also show significant

upregulation in di!erentiated cells (Figure 5.4: fifth row).

5.2 Input data sources

5.2.1 Mouse profiling studies

5.2.1.1 Definitions and sources

I used 30 di!erent studies, corresponding to 49 di!erent mouse cell populations

or 12 di!erent stem cell types from published transcriptional profiling studies. The

descriptions of the publications, cell types and other additional information on each

study have been included in Tables 5.1 (primary/freshly isolated cell data) and 5.2

(cultured cell data). The experiments used as input to the analysis have been separated

based on the source of the cells – primary or cultured. Primary cells are freshly isolated

cells derived directly from mice tissues, while cultured cells are grown in vitro.

For each data source, I obtained the lists of upregulated clones in each stem

cell population either directly from the publication, or through correspondence with the

authors of the publication. In both cases, the lists were used either as directly reported

68

Page 84: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

All-for-all (AFA)"stemness"

Pattern Cell diversity

Gene diversity

Specificity Pattern example

+ + +OR

One-for-all (OFA)"stemness"

+ - +

All-for-one (AFO) - + +/-

g1

g2

g3

g4

SC1 SC2 SC3 SC4

g1

g2

g3

g4

SC1 SC2 SC3 SC4

g1

g2

g3

g4

SC1 SC2 SC3 SC4

g1

g2

g3

g4

SC1 SC2 SC3 SC4

Constitutive module

(CM)

g1

g2

g3

g4

SC1 SC2 SC3 SC4

+ + -

Constitutive gene (CG)

+ - -g

1

g2

g3

g4

SC1 SC2 SC3 SC4

Figure 5.4: Classification of modules based on cell-diversity, specificity and gene-diversity scores. The first column gives the pattern type names. The second, thirdand fourth columns indicate the criteria modules needs to pass to be associated witha given pattern type. A ‘+’ symbol that the score is above the threshold, while a ‘-’suggests that the score is below the threshold. The symbol ‘+/-’ is used when thethreshold is not important. The fifth column shows examples of each pattern type. Redindicates that a given gene is upregulated in the stem cell, while green indicates thegene is upregulated in the di!erentiated cell.

69

Page 85: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

in the publication, or as inferred based on the methodology described in the paper.

After the initial selection of upregulated clones from each study, each clone was mapped

to its corresponding Entrez Gene ID. If Entrez Gene did not have a mapping for a clone,

the clone was excluded from further analysis. The final number of upregulated genes in

each stem cell population can be found in Tables 5.1 and 5.2.

Author Class Primary stem cell # upregulated CitationIvanova Hematopoietic HSC 794 [79]R-Santos Hematopoietic HSC 1476 [141]Forsberg Hematopoietic LT-HSC 402 [55]Akashi Hematopoietic LT-HSC 702 [3]Kiel Hematopoietic FL-HSC 346 [90]Kiel Hematopoietic FS-HSC 236 [90]Kiel Hematopoietic Femur-HSC 401 [90]Kiel Hematopoietic Pelvis-HSC 303 [90]Kiel Hematopoietic Sternum-HSC 316 [90]Kiel Hematopoietic HSC 928 [91]Terskikh Hematopoietic HSC 30 [175]Chambers Hematopoietic HSC 272 [34]Fortunel Retinal RPC 1716 [56]Stappenback Intestinal SiEP 159 [165]Giannakis Intestinal SiEP 1440 [61]Fevr Intestinal InSC 427 [54]Mills Gastric GEP 135 [112]Giannakis Gastric GEP 1366 [61]Morris Epithelial EpSC 92 [114]Tumbar Epithelial EpSC 106 [180]Orwig Spermatogonial SSC 320 [126]Kokkinaki Spermatogonial SSC 341 [96]

Table 5.1: List of non-cultured (primary) mouse stem cell profiling studies used in themouse stem cell compendium. The first column represents the author’s name used asa reference for the study. The second column represents the stem cell type in whichthe experiment has been classified. The third column identifies the more specific stemcell population name. The fourth column gives the number of upregulated Entrez Genegenes in each stem cell population, while the final column gives the citation to theoriginal paper.

I also collected PGLs that represented the genes upregulated in 49 di!eren-

70

Page 86: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Author Class Primary stem cell # upregulated CitationIvanova Embryonic ESC 666 [79]RSantos Embryonic ESC 1365 [141]Fortunel Embryonic ESC 1276 [56]Aiba Embryonic F-ESC 633 [1]Aiba Embryonic G-ESC 1784 [1]Aiba Embryonic N-ESC 1467 [1]Aiba Embryonic Z-ESC 1698 [1]Aiba Embryonic P-ESC 1109 [1]Aiba Embryonic iPSCbxo 1739 [1]Aiba Embryonic iPSC 1765 [1]Hirst Embryonic ESC 500 [73]Sharov Embryonic ES/EG 97 [156]Sharova Embryonic ESC/EGC 1462 [157]Tanaka Embryonic ES/TS 23 [173]Ivanova Neural NSC 721 [79]RSantos Neural NSC 1839 [141]Aiba Neural NSC 800 [1]Fortunel Neural NSC 1315 [56]Easterday Neural NS 65 [49]Karsten Neural NSC 125 [88]Buchstaller Neural NCSC 190 [26]Aiba Trophoblast TSC 1208 [1]Chateauvieux Mesenchymal MSC 254 [35]Sharov Mesenchymal MS/NS 24 [156]Ochsner Liver BMEL 874 [122]Behbod Mammary MG-SP 269 [14]Oatley Spermatogonial SSC 176 [121]

Table 5.2: List of cultured mouse stem cell profiling studies used in the mouse stemcell compendium. The first column represents the author’s name used as a reference forthe study. The second column represents the stem cell type in which the experimenthas been classified. The third column identifies the more specific stem cell populationname. The fourth column gives the number of upregulated Entrez Gene genes in eachstem cell population, while the final column gives the citation to the original paper.

71

Page 87: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

tiated cell populations from the same studies and grouped the lists according to the

stem cell from which the di!erentiated cell was derived. For example, all blood sys-

tem populations were grouped together and labeled as expressed in cells derived from

hematopoietic stem cells.

5.2.1.2 Establishment of replicate sets in mouse dataset compendium

One aspect that most previous expression-based stemness studies did not ac-

count for was the redundancy between individual studies. In assessing the contribution

of independent experiments, this step is crucial as it ensures that very similar data

sources are not counted twice. It is especially important to account for redundancy if

the source of similarity is technical, rather than biological.

To account for the redundancy between studies, I initially evaluated the simi-

larity between individual experiments by measuring the overlap of the genes upregulated

in every pair of experiments. In more concrete terms, for every pair of experiments,

starting from (U1, U2) to (U48, U49), I evaluated the significance of overlap between the

experiments in the pair as estimated using the hypergeometric distribution. To de-

cide what experiments would be grouped together in a replicate set, if a pair (Uj1, Uj2)

showed a similarity stronger than p = 10!50, it was kept for further examination.

I also tried less stringent p-value cuto!s (p = 10!5, p = 10!10, and p = 10!20),

but they were not su#ciently stringent to separate experiments of the same stem cell

type from experiments of di!erent stem cell types. More stringent cuto!s separated not

only the experiments of the di!erent stem cell types, but also most of the experiments

72

Page 88: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

of the same stem cell type.

The p-value cuto! may look misleadingly stringent, but since the p-value is

size-dependent and some experiments can show high level of overlap across cell types,

it was a reasonable starting point. I used Cytoscape [155] to visualize the sets of

connected components (experiments) and clustered the network using an edge-weighted

spring-embedded layout, where the weight represented the strength of the p-value.

The final replicate sets (Figure 5.5) were selected based on their higher level

of similarity within the same cell type and study, as compared to other cell types.

A few experiments were borderline and could be associated with a replicate set (i.e.,

the Ramalho-Santos HSC signature), but since they represented experiments associated

with a di!erent study from the rest of the replicate set list, they were excluded from

the final replicate set.

5.2.2 Human profiling studies

5.2.2.1 Study descriptions

In human, I collected published gene lists of significantly upregulated genes

from 33 studies for 49 populations, corresponding to 9 general cell types. The first

author names, publications, cell types and the numbers of upregulated genes in each

study have been summarized in Tables 5.3 (cultured cell data) and 5.4 (primary cell

data).

I also collected 38 PGLs of di!erentially downregulated genes in the same

individual stem cell experiments. The downregulated PGLs represented 8 general stem

73

Page 89: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

ESC

HSC

NSC

Retinal SC

Intestinal SC

Liver SC

Epithelial SC

Gastric SC

SSC

MSC

TSC

Mammary SC

Figure 5.5: Gene-based similarities between all experiments in mouse stem cell com-pendium. Nodes represents individual experiments, while edges are drawn betweennodes if the similarity between the two experiments exceed a p-value cuto!. Nodes arecolored by stem cell type, cross-cell type edges are marked in red, while within-cell typelinks can be identified in blue. We observed many links at higher levels of similarity andselected replicate sets based on the closer similarity of the experiments to each otherthan to any other experiments. The final replicate sets are circled in black.

cell types.

5.2.2.2 Establishment of replicate sets in human dataset compendium

I assessed the similarity between individual human stem cell experiments using

the exact same techniques and cuto!s as in the mouse stemness meta-analysis. Interest-

ingly, the human data showed a clear separation between same and di!erent cell types.

In particular, while some mouse experiments showed a higher similarity to other exper-

iments from the same lab and di!erent cell type than to other experiments in the same

cell type, the human experiments that showed a more significant overlap than 10!50

74

Page 90: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Author Class Cultured stem cell # upregulated CitationArmstrong Embryonic ESC 156 [6]Enver Embryonic Normal-ESC 2520 [52]Enver Embryonic Adapted-ESC 1997 [52]Player Embryonic ESC 847 [136]Xu Embryonic ESC 1793 [198]Calhoun Embryonic ESC 69 [29]Cao Embryonic ESC 1489 [30]Brandenberger Embryonic ESC 477 [25]Brandenberger Embryonic ESC-MPSS 149 [24]Cai Embryonic ESC 226 [28]Sato Embryonic ESC 823 [152]Skottman Embryonic ESC 163 [159]Sperger Embryonic ESC 1174 [164]Beqqali Embryonic ESC 98 [15]Boquest AdiposeStromal ADSC 252 [19]Dontu Mammary MamSC 77 [48]Wright Neural NSC 370 [195]Huang Neural NSC 1251 [74]

Table 5.3: List of cultured human stem cell profiling studies collected in human stemcell compendium. The first column represents the author’s name used as a reference forthe study. The second column represents the stem cell type in which the experimenthas been classified. The third column identifies the more specific stem cell populationname. The fourth column gives the number of upregulated Entrez Gene genes in eachstem cell population, while the final column gives the citation to the original paper.

75

Page 91: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Author Class Primary stem cell # upregulated CitationEckfeldt Hematopoietic BM-HSC 1623 [50]Eckfeldt Hematopoietic UCB-HSC 1022 [50]Jaatinen Hematopoietic HSC 317 [80]Huang Hematopoietic CD133-HSC 840 [74]Hemmoranta Hematopoietic CD133-HSC 152 [70]Hemmoranta Hematopoietic CD34-HSC 111 [70]Komor Hematopoietic HSC-Er 50 [97]Komor Hematopoietic HSC-Gr 28 [97]Komor Hematopoietic HSC-Megak 34 [97]Kim Hematopoietic HSC-B 277 [93]Kim Hematopoietic HSC-CD4T 287 [93]Kim Hematopoietic HSC-CD8T 263 [93]Kim Hematopoietic HSC-NK 181 [93]Kim Hematopoietic HSC-Mye 482 [93]Kim Hematopoietic HSC-Mono 334 [93]Kim Hematopoietic HSC-ImmDendr 335 [93]Kim Hematopoietic HSC-MatDendr 170 [93]Wagner Hematopoietic HSC 23 [187]Toren Hematopoietic HSC-Cord 514 [178]Toren Hematopoietic HSC-Per 432 [178]Ivanova Hematopoietic HSC 262 [79]Igreja Endothelial EPC 230 [76]Huang Mesenchymal MSC 604 [74]Tsai Mesenchymal MSC 48 [179]Kulterer Mesenchymal MSC 273 [103]Song Mesenchymal MSC-AS 201 [162]Song Mesenchymal MSC-OS 176 [162]Song Mesenchymal MSC-CS 102 [162]Kocer Epithelial EpSC 1091 [95]Roh Epithelial EpSC 116 [147]Kosinski Intestinal InEpSC 282 [100]

Table 5.4: List of non-cultured (primary) human stem cell profiling studies collected inhuman stem cell compendium. The first column represents the author’s name used asa reference for the study. The second column represents the stem cell type in whichthe experiment has been classified. The third column identifies the more specific stemcell population name. The fourth column gives the number of upregulated Entrez Genegenes in each stem cell population, while the final column gives the citation to theoriginal paper.

76

Page 92: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

HSC

ESC

MSC

Epithelial SC

Mammary SC

NSC Intestinal SC

ADSCEndothelial PC

Figure 5.6: Gene-based global similarities between all experiments in human stem cellcompendium. Nodes represents individual experiments, while edges are drawn betweennodes if the similarity between the two experiments exceed a p-value cuto!. Nodes arecolored by stem cell type and separate exactly into same cell-type cliques, which areused as the final replicate sets. The display uses an edge-weighted spring-embeddedlayout. “Replicate” sets are formed only for hematopoietic (red) and embryonic (lightblue) stem cells.

segregated perfectly into same-cell type cliques (Figure 5.6).

However, this phenomenon was not necessarily biological in nature, but could

be caused by technical reasons, such as similarities between the protocols and platforms

used by the studies in each clique, especially since one of the cliques consisted of exper-

iments from the same study. The cliques identified in the human data represented the

final replicate sets I used in the human stemness meta-analysis.

77

Page 93: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

5.3 Input gene modules

Gene modules represented the other main input to the stemness meta-analysis

method, as gene-level based analyses had previously failed to identify shared stem cell

mechanisms at the expression level. One of the primary hypotheses that I was inter-

ested in testing was whether functional redundancy and tissue-specific expression could

explain previous failures to identify common stem cell mechanisms. Both functional

redundancy and tissue-specific expression could arise through gene duplications, which

led me to test putatively paralogous gene sets (labeled homolog modules hereafter) for

significant stem cell expression.

Unfortunately, genome-wide predictions for mouse paralog gene sets were not

readily available already, so I used a simple technique to identify such input gene mod-

ules.

5.3.1 Homolog gene families (modules)

I used BLASTP at an e-value1 cuto! of 0.05 to align the entire mouse proteome

(mm9; 45,480 protein sequences2) for each available mouse EntrezGene ID. BLASTP

was an appropriate choice for this analysis, as I did not expect to need the recognition

of very remote homologies.

For each pair of proteins, only the alignment with highest alignment score and

e-value, as well as the highest overall sequence coverage was chosen as representative1An e-value represents a value that measures the number of random hits we expect to see randomly

in a given database.2Some proteins are represented by several di!erent protein sequences, which may explain why the

number is closer to 45,000, rather than to 25,000 sequences.

78

Page 94: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

of the gene pair. After the initial screen, I only kept gene pairs whose sequences had

an alignment e-value smaller than 10e-70 and coverage of more than 50%. Genes were

connected to each other only if they satisfied this stringent requirement. I performed a

depth-first traversal of the protein-protein similarity network to identify all connected

components, each of which was thereafter used as a putative homolog family.

After the initial homolog family assignment, I performed a neighbor expan-

sion step on all singleton gene families. Specifically, to incorporate more evolutionarily

distant homology, unconnected genes were assigned to the family containing the gene

with the most similar protein sequence if this best match exceeded a less stringent

cuto!. I used an e-value cuto! of 10e-10 and a coverage cuto! of 50%. The process

was iteratively repeated until no more singleton families could be reassigned to larger

gene sets and the process converged on to a final set of homolog families (Figure 5.7).

Genes remaining unconnected even after this expansion step were treated as singletons

or singleton families in all subsequent analyses.

It should be noted that while this methodology works well for most homolog

modules, it also has some pitfalls which can a!ect the homolog module quality. The

use of depth-first traversal can cause “chains” of unrelated proteins to be placed in the

same module. For example, protein A could be highly similar to protein B, and protein

B could be highly similar to protein C without protein A sharing any similarity with

protein C (Figure 5.8).

Summary results with a breakdown of homolog module number and sizes for

each organism are provided in Tables 5.5 (for mouse) and 5.6 (for human).

79

Page 95: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

BLASTP

DFS

Neighbor

expansion

Figure 5.7: Protein similarity network generation approach. BLASTP is used to gener-ate alignments and at a stringent cuto!, depth-first search (DFS) is applied to identifyall connected components: homolog families. Subsequently, an iterative neighbor ex-pansion technique is applied to all singletons (red) until the set of homolog familiesconverges to its final form.

Protein domains

Protein A

Protein B

Protein C

Protein D

Figure 5.8: Possible pitfalls of the homolog module definition methodology. The puta-tive domain structure of four proteins (A,B, C, and D) is shown in the form of di!erentshapes. The four proteins belong to the same homolog module. Each colored shape(triangle, square, diamond, circle) represents a di!erent domain. While protein A issimilar to protein B, and protein B is similar to protein C, protein A shares a low levelof similarity with protein C. In addition, even though protein C and protein D sharesome similarity with protein A, they share no similarity with each other.

80

Page 96: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Description of mouse module Before NE After NEHomolog groups with " 100 members 3016 4653Homolog groups with >100 members 5 6Singleton genes 11920 5249Homolog gene families — Total 14941 9908

Table 5.5: Summary of the distribution of mouse homolog (evolutionary) gene modulesused as input to the stemness meta-analysis method. The neighbor expansion (NE) stepsignificantly reduces the number of singleton genes represented in the homolog familyinput. The breakdown of all input module types can be found under the rightmostcolumn. The total number of mouse homolog modules is shown in bold.

Description of human module Before NE After NEHomolog groups with " 100 members 2772 4346Homolog groups with >100 members 3 4Singleton genes 10983 4731Homolog gene families — Total 13757 9081

Table 5.6: Summary of the distribution of human homolog (evolutionary) gene modulesused as input to the stemness meta-analysis method. The total number of humanhomolog modules is shown in bold.

5.3.2 Functional gene modules

While homolog modules could allow direct testing of the functional redundancy

hypothesis, other functional relationships between genes could also be used to investigate

common stem cell mechanisms. For example, protein complexes could be commonly

used by most stem cell types. Alternatively, functional pathways, such as the Wnt or

Notch signaling pathways could also be shared by multiple stem cell types. Therefore,

I also evaluated the role of many di!erent functional gene modules in the regulation of

stem cell state.

I next summarize the methodology for the construction of the functional gene

module set used in the mouse stemness study. The methodology in the human stemness

81

Page 97: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

study, however, is similar, so it will not be independently described.

I constructed a large set of functional modules, derived from five di!erent data

sources: GO [7], Biocarta, CORUM (experimentally derived mouse protein complexes)

[148], and protein-protein interaction databases (modules identified from mouse protein-

protein interaction data, as well as modules identified from human protein-protein in-

teractions mapped to mouse genes by the corresponding best-reciprocal BLAST hit)

[11, 134, 166, 197] .

I compiled all of these modules together and excluded any gene sets that had

more than 25% gene overlap with any other functional module. This filtering step aimed

to reduce the level of redundancy inherent to many of these functional modules, because

of

• the hierarchical nature of Gene Ontology,

• the high level of expected overlap between experimentally derived protein com-

plexes and the predicted highly connected mouse and human protein-protein in-

teraction modules.

Subsequently, I also removed any functional modules that showed more than

50% overlap with any homolog family (singletons excluded). The purpose of this filtering

step was to allow the discovery of functional relationships between genes that could not

have been identified using evolutionary relations. The relatively high overlap cuto!

(50%) between evolutionary and functional gene modules was set to allow genes with

multiple functional roles to be captured by di!erent module types.

82

Page 98: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Description of functional module # of modulesExperimentally-derived protein complexes 90Mouse PPI modules 5Human PPI modules 140GO, KEGG and Biocarta modules 376Functional gene modules — Total 611

Table 5.7: Summary of the distribution of mouse functional gene modules used as inputto the stemness meta-analysis method. The breakdown of all input module types canbe found under the rightmost column. The total number of modules is shown in bold.Most mouse protein-protein interaction (PPI) modules have already been captured byexperimentally derived protein complexes and removed for redundancy purposes, whichexplains the low number of these modules in the input data. The human PPI modulesrepresent genes mapped to mouse gene space.

Description of functional module # of modulesExperimentally-derived protein complexes 234Human PPI modules 4Mouse PPI modules 94GO, KEGG and Biocarta modules 557Functional gene modules — Total 889

Table 5.8: Summary of the distribution of human functional gene modules used as inputto the stemness meta-analysis method.The total number of modules is shown in bold.

The final set of modules from each source type is summarized in Tables 5.7

(for mouse) and 5.8 (for human).

In the next few sections I describe the steps of the SMA method in their appli-

cation order – recurrence, diversity and specificity. Each section and scoring type will

first begin with a notation section to allow the reader to follow the scoring descriptions

more easily. The notations do not overlap, so each label will be uniquely associated with

a single concept. Each notation section also includes a table for easy lookup of labels.

83

Page 99: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

5.4 Recurrence scoring

The first step of the stemness meta-analysis method aims to measure the level

of reproducibility and recurrence of upregulation associated with each gene module used

as input (as described in the previous sections). Previous expression-based approaches to

stemness have often taken simple approaches, such as vote-counting, to the integration

of measurements across multiple experiments.

Simple approaches, however, are inappropriate if we need to integrate data

from high-throughput gene expression experiments from multiple platform types. Plat-

forms reached genome-wide coverage only relatively recently, so it is important to ac-

count for whether an individual gene was tested in a given experiment based on the

input data.

Another important factor to consider is the signal strength of each experi-

ment. While the use of di!erentially expressed genes is appropriate for cross-platform

integration, simple integration approaches do not account explicitly for the di!erences

in stringency and specificity associated with the identification of di!erentially expressed

genes. Ideally, highly conservative experiments that identify a smaller fraction of upreg-

ulated genes will contribute a higher weight than experiments that reveal most tested

genes to be significantly upregulated.

Furthermore, previous expression-based stemness approaches do not incorpo-

rate information about the similarities between experiments. As double-counting infor-

mation from highly similar experiments is a concern (especially if the similarity is based

84

Page 100: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

on technical reasons), it is crucial to account for the redundancy between individual

experiments.

The recurrence score that I develop tries to address all of the above issues. To

understand the score, however, I first give some notation definitions.

5.4.1 Notation definitions

Each module will be denoted with M, and the recurrence score for module M

will be labeled as R(M) . Module M will consist of n genes and each index over the

genes in the module will be written as i. An expression compendium will consist of k

experiments, and each index over experiments will be indicated with j.

The fraction of upregulated genes in an experiment j will be denoted by vj ,

while zj will be used to indicate a “signal strength” weight for an experiment j.

To ease the understanding of the replicate sets, I will define a set of all repli-

cate groups, B1, B2, ..., BF , as a mutually exclusive and exhaustive partition of the

experiments in the compendium, where F is the number of replicate groups in the com-

pendium. Each one of the replicate groups is generated as described in Section 5.2.1.2

and ranges in size between 1 and 7 experiments. Experiment j ’s replicate group can

be written as # (j), in which case if # (j) = 1, then experiment j # B1. Finally, define

|B!(j)| to indicate the number of experiments in the “replicate” set B!(j).

A summary of all recurrence score notation symbols presented here and in the

following section can be found in Table 5.9.

85

Page 101: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Notation symbol Notation descriptionM Module of genesJ Set of all experiments in expression compendiumR(M) Recurrence score for module Mn Number of genes in module Mk Number of experiments in expression compendiumF Number of replicate sets in expression compendiumi Index over the genes in module Mj Index over the experiments in the set Ktij Tested/not-tested status of gene i in experiment jxij Upregulation status of gene i in experiment jvj Fraction of upregulated genes in experiment jzj “Signal strength” weight for experiment jB!(j) “Replicate” set for experiment j|B!(j)| Size of “replicate” setwj “Replicate” weight for experiment jhi,!(j) Gene-experiment-specific “replicate” weight

Table 5.9: Overview of the recurrence score-associated notation. The first columnrepresents the notation symbol used in the recurrence scoring definition. The secondcolumn gives the descriptions for all notation symbols introduced in the first column.

5.4.2 General form of a recurrence score

Let a module of genes be denoted as the set M. I define a recurrence score for

a module, R(M), to reflect the degree to which its constituent genes are upregulated

in a collection of experiments, J. Incorporating all redundancy, signal strength, test

status and upregulation status information, a gene-specific score for a gene i, Gi, can

be written as follows:

Gi =!

j"J

zjhi,!(j)xijtij . (5.1)

One can notice four elements that contribute to the overall score: tij (whether a gene is

tested or not), xij (whether a gene is upregulated or not), zj (“signal strength” weight),

and hi,!(j) (“replicate” weight).

86

Page 102: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Both tij and xij are binary variables and can be described as simple measures

of whether gene i is respectively tested, or upregulated in experiment j :

tij =

"####$

####%

1, if gene i was tested in experiment j ;

0, otherwise.

xij =

"####$

####%

1, if gene i was upregulated in experiment j ;

0, otherwise.

zj represents the “signal strength” weight for an experiment j, such that more

stringent experiments have a high signal strength, while less stringent experiments have

a low signal strength. Various di!erent measures can be used to determine the signal

strength and I empirically determined a good choice for this weight, as shown in Section

5.4.3.

Finally, hi,!(j) represents a gene-experiment-specific “replicate” weight. This

weight is dependent on both the replicate set for experiment j and the gene i, since

di!erent genes have been tested in di!erent experiments, so the weight for a particular

experiment is a function of the other experiments in which a gene was tested. The

replicate weight can thus be expanded as follows:

hi,!(j) =wjtij&

l"B!(j)wltil

, (5.2)

where wj is the “replicate” weight associated just based on experiment j.

87

Page 103: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

After defining Gi, it is easy to also define a gene-specific normalization factor,

Ti, that takes the replicate weight, the signal strength weight and the test status into

account as

Ti =!

j"J

zjhi,!(j)tij . (5.3)

Given Ti and Gi for every gene in a module M, a recurrence score in its most

general form can be written as follows:

R (M) =!

i"M

1Ti

(Gi)q (5.4)

=!

i"M

1&j"J zjhi,!(j)tij

'

(!

j"J

zjhi,!(j)xijtij

)

*q

(5.5)

Here q is the exponent associated with the gene score, determining how much

contribution each gene will have to the final module score R(M).

This general scoring scheme fits well with the meta-analysis techniques dis-

cussed in the previous chapters. It follows the vote-counting methods that count the

number of experiments each gene is upregulated in, but it also incorporates score ele-

ments similar to the inverse variance weights typically used for combining e!ect sizes in

standard meta-analysis.

5.4.3 Simulation of synthetic module data

Due to the lack of appropriate positive controls, I used synthetic data to assess

the accuracy of each scoring method at identifying recurrent upregulation across many

88

Page 104: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

stem cell types. Two distinct categories of synthetic modules were created: stemness and

non-stemness modules. The data for each module type were synthesized from di!erent

underlying models.

5.4.3.1 Size of module selection

Initially, the size of each module was randomly sampled from an exponential

distribution with a mean of four genes, which followed closely the distribution and

average size of real homolog families. I generated data for 2000 modules, of which only

120 represented stemness families, based on my hypothesis that stemness families should

represent no more than 5–6% of all tested families.

5.4.3.2 Module subtype selection

Three di!erent stemness module subtypes, which reflected di!erent possible

patterns of upregulation, were simulated at equal ratios of 1:1:1, i.e. 40 stemness mod-

ules of each type. The three subtypes corresponded to the following patterns: families

with predominant expression of a Single gene across Many tissues (type SM modules),

families with expression of a Single member gene in a Single tissue (type SS modules),

and families with a higher level of activity - Many genes expressed in Many tissues

(type MM modules).

Two di!erent non-stemness module subtypes—tissue-specific and unrelated—

were simulated at a ratio of 2:3 (tissue-specific:unrelated). The tissue-specific subtype

corresponded to families of genes that were primarily expressed in a single tissue, while

89

Page 105: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

unrelated families represented families with no role in stem cell biology.

Data was simulated for nine experiments to match the experimental design of

the founder studies, discussed previously in Section 4.1.1.

5.4.3.3 Sampling for each module subtype

Stemness module subtypes were sampled as follows:

• Families with predominant expression of a single gene in many tissues

(type SM modules). Randomly select a single gene i from the module and

assign it a high probability of selection (pi=0.8). Each other gene can be assigned

the same low probability of selection estimated as 0.2(n!1) , where n is the total

number of genes in the module. Since the type SM modules are expected to

express predominantly a single gene, the probabilities of success for the genes in

the module are heavily skewed towards a single gene. Then for each of the nine

synthetic experiments, an independent single draw from a multinomial distribution

is made with the above mentioned probabilities of success for each gene in the

module.

• Families with predominantly of a single gene in a single tissue (type

SS modules). Assign all genes in the module the same probability of selection,

given as 1n , where n is the number of genes in the module. Since the type SS

modules are expected to express a single gene in a single tissue, the probabilities

of success for each gene in the module are equal. Then for each of the nine synthetic

90

Page 106: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

experiments, an independent single draw from a multinomial distribution is made,

where all genes in the module have an equal probability of being picked.

• Families with a higher level of activity - many genes expressed in many

tissues (type MM modules). Assign all genes in the module the same prob-

ability of selection, given as 1n , where n is the number of genes in the module.

Since the type MM modules are expected to express many gene in many tissues,

the probabilities of success for each gene in the module are equal, but multiple

draws are made in each tissue or experiment. Then for each of the nine synthetic

experiments, two draws from the multinomial distribution are made, where all

genes in the module have an equal probability of being picked.

Non-stemness module subtypes were sampled in a slightly di!erent manner.

First, for each non-stemness family, to decide which genes in the modules will be ac-

tivated in any tissue type, a draw was initially made from a binomial distribution,

Binom(n,p = 0.5), where n is the number of genes in the module. Once activated genes

were selected, the sampling was performed as follows:

• Tissue-specific families. Randomly select a single tissue and assign it a high

probability of selection (pj=0.8). Each other tissue can be assigned the same

low probability of selection estimated as 0.2(k!1) , where k is the total number of

experiments. Then, for each activated gene, make a single draw from a multinomial

distribution with the above mentioned probabilities.

• Families with no role in stem cells. Assign all tissues the same probability

91

Page 107: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

of selection, given as 1k , where k is the total number of experiments. For each

activated gene, make a single draw from a multinomial distribution, where all

tissues have an equal probability of selection.

5.4.4 Evaluation and selection of a recurrence score

I used the simulated synthetic data to directly evaluate several recurrence scor-

ing methods, each of which followed the general form of the recurrence score introduced

in Equation 5.5 in Section 5.4.2.

I chose to vary two di!erent parameters associated with the score. The first

one was the exponent q associated with the gene scores, which determined how much

weight a gene will contribute to the final module score. I tested four di!erent exponent

choices: q = 0.5, 1, 2 and 3. The second parameter was the “signal strength weight” zj .

I tested three di!erent “signal strength” weight choices zj : $log(vj), 1vj

and (1 $ vj),

where vj corresponded to the fraction of genes upregulated in experiment j.

I evaluated all combinations of the parameter choices, as well as an additional

score that used a square as the exponent (q = 2), but did not use any form of experiment

specificity weighting.

The results based on the simulated synthetic data indicated that the choice of

signal strength weight was not as significant as the choice of the exponent associated

with the gene scores. For all tested scoring methods within each exponent selection,

no significant di!erence between the scores for di!erent weights was observed, based on

the Wilcoxon rank sum test.

92

Page 108: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

To select an appropriate exponent, however, I calculated the area-under-the-

curve (AUC) for di!erent exponent choices, and compared the di!erences in AUC re-

sults, based on the stemness module type chosen for scoring (Table 5.10). The AUC

represents a summary statistic usually derived from ROC (receiver-operating charac-

teristic) plots, which compare the sensitivity and false positive rates associated with a

scoring method. AUC values range between 0 and 1, and the best possible method will

have the highest possible AUC.

The results indicated that high exponent choices, such as quadratic and cubic

choices, scored highly with modules dominated by the expression of a single gene (type

SM modules), while small exponent scoring choices, such as square root and linear

selections scored the highest with modules showing a complex (type MM modules) or

one-gene-one-tissue-like (type SS modules) stemness expression pattern.

One final AUC score was calculated as the average of the AUC scores for all

stemness module types. The AUC scores for each individual stemness module type, as

well as a summary of the average AUC score results are shown in Table 5.10.

I also directly compared the ranks of the scores of all simulated stemness

modules between the di!erent scoring types using the Wilcoxon rank sum test. The

comparisons indicated no significant di!erence between the ranks of the scores, given

by the quadratic and cubic scoring choices (Wilcoxon rank sum test: p-value = 0.8172).

However, these scoring choices were both significantly di!erent from the square root

scoring (quadratic: p-value = 0.0048 and cubic: p-value = 0.00863).

Finally, I selected the experiment-weighted quadratic (q = 2; zj = $log(vj))

93

Page 109: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Scoring method Type SM AUC Type SS AUC Type MM AUC Average AUCq = 0.5; zj = $log(vj) 0.3087 0.5036 0.6565 0.4896q = 0.5; zj = 1

vj0.3076 0.5010 0.6556 0.4880

q = 0.5; zj = 1$ vj 0.3086 0.5035 0.6565 0.4895q = 1; zj = $log(vj) 0.3844 0.5137 0.6481 0.5154q = 1; zj = 1

vj0.3844 0.5137 0.6481 0.5154

q = 1; zj = 1$ vj 0.3843 0.5137 0.6481 0.5153q = 2; zj=$ log(vj) 0.4733 0.4867 0.6147 0.5249q = 2; zj = 1

vj0.4731 0.4862 0.6152 0.5248

q = 2; zj = 1$ vj 0.4731 0.4868 0.6148 0.5249q = 3; zj = $log(vj 0.5005 0.4781 0.5952 0.5246q = 3; zj = 1

vj0.5013 0.4774 0.5949 0.5246

q = 3; zj = 1$ vj 0.5009 0.4781 0.5951 0.5247q = 2; no weight 0.4727 0.4840 0.6147 0.5238

Table 5.10: Area-under-curve (AUC) results for 13 recurrence scoring methods testedfor recovery of stemness modules, based on synthetic data evaluation. Two scoringparameters were varied and each scoring method represented in this table shows adi!erent pair of parameter choices. The row that describes the final parameter choiceassociated with the highest average AUC is shown in bold. The first column (Scoringmethod) summarizes the parameter choice for each score. The second column (TypeSM AUC) gives the AUC results for stemness families with predominant expression of asingle gene in many tissues (abbreviated as type SM). The third column (Type SS AUC)gives the AUC results for stemness families with predominant expression of a single genein a single tissue (abbreviated as type SS). The fourth column (Type MM AUC) givesthe AUC results for stemness families with a higher level of activity - multiple genes inmultiple tissues (abbreviated as type MM). The fifth column shows the average AUCacross the type SM, type SS, and type MM families. The highest AUCs in each type areshown in italics in each column. The critical parameter in the simulation is q. Smallervalues of q produce more accurate results for modules that use multiple genes in multipletissues, while higher values of q produce more accurate results for modules that use asingle gene in many tissues.

94

Page 110: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

recurrent scoring method as the method with the highest average area-under-the-curve

among the tested scoring measures and used it in all subsequent analyses. The final

form of the recurrent scoring can be summarized from Equation 5.5 as:

R (M) =!

i"M

1&j"J $log (vj) hi,!(j)tij

'

(!

j"J

$log (vj)hi,!(j)xijtij

)

*2

, (5.6)

where all score elements remain as previously described.

5.4.5 Significance of recurrence scores

Once I selected a recurrence scoring method and applied it the real stem cell

compendium data, I had to also perform an evaluation of the significance of recurrence

scores. The significance of the recurrence scores was assessed using permutation analysis

and was used to identify modules with significantly recurrent upregulation scores across

di!erent stem cell experiments.

The data within each experiment were permuted 1,000 times. The total num-

ber of upregulation counts in each experiment was kept the same, but the upregulation

states were randomly permuted between genes and each gene acquired a new upreg-

ulation state assignment. This procedure was followed to preserve the distribution of

the number of upregulated genes the same within each experiment and yet break the

potential dependency in the expression of the genes associated with each module.

If an experiment was part of a replicate set, the block of data values for each

gene within the replicate set was treated as one experiment and permuted together. I

95

Page 111: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

inferred 1,000 randomized recurrence scores for each module using the score described

in Equation 5.6. All module scores were normalized by the size of the module. Since

the recurrence score was dependent on the size of the module and to avoid empirical

distribution fits, I used an FDR approach to select an appropriate significance cuto!

for each module size.

At the most general level, FDR can be defined as

FDR =FP

TP + FP(5.7)

While I didn’t know the number of false positives in this context, I could

approximate their number from the randomly permuted data.

FP = # of modules M in the randomly permuted data with R(M) > c (5.8)

FP + TP = # of modules M in the real data with R(M) > c (5.9)

Therefore, the FDR could be computed from Equations 5.8 and 5.9 as

FDR =# of modules M in the randomly permuted data with R(M)>c

# of randomly permuted modules

# of modules M in the real data with R(M)>c# of real modules

(5.10)

and used to select the cuto! c separately for each di!erent module size. To ensure

conservativeness, I selected a FDR cuto! of 5% and estimated the corresponding re-

currence score cuto! for each family size (Table 5.11); the cuto! was thereafter used

as a significance cuto! for that family size. The overall number of recurrent modules

identified in each organism is shown in Figure 6.2 (in Chapter 6 for mouse) and Figure

7.2 (in Chapter 7 for human).

96

Page 112: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Module Size Recurrence cuto!1 4.12 23 1.44 15 16-8 0.89-10 0.711 0.612 0.713-19 0.620-25 0.5> 25 0.1-0.3

Table 5.11: Recurrence score cuto!s for modules of di!erent sizes. The recurrence scorecuto!s represent the recurrence scores for each module size at which the false discoveryrate was 5%. Any modules with recurrence scores higher than the ones indicated in thistable are considered significantly recurrent. The first column shows di!erent modulesizes. If the column has a single value, only a one module size is shown in the row. If arange of values is introduced in the first column, all modules of these sizes have similarcuto!s. The second column shows the recurrence score cuto!s for the di!erent modulesizes.

97

Page 113: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

5.5 Diversity scoring

Even though recurrence scoring could successfully identify families with re-

currently upregulated member genes across a large set of stem cell experiments, the

over-representation of certain stem cell types in the literature, such as embryonic stem

cells, could lead to the discovery of recurrently upregulated families in only a single or

a few stem cell types. To distinguish between modules with high and low level of stem

cell type coverage, I followed a di!erent, information-theoretic entropy-based scoring

strategy.

Based on the input published gene list (PGL) di!erential expression data, every

module has an associated binary expression matrix that represents the upregulation

status of the individual member genes in every experiment. These upregulation matrices

can be transformed into cell type-based matrices, such that each row represents an

individual gene and each column represents an individual cell type. Every entry in the

cell-type based matrix corresponds to the fraction of experiments in the cell type that

the gene was upregulated in.

5.5.1 Notation definitions

Each module will be denoted with M, a cell type diversity score for module M

will be labeled as d(M), and a gene usage diversity score will be labeled as g(M).

The total number of tested cell types from the k experiments used in the stem

cell expression compendium will be denoted as s, and as previously described n will be

98

Page 114: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Notation symbol Notation descriptionM Module of genesL Set of all stem cell types in compendiumd(M) Cell type diversity score for module Mg(M) Gene usage diversity score for module Mn Number of genes in module Mk Number of experiments in expression compendiums Number of stem cell types in expression compendiumi Index over the genes in module Ml Index over the stem cell types in expression compendium

Table 5.12: Overview of the diversity-score-associated notation. The first column rep-resents the notation symbol used in the diversity scoring definition. The second columngives the descriptions for all notation symbols introduced in the first column.

used to reflect the total number of genes in module M. Each index over the genes in the

module can still be written as i, and each index over cell types can be denoted with l.

The set of all tested cell types will be denoted with L.

A summary of all diversity-related notation symbols presented here can be

found in Table 5.12.

5.5.2 Cell type and gene usage diversity

To understand why an entropy-based score is an appropriate choice for the

measurement of diversity, I first introduce the concept of entropy. Information-theoretic

entropy measures the level of uncertainty attached to a random variable. Let’s assume

I would like to measure the entropy associated with some future event, based on infor-

mation about the probabilities of the various event outcomes. If all outcomes in the

outcome space are all equally probable, the uncertainty of the outcome of the future

event is very high and the entropy score is the highest possible. Conversely, if the prob-

99

Page 115: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

abilities in the outcome space are so skewed that only one of the outcomes is possible,

the uncertainty of the outcome of the future event is minimal and the entropy score is

the lowest possible. For a random variable A with m possible outcomes (a1, a2, ..., am),

the entropy can be written as

H(A) = $m!

i=1

p(ai) log(p(ai)) . (5.11)

I estimated two di!erent entropy-based diversity scores. The first one reflects

the cell type diversity of the module, while the other reflects the gene usage diversity.

For a family M, I defined the cell type diversity (entropy) score d(M) as follows:

d(M) = $!

l"L

!l(M) log(!l(M)) , (5.12)

where !l(M) represents the fraction of upregulated genes from module M in cell type l

and can be defined as

!l(M) =&

i"M fil&l!"L

&i"M fil!

, (5.13)

and if Z(l) represents the set of all experiments of cell type l, then fil can be written as

fil =

&j"Z(l) xij&j"Z(l) tij

. (5.14)

Here the binary variables xij and tij still represent the upregulation and test status

respectively of a gene i in experiment j.

Similarly, I defined the gene usage diversity (entropy) score g(M) as follows:

100

Page 116: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

g(M) = $!

i"M

"i log("i), (5.15)

where "i corresponds to the fraction of cell types in which a gene i is upregulated and

can be defined as

"i =&

l"L fil&i!"M

&l"L fi!l

. (5.16)

L represents the set of all stem cell types in the compendium, while fil is still used to

describe the fraction of experiments of cell type l in which a gene i is upregulated.

I also estimated normalized cell type (dnorm(M)) and gene usage (gnorm(M))

diversity scores to take the maximum possible diversity (all possible outcomes are

equiprobable) into consideration, as described by the denominators of both scores:

dnorm(M) =d(M)log(s)

(5.17)

gnorm(M) =g(M)log(n)

(5.18)

This normalization step is especially important for the gene usage diversity

score, because it accounts for the di!erences between the sizes of di!erent modules.

5.5.3 Significance of diversity scores

The diversity score significance assessment was performed in a similar fashion

to the evaluation of the recurrence scores. Randomized families from 1,000 permutations

were used to generate cell-type diversity scores for random modules. For each family

101

Page 117: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

size, a false discovery rate was estimated at every cuto! from 0 to the maximum possible

entropy score.

However, since the total number of tested cell types was a constant across

modules of all sizes, and since empirical observations suggested that the FDR cuto!s

for modules of di!erent sizes would be relatively similar (see Figure 6.6), I generated a

combined FDR estimate across all module sizes at each cuto!.

The combined estimate used a weighted average of the FDR levels associated

with each size weighted by the number of families of that size. The final cell-type

diversity score cuto! corresponded to the 5% FDR level cuto!; any family above it was

considered cell type diverse. This cuto! was 2.5, where the maximum possible score that

could be achieved for twelve stem cell types was 3.6.

Since gene diversity was not as crucial to the determination of stemness, but

rather to the overall participation and contribution of individual member genes of the

module, a simple cuto! was used to distinguish gene diverse modules from modules

that exhibited low gene diversity. Families that showed a normalized gene diversity

score higher than 0.5 were considered “gene-diverse” modules.

5.6 Specificity scoring

The discovery of modules that are reproducibly upregulated across many dif-

ferent stem cell types does not guarantee their specificity to undi!erentiated cells. To

identify and exclude any families that were also significantly upregulated across di!er-

102

Page 118: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

entiated cells, I defined a specificity score. The specificity score I chose was based on the

recurrence score of the module across all di!erentiated cell experiments. Significance

of the recurrence score of a module across the di!erentiated cells was determined using

the 5% FDR cuto!s for modules of that size as identified from the stem cell data. If

a module was significantly upregulated in the stem cell experiments, but not signifi-

cantly upregulated in the di!erentiated cell experiments, it was considered “specific.”

If a module was significantly recurrently upregulated in both stem and di!erentiated

cells, it was considered non-specific and excluded from further consideration in both the

stemness “on” and “o!” categories.

5.7 Pattern classification

The three scoring elements described in the previous sections allow the clas-

sification of modules into several di!erent classes. Because of the noise inherent to

microarray data and the worry about di!erences between the experimental designs of

individual experiments, only significant recurrently upregulated modules were consid-

ered for further classification.

Some interesting biological phenomena that are not reproducible across all

conditions and cell populations could be missed because of this restriction, but I chose

the more conservative statistically sound approach, trying to reduce the biological false

positives at the expense of potential false negatives.

The most important distinction, as related to stemness, between individual

103

Page 119: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Significantly high

cell diversity?

Yes No

High gene diversity?

Yes

AFO modules

Significant stem cell

specificity ?

Yes

No

High gene diversity?

Yes

No

AFA modules

OFA modules

High gene diversity?

Yes

No

CM gene sets

CG gene sets

Significantly

recurrent modules

Figure 5.9: A pattern classification procedure used to identify stemness modules. Thepath to the definition of both types of stemness modules – all-for-all (AFA) and one-for-all (OFA) – is outlined in red.

recurrent modules is at the cell-diversity level (Figure 5.9).

The cell diversity score delineates two classes of modules: modules with sig-

nificantly high cell diversity scores and modules with low cell diversity scores. Highly

(and significantly) cell-type diverse modules can fall in one of four categories:

• All-for-all (AFA) modules upregulate many member genes in most stem cell

types. They show significant specificity to stem cells and have a normalized gene

diversity score higher than 0.5. These gene modules can vary from protein com-

plexes, which require all member genes to be di!erentially upregulated in stem

cells to homolog modules that exhibit multi-cell type specialization of individual

104

Page 120: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

genes, i.e. each gene can be highly upregulated in several stem cell types. AFA

modules represent one of the two categories of gene sets that is associated with

stemness.

• One-for-all (OFA) modules upregulate predominantly one gene in most stem

cell types. They also show a significant specificity to stem cells, but they have

a normalized gene diversity lower than 0.5. Even though many of these modules

consist of more than one gene, there is only one predominant gene consistently

upregulated in di!erent stem cells. OFA modules represent the second category

of gene sets that is associated with stemness.

• Constitutive module (CM) sets upregulate many member genes in most stem

cell types and a significant set of di!erentiated cells as well. They are not specific to

stem cells, but they have a normalized gene diversity score higher than 0.5. These

modules consist of genes that are potentially required either for housekeeping roles

in the cell, or alternatively have cell-type specific roles that are unrelated to the

functional properties of stem cells.

• Constitutive gene (CG) sets upregulate predominantly one gene in most stem

cell types and a significant number of di!erentiated cells. They are not specific

to stem cells and have a normalized gene diversity score lower than 0.5. These

modules are expected to be rare and theoretically consist of one predominant

expressed gene in stem cells, and one or a combination of other genes expressed

in di!erentiated cells.

105

Page 121: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Based on the definitions above and for completeness purposes we can also

define several categories of non cell-type diverse modules. However, in practice most of

these categories do not bear a clear relevance to stemness, so I restricted the pattern

classification to only two non-cell type diverse module categories:

• All-for-one (AFO) modules upregulate many member genes in a low num-

ber of stem cells and can be described as cell-type-specific modules. They have

normalized gene diversity higher than 0.5 and can show specificity to stem cells,

but are not required to as the definition of non-specificity does not have a clear

meaning at this level.

• One-for-one (OFO) modules upregulate few member genes in a low number of

stem cells and can be described as single-gene-dominated cell-type-specific mod-

ules. They have normalized gene diversity lower than 0.5.

5.8 Stemness “on” and stemness “o!” modules

I used the pattern classification procedures and the combination of all scores

to define modules associated with stemness. Modules that fell in either the all-for-all

(AFA), or one-for-all (OFA) categories were defined as stemness modules. As previ-

ously mentioned, while modules of upregulated genes in many stem cell types (stemness

“on” modules) represented clear examples of gene sets with stem-cell-related functions,

modules of downregulated genes across many stem cell types (stemness “o!” or di!er-

entiation modules) could also be very important.

106

Page 122: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

These stemness “o!” modules could be identified in the exact same way as the

stemness “on” modules, but using the di!erentiated cell data, instead of the stem cell

data. Stemness “o!” modules would then be defined as significantly recurrent across

di!erentiated cell experiments, significantly cell-diverse across many di!erentiated cell

types, and significantly specific to di!erentiated cells.

5.9 Formulation of stemness index

Stemness module identification is important for discovering families, modules

and processes associated with self-renewal and the control of di!erentiation. Individual

gene families and their potential regulatory roles in the maintenance of the stem cell

state may provide testable hypotheses in existing stem cell types, as well as suggest

potential markers for less well-studied stem cells. However, the SMA method can also

be viewed as a feature selection method, which identifies the most relevant features

(stemness families) that could be used to distinguish between stem cell and non-stem

cell types.

In its most simple form, classification can be done through the calculation of

a single stemness index score that measures how stem-cell-like the population is. A

general advantage of such a score is that it avoids the explicit binary classification:

stem cell/non-stem cell, as there is a range of cells at various stages of di!erentiation in

between these two classes.

107

Page 123: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

5.9.1 Notation definitions

I will denote the set of all stemness (“on”) modules with S, and the set of

all di!erentiation (stemness “o!”) modules with D. An individual module will still be

denoted with M, and the total number of genes in module M will still be denoted with

n.

Each index over stemness modules can be written as a, while each index over

the di!erentiation modules can be written as b. The total number of stemness modules

will be labeled with r and the total number of di!erentiation modules will be denoted

with u.

The stemness index score will be written as SI, and each new experiment gene

signature will be denoted with E. The activation of an individual module M will be

labeled as A, while the fraction of upregulated genes in a module will be denoted with

f.

A summary of all stemness index-related notation symbols can be found in

Table 5.13.

5.9.2 Cross-validation setup

To select the most accurate stemness index score, I used a cross-validation

framework and specifically a 5-fold cross-validation setup based entirely on the mouse

stem cell compendium data.

The basic cross-validation setup is shown in Figure 5.10.

The 49 stem cell populations were randomly assigned to five di!erent groups,

108

Page 124: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Stem cell compendium (49 experiments)

Stem cell

subset 1

(10 exp.)

Stem cell

subset 2

(10 exp.)

Stem cell

subset 5

(9 exp.)

Stem cell

subset 4

(10 exp.)

Stem cell

subset 3

(10 exp.)

SMA

Train:

B+C+D+E

A B C D E

SMA output:

Stemness

modules

Stemness

index input

experiments:

A

SMA output:

Stemness

modules

Stemness

index input

experiments:

B

SMA

Train:

A+C+D+E

SMA output:

Stemness

modules

Stemness

index input

experiments:

C

SMA

Train:

A+B+D+E

SMA output:

Stemness

modules

Stemness

index input

experiments:

D

SMA

Train:

A+B+C+E

SMA output:

Stemness

modules

Stemness

index input

experiments:

E

SMA

Train:

A+B+C+D

Stemness index output:

49 stemness indices

Figure 5.10: Visual illustration of the five-fold cross-validation setup. The stem cellcompendium is divided into five parts and each part uses 4/5 of the original data toidentify stemness modules (black solid lines) and 1/5 of the original data (black dashedlines) to test and evaluate the stemness index scores.

109

Page 125: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Notation symbol Notation descriptionE New experiment gene signatureM Module of genesS Set of all stemness “on” modulesD Set of all stemness “o!” (di!erentiation) modulesn Number of genes in module Mr Number of stemness modules in Su Number of di!erentiation modules in Da Index over the stemness modules in Sb Index over the di!erentiation modules in DSI Stemness index scoreA Activation score for a gene modulef Fraction of upregulated genes in a module

Table 5.13: Overview of the stemness index score notation. The first column representsthe notation symbol used in the stemness index scoring definition. The second columngives the descriptions for all notation symbols introduced in the first column.

where four groups consisted of ten experiments and one consisted of nine experiments.

I then created five datasets from these groups, where in each dataset four of the groups

were all used together as input to the SMA method, while the last group was used

for testing. Thus, each dataset consisted of 39 or 40 input stem cell populations and

the stemness score definition used only homolog modules as input. For each stem cell

dataset I also defined the corresponding di!erentiated cell dataset. The di!erentiated

cell dataset consisted of all di!erentiated cell experiments, associated with the particular

stem cell experiments in that cross-validation subset.

Following the SMA method application described earlier, I identified stemness

modules from each cross-validation set and the summary of the results is presented in

Tables 5.14 and 5.15. Because of the random assignment of experiments to each of the

initial five groups, some of the groups had a smaller number of actual stem cell types

than the original twelve stem cell types. To define the stemness modules, I used the

110

Page 126: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

CV fold # recurrent # stemness “on” # recurrent # stemness “on”stem cell homolog stem cell functionalhomolog modules functional modules

CV fold 1 256 74 90 39CV fold 2 225 82 81 53CV fold 3 163 84 58 39CV fold 4 163 50 65 43CV fold 5 194 40 72 28

Table 5.14: Stemness “on” modules identified in each cross-validation (CV) fold. Everyrow represents summary data for a di!erent cross-validation subset. The first columndefines the cross-validation folds. The second column (# recurrent stem cell homolog)shows the number of significantly recurrent homolog modules in the stem cell data fromthat cross-validation subset. The third column (# stemness “on” homolog modules)shows the number of stemness “on” homolog modules inferred from the stem cell data.The fourth column (# recurrent stem cell homolog) shows the number of significantlyrecurrent functional modules in the stem cell data from that cross-validation subset.The fifth column (# stemness “on” functional modules) shows the number of stemness“on” functional modules identified from the stem cell data.

previously established significance cuto!s for recurrence, but adjusted the significance

cuto!s for the cell-type diversity to proportionally match the original cuto! based on

the number of cell types used in training. For example, if the original cell diversity

cuto! was d = 2.5 and the new subset of experiments consisted of only ten stem cell

types instead of twelve, the cuto! was adjusted to d = 2.5 % log(10)log(12) .

To ensure that the stemness and di!erentiation modules inferred from each

cross-validation (CV) fold were robust to the experiment selection, I compared the

recurrence scores for each module across all cross-validation sets. If the modules were

highly robust, they would show similar recurrence scores across all cross-validation sets.

Indeed, this was the result I observed. Most modules showed highly similar scores across

most cross-validation sets (Figure 5.11), suggesting that the stemness and di!erentiation

modules identified from each cross-validation set are unlikely to be highly a!ected by

111

Page 127: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

CV fold # recurrent # stemness “o!” # recurrent # stemness “o!”di!. cell homolog di!. cell functionalhomolog modules functional modules

CV fold 1 210 44 42 16CV fold 2 142 27 27 11CV fold 3 93 24 17 10CV fold 4 81 8 20 2CV fold 5 218 16 39 6

Table 5.15: Di!erentiation families identified in cross-validation (CV) fold. Every rowrepresents summary data for a di!erent cross-validation subset. The first column definesthe cross-validation sets. The second column (# recurrent di!. cell homolog) shows thenumber of significantly recurrent homolog modules in the di!erentiated cell data fromthat cross-validation subset. The third column (# stemness “o!” homolog modules)shows the number of stemness “o!” (di!erentiation) homolog modules inferred from thedi!erentiated cell data. The fourth column (# recurrent di!. cell homolog) shows thenumber of significantly recurrent functional modules in the di!erentiated cell data fromthat cross-validation subset. The fifth column (# stemness “o!” functional modules)shows the number of stemness “o!” (di!erentiation) functional modules identified fromthe di!erentiated cell data.

the experiment selection in each cross-validation set.

5.9.3 Binary switch and fractional scoring

One naive approach for measuring the association of a new experiment with

stemness is to summarize each stemness module by a single value, which captures the

activation of the module in a given experiment. Two primary candidates include a

“binary switch” score and a “fraction of gene activation” score. Given a new set of

di!erentially expressed genes, the binary switch score considers a module activated

if there is at least one gene in the module that has been identified as di!erentially

expressed. The fractional score, on the other hand, estimates the activation of a module

as the fraction of the genes in that module that have been marked as di!erentially

112

Page 128: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

CV3 CV1 CV2 CV5 CV4

Figure 5.11: Robustness of recurrence scores across cross-validation (CV) folds. Theheatmap shows the recurrence scores of all mouse homolog modules in each cross-validation fold. Each column represents a di!erent cross-validation fold. Each rowrepresents a di!erent homolog module. To ease viewing, only modules that had a non-zero recurrence score in at least one cross-validation fold were plotted. High recurrencescores are shown in yellow, while the low (or zero) recurrence scores are shown in black.

113

Page 129: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

expressed.

To combine the activation input from all stemness modules, we can estimate

the average stemness module activation score. Alternatively, we can define a weighted

average stemness module activation score, where the weight contribution associated

with each stemness module is the cell-diversity of the module, based on the results of

the application of the original SMA method.

These possibilities suggest four di!erent scoring combinations: a weighted frac-

tional, weighted binary, unweighted fractional, and an unweighted binary score. The

formulas for the stemness index SI of a new experiment gene signature E is described in

Equation 5.19 (unweighted scoring combinations) and Equation 5.20 (weighted scoring

combinations):

SI (E) =&

a"S Aa(E)r

(5.19)

SI (E) =&

a"S d(a)Aa(E)&a"S d(a)

(5.20)

Aa(E) =

"####$

####%

1, if binary score;

fa(E), if fractional score,

where d(a) is the cell diversity score associated with each stemness module a, and fa(E)

is the fraction of genes in module a that are upregulated and are represented in the

experiment gene signature E (i.e., number of genes in module a that are upregulated,

114

Page 130: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

divided by the total number of genes in the module).

All four stemness index scores – binary unweighted, binary weighted, fractional

unweighted and fractional weighted were evaluated on each of the cross-validation sets.

Based on the stem cell (SC) experiments and di!erentiated cell (DC) experiments and

only the homolog-based stemness features, I defined the precision and recall as follows:

Precision =#SC exp. with index > c

(#SC exp. with index > c) + (#DC exp. with index > c)(5.21)

Recall =#SC exp. with index > c

(#SC exp. with index > c) + (#SC exp. with index " c)(5.22)

I swept through the cuto!s c and compared the precision-recall curves asso-

ciated with each score. Initial observations suggested high similarity between all four

scores, though the fractional weighted and unweighted scoring measures were most sim-

ilar and performed slightly better than their binary counterparts (Figure 5.12).

I also evaluated the precision of each stemness index score directly as a function

of the score (Figure 5.13) and observed the highest correlation between the precision

and index score ($ = 0.869) using the weighted fractional score.

While these stemness index scoring measures captured some stem-cell-associated

signal, overall their performance was relatively poor. This poor performance could be

due to the lack of su#cient signal in the activation scores, which were the most signifi-

cantly contributing source of information to the stemness index scores. In addition, con-

sideration of not only stemness, but di!erentiation modules as well could have improved

the overall accuracy of these measures. In this context, I examine a more sophisticated

115

Page 131: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

!

!!

!

!

!!

0.0 0.2 0.4 0.6 0.8 1.0

0.5

0.6

0.7

0.8

0.9

1.0

Recall

Pre

cis

ion

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! Weighted fractional score

Unweighted fractional score

Weighted binary score

Unweighted binary score

Figure 5.12: Precision-recall comparison of four stemness index scores - a binary un-weighted (green dashed line), a binary weighted (blue dashed line), a fractional un-weighted (red dashed line), and a fractional weighted (black solid line) scoring measures.The calculation of the stemness indices is based on stemness “on” homolog features only.Recall is plotted on the x-axis, while precision is shown on the y-axis. The most accuratemethod is the one approximately closest to the upper right-hand-side corner.

116

Page 132: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

!

!

!

!

!

! !

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

0.0

0.2

0.4

0.6

0.8

1.0

Fractional score

Pre

cis

ion

!

!

!

!

!

! !Weighted fractional score

Unweighted fractional score

!

!

!

!

!

!

!

!

!

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Binary score

Pre

cis

ion

!

!

!

!

!

!

!

!

!Weighted binary score

Unweighted binary score

Figure 5.13: Precision comparison of four stemness index scores. The upper panelshows the precision comparison between the binary unweighted (green dashed line) anda binary weighted (blue solid line) scoring measures. The lower panel shows the preci-sion comparison between the fractional unweighted (red dashed line), and a fractionalweighted (black solid line) scoring measures. Each x-axis shows directly the values pro-duced by the scoring measures plotted in that panel. The y-axis shows the precisionvalue.

117

Page 133: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

stemness index approach next.

118

Page 134: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

5.9.4 Log-likelihood approach

Based on the methodology described earlier, the SMA method can identify not

only stemness modules, but also di!erentiation modules. A more sophisticated approach

would incorporate not only the stemness modules as features, but the di!erentiaiton

modules as well. Such a score would be naturally balanced around zero, where positive

scoring experiments are more stem cell-like, while negative scoring experiments are more

di!erentiated cell-like.

It is also natural to incorporate all identified features, homolog and functional,

and compare the classification accuracy associated with the use of each type.

Each module can still be weighted by its own cell diversity, while the activation

element of the score can be associated with a value that captures the level of enrichment

of the module in the new experiment. While the fraction of upregulated genes described

as a possible choice in Section 5.9.3 carries some information about the enrichment of the

module, we also want to account for the module size and the deviation from expectation.

I consider two other related module activation scores:

• the log-ratio of the observed fraction of activated genes to the expected fraction

of activated genes, as given by the hypergeometric distribution (log-ratio method)

• the significance of overlap between the module and the genes upregulated in the

new experiment, as measured by a -log10 p-value score (p-value method)

I incorporated all of these considerations into the formulation of four di!erent

scores, mathematically formulated as shown in Equations 5.23–5.26. The first two equa-

119

Page 135: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

tions (Equations 5.23–5.24) represent the log-ratio method, while Equations 5.25–5.26

correspond to the p-value method. All four formulas, however, build on the weighted

fractional scoring introduced earlier in Equation 5.20.

SI (E) =

&a"S d(a)log(Observed[fa(E)]

Expected[fa(E)])&a"S d(a)

$&

b"D d(b)log(Observed[fb(E)]Expected[fb(E)])&

b"D d(b)(5.23)

SI (E) =

&a"S d(a) + log(Observed[fa(E)]

Expected[fa(E)])&a"S d(a)

$&

b"D d(b) + log(Observed[fb(E)]Expected[fb(E)])&

b"D d(b)(5.24)

SI (E) =&

a"S d(a)($log10ha(E))&a"S d(a)

$&

b"D d(b)($log10hb(E))&b"D d(b)

(5.25)

SI (E) =&

a"S d(a) + ($log10ha(E))&a"S d(a)

$&

b"D d(b) + ($log10hb(E))&b"D d(b)

(5.26)

Here d(a) is still the cell diversity score associated with stemness module a,

while d(b) is the cell diversity score associated with di!erentiation module b. fa(E) is

the fraction of upregulated genes in module a in experiment E.

The log-ratio scores observed in Equations 5.23 and 5.24 are estimated based

on the observed fraction of upregulated genes in the module and the expected fraction

as calculated from the hypergeometric distribution. Finally, ha(E) corresponds to the

p-value of the gene overlap between the module a and the set of upregulated genes in the

new experiment gene signature E as estimated using the hypergeometric distribution.

120

Page 136: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

I tested these four di!erent scores using three di!erent feature types: homolog

modules-only, functional modules-only and a combined feature set for a total of twelve

di!erent scoring evaluations. I measured the accuracy of each stemness index scoring

method using a precision-recall curve based on a sweep of all possible cuto!s ([$10...10])

(Figure 5.14).

The results indicated that functional families perform globally worse than the

homolog-based and combined-based feature sets regardless of the choice of scoring. The

combined feature set compared better to the homolog-only based features, even though

on a score-by-score basis homolog-based features performed best (Figure 5.14).

I also observed a marked improvement in the accuracy of the log-ratio method

over the p-value method. The method used for combining module score elements –

multiplication versus sum – was not as crucial, though the multiplication method of-

ten showed higher accuracy. For the log-ratio activation measure method that used

combined features, the accuracy of the “mult” method was noticeably higher than the

“sum” method (Figure 5.14).

All homolog-based feature scores showed a significant improvement over the ini-

tial weighted binary or fractional scoring methods. This marked improvement in scores

was most likely due to the incorporation of information about the expected activation of

individual stemness modules, as opposed to the raw numbers of upregulated genes used

as a measure of gene set (module) activation in the initial more naive approaches. Addi-

tionally, it is possible that the di!erentiation modules provided additional independent

information from the stemness modules, which improved the predictive accuracy.

121

Page 137: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Based on these results and observations, I selected the multiplicative-based

log-ratio method with a homolog-based stemness and di!erentiation module feature set

as the final method to use for measuring stemness index scores. All further analyses use

Equation 5.27.

SI (E) =

&a"S d(a)log(Observed[fa(E)]

Expected[fa(E)])&a"S d(a)

$&

b"D d(b)log(Observed[fb(E)]Expected[fb(E)])&

b"D d(b)(5.27)

To assess if the stemness and di!erentiation homolog modules were truly pre-

dictive, I compared the performance of the homolog feature set with the average per-

formance of 100 randomly selected homolog feature sets of equal size using the newly

defined stemness index score. The stemness and di!erentiation module set of features

performed significantly better than random features (t=4.3182 p-value = 1.763e-05;

paired Student t-test), as shown in Figure 5.15.

Finally, using the final stemness index scoring method I compared the pooled

stemness index scores of all stem cell and di!erentiated cell experiments in the cross-

validation sets from the mouse compendium. The stem cell experiments in the com-

pendium had significantly higher stemness indices than the di!erentiated cell experi-

ments (t=9.8366; p-value < 2.2e-16; Welch two-sample t-test; Figure 5.16). These re-

sults indicated the stemness index score method could successfully distinguish between

stem cell and di!erentiated cell experiments, which opens up some exciting application

possibilities, such as the evaluation of the self-renewal potential of any cell based on its

gene expression signature. Some of the possible applications are discussed further in

122

Page 138: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0.0 0.2 0.4 0.6 0.8 1.0

0.5

0.6

0.7

0.8

0.9

1.0

Recall

Pre

cis

ion

Homolog P!value Mult

Homolog LogRatio Mult

Homolog P!value Sum

Homolog LogRatio Sum

Functional P!value Mult

Functional LogRatio Mult

Functional P!value Sum

Functional LogRatio Sum

All P!value Mult

All LogRatio Mult

All P!value Sum

All LogRatio Sum

Figure 5.14: Precision-recall comparison of twelve stemness index scores, based onhomolog-only (red), functional-only (blue) and combined (green) features. X-axis mea-sures the recall associated with each method, while the y-axis measure the precisionof each method. The most accurate method should be approximately in the top right-hand-side corner. The comparison between the twelve stemness index scoring measuressuggests that the multiplicative-based log-ratio method, based on a homolog-based fea-ture set (red dashed line) has the highest accuracy.

123

Page 139: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0.0 0.2 0.4 0.6 0.8 1.0

0.5

0.6

0.7

0.8

0.9

1.0

Recall

Pre

cis

ion

Stemness homolog features(logRatio mult)

Random homolog features(logRatio mult)

Figure 5.15: Precision-recall comparison of the real stemness and di!erentiation featuresto 100 randomly selected feature sets. The red line indicates the performance of thereal feature set of stemness and di!erentiation homolog modules, while the black dashedline shows the average performance of 100 random homolog feature sets of the samesize as the original feature set. The real stemness and di!erentiation features performsignificantly better than the average random feature sets.

124

Page 140: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Chapter 8.

5.10 Summary

This chapter gave a detailed description of the SMA method designed to iden-

tify and measure stemness mechanisms in di!erent stem cells types. Section 5.1 provided

a brief overview of the method. The microarray expression data used in these studies

was summarized in Section 5.2. Section 5.3 described the generation of functional and

evolutionarily-related gene modules. Section 5.4 outlined the general form of the re-

currence score, as well as the methods used to select the final parameters of the score.

Section 5.5 introduced the cell-type and gene-based diversity scores used later for mod-

ule type classification. Section 5.6 described the specificity score used to distinguish

between modules used primarily by stem cells and modules used by di!erentiated cells

as well. Section 5.7 used these three scoring measures to classify modules into di!erent

pattern types. Section 5.8 briefly discussed the methodology for the inference of stem-

ness “o!” modules, along with the stemness “on” modules. The last section, Section

5.9 outlined various scoring measures that assess how stem cell-like a gene signature is.

125

Page 141: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

!

!

Diff.cell Exp. Stem cell Exp.

!2

02

4

Ste

mness index s

core

Figure 5.16: A box-plot comparison of the stemness indices of all stem cell and dif-ferentiated cell experiments in the mouse stem cell compendium, as defined by thecross-validation setup. Stem cell signatures (right side) in the compendium show sig-nificantly higher stemness indices than di!erentiated cell signatures (left side) based oncross-validation results. X-axis shows the two input types, stem cell and di!erentiatedcell signatures, while the y-axis shows the stemness index (SI) score. The thick bandsin the middle of each box represent the median values, while the lower and upper endsof each box represent the 25th and 75th percentiles.

126

Page 142: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Chapter 6

Stemness mechanisms in mouse stem

cells

The goal of this chapter is to address the first central question of this dis-

sertation: do functional redundancy and tissue-specific expression mask the common

stem cell mechanisms? The chapter presents and summarizes the results of the appli-

cation of the SMA method to a large mouse stem cell compendium of gene expression

data, collected as part of the study. Section 6.1 describes the selection of recurrently

upregulated modules from the mouse data set compendium, while Section 6.2 presents

the selection of cell diverse modules. Section 6.3 summarizes the classification of all

significantly recurrent mouse modules into the di!erent pattern types introduced in the

previous chapter. Section 6.4 discusses some of the most interesting stemness modules

identified by the stemness meta-analysis method. Section 6.5 presents a comparison of

the results of the SMA method to its most similar other study in the literature.

127

Page 143: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

6.1 Identification of recurrent modules from mouse dataset

compendium

I tested a collection of diverse modules of functionally- and evolutionarily-

related genes. All homolog and functional modules were defined and processed as de-

scribed in Section 5.3. After neighbor expansion, I identified 9,908 homolog families,

comprised of 4,653 mutually-exclusive homolog groups (with two or more gene members)

and 5,255 biological singletons (genes without a close homolog in the mouse genome),

as well as 611 candidate functional gene modules.

I applied the recurrence, diversity and specificity scoring measures to the entire

collection of mouse gene modules. For each type of score, I estimated the false-discovery

rate using the permutation analysis described in Section 5.4. I found a significant shift in

recurrence scores between the gene modules and the negative controls, which consisted

of sets of randomly grouped genes with the same size distribution as the real gene

modules (t = 7.2669, p-value = 3.498e–13; Figure 6.1).

The recurrence analysis showed that at a 5% FDR cuto! 266 homolog modules

(Figure 6.2) and 94 of the functional modules (Figure 6.3) were coordinately upregulated

across stem cell experiments. This is consistent with the existence of a set of homolog

families with genes coordinately upregulated in stem cells.

128

Page 144: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0.1 0.7 1.3 1.9 2.5 3.1 3.7 4.3 4.9 5.5

1e!05

1e!03

1e!01

1e+01

05

10

15

1.3 1.7 2.1 2.5 2.9 3.3 3.7 4.1 4.5 4.9 5.3 5.7

0100

200

300

400

500

600

0.1 0.7 1.3 1.9 2.5 3.1 3.7 4.3 4.9 5.5

Recurrent upregulation score

Num

ber

of m

odule

s

Sig

nifi

cance c

uto

ff

Recurrent upregulation score

Num

ber

of m

odule

s

Modules

Randomized modules N

um

ber

of m

odule

s

Recurrent upregulation score

A.

B. Modules

Randomized modules

Figure 6.1: (A.) A representative recurrence score distribution of all 1098 modules ofsize 3 shown in black – 1091 homolog modules and 7 functional modules – compared torandomized modules shown in yellow, based on 1000 permutations of the original data,indicates a significant shift in recurrence scores. X-axis shows the recurrent upregulationscore, while the y-axis shows the number of modules in each bin. The recurrence scoresignificance cuto! is 1.4. (B) The same recurrence score distribution is shown on a logscale. Empty bins have been assigned a floor value of 0.00001 to facilitate log scaleplotting.

129

Page 145: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0 1 2 3 4 5

0.05

0.10

0.20

0.50

1.00

FDR=5%30 modules47

43

62

Total = 266 significant

modules

Module size = 1 gene

2

3

4

# modules of that

size that satisfy score

cutoff

5-10

> 10

66

18

Recurrent upregulation score

FD

R

Figure 6.2: Selection of significant recurrently upregulated mouse homolog families. X-axis corresponds to the size-dependent recurrence score, while the y-axis show the falsediscovery rate on a log scale. Each color represents the FDR curve associated with adi!erent module size. The value under each colored arrow represents the number ofupregulated homolog families of that size that passed the recurrence cuto! for that size,shown in Table 5.11. FDR cuto! used to identify significantly recurrent modules was5%.

Non-recurrent

functional groups

(518 groups)

Recurrent functional

groups (94 groups)

Non-recurrent

homolog groups

(4417 families)

Recurrent homolog

groups (236 families)

Figure 6.3: Distribution of the modules of each input represented as significantly recur-rently upregulated (yellow). Each pie chart examines only the non-singleton modules(modules with more than one gene member). Non-recurrent homolog modules are shownin blue, while non-recurrent functional modules are shown in red.

130

Page 146: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

6.1.1 Recurrent module swap control

“Housekeeping” gene modules could exhibit broad upregulation across a panel

of unrelated cells completely independent of stem cell mechanisms. Thus, it was possible

that the recurrence analysis would identify significant modules, regardless of the input

data. To assess if such housekeeping processes appreciably confounded the analysis, I

performed a “swap” experiment. The purpose of this experiment was to test whether

functionally similar cells, such as stem cells, coordinately upregulate genes within gene

modules to a greater extent than functionally di!erent populations, such as di!erenti-

ated cells.

For most experiments included in the mouse compendium, I had two input lists

of di!erentially expressed genes. One consisted of the genes upregulated in the stem

cell and the other consisted of the genes downregulated in the stem cell (upregulated in

the di!erentiated cell). To identify significant recurrently upregulated modules in stem

cells, I usually used only the sets of upregulated genes in the stem cell.

The “swap” experiment can be described as follows: I replaced the lists used as

input to the recurrence analysis, such that each list of upregulated genes in the stem cell

was swapped with its corresponding list of genes downregulated in the stem cell. This

swap was performed for an increasing number of experiments until all input lists had

been replaced with their counterparts of downregulated genes. At each swap level, the

swap was performed ten di!erent times, such that the subset of experiments that were

chosen to be swapped was randomly selected. For example, if the lists of upregulated

131

Page 147: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

genes in five di!erent stem cell populations had to be swapped, these five lists were

randomly selected from the entire input set exactly ten times.

If the recurrence of modules was more tightly related to the functional similar-

ity of the stem cells, I would expect that as I replaced the input lists of genes upregulated

in the stem cells with the genes upregulated in the di!erentiated cells, I would identify

fewer recurrently upregulated modules.

To test this expectation, I measured the FDR for functional and homolog

modules separately and evaluated it as a function of the number of swaps. The cuto!s

used to identify recurrent modules in each context were based on stem cell-only input

(zero swaps).

Homolog modules showed a stronger swap trend than functional modules. The

lowest false discovery rate of shared homolog families (FDR=6.8%±0.0006) was achieved

when no swaps were introduced, indicating that stem cells share homolog families to

a significantly greater extent than functionally unrelated cell populations (Figure 6.4).

The functional gene families also showed their lowest false discovery rate in the datasets

associated with no gene list swaps.

The FDR associated with the datasets comprised entirely of stem cells was

slightly above the expected 5% (FDR=6.8% ± 0.0006) most likely due to the fact that

in the swap test, false discovery rates were calculated from a smaller number of random

permutations (100, instead of 1,000) than in the original analysis used to select the

recurrence score cuto!s. Even though the higher number of permutations could have

given the more accurate FDR level, the smaller number of permutations was used to

132

Page 148: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

reduce the computational time.

6.1.2 Cultured cell bias

Because many of the populations were derived from cultured stem cells, I mea-

sured the extent to which the results might be due to the comparison of cultured cells

rather than stem cells. The cultured cells included all neural and embryonic stem cell

sources, as well as a couple of other stem cell types, such as spermatogonial and liver

stem cells. To get a conservative upper bound on the degree of influence of highly pro-

liferative cultured cells on the results, I excluded all data from cultured cell populations

(27/49 data sets) and recalculated recurrence.

I scored the recurrence of modules using only the 22 individual primary cell

populations and identified 112 modules. Similarly, using only the 27 cultured cell pop-

ulations, I identified 177 modules. To estimate the extent to which the results reflect

signatures from only cultured cells, I compared the modules derived from the primary,

cultured, and compendium analyses (Figure 6.5). I found that most of the modules re-

covered in the sub-populations were also detected when the compendium was analyzed

- 76 out of 112 primary modules and 139 out of 177 cultured modules.

The results show that over half of the modules (143 out of 266; 54%) identified

from the compendium are due to the inclusion of primary cells (moon-shaped area in

(Figure 6.5). The signal in recurrent modules that did not have strong primary cell

contribution was not necessarily associated with the cultured status of the cells, as the

cultured cell data included all embryonic and neural stem cells, so the source of the

133

Page 149: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0.0

0.1

0.2

0.3

0.4

0 5 10 15 20 25 30 35 40

Number of swapped datasets

FD

R

Figure 6.4: Swap control of mouse stem cell data. X-axis shows the number of swappedinput lists of upregulated genes, where the values range from 0 swapped experiments(stem-cell-only input) to 40 (di!erentiated-cell-only input). Y-axis shows the false dis-covery rate (FDR) as a function of the number of swapped experiments. For each bar,the average FDR across ten experiments is plotted, while the error bars represent thestandard error from the ten FDR summary measurements. The swap control experi-ment suggested that stem cells share homolog (red) and functional (blue) modules to asignificantly greater extent than functionally unrelated cells (light pink: homologs; lightblue: functional modules). Homolog families showed the lowest FDR and a strongertrend than functional modules, based on the more contrasting results between stem celland functionally unrelated cells.

134

Page 150: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

67

123

6016

Primary cell-only data

Cultured cell-only data

All compendium data

38

36

Figure 6.5: Proliferation bias control shows low bias impact of cultured cell data. EachVenn bubble represents the number of recurrent modules identified using each type ofinput data - cultured-cell-only input (pink), primary cell-only input (blue), and com-bined cell input (yellow). The thick black dashed line demarks the set of recurrentmodules that have primary cell contribution. The moon-shaped area represents 54% ofthe recurrent modules identified using the whole compendium.

signal could also be stem cell-associated.

These results show that the significant recurrent modules identified by the com-

prehensive meta-analysis are not likely to be heavily influenced by the over-representation

of cultured stem cell types in the compendium.

6.2 Selection of significant of cell-diversity scores

Once recurrently upregulated modules were selected, I applied the cell type-

diversity measure to all significantly recurrent mouse modules. The significance cuto!

selection method was identical to the recurrence scoring significance evaluation and was

135

Page 151: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

already described in Section 5.5.3. I used this methodology to select the final cell-

diversity score cuto! of 2.5 (max = 3.6) at a false discovery rate cuto! of 5%. At this

cuto!, 114 of the 266 significantly recurrent homolog families were identified as cell-type

diverse (Figure 6.6).

136

Page 152: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

510

50

100

500

5000

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

1e!04

1e!03

1e!02

1e!01

1e+00

Cell type diversity cutoff

FD

RN

um

be

r o

f h

om

olo

g f

am

ilie

s a

bo

ve

cu

toff

Cell type diversity cutoff

114

All homolog families

Significant recurrent

homolog families

Family size = 1 gene

2

3

4

5

10

20

15

25

>25

Average

FDR=5%

A.

B.

Figure 6.6: Selection of significant cell-type diverse modules in the mouse stem cellcompendium. (A.) The cuto! was selected as the 5% FDR cuto! score associated withthe weighted average of the FDR curves for all family sizes. Each color represents theFDR curve associated with a di!erent module size. X-axis represents the cell diversityscore, while the y-axis shows the FDR in log scale. To facilitate log plotting, a floorvalue of 0.0001 is selected for all entries that would be otherwise 0. (B.) At the 5%FDR cuto!, 114 recurrent homolog families (red) passed the criteria and were labeledas cell-type diverse modules.

137

Page 153: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

6.3 Classification of modules using diversity and speci-

ficity scoring

Using the diversity and specificity scoring schemes, I classified the significant

recurrent evolutionary and functional modules into the classes described in Section 5.7.

6.3.1 Single-gene stemness

To distinguish between the contribution of the modules and the contribution

of individual genes, I first, however, used the SMA method on single-genes-only input,

i.e. every gene was treated as a module. Consistent with the founder studies, few

genes were upregulated in common across di!erent stem cells. In total, 38 genes had

significant recurrence, cell diversity and specificity scores and I refer to them as stemness

genes. Examples of stemness genes were Mcm2, Mcm4, Pcna, Set, Cdt1 and several

cyclin genes. Many of the identified stemness genes have been implicated primarily in

roles associated with replication and the replication fork, though they could have other

unknown roles in the cell as well. The full list of stemness genes is shown in Table 6.1.

6.3.2 Module-level stemness

At the evolutionary and functional module level I identified 124 all-for-all

(AFA) stemness modules and 38 one-for-all (OFA) stemness modules (Table 6.2; Fig-

ure 6.7). Of the OFA stemness homolog modules, 4 were trivial as they were among

the single stemness genes identified earlier. Homolog modules were enriched among the

138

Page 154: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Functional association Stemness genes Enrichment p-valueDNA replication Pcna,Cdt1,Orc1l, Mcm2, 1.243713e$ 05

Mcm4, Mcm5, Cdc6, Rrm2Cell divison Top2a,Ccnd2,Mcm5, Cks2, 0.0008

Bub1, Cdc6, Hells, Ruvbl1Chaperonin Cct3, Cct5, Cct8 —Other Kpna2, Ncl, Nap1l1, Dph5, —

Ttk, Col18a1, Impdh2, Ipo5,Shmt1, Depdc6, Set, Fignl1,Dnahc11, Shroom3,Prps1, Hnrnpa2b1, Sfrs3,Dtymk, Csrp2, Eya2, Fbl

Table 6.1: A summary of the 38 stemness genes identified by the single-gene SMAmethod is shown along with the functional categories significantly enriched in the stem-ness genes. The first column gives the name of the functional category. The secondcolumn shows the names of the stemness genes that fall in that functional category.Genes that have been associated with more than one significant functional categoriesare italicized. Genes placed in the “Other” category are not identified with any sig-nificant functional group. The third column shows the p-value associated with thesignificance of the overlap, as measured by the hypergeometric distribution, betweenthe set of all 38 stemness genes and the functional category represented in that row. Ifthe p-value> 0.05, no value is shown.

139

Page 155: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

identified AFA and OFA patterns.

Interestingly, approximately 70% (80/110) of the stemness AFA and OFA ho-

molog modules did not contain a single recurrent stemness gene. Therefore, the diversity

of these homolog modules derives mainly from the complementary pattern of upregu-

lation across di!erent member genes and could not have been identified using single

gene-level analysis.

In summary, 78 stemness homolog AFAs (HAFA; Table B.1) and 46 functional

stemness AFAs (FAFA; Table B.2) were identified, as well as 25 stemness homolog OFA

modules (HOFA; Table B.3) and 13 stemness functional OFA modules (FOFA; Table

B.4). These results indicate an approximately 3-fold enrichment of modules (both ho-

molog and functional) that upregulate a high number of their member genes (AFA mod-

ules) over modules with a lower gene usage (OFA modules). For comparative purposes,

if we select for non-stemness (tissue-specific) patterns, there is only 1.2-fold enrichment

of modules that use a high number of their member genes over modules that use a low

number of their member genes.

140

Page 156: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Class Cell Gene Specificity Homolog Functionaldiversity diversity modules modules

AFA + + + 78 46OFA + - + 25(21) 13AFO - + -/+ 84 17OFO - - -/+ 68(42) 15CM + + - 10 2CG + - - 1 1

Total 266(236) 94

Table 6.2: Summary of the classification and distribution of all recurrently upregulatedgene modules in the mouse stem cell compendium. The stemness modules (AFA andOFA pattern classes) identified by the SMA method are shown in bold.

141

Page 157: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0 1 2 3 4 5 6 7 0 1 2 3 4 5 !1.5 !1.0 !0.5 0.0 0.5 !0.4 !0.2 0.0 0.2 0.4 !1 0 1 2 3 4

Module size RecurrenceCell-type

diversity

Gene

diversitySpecificity

Stemness modules (AFA) Stemness modules (OFA)

Recurrent, non-stemness modules

1 12816 Sig. cutoff Threshold Threshold Threshold

Figure 6.7: Global overview of classes of functional and homolog recurrent modules.Red and green demarcate all stemness homolog and functional modules. All recurrentmodules that did not pass the criteria for stemness are shown in grey.

142

Page 158: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

I observed many families with a high level of interchangeability between indi-

vidual gene members, which suggests the dispensability of individual genes. One clear

example of tissue specificity among the stemness AFA homolog familes came from the

p53 family of genes, specifically p53 (Trp53 in Figure 6.8), p63 (Trp63) and p73 (Trp73).

p63 is necessary primarily for the maintenance of epithelial stem cells and the stemness

meta-analysis showed a clear separation of the expression of this gene in breast, gastric

and intestinal stem cells (Figure 6.8), while p53 – its well-known paralog – was upreg-

ulated in most other stem cell types, including hematopoietic, embryonic and neural

stem cells.

The stemness analysis results indicate that some homolog families – Myb (Fig-

ure 6.9) and Rbp (Figure 6.10) families – show a clear alternation in gene usage between

di!erent stem cell types, while other groups can use a variable set of genes within the

di!erent types of stem cells.

The mechanisms of specialization to these di!erent stem cells are particularly

interesting. Specifically, some families may have developed a mostly exclusive cell type

specificity of individual genes, which we can trace through the evolution of the homolog

family. Other groups may, however, allow for a more stochastic nature of homolog

partner gene use in the cell, which may be advantageous for the maintenance of the

stem cell state. Understanding the di!erences between these module types may also

allow us to better predict the contribution of individual genes to self-renewal.

143

Page 159: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Figure 6.8: Stem-cell-only expression pattern of the p53 tumor suppressor gene family– p53 (Trp53), p63 (Trp63), and p73 (Trp73). The pattern shows the specializationof p53 and p63 to di!erent cell types. Each column represents one of the twelve stemcell types. Each row represents a di!erent module member gene. The value for eachgene in a given cell type is calculated as the average number of experiments of thatstem cell type that measured the gene as upregulated, ranging from 0 (the gene was notupregulated in a single experiment) to 1 (the gene was upregulated in all experimentsof that cell type). Genes that have not been tested in a given cell type are shown ingrey.

144

Page 160: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Figure 6.9: Stem-cell-only expression pattern of Myb gene family – a-myb (Mybl1),b-myb (Mybl2), and c-myb (Myb). Each column represents one of the twelve stem celltypes. Each row represents a di!erent module member gene. The value for each genein a given cell type is calculated as the average number of experiments of that stemcell type that measured the gene as upregulated, ranging from 0 (the gene was notupregulated in a single experiment) to 1 (the gene was upregulated in all experimentsof that cell type). Genes that have not been tested in a given cell type are shown ingrey. The family shows a very diverse pattern of upregulation across di!erent familymembers.

145

Page 161: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Figure 6.10: Stem-cell-only expression pattern of Rbp gene family – Rbp1, Rbp2, andRbp7. Each column represents one of the twelve stem cell types. Each row represents adi!erent module member gene. The value for each gene in a given cell type is calculatedas the average number of experiments of that stem cell type that measured the gene asupregulated, ranging from 0 (the gene was not upregulated in a single experiment) to 1(the gene was upregulated in all experiments of that cell type).

146

Page 162: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

6.4 Stemness modules

Families whose member genes are involved in the maintenance of the stem cell

state across a large number of stem cell types are particularly biologically interesting,

because they can be potentially used as markers for stem cell types that are not well

understood. Alternatively, these genes could also be candidates for pluripotency or

multipotency induction in mature cells of various cell types.

I primarily focused on the 103 significant stemness (AFA and OFA; Tables

B.1 and B.3) homolog families that showed the highest level of stem cell type diversity.

Functional enrichment analysis of the stemness homolog modules showed a wide variety

of functional categories varying from Wnt pathway signaling to chromatin assembly,

phosphatase activity, ligase and ATPase activities and DNA repair (Figure 6.11; up-

per panel). I also identified stemness functional modules associated with imprinting,

chromatin-dependent silencing, heterochromatin and the nuclear lamina, consistent with

a wide-spread suppression of many lineage-associated genes before di!erentiation (Fig-

ure 6.11; lower panel).

A few themes emerged pertaining to the stemness families. These themes

provide an understanding of how the cell may achieve the balance between quiescence,

proliferation, apoptosis, and di!erentiation. I observed several proto-oncogene families,

balanced by the expression of more than a few tumor suppressor factor families; signaling

pathways known to increase self-renewal along with proteins known to downregulate

signaling and bring back the cell to quiescence. Other regulatory molecules included

147

Page 163: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

RNA splicingligase activity DNA-dependent ATPase activity

Functional categories enriched in

stemness homolog (AFA and OFA)

modules

chromatin assembly/disassemblyresponse to stimulus

DNA repairDNA metabolic processprotein metabolic process

protein Tyr/Ser/Thr phosphatase activityregulation of cell cycletelomeric DNA binding

molecular adaptor activityactive transmembrane transporter activityphosphoprotein phosphatase activityWnt receptor signaling pathway

hydrolase activitydevelopmental processcell-matrix adesion

Stemness (AFA and OFA)

functional modules

imprintingregulation of TFGb signaling

PU1-associated protein complex

regulation of DNA replication initiation

methylation-dependent chromatin silencing

centric heterochromatin

dosage compensation by X chromosome inactivation

double-stranded break repair

nuclear lamina

liver developmentRNA helicase activity

MammarySC

EpSC

MSC

SSC

HSC

ESC

TSC

GastricSC

InSC

NSC

LiverSC

RSC

0

1

MammarySC

MSC

EpSC

SSC

HSC

ESC

TSC

GastricSC

InSC

NSC

LiverSC

RSC

Figure 6.11: Functional categories represented in the stemness homolog (top panel)and functional (bottom panel) gene modules. Each row represents a di!erent stemnessmodule. The value for each module in a given cell type is calculated as the fraction ofupregulated genes in the module in the stem cell type. The value could range from 0(no genes upregulated in the stem cell type) to 1 (all genes upregulated). The categoriesshown to the right of the upper panel heatmap consist of functional categories signifi-cantly enriched in the stemness homolog modules, as measured by the hypergeometricdistribution. The categories shown to the right of the lower panel heatmap consist ofrepresentative names of functional modules.

148

Page 164: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

chromatin remodeler families along with lineage-specific inhibitors. These molecules

were supported by chaperone proteins and signal transduction proteins. As the niche

input is indispensable to a stem cell, I also found common adhesion molecules.

In the next few paragraphs, I briefly discuss some of the most interesting

examples of stemness families and summarize the current knowledge on their role in stem

cell biology. The reader should note that several families (Itga, Frizzled, TCF/LEF,

and Chd/Smarc) are left out of the discussion in this chapter, as they also make an

appearance in the human stemness analysis, so they are discussed in detail in the next

chapter.

6.4.1 Oncogenes: Myb family

One of the highest scoring stemness families was the Myb family of oncogenes:

a-myb (Mybl1 in Figure 6.9), b-myb (Mybl2) and c-myb (Myb; Figure 6.9). I found

a very diverse gene expression pattern in this family among the di!erent stem cell

types. Previously, c-myb has been suggested to control hematopoietic proliferation and

di!erentiation [149]. Recently, it was also implicated as a potential master regulator

of di!erentiation, as RNAi-induced silencing of this gene in a human leukemic cell

line mimicked very closely the e!ects of the application of a di!erentiation inducing

drug [169]. In our murine stemness analysis, c-myb was activated not only in the

hematopoietic system, but also in neural, embryonic, intestinal and retinal stem cells.

A-myb complemented the expression pattern of its partner genes by showing significant

upregulation in gastric stem cells, while b-myb was upregulated in trophoblast stem

149

Page 165: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

cells. The neural and embryonic stem cell types used all members of this proto-oncogene

family.

6.4.2 Tumor suppressor factors: Sfrp family

Aside from the p53 family, other putative tumor suppressor families could

also play important roles in stem cell regulation. Specifically, while the self-renewal-

associated Wnt pathway had a significant presence in the stemness family set with two

of its most important components – the Frizzled family of receptors and the TCF/LEF

family of transcription factors (e!ectors of "-catenin accumulation), one of the largest

families of Wnt inhibitors, the secreted Frizzled-related protein (Sfrp) family, was also

classified as a stemness family.

In human, Sfrp has five protein members, but three of them – Sfrp1, Sfrp2,

and Sfrp5 – represent an independent subfamily from Sfrp3 and Sfrp4 with di!erent

ligands [153]. Interestingly, the stemness analysis recognized Sfrp1, Sfrp2 and Sfrp5 as

a separate, putative stemness family. The stemness upregulation pattern (Figure 6.12)

showed a wide use of all family members in di!erent stem cell types.

Sfrp genes could function in several di!erent ways – they could either interact

with the Wnt proteins directly, or they could modulate each other or various other

receptors and inhibit BMP signaling [22]. Sfrp2 has also been shown to interact with

the fibronectin integrin !5"2 receptor complex and regulate cell adhesion [22]. One

interesting model in intestinal stem cells suggests that Sfrp genes can function as part of

Hedgehog signaling, where in normal intestinal di!erentiated cells, Hedgehog is active

150

Page 166: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Figure 6.12: Secreted Frizzled-related protein (Sfrp) family expression in stem cells.The Sfrp family is one of the major inhibitors of the spread of Wnt signaling and showsa very diverse pattern of upregulation in di!erent stem cells. Each column representsone of the twelve stem cell types. Each row represents a di!erent module member gene.The value for each gene in a given cell type is calculated as the average number ofexperiments of that stem cell type that measured the gene as upregulated, ranging from0 (the gene was not upregulated in a single experiment) to 1 (the gene was upregulatedin all experiments of that cell type).

151

Page 167: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Wnt

Wnt

WntWnt

Wnt

Wnt

WntSfrp1

Wnt Wnt-expressing stem cell

Differentiated cell

Progenitor cell

Intestinal crypt

Figure 6.13: Sfrp1 model of regulation in intestinal stem cells. The model is adaptedfrom a paper by Katoh and Katoh [89]. Dark blue indicates the stem cells located nearthe stem cell niche, the lighter blue shows intestinal progenitor cells, while the lightestblue highlights the di!erentiated cells. Sfrp1 can inhibit the spread of Wnt signalingfrom the stem cells to the di!erentiated cells.

and acts through Sfrp1, which regulates Wnt signaling to ensure that Wnt will not

spread from the intestinal stem cells at the base of the crypt niche to the di!erentiated

cells further up [89] (Figure 6.13).

6.4.3 NM23 family

Another highly scoring stemness family of proteins consisted of the NM23

group of homologs, which play functional roles in di!erentiation and tumorigenesis [125,

138] . This homolog family was represented by three genes, Nme1, Nme2, and Nme4, and

has been previously shown to have a tissue-specific and di!erentiation-specific manner

of expression. In particular, the human NM23 family has been implicated in negative

152

Page 168: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

regulation of stem cell di!erentiation – NME1 negatively regulates growth factors and

NME2 has been implicated as a direct activator of c-myc [138]. The stemness analysis

results suggest that this family may be involved in the regulation of most stem cell

types, where Nme2 and Nme4 are the predominantly active genes, but Nme1 and Nme3

also show upregulation in embryonic and spermatogonial stem cell types (Figure 6.14).

153

Page 169: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Figure 6.14: Non-metastatic expressed (Nme) family expression in stem cells representsanother example of a tumor suppressor family with a role in stem cell fate. Each columnrepresents one of the twelve stem cell types. Each row represents a di!erent modulemember gene. The value for each gene in a given cell type is calculated as the averagenumber of experiments of that stem cell type that measured the gene as upregulated,ranging from 0 (the gene was not upregulated in a single experiment) to 1 (the genewas upregulated in all experiments of that cell type).

154

Page 170: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

6.4.4 Chaperone roles: Heat shock (Hspa) and importin families

One very interesting stemness family was the heat shock protein (Hspa/Hsp70)

module. Heat shock proteins belong to a set of chaperones and co-chaperones whose

cellular role is to aid the folding process of newly formed proteins through conformational

and other changes. Because of this essential role, Hsp proteins are expressed generally

in all cells, however they are highly induced in cells undergoing a stress response. The

role of these proteins has not been heavily studied in stem cells, but several studies have

suggested that these chaperone proteins may be involved in stem cell self-renewal, as

their expression significantly reduces with di!erentiation [139].

Many di!erent classes of heat shock proteins exist, such as Hsp90, Hsp70,

Hsp60 and others, but the stemness meta-analysis identified only the Hsp70/Hspa mod-

ule of proteins – Hspa1a, Hspa1b, Hspa1l, Hspa2, Hspa4, Hspa4l, Hspa5, Hspa8, Hspa9,

Stch (Hspa13), Hspa14, Hsph1, and Hyou1(Hsph4) – as commonly upregulated across

di!erent stem cell types (Figure 6.15). This result is consistent with a study that shows

the active upregulation of many Hsp70 member genes in embryonic, neural and mes-

enchymal cells [12].

One model of regulation associated with Hsp genes (though not directly Hsp70-

associated) starts with the arrival of a signaling molecule in the cell. After its activation,

the signaling molecule binds the chaperone and co-chaperone complex. Subsequently

the entire bound complex associates with an importin molecule, which facilitates the

transition across the nuclear pore [139]. Importin genes are members of another family

155

Page 171: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Figure 6.15: Heat shock protein 70 (Hsp70/Hspa) family expression in stem cells. Thewide variety of genes in this family used by stem cells points to a potential role ofthese genes in the control of cell fate through stress response. Each column representsone of the twelve stem cell types. Each row represents a di!erent member gene of theHspa/Hsp70 family of proteins. The value for each gene in a given cell type is calculatedas the average number of experiments of that stem cell type that measured the gene asupregulated, ranging from 0 (the gene was not upregulated in a single experiment) to 1(the gene was upregulated in all experiments of that cell type).

156

Page 172: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

designated as a stemness module in the SMA analysis. After entry into the nucleus the

complexes dissociate and the signaling molecule can proceed with its function. A direct

application of this model is the activation of STAT3, which bound by Hsp90 could be

transported to the nucleus, where it can upregulate Nanog, a gene essential for ESC

self-renewal [139].

But why would stem cells require higher activity level of chaperone proteins?

This question still remains unanswered. I speculate this phenomenon may be related to

the importance and rarity of stem cells. It is possible that in rare cells, whose proper

function is essential for the organism, proper folding and location of new proteins is

crucial. This may not be as much the case for the more dispensable di!erentiated

cells, which could explain the downregulation of these genes with di!erentiation. Alter-

natively, stem cells may be exposed to hypoxic conditions near the niche, which may

induce a stress response. Finally, it is possible that the heat shock proteins of this family

are induced only as a side e!ect of the general stress response, or else they may have

another functional role, unrelated to their chaperone abilities.

6.4.5 Lineage-specific gene inhibition: Inhibitor of di!erentiation/DNA

binding (Id) family

The inhibitor of di!erentiation (Id) set of proteins represents one of the seven

sub-families of the very large and diverse helix-loop-helix (HLH) protein family [44].

The HLH family has more than 200 members genes and is involved in the regulation

of development and cell fate decisions in a variety of organisms. Proteins in this large

157

Page 173: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

HLH

HLH

bHLH

Id HLH

A.

B.

DNA-binding element

E-box

bHLH

Id

E-box

bHLH

Id

active inactive

bHLH protein active

(MyoD,NeuroD)

Differentiation can occur

bHLH protein inactive

No differentiation

Figure 6.16: Inhibitor of di!erentiation (Id) family structural elements and functionalrole. (A.) Members of the Id HLH family do not have the DNA-binding element thatallows other HLH family members to interact with DNA. (B.) Id proteins can interactwith bHLH proteins that play a role in the regulation of di!erentiation. Upon binding ofId to the bHLH protein, activation of lineage-specific genes, such as MyoD and NeuroD,is prevented and di!erentiation does not occur.

family are characterized by their largely conserved HLH domain pattern and are known

to assemble into either homo- or heterodimers [44].

Interestingly, the Id sub-family does not have the DNA binding capacity (Fig-

ure 6.16A) that characterizes many other HLH family members. Unlike most other

HLH family members, which specifically bind to the E-box motif (CANNTG) on the

DNA, Id family members interact directly with proteins from other sub-families – such

as some regulators of di!erentiation like MyoD, NeuroD, and E2-2 – to prevent their

access to the DNA. This binding event e!ectively blocks di!erentiation (Figure 6.16B),

which also explains the origins of the Id family name.

In mammalian species, the Id family consists of four genes – Id1, Id2, Id3,

Id4. Individual gene members have been shown to play significant roles in maintain-

158

Page 174: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

ing long-term hematopoietic stem cell renewal, even though they are not an essential

component of short-term engraftment [135]. Most recently, members of this family have

been implicated in neural stem cell self-renewal through their inhibition of the NeuroD

transcription factor [84].

6.5 Comparison with other global stemness methods

All observations discussed up to this point render the SMA approach partic-

ularly relevant to the study of stemness. I have identified highly pertinent putative

stemness modules that can be used to form hypotheses about the roles of individual

genes in specific stem cell types and guide their further functional study. The method

does not require the use of many homologous genes in stem cells; single genes are also

detected. But where does this approach stand with respect to other stemness-related

methods?

The SMA method is complementary to other large-scale e!orts to catalog

di!erent stem cells and understand the mechanisms of pluri- and multi-potency main-

tenance, such as the PluriNet [117] by Muller et al. and the stem cell module map [194]

by Wong et al. The SMA method may also be more encompassing than both of these

methods, since homology provides orthogonal information to KEGG pathways and GO

functional gene sets (Figure 6.17) used as the basis for the Wong et al. module definition

in the stem cell module map [194].

I directly tested the gene modules in the module map using the stemness meta-

159

Page 175: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

1e!04 1e!03 1e!02 1e!01 1e+00

110

100

1000

10000

!!!

!

!!

!

!

!

!

!!

!

!!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!!

!!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!!

!

! ! !

!

!

!

!

!

!

!

!

!

!!!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

0.002 0.005 0.010 0.020 0.050 0.100 0.200 0.500 1.000

15

10

50

100

500

! !

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

! !

!

!

!

!

!

!

!

!

Fraction of overlap with most similar pathway

Pa

thw

ay o

r g

en

e s

et

siz

e

Highly cell type diverse recurrent families Recurrent families

Number of cell type diverse recurrent homolog families

A. B.

C.

Most significant overlapping gene set < 100 genes

Most significant overlapping gene set > 99 genes

68

46

Gene set size

Figure 6.17: Global similarity comparison between homolog modules and GO genesets/KEGG pathways. (A.) The plot examines the size of the most similar pathway/geneset to each one of the highly cell-diverse recurrent homolog modules. X-axis shows thefraction of gene overlap between the homolog module and its most similar pathway.Y-axis shows the size of that pathway. The density of plotted points at a given set ofcoordinates can be inferred through the color and size of the point (single points: smalland black; high number of points: large red sunflower-like point). (B.) Same as (A), butfor all recurrent homolog modules. (C.) The table examines the number of cell-diverserecurrent homolog modules in each of two categories. The first category represents theset of modules, for which the most significant overlapping pathway/gene set has lessthan 100 gene members. The second category represents the set of modules, for whichthe most significant overlapping pathway has 100 or more gene members. Homologmodules in the second category contribute more specific information than their largeoverlapping pathways/gene sets.

160

Page 176: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

analysis approach. Although the module map modules exhibited comparable recurrence

scores, the highest scoring homolog families scored higher than any Wong et al. module

(Figure 6.18), measured as a function of the deviation from the recurrence significance

cuto!s. The role of c-myc was central to the findings of the module map study, but my

stemness analysis also identified the myc-family – c-myc, N-myc, v-myc – as a highly

relevant putative stemness “master regulator” module.

6.6 Di!erentiation modules

In addition to the stemness (stemness “on”) families, I also defined di!erentia-

tion (stemness “o!”) modules from the entire original stem cell compendium. Because

the stemness index score only used homolog features, I defined only di!erentiation ho-

molog families. It should be noted that the di!erentiation modules were derived from

functionally more heterogeneous experimental data, so I expected a smaller number

of di!erentiation-specific homolog families. The SMA method identified a total of 39

di!erentiation modules, summarized in Table 6.3.

The presence of a few specific homolog families warrants special mention. Both

the integrin " (Itgb; 10th row in Table 6.3) and ATP-binding cassette, subtype B (Abcb;

28th row in Table 6.3) families of proteins have important roles in stem cells. Integrin

" protein family members function as part of an integrin heterodimer receptor, which

facilitates the communication of stem cells with their respective niche. ABC transporter

proteins are actively used by stem cells to remove various toxins and drugs from the

161

Page 177: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0.1 0.5 0.9 1.3 1.7 2.1 2.5 2.9 3.3 3.7 4.1 4.5 4.9 5.3 5.7

1e!05

1e!04

1e!03

1e!02

1e!01

Difference of gene set recurrence score from significance cutoff

Fre

qu

en

cy

Homolog families

Wong et al modules

Figure 6.18: A direct comparison of the Wong et al. “stem cell module map” modules(green) and the SMA homolog families (black). X-axis shows the deviation of therecurrence score for each module from the significance recurrence cuto! for that modulesize. Y-axis shows the module frequency in each bin in log scale. All empty bins havebeen assigned a floor value of 0.00001 to facilitate log scale plotting.

162

Page 178: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

cell. Thus, it is surprising to see these two families among the di!erentiation modules.

While the upregulation patterns of these two families were su#ciently domi-

nated by genes used in di!erentiated cells and even though these two modules did not

pass the recurrence and diversity cuto!s to be defined as stemness families, they still

showed upregulation of individual genes in stem cells. For example, in the ATP-binding

cassette, subtype B (or Abcb) module, Abcb1b, Abcb10, and Abcb8 were almost ex-

clusively upregulated by stem cells (Figure 6.19).

Therefore, the di!erentiation module results are not in conflict with the current

literature. It is a bit unusual to observe members of the same heterodimer complex

regulated di!erently – integrin ! is significantly upregulated in stem cells, while integrin

" is significantly downregulated, but it should be noted that the most common integrin

complexes used in stem cells make use of only a minority of the members of the integrin

" family, such as integrin "1.

163

Page 179: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Di!erentiation module name Cell diversity Recurrence FDRGna 3.03 0.01Ankrd 2.84 0Cebp 2.83 0S100a 2.78 0Pdlim 2.78 0.008H2 2.74 0Flt 2.74 0Klf 2.73 0.0018Bhlhb 2.71 0.0147Itgb 2.70 0.004Lgals 2.70 0.0027Atp1a 2.69 0Vil 2.68 0.0196Tspan 2.67 0Ces 2.67 0Anxa 2.66 0Soat 2.65 0.0334Lck/Src/Fyn/Abl 2.65 0Mef2 2.65 0.004Flvcr 2.64 0.0334Degs 2.63 0.0372Cpxm 2.62 0.008Vamp 2.62 0.0372Ero1 2.62 0.048Lrrc4 2.62 0.0335Masp/C1r 2.59 0.0025Sema6 2.59 0Abcb 2.59 0Itm2 2.57 0Stgal 2.56 0Mpp 2.55 0Cts 2.55 0Gabarap 2.55 0.0238Ifi27 2.55 0.0081Acsl 2.54 0.00355Ppap2 2.54 0.0268Jak 2.52 0.0456H2-Ea/Oa/Aa 2.51 0.0055Cpeb 2.50 0.007

Table 6.3: List of all mouse di!erentiation homolog modules.

164

Page 180: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Figure 6.19: Upregulation pattern of the ATP-binding cassette, subtype B (Abcb) familyof transporter proteins in di!erentiated (green) and stem (red) cells. Each columnrepresents one of the twelve stem cell types. Each row represents a di!erent modulemember gene. The value for each gene in a given cell type is calculated as the di!erencebetween the average number of experiments of that stem cell type that measured thegene as upregulated and the average number of di!erentiated cell experiments in thatsystem type. The values can range from -1 (the gene was upregulated in all di!erentiatedcell experiments of that type and no stem cell experiments of that cell type) to 1 (thegene was upregulated in all stem cell experiments of that cell type and no di!erentiatedcell experiments of that type). Some genes, such as Abcb1b, are used primarily by stemcells, though the method identifies the family as a di!erentiation family.

165

Page 181: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

6.7 Summary

This chapter addressed the first central question of this dissertation: do func-

tional redundancy and tissue-specific expression mask the common stem cell mecha-

nisms? It presented the results of the application of the SMA method to a large murine

stem cell expression data compendium and showed that if we account for functional re-

dundancy, common stemness mechanisms do emerge. Section 6.1 identified significantly

recurrent reproducibly upregulated mouse homolog and functional modules. Section 6.2

described the selection of cell type diverse modules. The next section, Section 6.3, pre-

sented the classification of recurrent modules into several di!erent pattern types. Section

6.4 summarized the biological knowledge from the literature associated with some of the

most interesting stemness modules. Section 6.5 compared the stemness meta-analysis

method to its most similar stemness study, while Section 6.6 presented an overview of

the di!erentiation (stemness “o!”) modules identified in the mouse stemness analysis.

166

Page 182: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Chapter 7

Stemness mechanisms in human stem

cells

The goal of this chapter is to address the second central question of this disser-

tation: if common stem cell mechanisms exist, are they conserved between mouse and

human stem cells? The chapter summarizes the results of the application of the SMA

method to a human stem cell compendium and examines the conservation of stemness

patterns between mouse and human. As the stemness meta-analysis setup has already

been described extensively in the previous chapters, I try to avoid repetition and present

only the most important summary statistics and comparisons. Section 7.1 describes the

identification of recurrently upregulated human modules, compares the recurrent mod-

ules between mouse and human and explains the di!erences observed in the human

data. Section 7.2 summarizes the selection of cell diverse modules and stemness fam-

ilies. Section 7.3 presents an overview of several interesting human stemness families,

167

Page 183: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

while the final section – Section 7.4 – summarizes our current biological knowledge on

the conserved mammalian stemness modules.

7.1 Recurrent modules in a human stem cell compendium

To identify human stemness modules, I performed the exact same steps as

already described for the mouse stemness meta-analysis. I defined human homolog

families and after neighbor expansion, I identified 9,081 homolog families, comprised

of 4,350 mutually-exclusive homolog groups (with two or more gene members) and

4,731 biological singletons, as well as 889 non-redundant functional gene modules. The

breakdown of modules was shown earlier in Tables 5.6 and 5.8.

To test the existence of stemness families in human and evaluate the conser-

vation of stemness mechanisms between the two mammalian species, I first directly

compared the two sets of predicted homolog modules from each organism. For each

of the 9,081 homolog families in the human network, I mapped every gene member

to its best reciprocal BLAST hit in mouse. If no such gene existed, the human gene

was removed from further consideration. In the entire human homolog module network

13,711 genes had a best reciprocal hit in mouse and after the elimination of genes that

could not be mapped, the orthologously defined (human-to-mouse) homolog modules

were reduced to 6,755 gene families.

I evaluated the overlap between all mouse homolog modules and their corre-

sponding best matching human modules. Homolog modules most often completely over-

168

Page 184: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

lapped (Figure 7.1). A large number of the mouse homolog modules (approx. 3,000)

showed no overlap with any human homolog module. These represented the mouse

modules of genes that had no best reciprocal human mapping. The non-negligible peak

at 0.5 consisted of families that share half of their genes between organisms. This ob-

servation is not surprising, as the mean number of genes in the mouse homolog gene set

was small – close to four genes.

These results indicated that regardless of the comparison base, the fraction of

gene overlap between each pair of modules was consistently high (Figure 7.1), so it is

reasonable to treat any common stemness gene families between the two organisms as

mammalian stemness modules.

The recurrence analysis showed that at a 5% FDR cuto! 85 homolog modules

(Figure 7.2) and 34 of the functional modules were coordinately upregulated across stem

cell experiments.

An initial striking di!erence between the recurrence results in the mouse and

human stemness meta-analyses was the approximately 3-fold reduction in significant

recurrently upregulated homolog modules in the human stem cell compendium as com-

pared to mouse. To assess quantitatively this observation, I directly compared the

recurrence score distributions of all human and mouse homolog families and found a

significant global decrease in recurrence scores (t = 21.579; p-value < 2.2e-16; one-tailed

Student t-test; Figure 7.3; panel B), which could not be accounted for by a significant

di!erence in homolog module sizes (Figure 7.3; panel A). The recurrence cuto!s used in

mouse and human were also very similar (Figure 7.4), so the global di!erence in scores

169

Page 185: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Fraction of mouse module

represented in its best

corresponding human module

Fraction of the best candidate

human module represented in its

corresponding mouse module

Num

ber

of hom

olo

g m

odule

s

0.0 0.2 0.4 0.6 0.8

510

20

50

100

200

500

2000

5000

0.0 0.2 0.4 0.6 0.8

15

10

50

100

500

5000

Figure 7.1: Fraction of overlap distribution between human and mouse networks. X-axisshows the fraction of gene overlap between each mouse homolog module and its mostsignificantly overlapping human homolog module, as measured by the hypergeometricdistribution. The overlap can be shown as a function of the size of the mouse homologmodule (left), or as a function of the size of the human module (right). The y-axis showsthe number of homolog modules in each bin on a log scale. A large number of mousehomolog modules (approx. 3,000) show no overlap with any human homolog module.These represent the mouse modules of genes that had no best BLAST reciprocal humanmapping.

170

Page 186: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0 1 2 3 4 5

0.005

0.010

0.020

0.050

0.100

0.200

0.500

1.000

FDR=5%6 families14

20

18

Total = 85 significant

families

Family size = 1 gene

2

3

4

# families of that size

that satisfy score

cutoff

5-10

> 10

17

10

Recurrent upregulation score

FD

R

Figure 7.2: Selection of significant recurrently upregulated homolog families. X-axiscorresponds to the size-dependent recurrence score, while the y-axis show the falsediscovery rate on a log scale. Each color represents the FDR curve associated with adi!erent module size. The value under each colored arrow represents the number ofupregulated homolog families of that size that passed the recurrence cuto! for that size.FDR cuto! used to identify significantly recurrent modules was 5%.

171

Page 187: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

10 20 50 100 200 500 1000

11

01

00

10

00

10

00

0

Family size

Nu

mb

er

of

ho

mo

log

fa

mili

es

0.1 0.2 0.5 1.0 2.0 5.0

11

01

00

10

00

10

00

0

Recurrence score

Human

Mouse

p-value < 2.2e-16

(t=21.579)

Family size

Nu

mb

er

of

ho

mo

log

fa

mili

es

Recurrence score

A. B.

Figure 7.3: Comparison of the global distributions of (A) homolog family sizes and (B)recurrence scores in human and mouse. (A.) X-axis shows the sizes of mouse (green) andhuman (red) homolog modules on a log scale. Y-axis shows the frequency of each sizemodule on a log scale. (B.) X-axis shows the recurrence score for each mouse (green)and human (red) homolog module. Y-axis shows the frequency of modules with eachrecurrence score. The black arrow points to the significant (p-value shown in the figure)global decrease in the recurrence scores of the human homolog modules, as comparedto the recurrence scores of the mouse homolog module.

between the two organisms was unlikely to be a function of a di!erent scale.

One possible explanation for the reduction in reproducibility and recurrence

is the higher heterogeneity of the human stem cell data. While the study of human

embryonic stem cells has faced debate, much work has been done on understanding

pluripotency mechanisms using both embryonic stem cells and, more recently, induced

pluripotent cells in human. Experiments on human non-cultured adult stem cells, how-

ever, especially ones associated with intestinal, gastric, spermatogonial, liver and other

172

Page 188: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0 1 2 3 4

01

23

4

Mouse module recurrence cutoff

Hum

an m

odule

recurr

ence c

uto

ff

Figure 7.4: Correlation between the recurrence cuto!s used for every module size incommon between mouse and human. X-axis shows the recurrence cuto! scale for mousemodules, while the y-axis shows the recurrence cuto! scale for human modules. Eachpoint represents a di!erent module size represented in both the human and the mouseinput data. The y=x line is shown as a black solid line.

173

Page 189: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

stem cells is much harder to perform than in mouse, so little data is available for many

of these stem cell types. This experimental bias is obvious in the human stem cell com-

pendium used in the human stemness meta-analysis. Because of these di#culties, the

isolation of pure stem cell populations has been hindered and many of the input profil-

ing data could represent more functionally diverse, but stem cell-enriched populations.

The heterogeneity in the purity of the populations may explain the di!erence in module

recurrence.

This conclusion is not at odds with the perfect clique formation of human

stem cell experiments from the same stem cell type in the discovery of “replicate” sets,

shown earlier in Figure 5.6. The high similarity between experiments within the few

large “replicate” sets could be explained by technical issues. Experiments from the

same study within the same cell type, but with di!erent di!erentiation fates clustered

the closest together. Other experiments that clustered very closely in a “replicate”

set represented populations derived from labs with similar protocols. While these two

examples show a high within-protocol reproducibility, they do not guarantee similarities

with other stem cell experiments derived using di!erent protocols.

Other explanations for the higher heterogeneity in the human data include the

more diverse genetic background of humans as compared to mice, as well as transcrip-

tional di!erences in the stem cell populations associated with the age and health of the

humans from which the populations were derived.

Finally, yet another explanation is that there is a true di!erence between the

mechanisms that guide mouse and human stem cell pluri- and multipotency. However,

174

Page 190: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

some evidence already suggests that at least some mechanisms should be conserved. For

example, the initial four transcription factors successfully used to induce the transfor-

mation of fibroblasts into induced pluripotent stem (iPS) cells – Sox2, Oct3/4, Myc and

Klf4 – were the same in both mammalian species [170, 171]. Even though the murine

stemness meta-analysis identified these transcription factors as significantly recurrently

upregulated in stem cells, the human SMA did not recognize a single one of the factors

as recurrently upregulated.

To test the heterogeneity of the human stem cell data, I made use of the same

“swap” analysis that I introduced earlier in the mouse stemness analysis. I swapped

the lists used as input to the recurrence analysis, such that each list of di!erentially

upregulated genes in a stem cell was replaced with its corresponding list of genes dif-

ferentially downregulated in the stem cell. This swap was performed for an increasing

number of experiments until all input lists had been replaced with their counterparts

of downregulated genes. The only di!erence from the mouse test was in the number

of swaps performed at each level of increase; in human, I performed only five swaps,

while in mouse I used ten. Specifically, at each swap level, the swap was performed

five di!erent times, such that the subset of experiments that were chosen to be swapped

was randomly selected. The baseline cuto!s used to assess the false discovery rates were

used directly as derived from the stem-cell-only input data.

If the recurrence of modules was more tightly related to the functional similar-

ity of the stem cells, I would expect that as I replaced the input lists of genes upregulated

in the stem cells with the genes upregulated in the di!erentiated cells, I would identify

175

Page 191: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

fewer recurrently upregulated modules and higher FDR levels. Also, if the human stem

cell populations were more heterogeneous than their mouse counterparts, the distinction

between the FDR associated with the stem cell-only data and the FDRs of the progres-

sively swapped datasets would not be as clear as it was in mouse. Intuitively, equally

functionally dissimilar cells should generate approximately equal false discovery rates.

This result would suggest that the stem cell data resembles less functionally similar

populations, so the heterogeneity of the input stem cell populations would be a good

explanation.

As already mentioned, to reduce computational time I limited the number

of swaps to five at each level of increase, still allowing the estimation of a standard

error. The results (Figure 7.5) confirmed the prediction: while the false discovery rate

associated with the stem-cell-only dataset was still the lowest, the false discovery rates

for the more heterogeneous sets were following more closely the stem-cell-only FDR

than previously observed in the mouse analysis.

7.2 Cell type diversity assessment and classification of re-

current modules

To identify significantly cell type diverse modules, I used the methods described

in Chapter 5 and chose a cuto! of 2.15 at a 5% FDR (Figure 7.6).

At this cuto!, 22 evolutionary families and 10 functional modules satisfied the

stringent criteria. None of these modules were also significantly recurrent in di!eren-

176

Page 192: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0.0

00

.05

0.1

00

.15

0.2

00

.25

0 5 10 15 20 25 30 35 38

Number of swapped datasets

FD

R

Figure 7.5: Swap analysis of human recurrently upregulated homolog families. X-axisshows the number of swapped input lists of upregulated genes, where the values rangefrom 0 swapped experiments (stem-cell-only input; marked in red) to 38 (di!erentiated-cell-only input). Y-axis shows the false discovery rate (FDR) as a function of the numberof swapped experiments. For each bar, the average FDR across five experiments isplotted and the error bars represent the standard error from the five FDR summarymeasurements.

177

Page 193: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0.0 0.5 1.0 1.5 2.0 2.5 3.0

1e!04

1e!03

1e!02

1e!01

1e+00

Cell type diversity cutoff

FD

R

Family size = 1 gene

2

3

4

5

10

20

15

25

>25

Average

FDR=5%

Figure 7.6: Selection of significant cell-type diverse modules in the human stem cellcompendium. The cuto! (2.15) was selected as the 5% FDR cuto! score associatedwith the weighted average of the FDR curves for all family sizes. Each color representsthe FDR curve associated with a di!erent module size, as shown in the legend. X-axisrepresents the cell diversity score, while the y-axis shows the FDR in log scale. Tofacilitate log plotting, a floor value of 0.0001 is selected for all entries that would beotherwise 0.

178

Page 194: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

tiated cells, so this set of families represented the final group of stemness modules. A

summary of the distribution of modules into individual classes is shown in Table 7.1 and

the list of all stemness families is shown in Table 7.2. I did not observe any significant

deviations in the class membership module distribution between mouse and human. As

in the mouse stemness analysis, AFA modules were at least four-fold overrepresented

than their partner OFA modules in both the human homolog and functional recurrent

gene modules. The human stemness analysis did not generate any constitutive module

sets or gene sets, although given the small number of recurrently upregulated modules

altogether, this result is not entirely surprising.

Next, I compared the conservation of the stemness homolog families between

mouse and human – approximately 6,700 modules were shared between the two species –

and found that of the 103 (103/9908 tested) murine stemness homolog modules and the

22 (22/9081 tested) human stemness modules, 5 were shared between the two species,

whereas the expected number of shared homolog modules was 0.17. While not many,

the conserved mammalian stemness modules represent some functions crucial to stem

cell function, such as cell-niche communication, self-renewal-related signaling, and chro-

matin remodeling. The modules that fall in the conserved category are examined in

more detail in Section 7.4.

I also defined di!erentiation modules and identified only four di!erentiation

homolog families and no di!erentiation functional modules. The di!erentiation modules

are summarized in Table 7.3.

179

Page 195: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Class Cell Gene Specificity Homolog Functionaldiversity diversity modules modules

AFA + + + 22 8OFA + - + 2 2AFO - + -/+ 32 14OFO - - -/+ 25 10CM + + - 0 0CG + - - 0 0

Total 85 34

Table 7.1: Summary of the classification and distribution of all recurrently upregulatedgene modules in the human stem cell compendium. The stemness modules (AFA andOFA pattern classes) identified by the SMA method are shown in bold. Similarly to themouse stemness results, there is at least a 4-fold overrepresentation of AFA modules overOFA families in both the homolog and functional subsets. No significant constitutivemodule sets or gene sets were identified in this analysis.

7.3 Human stemness modules

7.3.1 Angiogenesis: FGFR/FLT/PDGFR family

One of the highest scoring human stemness modules was an angiogenesis-

related family of receptor tyrosine kinases. This family included the four members of

the fibroblast growth factor (FGF) receptor subfamily, the two members of the platelet-

derived growth factor (PDGF) receptor subfamily, the three members of the vascular

endothelial growth factor (VEGF) receptor subfamily and several other receptor pro-

teins, such as the stem cell factor (kit). These receptors have di!erent a#nities to the

individual growth factors associated with each family and the di!erence in a#nities

provides the tissue and condition specificities of the growth factor receptors. As they

are primarily involved with angiogenesis, many of the receptors have been implicated in

cancer and di!erentiation regulation [31]. PDGFR! has been implicated as a marker of

180

Page 196: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Module Type Cell diversity Recurrence FDRNotch; Laminin; Collagen HAFA 2.94 0Integrin alpha (ITGA) HAFA 2.74 0.0015Frizzled (Fzd) HAFA 2.71 0FGFR; FLT; PDGFR HAFA 2.68 0Ptpr, Contactin HAFA 2.66 0TCF/LEF HAFA 2.55 0.0203DPP HAFA 2.54 0.0076Glypican (GPC) HAFA 2.52 0Tropomyosin HAFA 2.50 0Kinesin (KIF); Spectrin (SPTA/B) HAFA 2.49 0.002CCSPG HAFA 2.49 0.0076Chd; Smarc HAFA 2.47 0TIMP HAFA 2.40 0.0031Melanoma antigen (MAGE) HAFA 2.33 0.005Thrombospondin (THBS) HAFA 2.28 0.0307Suppresor of cytokine sig. (SOCS) HAFA 2.23 0.0127Creatine kinase (CK) HAFA 2.21 0.0086GNL HAFA 2.20 0.0272PHLDA HAFA 2.19 0.0137GABR HAFA 2.19 0.011IGFBP HOFA 2.26 0.0095DCAM HOFA 2.22 0.0006Human PPI module 181 FAFA 2.96 0Human PPI module 246 FAFA 2.74 0Fibril FAFA 2.41 0Protein tyrosine kinase activator activity FAFA 2.41 0Lens development in camera-type eye FAFA 2.41 0.05Positive regulation of cell-cell adhesion FAFA 2.30 0Human PPI module 98 FAFA 2.2 0.0307GINS protein complex FAFA 2.17 0.0076FN1-TGM2 complex FOFA 2.31 0.0026Human PPI module 281 FOFA 2.19 .0272

Table 7.2: List of all human stemness evolutionary and functional gene modules. Firstcolumn provides the module identifier, the second column shows the classification type,the third column shows cell diversity, and the fourth column shows the recurrenceFDR. The FDRs of “0” represent FDR levels below the lowest level of detectabilityand are thus approximately annotated with 0. Gene modules also identified in themouse stemness meta-analysis are marked in bold. Mammalian stemness modules arebased on data from similar stem cell types, such as HSC, ESC, NSC, MSC and others.Similarity between modules is based on gene membership. Comparisons between humanand mouse protein-protein interaction (PPI) modules are based in human gene space,but on the data from the respective species.

181

Page 197: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Module Cell diversity Recurrence FDRSERPIN 2.43 0ARL 2.29 0.0241ITGA (Itga1;Itga2;ItgaD/E/L/M/X; Itga10) 2.19 0.05CHST 2.19 0.0243

Table 7.3: List of all human di!erentiated evolutionary and functional gene modules.First column provides the module identifier, the second column shows the classificationtype, the third column shows cell diversity, and the fourth column shows the recurrenceFDR.

neural stem cells [81], VEGFR1 is expressed in hematopoietic stem cell-enriched popu-

lations [140], while kit is one of the markers used for segregation of HSC-enriched and

other populations.

7.3.2 Heparan sulfate proteoglycans (HSPGs): Glypican family

The heparan sulfate proteoglycan (HSPG) protein family consists of cell surface

associated proteins, which can have one of several di!erent functional cores – syndecan,

glypican and perlecan. Glypican was among the highest stemness modules identified

by the human stemness meta-analysis. The name of the protein family derives from

the heparan sulfate side chains, associated with the core protein [39]. Interestingly, the

role of glypicans is related to the receptor tyrosine kinases discussed in the previous

paragraph. FGF ligands often bind to the heparan sulfate chain of these proteins before

association with the FGFR receptor, which implicates the HSPG proteins as co-receptors

of the various growth factor receptors [39].

182

Page 198: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

7.4 Mammalian stemness modules: Notch, TCF/LEF, Friz-

zled, Integrin and Chd families

7.4.1 Cell adhesion and communication: Integrin alpha family

The integrin ! family, which includes the only stemness gene (Itg!6) identi-

fied in the founder studies discussed in Section 4.1.1, was consistently among the top

stemness candidate families in both mouse and human. The ! subfamily consists of

eighteen di!erent units, while the " subfamily includes nine di!erent subunits [60]. The

! and " subunits interact and form many di!erent heterodimers, which represent the

final functional receptors. The stemness homolog families, however, consisted of only

nine of the ! family members. The other nine, which included integrin !L, integrin

!M, and integrin !X, were identified as a human di!erentiation family.

The separation of the ! family into two independent homolog modules in both

mouse and human reflects the somewhat di!erent possible evolution of the ! subfamilies.

Integrin !L, integrin !M, and integrin !X represent members of the integrin ! family

that are strongly expressed in immune cells and follow more closely the evolution of the

integrin " family, rather than the evolution of the rest of members of the ! family [75].

This observation may explain to a certain extent why both the entire integrin " family

in mouse and the integrin !L/ !M/!X subfamily in human have similar expression fates

to each other, completely opposite of the expression fate of the rest of the ! family.

The presence of this family among the stemness modules is not entirely surpris-

ing. This family represents one of the core gene modules for communication of the cell

183

Page 199: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

with its environment, so some family members are essential to the survival of the cell.

For example, deletion of integrin-"1 is embryonic lethal in mice [137]. The members

of the integrin family have been linked to both normal and cancer stem cell biology, as

well as metastatic cancer [137].

The stemness analysis showed that in agreement with the founder studies,

hematopoietic, neural, embryonic, retinal stem cells all expressed integrin ! family

members, but other stem cell types, such as intestinal, gastric, and trophoblast stem

cells made a wide use of the family as well. Integrin !6 still remains one of the central

stem cell-related proteins in this family – integrin !6 is a breast stem cell marker [154]

and the integrin !6"4 is thought to adhere cells to the breast basal membrane, which

represents the stem cell niche [137].

7.4.2 Wnt pathway: Tcf/LEF and Frizzled families

Consistent with the role of Wnt signaling in self-renewal, two of the five mam-

malian stemness modules – TCF/LEF and Frizzled – were associated with the Wnt

pathway, introduced in Section 2.1.3. The stemness modules included the “beginning”

and “end” of the signaling pathway, as Frizzled is the receptor on the surface of the

cell that acquires the Wnt signal and begins the signal transduction process, while the

TCF/LEF is the transcription factor family ultimately activated by the accumulation

of "-catenin.

The stemness Tcf/LEF mouse subfamily included the Lef1, Tcf3, Tcf7, and

Tcf7l2 genes, while the human family consisted of Tcf3, Tcf4, and Tcf12 in human.

184

Page 200: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Tcf3 could play a role in the regulation of the pluripotent state [172], Tcf4 may be

involved in intestinal and hematopoietic stem cells maintenance [185], while Tcf12 has

been implicated in neural stem and progenitor cell expansion [182].

Interestingly, I did not observe the actual Wnt ligands among the stemness

modules, but since these molecules may be extensively regulated at the post-transcriptional

level, their absence is not entirely surprising.

7.4.3 Chromatin: Chd/Smarca family

The Snf2 superfamily of ATP-dependent helicases plays a central role in chro-

matin remodeling. The proteins in this family function as part of a protein complex

that has the ability to actively modify histone tails to modulate chromatin state. The

superfamily has three broad subfamilies: CHD, SWI/SNF-related, and ISWI. The sub-

families are characterized by the di!erent domains that allow them to interact with the

histone tail residues – CHD proteins contain chromodomains, SWI/SNF-related pro-

teins contain bromodomains, while ISWI proteins contain SANT domains [109]. The

mouse and human stemness families consisted of two of the three subfamilies of the Snf2

superfamily: CHD and SWI/SNF-related.

The chromodomain family of proteins (Chd) consists of many chromatin re-

modeling enzymes. These proteins are mostly associated with marks of active tran-

scription and are believed to maintain chromatin in an open state [58]. One of the most

important stem cell-related members of this family is Chd1, which recognizes di- and

tri-methylation on H3K4 (the lysine 5 residue on histone 3) residues and was recently

185

Page 201: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

shown to regulate the self-renewal and the maintenance of an open chromatin state in

embryonic stem cells [58]. Chd1, along with Chd3 and Chd4 showed the highest upreg-

ulation patterns in the stemness analysis, though Chd5 and Chd9 were also upregulated

in a couple of di!erent stem cell types.

The SWI/SNF-related, matrix-associated actin-dependent regulator of chro-

matin, subfamily A (Smarca) family was the other subfamily of proteins identified in

the same stemness module. This subfamily also encodes genes with chromatin remod-

eling abilities, which associate with the SWI/SNF complex to regulate open chromatin.

All members of the Smarca family were actively upregulated in most stem cell types in

the stemness meta-analysis.

7.5 Summary

This chapter addressed the second central question of this dissertation: are

stemness mechanisms conserved between mouse and human stem cells? Section 7.1

reviewed the definition of recurrently upregulated human modules, compared them to

mouse recurrent modules and presented explanations for the higher heterogeneity of the

human stem cell data. Section 7.2 introduced the classification of the recurrent modules

into di!erent pattern types, including the stemness modules. Section 7.3 presented a

couple of the human stemness families, while Section 7.4 reviewed the main conserved

mammalian stemness modules.

186

Page 202: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Chapter 8

Applications of stemness mechanisms to

stem cell and cancer classification

The purpose of this chapter is to address the last question of this dissertation:

can we predict the state of di!erentiation of a cell based on its gene expression signature?

Could stem cell state be recognized and identified at the expression level? In this

chapter, I directly use the stemness index defined in earlier chapters and apply it to data

from various sources. Section 8.1 provides a brief overview of the test data selection.

Each of the subsequent sections summarizes the results from the application of the

stemness index score to a di!erent data source – normal stem cells (Section 8.2), side

populations (Section 8.3), cancer stem cells (Section 8.4), and metastatic populations

(Section 8.5).

187

Page 203: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

8.1 Motivation and overview

Many of the stemness modules identified by the stemness meta-analyses are

of biological interest, as individual gene module members have already been implicated

in various regulatory roles in self-renewal and di!erentiation. However, because the

stemness and di!erentiation modules were also found to be predictive of stem cell state in

a cross-validation experiment, it was natural to extend the application of predictiveness

to data from new experiments, not previously included in the stem cell compendia.

I tested several heterogeneous types of data all potentially relevant to stem

cell biology, as discussed previously in Chapter 2:

• Expression signatures from stem-cell-like populations

• Expression signatures from side populations

• Expression signatures from cancer stem cells

• Expression signatures from metastatic cancer populations

In the next few sections of this chapter, I present and analyze the results

of the application of the stemness index classification to these new data. It should

be mentioned that testing is limited to the data available in the literature and thus

gathering a su#cient number of experiments in a single data type is not straightforward,

as both data quality and availability are an issue. For example, metastatic cancer models

in mouse that are not xenograft-based are relatively rare, as mice often die of their

188

Page 204: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Data type Number of populationsNormal stem/di!erentiated cells 15Side/non-side populations 8Cancer stem cell/cancer di!erentiated populations 12Metastatic/non-metastatic populations 4Total # of tested populations 39

Table 8.1: Distribution of the sources of mouse test populations used to predict self-renewal capacity and perform stemness classification. The first column shows eachtested data type. The second column shows the number of tested populations of eachtype. The term “cancer di!erentiated” populations is used to refer to primary cancerpopulations that are non-stem-cell-like. The total number of tested populations is shownin bold.

primary tumor cancers before they can develop metastatic growths. The distribution of

the number of populations tested in each category is available in Table 8.1.

8.2 Stem cell-like populations

The most direct test of the predictive value of the stemness modules is the

performance evaluation of the stemness index score on entirely new stem cell and dif-

ferentiated populations. The new data can be particularly interesting, if they include

experiments from stem cell types that were never represented in the original stem cell

compendium. Positive results would suggest that the self-renewal signature captured

by the stem cell compendium generalizes well.

For this test, I identified 15 populations from the literature, which included

putative lung, muscle, and prostate stem cells, along with di!erentiated populations

from these systems. Other di!erentiated populations came from directed di!erentiation

of iPS cells into dendrite and macrophage cells. The results of the analysis of the 15

189

Page 205: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

populations are shown in Figures 8.1 and 8.2.

Figure 8.1 shows the comparison of three independent prostate cell signatures

based on their stemness index scores. Prostate stem cell research has been an active

area of research with particular relevance to prostate cancer. One study identified Sca-

1 as a potential marker gene of di!erentiation and sorted four di!erent populations -

a primitive (embryonic) prostate stem cell population, a putative adult prostate stem

cell population (Sca-1hi), a putative adult prostate progenitor population (Sca-1lo), and

di!erentiated prostate cells Sca-1neg). To compare the potential of these populations,

the authors performed gene expression comparisons of all upstream populations to the

same Sca-1neg set. This experimental setup provides a unique view of self-renewal

capabilities from a stemness index perspective .

The results indicated a very high stemness score for the most primitive cell

populations, intermediate stemness scores for the adult prostate stem cell population

and very low stemness scores for the progenitor-restricted populations (Figure 8.1).

These stemness indices correlate well with the conclusions of the original study, which

suggested that the most primitive cells undergo the fastest rate of self-renewal; a rate

reduced after the transition to the adult stem cell type [17]. It would be particularly

interesting to compare these scores to the scores of prostate cancer stem cell populations,

which may revert back to the expression patterns of the more primitive population [17].

The performance of the other twelve populations is shown in Figure 8.2. No-

tably, the stem cell-like populations scored consistently higher than their di!erentiated

counterparts, even though based on the stemness index cuto! selected in the initial

190

Page 206: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0 1 2 3 4

01

23

4

Differentiation score

Ste

mness s

core

!

!

Primitive SC!like

Adult SC!like

Adult progenitor!like

Figure 8.1: Stemness scores for a new mouse prostate stem cell experiment from theliterature.The stemness index scores associated with each population are shown de-convoluted to their constitutive elements — the stemness and di!erentiation sub-scores.The line indicates a stemness index score of 0, which is the previously selected stemcell/di!erentiated cell classification cuto!.

191

Page 207: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

mouse compendium cross-validation results, three of the five stem cell-like populations

showed negative stemness indices (75% accuracy). All di!erentiated populations showed

higher di!erentiation scores than stemness scores and were thus correctly classified.

There are a few possible explanations for the negative stemness indices associated with

three of the five stem cell populations.

One of the populations represented a mesenchymal stem cell that was described

by the authors of the original study as most similar transcriptionally to mouse embryonic

fibroblasts, even though it also shared some transcriptional similarities with neural

and embryonic stem cells, as well as the hematopoietic stem cell niche [133]. These

observations may explain its high di!erentiation score and reasonably high stemness

score.

The second population consisted of basal cells from the mouse trachea thought

to have both a self-renewal and multipotency abilities [146]. These cells are compared

to a non-basal population, shown functionally to self-renew, although they did not

necessarily represent a pure population. It is currently hard to assess if the negative

stemness index is simply a false negative, based on the stringent cuto!s, or alternatively

the tracheal population actually represents a mixture of several populations at di!erent

stages of di!erentiation.

192

Page 208: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0 1 2 3 4

01

23

4

Differentiation score

Ste

mness s

core

Stem cell!like

Non!stem cell!like

Figure 8.2: Stemness scores for new mouse stem cell and di!erentiated cell experimentscollected from the literature.The stemness index scores associated with each populationare shown de-convoluted to their constitutive elements — the stemness and di!erentia-tion sub-scores. The line indicates a stemness index score of 0, which is the previouslyselected stem cell/di!erentiated cell classification cuto!.

193

Page 209: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

8.3 What about side populations?

Another potential data source for stemness index application includes side

populations — phenotypically defined cells based on their ability to e"ux Hoescht 33342

dye, as describe in Section 2.3. These populations are thought to be enriched for stem

cells and have been used as means of stem cell discovery in many organs and systems.

I tested gene expression signatures from four di!erent experiments, correspond-

ing to eight di!erentially expressed sets from either side or non-side populations. It

should be mentioned here that one of the side population signatures actually repre-

sented the consensus set of di!erentially upregulated genes from four individual side

populations of bone marrow, mesenchymal, germinal and muscle origin [145]. This

SP signature scored highly (SI1 = 0.503) and the result is further confirmation of the

validity of the stemness index approach.

Of the eight tested populations in this category, seven were correctly classified

and one was potentially misclassified (87.5% accuracy). Again, the putatively misclas-

sified population was a stem cell-enriched population – a small intestine side population

thought to contain the true stem cell. It is di#cult to assess the correctness of this

assignment, as the population is not very well understood. In fact, while the authors of

the study performed experiments to show that this stem cell-enriched population local-

ized to the appropriate stem cell niche – the intestinal basal crypt, they did not perform

any functional self-renewal assays to confirm the extent of self-renewal potential. Thus,

it is possible this result is not actually a false negative, but rather a very heterogeneous1The SI notation refers to the stemness index score defined in Chapter 5.

194

Page 210: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0 1 2 3 4

01

23

4

Differentiation score

Ste

mness s

core

Side population

Non!side population

Figure 8.3: Stemness scores for side and non-side populations. The stemness index scoresassociated with each population are shown de-convoluted to their constitutive elements— the stemness and di!erentiation sub-scores. The line indicates a stemness index scoreof 0, which is the previously selected stem cell/di!erentiated cell classification cuto!,based on normal stem cell data.

mixture of stem, progenitor and even more mature cells [67], so it will be of interest to

follow the future work on the purification of this population.

8.4 Are normal stemness mechanisms conserved in cancer

stem cells?

As discussed in Chapter 2, there is already some evidence that adult normal

and cancer stem cells may share molecular mechanisms, as evidenced by the use of the

195

Page 211: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

same functional pathways, such as the Wnt and Notch pathways, or as indicated by the

expression of the same cell surface marker genes, such as in breast or the blood system

[41]. As the stemness index indirectly captures these core common processes associated

with the stem cell state, it can uniquely measure the level of this conservation. I tested

twelve di!erent populations from six studies, each associated with a cancer stem cell

signature and a di!erentiated cancer cell (non-stem-cell-like) signature.

Nine of the twelve populations were correctly classified (75% accuracy) — four

of the six cancer stem cell signatures showed high stemness scores and low di!erentiation

scores, while five of the six di!erentiated cancer cell signatures showed the reverse trend

(Figure 8.4). The di!erentiated cancer cell signature generally corresponded to mature

cancer cells.

Two of the misclassified (one cancer stem cell and one non-cancer stem cell)

signatures in this set deserve special attention, since they highlight a point of particular

relevance to the context in which the stemness index can be applied. This pair of signa-

tures was derived from a single experiment, in which a GMP progenitor population has

been transformed into a leukemic L-GMP population [101]. To aid the understanding

of this point, Figure 8.5 shows a modified hematopoietic di!erentiation tree with the

L-GMP population. The cancer stem cell signature consists of the genes commonly up-

regulated between the L-GMP and HSC populations, as compared to the CMP, GMP

and MEP progenitor populations, while the non-stem-cell-like (di!erentiated) cancer

cell signature contains the exact reverse set — the genes upregulated in the CMP,GMP

and MEP populations. The stemness index results suggested a low di!erentiation score

196

Page 212: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0 1 2 3 4

01

23

4

Differentiation score

Ste

mness s

core

Cancer stem cells

Cancer non!stem cells

Figure 8.4: Stemness scores for cancer stem cell populations and di!erentiated (non-stem-cell-like) cancer cell populations.The stemness index scores associated with eachpopulation are shown de-convoluted to their constitutive elements — the stemness anddi!erentiation sub-scores. The line indicates a stemness index score of 0, which is thepreviously selected stem cell/di!erentiated cell classification cuto!. Cancer stem cellpopulations generally show very high stemness scores and low di!erentiation scores,suggestive of a self-renewal signature. The few misclassified populations are discussedfurther in the text.

197

Page 213: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

LT-

HSC

ST-

HSC

MPP

CMP CLP

GMP MEPL-GMP

Self-renewing

populations

Comparison

progenitor

populations

Figure 8.5: The modified hematopoietic di!erentiation hierarchy tree shows the progres-sive restriction of di!erentiation potential from the true stem cell (LT-HSC) to morerestricted progenitor cells. A leukemic stem-cell-like cell (L-GMP) could be derivedfrom a myeloid progenitor cell, called a granulocyte-monocyte progenitor (GMP) [101].

for the non-stem-cell-like cancer signature and a high di!erentiation score for the cancer

stem cell signature (Figure 8.4). Since the stemness index relies on di!erential expres-

sion information, it may be that because the subtractive populations are themselves

relatively close to the stem cell in the di!erentiation hierarchy, the self-renewal program

is harder to identify.

These results also indicated that the stemness index scoring is not appropriate

for application to experiments that identify signatures from populations at equal stages

of di!erentiation, such as a normal and a cancer mammary stem cell.

198

Page 214: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

8.5 Stemness in metastasis

Another application that is relevant to stem cell biology is the assessment of

the extent of self-renewal and stemness associated with metastatic cancer populations.

In Chapter 2, I discussed some of the similarities between metastatic cancer cells, nor-

mal stem cells, and cancer stem cells. Specifically, these cells may all share the need for

extensive self-renewal — an important feature for the successful invasion and establish-

ment of a cancer cell in a new environment. Unfortunately, as metastatic cancer gene

expression data for mouse is rarely available, I could only test four populations — two

metastatic cancer populations and two primary cancer cell populations.

The results entirely concurred with the hypothesis that metastatic populations

exhibit the molecular properties of stem cells, as all four populations were correctly

classified based on this hypothesis (100% accuracy). The di!erence in stemness indices

between the metastatic and non-metastatic populations was remarkably high for each

set of experiments.

8.6 Summary

This chapter addressed the last question of this dissertation: can we use gene

expression data to predict stem cells? I used the stemness index defined in earlier chap-

ters and applied it to data from various sources. Section 8.1 gave a brief overview of the

motivation and the data selection. Section 8.2 described the application of the stem-

ness index score to normal stem cells. Section 8.3 tested the stemness signature in side

199

Page 215: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

0 1 2 3 4

01

23

4

Differentiation score

Ste

mness s

core

Metastatic cancer cells

Non!metastatic cancer cells

Figure 8.6: Stemness scores for metastatic and non-metastatic cancer populations.Thestemness index scores associated with each population are shown de-convoluted to theirconstitutive elements — the stemness and di!erentiation sub-scores. The line indicatesa stemness index score of 0, which is the previously selected stem cell/di!erentiated cellclassification cuto!. Metastatic populations both show very high stemness scores andlow di!erentiation scores, suggestive of a self-renewal signature.

200

Page 216: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

populations, Section 8.4 measured the stemness index of cancer stem cell populations,

and the final section, Section 8.5, evaluated the stemness of metastatic populations.

201

Page 217: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Chapter 9

Conclusion

In this final chapter of the dissertation, I summarize my findings along with

some concluding thoughts and speculations. I also present my thoughts and ideas on

possible future directions. Section 9.1 contains a general discussion of my findings, while

Section 9.2 presents the future directions.

9.1 Discussion

This dissertation assessed computationally the validity of three independent

hypotheses:

1. Functional redundancy and tissue-specific expression mask the common stem cell

mechanisms.

2. Stemness mechanisms are conserved between mouse and human stem cells.

3. We can accurately predict the di!erentiation state of a cell based on its gene

202

Page 218: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

expression signature.

To assess functional redundancy in stem cells, I developed a new methodology

to test for global reproducible expression of gene modules across multiple conditions.

The Stemness Meta-Analysis (SMA) method used meta-analysis techniques to identify

recurrently upregulated modules across many stem cell experiments. It subsequently

used other techniques to narrow the candidate stemness module list to only gene sets

that were upregulated in most stem cell types and were specific to stem cells as opposed

to di!erentiated cells. I identified 103 murine stemness modules of evolutionarily re-

lated homologous genes with reproducible, statistically significant and stem cell-specific

upregulation in many mouse stem cell types. The results indicated that if we do ac-

count for functional redundancy and tissue-specific expression, previously undiscovered

stemness mechanisms emerged from the mouse stem cell data.

To address the conservation of stem cell mechanisms between mouse and hu-

man cells, I also applied the stemness meta-analysis method to a human stem cell com-

pendium. The results suggested the human data were significantly more heterogeneous

than the mouse data, perhaps related to the lack of good human marker genes that

may be used for isolation of pure populations. I found conservation of only five major

stemness families between mouse and human: the Notch, Frizzled, Chd, TCF/LEF, and

Integrin ! families.

To address the predictiveness of stem cells from expression signatures, I defined

a stemness index score that measures how stem cell-like a new gene expression signature

is. I applied the stemness index scoring to mouse expression signatures from new stem

203

Page 219: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

cell experiments, side populations, cancer stem cells, and metastatic populations. The

results indicated that the mouse stemness modules, as used by the index score, can

faithfully predict normal stem cells and side populations, as well as cancer stem cells

and metastatic cells.

The mouse stemness modules identified in this study are the most central

component of my stemness research. They provide a glimpse into “the life” of a stem

cell. In earlier chapters, I mentioned that stem cells have to maintain a balance between

four main states: proliferation, quiescence, apoptosis and di!erentiation. The stemness

modules reflect this balance: I observed both proto-oncogene families and members

of signaling pathways that actively promote self-renewal, also counteracted by tumor-

suppressor families and signaling molecules involved in the promotion of quiescence.

By now, much evidence shows that stem cells are not solely intrinsically de-

fined, but they also need to rely on extrinsic factors provided by the stem cell niche.

Interaction with the extracellular matrix is essential for the maintenance of these cells;

this necessity was reflected in the presence of adhesion molecules among the stemness

modules.

Stem cells are also rare cells and not dispensable to their organ or system type;

the supply of cells in each organ depends on the proper function of these rare cells.

Thus, it should not be surprising that proteins associated with proper protein folding

and repair, as well as DNA damage repair appear on the list of stemness modules.

204

Page 220: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

9.2 Future directions

9.2.1 Application to human data

In the short term, the stemness index scoring could be applied to human data

from various data sources using the human stemness modules. However, because of the

small number of di!erentiation modules (only four) and the relatively small number

of stemness modules, the human feature set may not be su#ciently informative to

successfully distinguish between human stem-cell-like and non-stem-cell-like signatures.

However, since many previous studies from the cancer field indicate that the mouse

cancer signatures could be predictive for human cancer data, a plausible initial step

could be to map expression signatures from new human experiments directly to mouse

and use the mouse stemness modules to predict stemness.

9.2.2 Application to alternative splicing and miRNA data

One of the advantages of the stemness meta-analysis method is that it is a

general purpose method and can be applied to any input data type to test di!erent

hypotheses. If we restrict the input profiling data to only single lineage experiments, the

method could be directly used to identify lineage-specific families – markers of individual

stem cell types. Such an application would resemble the e!orts of the research group

behind the Plurinet [117].

In addition, regulation of cell type specificity is achieved not only through ho-

mologous proteins – alternative splicing and miRNA regulation may play a significant

205

Page 221: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

role as well. miRNA target prediction has been an active field of research for the last

several years, but the number of miRNA targets predicted for every miRNA is usually

very large. We could use the meta-analysis method structure to search for miRNA tar-

gets commonly upregulated across many di!erentiated cells, which may suggest miRNAs

highly active in stem cells.

Another possible application, given solid expression test data is to define fami-

lies of alternatively spliced isoforms, instead of homolog protein modules. Many proteins

are known to use alternatively spliced isoforms to induce tissue specificity to various cell

types. This application, however, may be more long-term as alternative splicing data

for various stem cell types is not available yet.

9.2.3 Addition of niche data

Stem cell niches play a central role in the maintenance of stem cells. Even

though niches are better understood at present than they were a decade ago, very

few profiling experiments exist that examine the expression of their cells. Given data

from various stem cell niches, it would be very interesting to compare the stemness

and potential “niche-ness” signatures, as I expect that the cross-talk between stem and

niche cells would yield many common factors.

9.2.4 Methodological improvements

One aspect that the current SMA method does not take into account is the

possible specialization of di!erent genes within a module to cells at various stages of

206

Page 222: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

di!erentiation. Because of that, it is possible to observe stemness families that have

member genes with a non-neglible upregulation in di!erentiated cells. We could define

a score that measures the specificity level of each gene in a given stemness module to

stem cells or di!erentiated cells.

A more important improvement over the current method would be to use

phylogenetically informed homolog modules. While the single e-value alignment cuto!

I used to define homolog families worked well for many gene modules, some homolog

group assignments could be improved. A paralog/homolog family-specific cuto! that

accounts for the evolution of the family would be certainly more informative, though

non-trivial to define.

9.2.5 Possible biological experiments

More than a few proteins from the set of stemness modules have already been

implicated as self-renewal genes or as potential master regulators of di!erentiation, and

many more could still be tested. Transcription factor stemness modules could be used

to test for new reprogramming candidates.

RNAi-mediated silencing of individual stemness family members could be used

to test directly their role in di!erentiation, in a manner similar to the one used by Suzuki

et al. [169], where RNAi was used to silence Myb. The e!ects of Myb silencing resembled

very closely the e!ects observed after treatment with a di!erentiation-inducing drug

[169].

Since many of these proteins have been also implicated in metastatic can-

207

Page 223: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

cer, Matrigel invasiveness assays could be used to test the metastatic abilities of cells

that have had individual stemness family members knocked out, or otherwise silenced.

Invasiveness abilities could also be tested through stemness gene (module) activation

(knock-in) experiments in normal cancer cell lines, or alternatively through drug inhi-

bition of genes from stemness modules in metastatic cancer lines.

208

Page 224: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Appendix A

Definitions of key terms

Adenocarcinoma Cancer of the protective epithelial cells that line internal organs.

Apoptosis Cellular death associated with normal cell turnover. Cancer cells generally

lose their ability to regulate this process.

Carcinoma Malignant cancer of the epithelial cells.

Cancer stem cells Rare cancer cells that have the ability to self-renew and give rise

to di!erentiated cancer cells. These cells are thought to be at the top of the

di!erentiation tree in the cancer cell hierarchy, just like normal stem cells are at

the top of the normal cell di!erentiation hierarchy.

Cell-type diversity An entropy-based score that measures how evenly distributed the

upregulation of a gene module is across di!erent cell types.

Di!erentiation The process through which mature cells arise from less di!erentiated

stem and progenitor cells.

209

Page 225: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Epithelial-mesenchymal transition (EMT) A morphological and phenotypic tran-

sition from an epithelial state (adherent, structured, not migratory) to a mesenchy-

mal state (less structured and more migratory). This transition is essential for a

successful invasion of a new tissue and is considered to be one of the hallmarks of

metastatic cancer.

FACS Fuorescence-activated cell-sorting flow cytometry is a widely used technique in

the stem cell field. FACS is used to isolate pure populations based on their cell

surface expression signature pattern.

Gene-usage diversity An entropy-based score that measures how evenly distributed

the upregulation of the genes in a gene module is.

Induced pluripotent stem (iPS) cells Cells derived from fully di!erentiated cells,

which upon induction with one or more transcription factors, such as Myc or Sox2

can transform into pluripotent cells, bearing all the hallmarks of embryonic stem

cells.

Inner cell mass The cells that give rise to all three germ layers – ectoderm, meso-

derm and endoderm – of the embryo, but not to the extra-embryonic tissue layer.

Embryonic stem cells are derived from the inner cell mass.

Inverse-variance weighting A technique used in standard meta-analysis, where the

contribution of each study to an overall combined e!ect is weighted by the inverse

of the variance associated with the measured e!ect in the study.

210

Page 226: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Meta-analysis A field of statistics that focuses on the inference of a measurable com-

bined e!ect through the integration of the individual e!ects measured in many

di!erent studies.

Metastatic tumor Tumors that have evolved from a primary tumor, but have invaded

a more distant and unrelated to the primary tumor tissue.

Monoclonal Arising from a single common ancestral cell.

Multipotent cells Cells that have the ability to give rise to only the cells within a

particular organ or system type. This term is most often associated with adult

stem cells, such as hematopoietic or neural stem cells.

Pluripotent cells Cells that have the ability to give rise to any cell with an organism

with the exception of the extra-embryonic tissue layer that forms the placenta. The

cells that are included in this category can make all three embryonic developmental

layers: ectoderm, mesoderm and endoderm.

Primary tumors Tumors that develop at the site of origin of the ancestral cell muta-

tion

Proliferation The process of active self-renewal of normal and cancer stem cells.

Quiescence An inactive non-proliferative stem cell state.

Recurrence Reproducible upregulation of a module or gene across many studies or

experiments.

211

Page 227: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Side population A stem cell-enriched population that is characterized by its ability

to e"ux drugs, toxins, or dyes using the ABC family of transporter proteins.

Squamous cell carcinoma Cancer of the secretory cells whose purpose is to release

secretions that protect the epithelial linings of internal organs.

Stem cells Functionally defined undi!erentiated cells that are characterized by their

ability to self-renew and give rise to many mature cell types.

Trophectoderm The cells that form the extra-embryonic tissues and give rise to the

placenta.

Tumor An abnormal growth of cells that lose the ability to undergo controlled prolif-

eration.

Tumor heterogeneity Subtype, individual organism, or cellular disparity between the

cells in a single tumor growth.

212

Page 228: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Appendix B

Tables of stemness modules

Homolog AFA module name Cell diversity Recurrence FDRNotch; Laminin; Collagen 3.37 0.006Kif2 3.20 0.011Myh 3.15 0.021Tubulin 3.11 0Prps 3.10 0Hdgf 3.08 0.014Chd/Smarc 3.06 0Ppi 3.02 0.009Ephrin 3.02 0.046Frizzled 3.01 0.008Ncl/Supt 3.00 0.01Sfrs 2.98 0Grb 2.96 0.01Nap1 2.93 0.011Rcc/Herc 2.93 0.036Wdr 2.93 0Nme 2.92 0.039Mapk 2.91 0.001Sfrs4–6 2.88 0.005Myb 2.87 0Plk 2.87 0.007Hsp (heat shock proteins) 2.83 0.001Abr/Bcr 2.82 0Mcm 2.81 0

Continued on next page

213

Page 229: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Table B.1 – continued from previous pageHomolog AFA module name Cell diversity Recurrence FDRCcnb 2.80 0.0002Rcn 2.80 0.007Terf 2.79 0.021Myc 2.79 0Aurora 2.77 0Dusp1–4 2.77 0.007Fus/Nola 2.77 0.015Id (inhibitor of di!erentiation) 2.76 0.015p53 (p53, p63, p73) 2.75 0.011Kpna 2.74 0.027Itga (Integrin alpha) 2.74 0.008Pbx 2.73 0.003F2r 2.73 0.006Smo 2.72 0.005Camk 2.72 0.049Ldha 2.71 0.009Cdkn1a–c (p21,p27,p57) 2.71 0.003Ccne 2.70 0.0001Rbp 2.70 0Cct (chaperonin) 2.69 0Sfrp 2.69 0.0005Tie 2.69 0.006Brca1/Mcph1 2.69 .009Acsm/Acsf 2.68 0.027Spred 2.68 0.033Lypla 2.68 0.011Fen1/Gen1 2.67 0Ccn 2.66 0.007Slc2a 2.65 0.009Pkp 2.65 0.0005Meis 2.63 0.0003Hnrnpa 2.63 0.01Hnrnpd 2.63 0.0002Lim 2.61 0.019Hmgb (high mobility) 2.60 0.003Khdrbs 2.59 0.033Ctbp 2.59 0.007Mkrn 2.58 0.026Bub1/1b 2.58 0.002Rnf125/138/166 2.57 0.007

Continued on next page

214

Page 230: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Table B.1 – continued from previous pageHomolog AFA module name Cell diversity Recurrence FDREya (eyeless) 2.57 0.005Ipo (importin) 2.57 0Acot 2.56 0.033Ctnna 2.56 0.007Cks 2.56 0.0003Gnb 2.56 0.011Msh 2.56 0.0035Lancl 2.56 0.026Lsp1/Ccdc113 2.55 0.004Ube2/Birc 2.55 0.023Ssbp 2.53 0.012Kif20/Kif23 2.51 .001Tcf/LEF 2.51 0.0009Ilf 2.50 0.033

Table B.1: List of mouse stemness homolog AFA modules.

Functional AFA module name Cell diversity Recurrence FDRGo:liver development 3.07 0Go:regulation of TGF" pathway 3.03 0.004Mouse PPI Interactions Module 11-4-5 3.03 0.001Go:blastocyst growth 3.02 0.022Go:establishment of planar polarity 2.95 0.011Human PPI Interactions Module147-11-15 2.93 0Human PPI Interactions Module251-11-14 2.90 0Human PPI Interactions Module3-21-54 2.89 0Go:cartilage condensation 2.89 0.021Go:determination of ante/post axis embryo 2.89 0.001Human PPI Interactions Module96-14-31 2.88 0.001Go:female pronucleus 2.86 0.008Human PPI Interactions Module54-16-32 2.85 0Go:RNA helicase activity 2.84 0Go:embryonic cleavage 2.81 0.014Human PPI Interactions Module258-10-14 2.80 0.021Go:neuromuscular synaptic transmission 2.79 0.033Go:morphogenesis of an epithelial sheet 2.79 0.014

Continued on next page

215

Page 231: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Table B.2 – continued from previous pageFunctional AFA module name Cell diversity Recurrence FDR

Go:heart morphogenesis 2.77 0Human PPI Interactions Module171-4-6 2.74 0Go:paraxial mesoderm morphogenesis 2.71 0.0335Human PPI Interactions Module263-5-5 2.70 0.024BC:Transcr. activation of dbpb from mRNA 2.69 0.015Human PPI Interactions Module182-6-7 2.69 0.039Go:centric heterochromatin 2.68 0Go:methylation-dependent chromatin silencing 2.68 0Human PPI Interactions Module236-5-5 2.67 0.008Human PPI Interactions Module115-18-40 2.67 0Bcl-xL-p53-PUMA DNA damage complex 2.67 .0145Go:pyrimidine base metabolic process 2.67 0.002Go:ER-Golgi intermediate compartment 2.65 0.0238DNA polymerase delta complex 2.65 0.0004Go:neg. regulation of protein import into nucleus 2.65 0.005Go:mitochondrial ribosome 2.64 0Go:cell structure disassembly during apoptosis 2.63 0.0362Human PPI Interactions Module172-4-5 2.62 0.0009Go:nuclear speck 2.61 0.007Go:cytosolic small rib. subunit sen. Eukaryota 2.61 0.0015HumanPPI Interactions Module264-14-23 2.60 0Go:cell migration involved in gastrulation 2.57 0.009Go:cysteine metabolic process 2.56 0.0075Human PPI Interactions Module232-4-4 2.56 0.045Go:imprinting 2.56 0Go:DS break repair via hom recombination 2.56 0Go:nuclear lamina 2.53 0.0008Human PPI Interactions Module228-4-6 2.50 0.045

Table B.2: List of mouse stemness functional AFA modules.

216

Page 232: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Homolog OFA module name Cell diversity Recurrence FDROrc1 2.87 0Rad51 2.80 0.001Impdh 2.80 0.005Camkk 2.80 0.008Bzw 2.78 0.048Atad/Fignl 2.77 0.009Ckap 2.73 0Shroom 2.71 0.001G3bp 2.70 0.009Cct8/Gm443 2.68 0.040Steap 2.66 0.0238Shmt 2.64 0Rrm2 2.64 0.003Top2 2.64 0.012Pa2g4/Metap2 2.63 0.012Set 2.62 0.009Fbl 2.58 0.016H2af 2.56 0.014Csrp 2.56 0.0099Galk 2.52 0.001Ruvbl 2.50 0

Table B.3: List of mouse stemness homolog OFA modules.

Functional OFA module name Cell diversity Recurrence FDRGo:hydroxymethyl-formyl related transf. activity 2.91 0.005Go:internal protein amino acid acetylation 2.90 0.0005Rag1-Rag2-Ku70-Ku80 protein-DNA complex 2.79 0.014Go:DNA replication synthesis of RNA primer 2.73 0.003PU1-associated protein complex 2.72 0.011Go:nerve growth factor receptor signaling pathway 2.63 0.005Go:dosage compensation by inactivation of X chrom 2.61 0.005Go:pos. regulation of gene-specific transcription 2.59 0.0008Human PPI Interactions Module177-4-4 2.59 0.0108Cd2ap-Fyn complex 2.58 0.008Go:NF-kappaB import into nucleus 2.56 0.023Go:regulation of DNA replication initiation 2.55 0.001Go:positive regulation of exocytosis 2.52 0.002

Table B.4: List of mouse stemness functional OFA modules.

217

Page 233: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Bibliography

[1] Kazuhiro Aiba, Timur Nedorezov, Yulan Piao, Akira Nishiyama, Ryo Matoba,Lioudmila V Sharova, Alexei A Sharov, Shinya Yamanaka, Hitoshi Niwa, andMinoru S H Ko. Defining developmental potency and cell lineage trajectoriesby expression profiling of di!erentiating mouse embryonic stem cells. DNA Res,16(1):73–80, Feb 2009.

[2] K Akashi, D Traver, T Miyamoto, and I L Weissman. A clonogenic commonmyeloid progenitor that gives rise to all myeloid lineages. Nature, 404(6774):193–197, Mar 2000.

[3] Koichi Akashi, Xi He, Jie Chen, Hiromi Iwasaki, Chao Niu, Brooke Steenhard,Jiwang Zhang, Je! Haug, and Linheng Li. Transcriptional accessibility for genesof multiple tissues and hematopoietic lineages is hierarchically controlled duringearly hematopoiesis. Blood, 101(2):383–389, Jan 2003.

[4] Muhammad Al-Hajj, Max S Wicha, Adalberto Benito-Hernandez, Sean J Mor-rison, and Michael F Clarke. Prospective identification of tumorigenic breastcancer cells. Proc Natl Acad Sci U S A, 100(7):3983–3988, Apr 2003.

[5] Bruce Alberts and Et Al. Molecular Biology of the Cell [Book and CD-ROM] .Garland Science, March 2002.

[6] Lyle Armstrong, Owen Hughes, Sun Yung, Louise Hyslop, Rebecca Stewart, IlkaWappler, Heiko Peters, Theresia Walter, Petra Stojkovic, Jerome Evans, MiodragStojkovic, and Majlinda Lako. The role of PI3K/AKT, MAPK/ERK and NFkap-pabeta signalling in the maintenance of human embryonic stem cell pluripotencyand viability highlighted by transcriptional profiling and functional analysis. HumMol Genet, 15(11):1894–1913, Jun 2006.

[7] M Ashburner, C A Ball, J A Blake, D Botstein, H Butler, J M Cherry, A PDavis, K Dolinski, S S Dwight, J T Eppig, M A Harris, D P Hill, L Issel-Tarver,A Kasarskis, S Lewis, J C Matese, J E Richardson, M Ringwald, G M Rubin,and G Sherlock. Gene ontology: tool for the unification of biology. The GeneOntology Consortium. Nat Genet, 25(1):25–29, May 2000.

218

Page 234: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[8] Jerome Aubert, Hannah Dunstan, Ian Chambers, and Austin Smith. Functionalgene screening in embryonic stem cells implicates Wnt antagonism in neural dif-ferentiation. Nat Biotechnol, 20(12):1240–1245, Dec 2002.

[9] T W Austin, G P Solar, F C Ziegler, L Liem, and W Matthews. A role forthe Wnt gene family in hematopoiesis: expansion of multilineage progenitor cells.Blood, 89(10):3624–3635, May 1997.

[10] Marina Bacac and Ivan Stamenkovic. Metastatic cancer cell. Annu Rev Pathol,3:221–247, 2008.

[11] G D Bader, I Donaldson, C Wolting, B F Ouellette, T Pawson, and C WHogue. BIND–The Biomolecular Interaction Network Database. Nucleic AcidsRes, 29(1):242–245, Jan 2001.

[12] Hossein Baharvand, Mohsen Hajheidari, Saeid Kazemi Ashtiani, and Ghasem Hos-seini Salekdeh. Proteomic signature of human embryonic stem cells. Proteomics,6(12):3544–3549, Jun 2006.

[13] Eduard Batlle, Je!rey T Henderson, Harry Beghtel, Maaike M W van den Born,Elena Sancho, Gerwin Huls, Jan Meeldijk, Jennifer Robertson, Marc van de We-tering, Tony Pawson, and Hans Clevers. Beta-catenin and TCF mediate cell posi-tioning in the intestinal epithelium by controlling the expression of EphB/ephrinB.Cell, 111(2):251–263, Oct 2002.

[14] Fariba Behbod, Wa Xian, Chad A Shaw, Susan G Hilsenbeck, Anna Tsimelzon,and Je!rey M Rosen. Transcriptional profiling of mammary gland side populationcells. Stem Cells, 24(4):1065–1074, Apr 2006.

[15] Abdelaziz Beqqali, Jantine Kloots, Dorien Ward-van Oostwaard, Christine Mum-mery, and Robert Passier. Genome-wide transcriptional profiling of human em-bryonic stem cells di!erentiating to cardiomyocytes. Stem Cells, 24(8):1956–1967,Aug 2006.

[16] M Bjerknes and H Cheng. Clonal analysis of mouse intestinal epithelial progeni-tors. Gastroenterology, 116(1):7–14, Jan 1999.

[17] Roy Blum, Rashmi Gupta, Patricia E Burger, Christopher S Ontiveros, Sarah NSalm, Xiaozhong Xiong, Alexander Kamb, Holger Wesche, Lisa Marshall, GeneCutler, Xiangyun Wang, Jiri Zavadil, David Moscatelli, and E Lynette Wilson.Molecular signatures of prostate stem cells reveal novel signaling pathways andprovide insights into prostate cancer. PLoS One, 4(5):e5722, 2009.

[18] C Booth and C S Potten. Gut instincts: thoughts on intestinal epithelial stemcells. J Clin Invest, 105(11):1493–1499, Jun 2000.

219

Page 235: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[19] Andrew C Boquest, Aboulghassem Shahdadfar, Katrine Fronsdal, Olafur Sigur-jonsson, Siv H Tunheim, Philippe Collas, and Jan E Brinchmann. Isolation andtranscription profiling of purified uncultured human stromal stem cells: alterationof gene expression after in vitro cell culture. Mol Biol Cell, 16(3):1131–1141, Mar2005.

[20] M. Borenstein, L.V. Hedges, J.P.T. Higgins, and H.R. Rothstein. Introduction tometa-analysis . Wiley, 2009.

[21] J E Bottenstein, S F Hunter, and M Seidel. CNS neuronal cell line-derived factorsregulate gliogenesis in neonatal rat brain cultures. J Neurosci Res, 20(3):291–303,Jul 1988.

[22] Paola Bovolenta, Pilar Esteve, Jose Maria Ruiz, Elsa Cisneros, and Javier Lopez-Rios. Beyond Wnt inhibition: new functions of secreted Frizzled-related proteinsin development and disease. J Cell Sci, 121(Pt 6):737–746, Mar 2008.

[23] Laurie A Boyer, Tong Ihn Lee, Megan F Cole, Sarah E Johnstone, Stuart S Levine,Jacob P Zucker, Matthew G Guenther, Roshan M Kumar, Heather L Murray,Richard G Jenner, David K Gi!ord, Douglas A Melton, Rudolf Jaenisch, andRichard A Young. Core transcriptional regulatory circuitry in human embryonicstem cells. Cell, 122(6):947–956, Sep 2005.

[24] Ralph Brandenberger, Irina Khrebtukova, R Scott Thies, Takumi Miura, CaiJingli, Raj Puri, Tom Vasicek, Jane Lebkowski, and Mahendra Rao. MPSSprofiling of human embryonic stem cells. BMC Dev Biol, 4:10, Aug 2004.

[25] Ralph Brandenberger, Henry Wei, Sally Zhang, Shirley Lei, Jaji Murage, Gre-gory J Fisk, Yan Li, Chunhui Xu, Rixun Fang, Karl Guegler, Mahendra S Rao,Ramumkar Mandalam, Jane Lebkowski, and Lawrence W Stanton. Transcrip-tome characterization elucidates signaling networks that control human ES cellgrowth and di!erentiation. Nat Biotechnol, 22(6):707–716, Jun 2004.

[26] Johanna Buchstaller, Lukas Sommer, Matthias Bodmer, Reinhard Ho!mann, UeliSuter, and Ned Mantei. E#cient isolation and gene expression profiling of smallnumbers of neural crest stem cells and developing Schwann cells. J Neurosci,24(10):2357–2365, Mar 2004.

[27] Alexandra B Byrne, Matthew T Weirauch, Victoria Wong, Martina Koeva, Scott JDixon, Joshua M Stuart, and Peter J Roy. A global analysis of genetic interactionsin Caenorhabditis elegans. J Biol, 6(3):8, Sep 2007.

[28] Jingli Cai, Jia Chen, Ying Liu, Takumi Miura, Yongquan Luo, Jeanne F Loring,William J Freed, Mahendra S Rao, and Xianmin Zeng. Assessing self-renewaland di!erentiation in human embryonic stem cell lines. Stem Cells, 24(3):516–530,Mar 2006.

220

Page 236: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[29] John D Calhoun, Raj R Rao, Susanne Warrenfeltz, Romdhane Rekaya, StephenDalton, John McDonald, and Steven L Stice. Transcriptional profiling of ini-tial di!erentiation events in human embryonic stem cells. Biochem Biophys ResCommun, 323(2):453–464, Oct 2004.

[30] Feng Cao, Roger A Wagner, Kitchener D Wilson, Xiaoyan Xie, Ji-Dong Fu,Micha Drukker, Andrew Lee, Ronald A Li, Sanjiv S Gambhir, Irving L Weissman,Robert C Robbins, and Joseph C Wu. Transcriptional and functional profilingof human embryonic stem cell-derived cardiomyocytes. PLoS One, 3(10):e3474,2008.

[31] Yihai Cao, Renhai Cao, and Eva-Maria Hedlund. R Regulation of tumor an-giogenesis and metastasis by FGF and PDGF signaling pathways. J Mol Med,86(7):785–789, Jul 2008.

[32] Ugo Cavallaro and Gerhard Christofori. Multitasking in tumor progression: sig-naling functions of cell adhesion molecules. Ann N Y Acad Sci, 1014:58–66, Apr2004.

[33] Grant A Challen and Melissa H Little. A side order of stem cells: the SPphenotype. Stem Cells, 24(1):3–12, Jan 2006.

[34] Stuart M Chambers, Nathan C Boles, Kuan-Yin K Lin, Megan P Tierney,Teresa V Bowman, Steven B Bradfute, Alice J Chen, Akil A Merchant, OlgaSirin, David C Weksberg, Mehveen G Merchant, C Joseph Fisk, Chad A Shaw,and Margaret A Goodell. Hematopoietic fingerprints: an expression database ofstem cells and their progeny. Cell Stem Cell, 1(5):578–591, Nov 2007.

[35] Sebastien Chateauvieux, Jean-Laurent Ichante, Bruno Delorme, Vincent Frouin,Genevieve Pietu, Alain Langonne, Nathalie Gallay, Luc Sensebe, Michele T Mar-tin, Kateri A Moore, and Pierre Charbord. Molecular profile of mouse stromalmesenchymal stem cells. Physiol Genomics, 29(2):128–138, Apr 2007.

[36] Jung Kyoon Choi, Jong Young Choi, Dae Ghon Kim, Dong Wook Choi, Bu YeoKim, Kee Ho Lee, Young Il Yeom, Hyang Sook Yoo, Ook Joon Yoo, and SangsooKim. Integrative analysis of multiple gene expression profiles applied to livercancer study. FEBS Lett, 565(1-3):93–100, May 2004.

[37] Michael F Clarke. A self-renewal assay for cancer stem cells. Cancer ChemotherPharmacol, 56 Suppl 1:64–68, Nov 2005.

[38] Nicole Cloonan, Alistair R R Forrest, Gabriel Kolle, Brooke B A Gardiner, Ge-o!rey J Faulkner, Mellissa K Brown, Darrin F Taylor, Anita L Steptoe, ShivangiWani, Graeme Bethel, Alan J Robertson, Andrew C Perkins, Stephen J Bruce,Clarence C Lee, Swati S Ranade, Heather E Peckham, Jonathan M Manning,Kevin J McKernan, and Sean M Grimmond. Stem cell transcriptome profilingvia massive-scale mRNA sequencing . Nat Methods, 5(7):613–9, Jul 2008.

221

Page 237: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[39] Simon M Cool and Victor Nurcombe. Heparan sulfate regulation of progenitorcell fate. J Cell Biochem, 99(4):1040–1051, Nov 2006.

[40] Xiangqin Cui and Gary A Churchill. Statistical tests for di!erential expressionin cDNA microarray experiments . Genome Biol, 4(4):210, 2003.

[41] Piero Dalerba, Robert W Cho, and Michael F Clarke. Cancer stem cells: modelsand concepts. Annu Rev Med, 58:267–284, 2007.

[42] Michael Dean. ABC transporters, drug resistance, and cancer stem cells. JMammary Gland Biol Neoplasia, 14(1):3–9, Mar 2009.

[43] Janos Demeter, Catherine Beauheim, Jeremy Gollub, Tina Hernandez-Boussard,Heng Jin, Donald Maier, John C Matese, Michael Nitzberg, Farrell Wymore,Zachariah K Zachariah, Patrick O Brown, Gavin Sherlock, and Catherine A Ball.The Stanford Microarray Database: implementation of new analysis tools andopen source release of software. Nucleic Acids Res, 35(Database issue):D766–70,Jan 2007.

[44] Pierre-Yves Desprez, Tomoki Sumida, and Jean-Philippe Coppe. Helix-loop-helixproteins in mammary gland development and breast cancer. J Mammary GlandBiol Neoplasia, 8(2):225–239, Apr 2003.

[45] Theresa A DiMeo, Kristen Anderson, Pushkar Phadke, Chang Feng, Charles MPerou, Steven Naber, and Charlotte Kuperwasser. A novel lung metastasis signa-ture links Wnt signaling with cancer cell self-renewal and epithelial-mesenchymaltransition in basal-like breast cancer. Cancer Res, 69(13):5364–5373, Jul 2009.

[46] Kim-Anh et al. Do, editor. Bayesian Inference for Gene Expression and Pro-teomics . Cambridge University Press, 2006.

[47] Jason M Doherty, Michael J Geske, Thaddeus S Stappenbeck, and Jason C Mills.Diverse adult stem cells share specific higher-order patterns of gene expression.Stem Cells, 26(8):2124–2130, Aug 2008.

[48] Gabriela Dontu, Wissam M Abdallah, Jessica M Foley, Kyle W Jackson, Michael FClarke, Mari J Kawamura, and Max S Wicha. In vitro propagation and tran-scriptional profiling of human mammary stem/progenitor cells. Genes Dev,17(10):1253–1270, May 2003.

[49] Mathew C Easterday, Joseph D Dougherty, Robert L Jackson, Jing Ou, IchiroNakano, Andres A Paucar, Babak Roobini, Mehrnoosh Dianati, Dwain K Irvin,Irving L Weissman, Alexey V Terskikh, Daniel H Geschwind, and Harley I Korn-blum. Neural progenitor genes. Germinal zone expression and analysis of geneticoverlap in stem cell populations. Dev Biol, 264(2):309–322, Dec 2003.

222

Page 238: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[50] Craig E Eckfeldt, Eric M Mendenhall, Catherine M Flynn, Tzu-Fei Wang,Michael A Pickart, Suzanne M Grindle, Stephen C Ekker, and Catherine M Ver-faillie. Functional analysis of human hematopoietic stem cell gene expressionusing zebrafish. PLoS Biol, 3(8):e254, Aug 2005.

[51] Ron Edgar, Michael Domrachev, and Alex E Lash. Gene Expression Omnibus:NCBI gene expression and hybridization array data repository. Nucleic Acids Res,30(1):207–210, Jan 2002.

[52] Tariq Enver, Shamit Soneji, Chirag Joshi, John Brown, Francisco Iborra, Tor-ben Orntoft, Thomas Thykjaer, Edna Maltby, Kath Smith, Raed Abu Dawud,Mark Jones, Maryam Matin, Paul Gokhale, Jonathan Draper, and Peter W An-drews. Cellular di!erentiation hierarchies in normal and culture-adapted humanembryonic stem cells. Hum Mol Genet, 14(21):3129–3140, Nov 2005.

[53] Ilaria Falciatori, Giovanna Borsellino, Nikolaos Haliassos, Carla Boitani, Ser-ena Corallini, Luca Battistini, Giorgio Bernardi, Mario Stefanini, and ElenaVicini. Identification and enrichment of spermatogonial stem cells displayingside-population phenotype in immature mouse testis. FASEB J, 18(2):376–378,Feb 2004.

[54] Tea Fevr, Sylvie Robine, Daniel Louvard, and Joerg Huelsken. Wnt/beta-cateninis essential for intestinal homeostasis and maintenance of intestinal stem cells. MolCell Biol, 27(21):7551–7559, Nov 2007.

[55] E Camilla Forsberg, Susan S Prohaska, Sol Katzman, Garrett C He!ner, Josh MStuart, and Irving L Weissman. Di!erential expression of novel potential regula-tors in hematopoietic stem cells. PLoS Genet, 1(3):e28, Sep 2005.

[56] Nicolas O Fortunel, Hasan H Otu, Huck-Hui Ng, Jinhui Chen, Xiuqian Mu, Tim-othy Chevassut, Xiaoyu Li, Marie Joseph, Charles Bailey, Jacques A Hatzfeld,Antoinette Hatzfeld, Fatih Usta, Vinsensius B Vega, Philip M Long, Towia ALibermann, and Bing Lim. Comment on ” ’Stemness’: transcriptional profilingof embryonic and adult stem cells” and ”a stem cell molecular signature”. Science,302(5644):393; author reply 393, Oct 2003.

[57] F H Gage. Mammalian neural stem cells. Science, 287(5457):1433–1438, Feb2000.

[58] Alexandre Gaspar-Maia, Adi Alajem, Fanny Polesso, Rupa Sridharan, Mike JMason, Amy Heidersbach, Joao Ramalho-Santos, Michael T McManus, KathrinPlath, Eran Meshorer, and Miguel Ramalho-Santos. Chd1 regulates open chro-matin and pluripotency of embryonic stem cells. Nature, 460(7257):863–868, Aug2009.

223

Page 239: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[59] E E Jr Geisert and A Frankfurter. The neuronal response to injury as visualizedby immunostaining of class III beta-tubulin in the rat. Neurosci Lett, 102(2-3):137–141, Jul 1989.

[60] Filippo G Giancotti and Guido Tarone. Positional control of cell fate through jointintegrin/receptor protein kinase signaling. Annu Rev Cell Dev Biol, 19:173–206,2003.

[61] Marios Giannakis, Thaddeus S Stappenbeck, Jason C Mills, Douglas G Leip,Michael Lovett, Sandra W Clifton, Joseph E Ippolito, Jarret I Glasscock, Mani-mozhiyan Arumugam, Michael R Brent, and Je!rey I Gordon. Molecular prop-erties of adult mouse gastric and intestinal epithelial progenitors in their niches.J Biol Chem, 281(16):11292–11300, Apr 2006.

[62] Gennadi V Glinsky. Death-from-cancer signatures and stem cell contribution tometastatic cancer. Cell Cycle, 4(9):1171–1175, Sep 2005.

[63] Gennadi V Glinsky, Olga Berezovska, and Anna B Glinskii. Microarray analysisidentifies a death-from-cancer signature predicting therapy failure in patients withmultiple types of cancer. J Clin Invest, 115(6):1503–1521, Jun 2005.

[64] M A Goodell, K Brose, G Paradis, A S Conner, and R C Mulligan. Isolationand functional properties of murine hematopoietic stem cells that are replicatingin vivo. J Exp Med, 183(4):1797–1806, Apr 1996.

[65] M A Goodell, M Rosenzweig, H Kim, D F Marks, M DeMaria, G Paradis, S AGrupp, C A Sie!, R C Mulligan, and R P Johnson. Dye e"ux studies suggest thathematopoietic stem cells expressing low or undetectable levels of CD34 antigenexist in multiple species. Nat Med, 3(12):1337–1345, Dec 1997.

[66] Robert Grutzmann, Hinnerk Boriss, Ole Ammerpohl, Jutta Luttges, HolgerKaltho!, Hans Konrad Schackert, Gunter Kloppel, Hans Detlev Saeger, and Chris-tian Pilarsky. Meta-analysis of microarray data on pancreatic cancer defines aset of commonly dysregulated genes. Oncogene, 24(32):5079–5088, Jul 2005.

[67] Ajay S Gulati, Scott A Ochsner, and Susan J Henning. Molecular properties ofside population-sorted cells from mouse small intestine. Am J Physiol GastrointestLiver Physiol, 294(1):G286–94, Jan 2008.

[68] Lorenz Haegele, Barbara Ingold, Heike Naumann, Ghazaleh Tabatabai, Birgit Le-dermann, and Sebastian Brandner. Wnt signalling inhibits neural di!erentiationof embryonic stem cells by controlling bone morphogenetic protein expression.Mol Cell Neurosci, 24(3):696–708, Nov 2003.

[69] Xi C He, Jiwang Zhang, Wei-Gang Tong, Ossama Tawfik, Jason Ross, David HScoville, Qiang Tian, Xin Zeng, Xi He, Leanne M Wiedemann, Yuji Mishina,and Linheng Li. BMP signaling inhibits intestinal stem cell self-renewal through

224

Page 240: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

suppression of Wnt-beta-catenin signaling. Nat Genet, 36(10):1117–1121, Oct2004.

[70] Heidi Hemmoranta, Sampsa Hautaniemi, Jari Niemi, Daniel Nicorici, JarmoLaine, Olli Yli-Harja, Jukka Partanen, and Taina Jaatinen. Transcriptional pro-filing reflects shared and unique characters for CD34+ and CD133+ cells. StemCells Dev, 15(6):839–851, Dec 2006.

[71] G H Heppner and B E Miller. Tumor heterogeneity: biological implications andtherapeutic consequences. Cancer Metastasis Rev, 2(1):5–23, 1983.

[72] M L Hermiston and J I Gordon. Organization of the crypt-villus axis and evolu-tion of its stem cell hierarchy during intestinal development. Am J Physiol, 268(5Pt 1):G813–22, May 1995.

[73] Claire E Hirst, Elizabeth S Ng, Lisa Azzola, Anne K Voss, Tim Thomas,Edouard G Stanley, and Andrew G Elefanty. Transcriptional profiling of mouseand human ES cells identifies SLAIN1, a novel stem cell gene. Dev Biol, 293(1):90–103, May 2006.

[74] Tse-Shun Huang, Jui-Yu Hsieh, Yu-Hsuan Wu, Chih-Hung Jen, Yang-HweiTsuang, Shih-Hwa Chiou, Jukka Partanen, Heidi Anderson, Taina Jaatinen, Yau-Hua Yu, and Hsei-Wei Wang. Functional network reconstruction reveals somaticstemness genetic maps and dedi!erentiation-like transcriptome reprogramminginduced by GATA2. Stem Cells, 26(5):1186–1201, May 2008.

[75] A L Hughes. Evolution of the integrin alpha and beta protein families. J MolEvol, 52(1):63–72, Jan 2001.

[76] Catia Igreja, Rita Fragoso, Francisco Caiado, Nuno Clode, Alexandra Henriques,Lauren Camargo, Eduardo M Reis, and Sergio Dias. Detailed molecular character-ization of cord blood-derived endothelial progenitors. Exp Hematol, 36(2):193–203,Feb 2008.

[77] Gareth J Inman, Francisco J Nicolas, James F Callahan, John D Harling,Laramie M Gaster, Alastair D Reith, Nicholas J Laping, and Caroline S Hill.SB-431542 is a potent and specific inhibitor of transforming growth factor-betasuperfamily type I activin receptor-like kinase (ALK) receptors ALK4, ALK5, andALK7. Mol Pharmacol, 62(1):65–74, Jul 2002.

[78] M Ishibashi, S L Ang, K Shiota, S Nakanishi, R Kageyama, and F Guillemot.Targeted disruption of mammalian hairy and Enhancer of split homolog-1 (HES-1)leads to up-regulation of neural helix-loop-helix factors, premature neurogenesis,and severe neural tube defects. Genes Dev, 9(24):3136–3148, Dec 1995.

[79] Natalia B Ivanova, John T Dimos, Christoph Schaniel, Jason A Hackney, Ka-teri A Moore, and Ihor R Lemischka. A stem cell molecular signature. Science,298(5593):601–604, Oct 2002.

225

Page 241: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[80] Taina Jaatinen, Heidi Hemmoranta, Sampsa Hautaniemi, Jari Niemi, DanielNicorici, Jarmo Laine, Olli Yli-Harja, and Jukka Partanen. Global gene expres-sion profile of human cord blood-derived CD133+ cells. Stem Cells, 24(3):631–641,Mar 2006.

[81] Erica L Jackson, Jose Manuel Garcia-Verdugo, Sara Gil-Perotin, Monica Roy, Al-fredo Quinones-Hinojosa, Scott VandenBerg, and Arturo Alvarez-Buylla. PDGFRalpha-positive B cells are neural stem cells in the adult SVZ that form glioma-likegrowths in response to increased PDGF signaling. Neuron, 51(2):187–199, Jul2006.

[82] Daylon James, Ariel J Levine, Daniel Besser, and Ali Hemmati-Brivanlou. TGF-beta/activin/nodal signaling is necessary for the maintenance of pluripotency inhuman embryonic stem cells. Development, 132(6):1273–1282, Mar 2005.

[83] Catriona H M Jamieson, Laurie E Ailles, Scott J Dylla, Manja Muijtjens, CarolJones, James L Zehnder, Jason Gotlib, Kevin Li, Markus G Manz, Armand Keat-ing, Charles L Sawyers, and Irving L Weissman. Granulocyte-macrophage pro-genitors as candidate leukemic stem cells in blast-crisis CML. N Engl J Med,351(7):657–667, Aug 2004.

[84] S Jung, RH Park, S Kim, YJ Jeon, DS Ham, MY Jung, SS Kim, YD Lee, CH Park,and H Suh-Kim. Id proteins facilitate self renewal and proliferation of neural stemcells. Stem Cells Dev, Sep 2009.

[85] M Kanehisa and S Goto. KEGG: kyoto encyclopedia of genes and genomes.Nucleic Acids Res, 28(1):27–30, Jan 2000.

[86] Minoru Kanehisa. The KEGG database. Novartis Found Symp, 247:91–101,2002.

[87] Yibin Kang and Joan Massague. Epithelial-mesenchymal transitions: twist indevelopment and metastasis. Cell, 118(3):277–279, Aug 2004.

[88] Stanislav L Karsten, Lili C Kudo, Robert Jackson, Chiara Sabatti, Harley I Ko-rnblum, and Daniel H Geschwind. Global analysis of gene expression in neuralprogenitors reveals specific cell-cycle, signaling, and metabolic networks. Dev Biol,261(1):165–182, Sep 2003.

[89] Yuriko Katoh and Masaru Katoh. WNT antagonist, SFRP1, is Hedgehog signal-ing target. Int J Mol Med, 17(1):171–175, Jan 2006.

[90] Mark J Kiel, Toshihide Iwashita, Omer H Yilmaz, and Sean J Morrison. Spatialdi!erences in hematopoiesis but not in stem cells indicate a lack of regional pat-terning in definitive hematopoietic stem cells. Dev Biol, 283(1):29–39, Jul 2005.

226

Page 242: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[91] Mark J Kiel, Omer H Yilmaz, Toshihide Iwashita, Osman H Yilmaz, Cox Terhorst,and Sean J Morrison. SLAM family receptors distinguish hematopoietic stem andprogenitor cells and reveal endothelial niches for stem cells. Cell, 121(7):1109–1121, Jul 2005.

[92] Mina Kim and Cindi M Morshead. Distinct populations of forebrain neural stemand progenitor cells can be isolated using side-population analysis. J Neurosci,23(33):10703–10709, Nov 2003.

[93] Yeong C Kim, Qingfa Wu, Jun Chen, Zhenyu Xuan, Yong-Chul Jung, Michael QZhang, Janet D Rowley, and San Ming Wang. The transcriptome of hu-man CD34+ hematopoietic stem-progenitor cells. Proc Natl Acad Sci U S A,106(20):8278–8283, May 2009.

[94] Yuval Kluger, David P Tuck, Joseph T Chang, Yasuhiro Nakayama, RanjanaPoddar, Naohiko Kohya, Zheng Lian, Abdelhakim Ben Nasr, H Ruth Halaban,Diane S Krause, Xueqing Zhang, Peter E Newburger, and Sherman M Weiss-man. Lineage specificity of gene expression patterns. Proc Natl Acad Sci U S A,101(17):6508–6513, Apr 2004.

[95] Salih S Kocer, Petar M Djuric, Monica F Bugallo, Sanford R Simon, and MajaMatic. Transcriptional profiling of putative human epithelial stem cells. BMCGenomics, 9:359, 2008.

[96] Maria Kokkinaki, Tin-Lap Lee, Zuping He, Jiji Jiang, Nady Golestaneh, Marie-Claude Hofmann, Wai-Yee Chan, and Martin Dym. The molecular signature ofspermatogonial stem/progenitor cells in the 6-day-old mouse testis. Biol Reprod,80(4):707–717, Apr 2009.

[97] Martina Komor, Saskia Guller, Claudia D Baldus, Sven de Vos, Dieter Hoelzer,Oliver G Ottmann, and Wolf-K Hofmann. Transcriptional profiling of hu-man hematopoiesis during in vitro lineage-specific di!erentiation. Stem Cells,23(8):1154–1169, Sep 2005.

[98] M Kondo, I L Weissman, and K Akashi. Identification of clonogenic commonlymphoid progenitors in mouse bone marrow. Cell, 91(5):661–672, Nov 1997.

[99] V Korinek, N Barker, P Moerer, E van Donselaar, G Huls, P J Peters, andH Clevers. Depletion of epithelial stem-cell compartments in the small intestineof mice lacking Tcf-4. Nat Genet, 19(4):379–383, Aug 1998.

[100] Cynthia Kosinski, Vivian S W Li, Annie S Y Chan, Ji Zhang, Coral Ho, Wai YinTsui, Tsun Leung Chan, Randy C Mi"in, Don W Powell, Siu Tsan Yuen, Suet YiLeung, and Xin Chen. Gene expression patterns of human colon tops and basalcrypts and BMP antagonists as intestinal stem cell niche factors. Proc Natl AcadSci U S A, 104(39):15418–15423, Sep 2007.

227

Page 243: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[101] Andrei V Krivtsov, David Twomey, Zhaohui Feng, Matthew C Stubbs, YingziWang, Joerg Faber, Jason E Levine, Jing Wang, William C Hahn, D GaryGilliland, Todd R Golub, and Scott A Armstrong. Transformation fromcommitted progenitor to leukaemia stem cell initiated by MLL-AF9. Nature,442(7104):818–822, Aug 2006.

[102] Paul M Krzyzanowski and Miguel A Andrade-Navarro. Identification of novelstem cell markers using gap analysis of gene expression data. Genome Biol,8(9):R193, 2007.

[103] Birgit Kulterer, Gerald Friedl, Anita Jandrositz, Fatima Sanchez-Cabo, AndreasProkesch, Christine Paar, Marcel Scheideler, Reinhard Windhager, Karl-HeinzPreisegger, and Zlatko Trajanoski. Gene expression profiling of human mes-enchymal stem cells derived from bone marrow during expansion and osteoblastdi!erentiation. BMC Genomics, 8:70, 2007.

[104] Homin K Lee, Amy K Hsu, Jon Sajdak, Jie Qin, and Paul Pavlidis. Coexpres-sion analysis of human genes across many microarray data sets. Genome Res,14(6):1085–1094, Jun 2004.

[105] Jonathan M Lee, Shoukat Dedhar, Raghu Kalluri, and Erik W Thompson. Theepithelial-mesenchymal transition: new insights in signaling, development, anddisease. J Cell Biol, 172(7):973–981, Mar 2006.

[106] Julie Lessard and Guy Sauvageau. Bmi-1 determines the proliferative capacityof normal and leukaemic stem cells. Nature, 423(6937):255–260, May 2003.

[107] Linheng Li and Ting Xie. Stem cell niche: structure and function. Annu RevCell Dev Biol, 21:605–631, 2005.

[108] Neethan A Lobo, Yohei Shimono, Dalong Qian, and Michael F Clarke. Thebiology of cancer stem cells. Annu Rev Cell Dev Biol, 23:675–699, 2007.

[109] Luca Magnani and Ryan A Cabot. Manipulation of SMARCA2 and SMARCA4transcript levels in porcine embryos di!erentially alters development and expres-sion of SMARCA1, SOX2, NANOG, and EIF1. Reproduction, 137(1):23–33, Jan2009.

[110] Sendurai A Mani, Wenjun Guo, Mai-Jing Liao, Elinor Ng Eaton, AyyakkannuAyyanan, Alicia Y Zhou, Mary Brooks, Ferenc Reinhard, Cheng Cheng Zhang,Michail Shipitsin, Lauren L Campbell, Kornelia Polyak, Cathrin Brisken, JingYang, and Robert A Weinberg. The epithelial-mesenchymal transition generatescells with properties of stem cells. Cell, 133(4):704–715, May 2008.

[111] B J Merrill, U Gat, R DasGupta, and E Fuchs. Tcf3 and Lef1 regulate lineagedi!erentiation of multipotent stem cells in skin. Genes Dev, 15(13):1688–1705,Jul 2001.

228

Page 244: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[112] Jason C Mills, Niklas Andersson, Chieu V Hong, Thaddeus S Stappenbeck, andJe!rey I Gordon. Molecular characterization of mouse gastric epithelial progenitorcells. Proc Natl Acad Sci U S A, 99(23):14819–14824, Nov 2002.

[113] Anna V Molofsky, Ricardo Pardal, Toshihide Iwashita, In-Kyung Park, Michael FClarke, and Sean J Morrison. Bmi-1 dependence distinguishes neural stem cellself-renewal from progenitor proliferation. Nature, 425(6961):962–967, Oct 2003.

[114] Rebecca J Morris, Yaping Liu, Lee Marles, Zaixin Yang, Carol Trempus, ShulanLi, Jamie S Lin, Janet A Sawicki, and George Cotsarelis. Capturing and profilingadult hair follicle stem cells. Nat Biotechnol, 22(4):411–417, Apr 2004.

[115] S J Morrison, N M Shah, and D J Anderson. Regulatory mechanisms in stemcell biology. Cell, 88(3):287–298, Feb 1997.

[116] S J Morrison and I L Weissman. The long-term repopulating subset of hematopoi-etic stem cells is deterministic and isolatable by phenotype. Immunity, 1(8):661–673, Nov 1994.

[117] Franz-Josef Muller, Louise C Laurent, Dennis Kostka, Igor Ulitsky, Roy Williams,Christina Lu, In-Hyun Park, Mahendra S Rao, Ron Shamir, Philip H Schwartz,Nils O Schmidt, and Jeanne F Loring. Regulatory networks define phenotypicclasses of human stem cell lines. Nature, 455(7211):401–405, Sep 2008.

[118] Y Nakamura, S i Sakakibara, T Miyata, M Ogawa, T Shimazaki, S Weiss,R Kageyama, and H Okano. The bHLH gene hes1 as a repressor of the neu-ronal commitment of CNS stem cells. J Neurosci, 20(1):283–293, Jan 2000.

[119] Don X Nguyen, Anne C Chiang, Xiang H-F Zhang, Juliet Y Kim, Mark GKris, Marc Ladanyi, William L Gerald, and Joan Massague. WNT/TCF signal-ing through LEF1 and HOXB9 mediates lung adenocarcinoma metastasis. Cell,138(1):51–62, Jul 2009.

[120] Satoshi Nishimura, Naoki Wakabayashi, Kazuyuki Toyoda, Kei Kashima, andShoji Mitsufuji. Expression of Musashi-1 in human normal colon crypt cells: apossible stem cell marker of human colon epithelium. Dig Dis Sci, 48(8):1523–1529, Aug 2003.

[121] Jon M Oatley, Mary R Avarbock, Aino I Telaranta, Douglas T Fearon, andRalph L Brinster. Identifying genes important for spermatogonial stem cellself-renewal and survival. Proc Natl Acad Sci U S A, 103(25):9524–9529, Jun2006.

[122] Scott A Ochsner, Helene Strick-Marchand, Qiong Qiu, Susan Venable, AdamDean, Margaret Wilde, Mary C Weiss, and Gretchen J Darlington. Transcrip-tional profiling of bipotential embryonic liver cells to identify liver progenitor cellsurface markers. Stem Cells, 25(10):2476–2487, Oct 2007.

229

Page 245: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[123] Kazuya Ogawa, Akira Saito, Hisanori Matsui, Hiroshi Suzuki, Satoshi Ohtsuka,Daisuke Shimosato, Yasuyuki Morishita, Tetsuro Watabe, Hitoshi Niwa, and Ko-hei Miyazono. Activin-Nodal signaling is involved in propagation of mouse em-bryonic stem cells. J Cell Sci, 120(Pt 1):55–65, Jan 2007.

[124] T Ohtsuka, M Ishibashi, G Gradwohl, S Nakanishi, F Guillemot, and R Kageyama.Hes1 and Hes5 as notch e!ectors in mammalian neuronal di!erentiation. EMBOJ, 18(8):2196–2207, Apr 1999.

[125] J Okabe-Kado, T Kasukabe, and Y Honma. Di!erentiation inhibitory factorNm23 as a prognostic factor for acute myeloid leukemia. Leuk Lymphoma, 32(1-2):19–28, Dec 1998.

[126] Kyle E Orwig, Buom-Yong Ryu, Stephen R Master, Bart T Phillips, MatthiasMack, Mary R Avarbock, Lewis Chodosh, and Ralph L Brinster. Genes in-volved in post-transcriptional regulation are overrepresented in stem/progenitorspermatogonia of cryptorchid mouse testes. Stem Cells, 26(4):927–938, Apr 2008.

[127] H Oshima, A Rochat, C Kedzia, K Kobayashi, and Y Barrandon. Morphogenesisand renewal of hair follicles from adult multipotent stem cells. Cell, 104(2):233–245, Jan 2001.

[128] V E Papaioannou, M W McBurney, R L Gardner, and M J Evans. Fate ofteratocarcinoma cells injected into early mouse embryos. Nature, 258(5530):70–73, Nov 1975.

[129] Ricardo Pardal, Michael F Clarke, and Sean J Morrison. Applying the principlesof stem-cell biology to cancer. Nat Rev Cancer, 3(12):895–902, Dec 2003.

[130] In-Hyun Park, Rui Zhao, Jason A West, Akiko Yabuuchi, Hongguang Huo, Tan AInce, Paul H Lerou, M William Lensch, and George Q Daley. Reprogramming ofhuman somatic cells to pluripotency with defined factors. Nature, 451(7175):141–146, Jan 2008.

[131] In-kyung Park, Dalong Qian, Mark Kiel, Michael W Becker, Michael Pihalja,Irving L Weissman, Sean J Morrison, and Michael F Clarke. Bmi-1 is re-quired for maintenance of adult self-renewing haematopoietic stem cells. Nature,423(6937):302–305, May 2003.

[132] Helen Parkinson, Misha Kapushesky, Nikolay Kolesnikov, Gabriella Rustici,Mohammad Shojatalab, Niran Abeygunawardena, Hugo Berube, Miroslaw Dy-lag, Ibrahim Emam, Anna Farne, Ele Holloway, Margus Lukk, James Malone,Roby Mani, Ekaterina Pilicheva, Tim F Rayner, Faisal Rezwan, Anjan Sharma,Eleanor Williams, Xiangqun Zheng Bradley, Tomasz Adamusiak, Marco Brandizi,Tony Burdett, Richard Coulson, Maria Krestyaninova, Pavel Kurnosov, EamonnMaguire, Sudeshna Guha Neogi, Philippe Rocca-Serra, Susanna-Assunta Sansone,

230

Page 246: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Nataliya Sklyar, Mengyao Zhao, Ugis Sarkans, and Alvis Brazma. ArrayExpressupdate–from an archive of functional genomics experiments to the atlas of geneexpression. Nucleic Acids Res, 37(Database issue):D868–72, Jan 2009.

[133] Enrico Pedemonte, Federica Benvenuto, Simona Casazza, Gianluigi Mancardi,Jorge R Oksenberg, Antonio Uccelli, and Sergio E Baranzini. The molecularsignature of therapeutic mesenchymal stem cells exposes the architecture of thehematopoietic stem cell niche synapse. BMC Genomics, 8:65, 2007.

[134] Suraj Peri, J Daniel Navarro, Ramars Amanchy, Troels Z Kristiansen, Chan-dra Kiran Jonnalagadda, Vineeth Surendranath, Vidya Niranjan, BabylakshmiMuthusamy, T K B Gandhi, Mads Gronborg, Nieves Ibarrola, Nandan Desh-pande, K Shanker, H N Shivashankar, B P Rashmi, M A Ramya, Zhixing Zhao,K N Chandrika, N Padma, H C Harsha, A J Yatish, M P Kavitha, MinalMenezes, Dipanwita Roy Choudhury, Shubha Suresh, Neelanjana Ghosh, R Sar-avana, Sreenath Chandran, Subhalakshmi Krishna, Mary Joy, Sanjeev K Anand,V Madavan, Ansamma Joseph, Guang W Wong, William P Schiemann, Stefan NConstantinescu, Lily Huang, Roya Khosravi-Far, Hanno Steen, Muneesh Tewari,Saghi Gha!ari, Gerard C Blobe, Chi V Dang, Joe G N Garcia, Jonathan Pevsner,Ole N Jensen, Peter Roepstor!, Krishna S Deshpande, Arul M Chinnaiyan, AdaHamosh, Aravinda Chakravarti, and Akhilesh Pandey. Development of humanprotein reference database as an initial platform for approaching systems biologyin humans. Genome Res, 13(10):2363–2371, Oct 2003.

[135] S Scott Perry, Ying Zhao, Lei Nie, Shawn W Cochrane, Zhong Huang, and Xiao-Hong Sun. Id1, but not Id3, directs long-term repopulating hematopoietic stem-cell maintenance. Blood, 110(7):2351–2360, Oct 2007.

[136] Audrey Player, Yonghong Wang, Bhaskar Bhattacharya, Mahendra Rao, Raj KPuri, and Ernest S Kawasaki. Comparisons between transcriptional regula-tion and RNA expression in human embryonic stem cell lines. Stem Cells Dev,15(3):315–323, Jun 2006.

[137] Stephanie M Pontier and William J Muller. Integrins in mammary-stem-cellbiology and breast-cancer progression–a role in cancer stem cells? . J Cell Sci,122(Pt 2):207–214, Jan 2009.

[138] E H Postel. Cleavage of DNA by human NM23-H2/nucleoside diphosphatekinase involves formation of a covalent protein-DNA complex. J Biol Chem,274(32):22821–22829, Aug 1999.

[139] Earl Prinsloo, Mokgadi M Setati, Victoria M Longshaw, and Gregory L Blatch.Chaperoning stem cells: a role for heat shock proteins in the modulation of stemcell self-renewal and di!erentiation? . Bioessays, 31(4):370–377, Apr 2009.

231

Page 247: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[140] Shahin Rafii, Scott Avecilla, Sergey Shmelkov, Koji Shido, Rafael Tejada, MalcolmA S Moore, Beate Heissig, and Koichi Hattori. Angiogenic factors reconstitutehematopoiesis by recruiting stem cells from bone marrow microenvironment. AnnN Y Acad Sci, 996:49–60, May 2003.

[141] Miguel Ramalho-Santos, Soonsang Yoon, Yumi Matsuzaki, Richard C Mulligan,and Douglas A Melton. ”Stemness”: transcriptional profiling of embryonic andadult stem cells. Science, 298(5593):597–600, Oct 2002.

[142] Adaikalavan Ramasamy, Adrian Mondry, Chris C Holmes, and Douglas G Altman.Key issues in conducting a meta-analysis of gene expression microarray datasets.PLoS Med, 5(9):e184, Sep 2008.

[143] Noemi Reguart, Biao He, Miquel Taron, Liang You, David M Jablons, and RafaelRosell. The role of Wnt signaling in cancer and stem cells. Future Oncol, 1(6):787–797, Dec 2005.

[144] Tannishtha Reya, Andrew W Duncan, Laurie Ailles, Jos Domen, David C Scherer,Karl Willert, Lindsay Hintz, Roel Nusse, and Irving L Weissman. A role for Wntsignalling in self-renewal of haematopoietic stem cells. Nature, 423(6938):409–414,May 2003.

[145] Christelle Rochon, Vincent Frouin, Sylvie Bortoli, Karine Giraud-Triboult, ValerieDuverger, Pierre Vaigot, Cyrile Petat, Pierre Fouchet, Bruno Lassalle, OlivierAlibert, Xavier Gidrol, and Genevieve Pietu. Comparison of gene expressionpattern in SP cell populations from four tissues to define common ”stemnessfunctions”. Exp Cell Res, 312(11):2074–2082, Jul 2006.

[146] Jason R Rock, Mark W Onaitis, Emma L Rawlins, Yun Lu, Cheryl P Clark,Yan Xue, Scott H Randell, and Brigid L M Hogan. Basal cells as stem cells ofthe mouse trachea and human airway epithelium. Proc Natl Acad Sci U S A,106(31):12771–12775, Aug 2009.

[147] Cecilia Roh, Qingfeng Tao, and Stephen Lyle. Dermal papilla-induced hair dif-ferentiation of adult epithelial stem cells from human skin. Physiol Genomics,19(2):207–217, Oct 2004.

[148] Andreas Ruepp, Barbara Brauner, Irmtraud Dunger-Kaltenbach, Goar Frishman,Corinna Montrone, Michael Stransky, Brigitte Waegele, Thorsten Schmidt, Oc-tave Noubibou Doudieu, Volker Stumpflen, and H Werner Mewes. CORUM:the comprehensive resource of mammalian protein complexes. Nucleic Acids Res,36(Database issue):D646–50, Jan 2008.

[149] Mark L Sandberg, Susan E Sutton, Mathew T Pletcher, Tim Wiltshire, Lisa MTarantino, John B Hogenesch, and Michael P Cooke. c-Myb and p300 regulatehematopoietic stem cell proliferation and di!erentiation. Dev Cell, 8(2):153–166,Feb 2005.

232

Page 248: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[150] Eugenio Sangiorgi and Mario R Capecchi. Bmi1 is expressed in vivo in intestinalstem cells. Nat Genet, 40(7):915–920, Jul 2008.

[151] Noboru Sato, Laurent Meijer, Leandros Skaltsounis, Paul Greengard, and Ali HBrivanlou. Maintenance of pluripotency in human and mouse embryonic stemcells through activation of Wnt signaling by a pharmacological GSK-3-specificinhibitor. Nat Med, 10(1):55–63, Jan 2004.

[152] Noboru Sato, Ignacio Munoz Sanjuan, Michael Heke, Makiko Uchida, Felix Naef,and Ali H Brivanlou. Molecular signature of human embryonic stem cells and itscomparison with the mouse. Dev Biol, 260(2):404–413, Aug 2003.

[153] Wataru Satoh, Makoto Matsuyama, Hiromasa Takemura, Shinichi Aizawa, andAkihiko Shimono. Sfrp1, Sfrp2, and Sfrp5 regulate the Wnt/beta-catenin and theplanar cell polarity pathways during early trunk formation in mouse. Genesis,46(2):spcone, Dec 2008.

[154] Mark Shackleton, Francois Vaillant, Kaylene J Simpson, John Stingl, Gordon KSmyth, Marie-Liesse Asselin-Labat, Li Wu, Geo!rey J Lindeman, and Jane EVisvader. Generation of a functional mammary gland from a single stem cell.Nature, 439(7072):84–88, Jan 2006.

[155] Paul Shannon, Andrew Markiel, Owen Ozier, Nitin S Baliga, Jonathan T Wang,Daniel Ramage, Nada Amin, Benno Schwikowski, and Trey Ideker. Cytoscape: asoftware environment for integrated models of biomolecular interaction networks.Genome Res, 13(11):2498–2504, Nov 2003.

[156] Alexei A Sharov, Yulan Piao, Ryo Matoba, Dawood B Dudekula, Yong Qian,Vincent VanBuren, Geppino Falco, Patrick R Martin, Carole A Stagg, Uwem CBassey, Yuxia Wang, Mark G Carter, Toshio Hamatani, Kazuhiro Aiba, Hide-nori Akutsu, Lioudmila Sharova, Tetsuya S Tanaka, Wendy L Kimber, ToshiyukiYoshikawa, Saied A Jaradat, Serafino Pantano, Ramaiah Nagaraja, Kenneth RBoheler, Dennis Taub, Richard J Hodes, Dan L Longo, David Schlessinger,Jonathan Keller, Emily Klotz, Garnett Kelsoe, Akihiro Umezawa, Angelo LVescovi, Janet Rossant, Tilo Kunath, Brigid L M Hogan, Anna Curci, MicheleD’Urso, Janet Kelso, Winston Hide, and Minoru S H Ko. Transcriptome analysisof mouse stem cells and early embryos. PLoS Biol, 1(3):E74, Dec 2003.

[157] Lioudmila V Sharova, Alexei A Sharov, Yulan Piao, Nabeebi Shaik, Terry Sullivan,Colin L Stewart, Brigid L M Hogan, and Minoru S H Ko. Global gene expressionprofiling reveals similarities and di!erences among mouse pluripotent stem cellsof di!erent origins and strains. Dev Biol, 307(2):446–459, Jul 2007.

[158] Sheila K Singh, Ian D Clarke, Mizuhiko Terasaki, Victoria E Bonn, CynthiaHawkins, Jeremy Squire, and Peter B Dirks. Identification of a cancer stemcell in human brain tumors. Cancer Res, 63(18):5821–5828, Sep 2003.

233

Page 249: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[159] Heli Skottman, Milla Mikkola, Karolina Lundin, Cia Olsson, Anne-MarieStromberg, Timo Tuuri, Timo Otonkoski, Outi Hovatta, and Riitta Lahesmaa.Gene expression signatures of seven individual human embryonic stem cell lines.Stem Cells, 23(9):1343–1356, Oct 2005.

[160] Marcel Smid, Lambert C J Dorssers, and Guido Jenster. Venn Mapping: clus-tering of heterologous microarray data based on the number of co-occurring dif-ferentially expressed genes. Bioinformatics, 19(16):2065–2071, Nov 2003.

[161] A G Smith. Embryo-derived stem cells: of mice and men. Annu Rev Cell DevBiol, 17:435–462, 2001.

[162] Lin Song, Nicole E Webb, Yingjie Song, and Rocky S Tuan. Identificationand functional analysis of candidate genes regulating mesenchymal stem cell self-renewal and multipotency. Stem Cells, 24(7):1707–1718, Jul 2006.

[163] G J Spangrude, S Heimfeld, and I L Weissman. Purification and characterizationof mouse hematopoietic stem cells. Science, 241(4861):58–62, Jul 1988.

[164] Jamie M Sperger, Xin Chen, Jonathan S Draper, Jessica E Antosiewicz, Chris HChon, Sunita B Jones, James D Brooks, Peter W Andrews, Patrick O Brown, andJames A Thomson. Gene expression patterns in human embryonic stem cells andhuman pluripotent germ cell tumors. Proc Natl Acad Sci U S A, 100(23):13350–13355, Nov 2003.

[165] Thaddeus S Stappenbeck, Jason C Mills, and Je!rey I Gordon. Molecular featuresof adult mouse small intestinal epithelial progenitors. Proc Natl Acad Sci U S A,100(3):1004–1009, Feb 2003.

[166] Chris Stark, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Bre-itkreutz, and Mike Tyers. BioGRID: a general repository for interaction datasets.Nucleic Acids Res, 34(Database issue):D535–9, Jan 2006.

[167] Joshua M Stuart, Eran Segal, Daphne Koller, and Stuart K Kim. A gene-coexpression network for global discovery of conserved genetic modules. Science,302(5643):249–255, Oct 2003.

[168] Ross Summer, Darrell N Kotton, Xi Sun, Bei Ma, Kathleen Fitzsimmons, andAlan Fine. Side population cells and Bcrp1 expression in lung. Am J PhysiolLung Cell Mol Physiol, 285(1):L97–104, Jul 2003.

[169] Harukazu Suzuki, Alistair R R Forrest, Erik van Nimwegen, Carsten O Daub,Piotr J Balwierz, Katharine M Irvine, Timo Lassmann, Timothy Ravasi, YukiHasegawa, Michiel J L de Hoon, Shintaro Katayama, Kate Schroder, Piero Carn-inci, Yasuhiro Tomaru, Mutsumi Kanamori-Katayama, Atsutaka Kubosaki, Al-tuna Akalin, Yoshinari Ando, Erik Arner, Maki Asada, Hiroshi Asahara, Timothy

234

Page 250: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

Bailey, Vladimir B Bajic, Denis Bauer, Anthony G Beckhouse, Nicolas Bertin, Jo-han Bjorkegren, Frank Brombacher, Erika Bulger, Alistair M Chalk, Joe Chiba,Nicole Cloonan, Adam Dawe, Josee Dostie, Par G Engstrom, Magbubah Essack,Geo!rey J Faulkner, J Lynn Fink, David Fredman, Ko Fujimori, Masaaki Furuno,Takashi Gojobori, Julian Gough, Sean M Grimmond, Mika Gustafsson, MegumiHashimoto, Takehiro Hashimoto, Mariko Hatakeyama, Susanne Heinzel, Win-ston Hide, Oliver Hofmann, Michael Hornquist, Lukasz Huminiecki, Kazuho Ikeo,Naoko Imamoto, Satoshi Inoue, Yusuke Inoue, Ryoko Ishihara, Takao Iwayanagi,Anders Jacobsen, Mandeep Kaur, Hideya Kawaji, Markus C Kerr, RyuichiroKimura, Syuhei Kimura, Yasumasa Kimura, Hiroaki Kitano, Hisashi Koga, ToshioKojima, Shinji Kondo, Takeshi Konno, Anders Krogh, Adele Kruger, Ajit Kumar,Boris Lenhard, Andreas Lennartsson, Morten Lindow, Marina Lizio, CameronMacpherson, Norihiro Maeda, Christopher A Maher, Monique Maqungo, Jes-sica Mar, Nicholas A Matigian, Hideo Matsuda, John S Mattick, Stuart Meier,Sei Miyamoto, Etsuko Miyamoto-Sato, Kazuhiko Nakabayashi, Yutaka Nakachi,Mika Nakano, Sanne Nygaard, Toshitsugu Okayama, Yasushi Okazaki, HarukaOkuda-Yabukami, Valerio Orlando, Jun Otomo, Mikhail Pachkov, Nikolai Petro-vsky, Charles Plessy, John Quackenbush, Aleksandar Radovanovic, Michael Rehli,Rintaro Saito, Albin Sandelin, Sebastian Schmeier, Christian Schonbach, Ariel SSchwartz, Colin A Semple, Miho Sera, Jessica Severin, Katsuhiko Shirahige,Cas Simons, George St Laurent, Masanori Suzuki, Takahiro Suzuki, Matthew JSweet, Ryan J Taft, Shizu Takeda, Yoichi Takenaka, Kai Tan, Martin S Tay-lor, Rohan D Teasdale, Jesper Tegner, Sarah Teichmann, Eivind Valen, ClaesWahlestedt, Kazunori Waki, Andrew Waterhouse, Christine A Wells, Ole Winther,Linda Wu, Kazumi Yamaguchi, Hiroshi Yanagawa, Jun Yasuda, Mihaela Zavolan,David A Hume, Takahiro Arakawa, Shiro Fukuda, Kengo Imamura, ChikatoshiKai, Ai Kaiho, Tsugumi Kawashima, Chika Kawazu, Yayoi Kitazume, Miki Ko-jima, Hisashi Miura, Kayoko Murakami, Mitsuyoshi Murata, Noriko Ninomiya,Hiromi Nishiyori, Shohei Noma, Chihiro Ogawa, Takuma Sano, Christophe Simon,Michihira Tagami, Yukari Takahashi, Jun Kawai, and Yoshihide Hayashizaki. Thetranscriptional network that controls growth arrest and di!erentiation in a humanmyeloid leukemia cell line. Nat Genet, 41(5):553–562, May 2009.

[170] Kazutoshi Takahashi, Koji Tanabe, Mari Ohnuki, Megumi Narita, TomokoIchisaka, Kiichiro Tomoda, and Shinya Yamanaka. Induction of pluripotent stemcells from adult human fibroblasts by defined factors. Cell, 131(5):861–872, Nov2007.

[171] Kazutoshi Takahashi and Shinya Yamanaka. Induction of pluripotent stem cellsfrom mouse embryonic and adult fibroblast cultures by defined factors. Cell,126(4):663–676, Aug 2006.

[172] Wai-Leong Tam, Chin Yan Lim, Jianyong Han, Jinqiu Zhang, Yen-Sin Ang, Huck-Hui Ng, Henry Yang, and Bing Lim. T-cell factor 3 regulates embryonic stem cell

235

Page 251: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

pluripotency and self-renewal by the transcriptional control of multiple lineagepathways. Stem Cells, 26(8):2019–2031, Aug 2008.

[173] Tetsuya S Tanaka, Tilo Kunath, Wendy L Kimber, Saied A Jaradat, Carole AStagg, Masayuki Usuda, Takashi Yokota, Hitoshi Niwa, Janet Rossant, and Mi-noru S H Ko. Gene expression profiling of embryo-derived stem cells revealscandidate genes associated with pluripotency and lineage specificity. GenomeRes, 12(12):1921–1928, Dec 2002.

[174] G Taylor, M S Lehrer, P J Jensen, T T Sun, and R M Lavker. Involvement offollicular stem cells in forming not only the follicle but also the epidermis. Cell,102(4):451–461, Aug 2000.

[175] Alexey V Terskikh, Toshihiro Miyamoto, Cynthia Chang, Luda Diatchenko, andIrving L Weissman. Gene expression analysis of purified hematopoietic stem cellsand committed progenitors. Blood, 102(1):94–101, Jul 2003.

[176] Jean Paul Thiery and Jonathan P Sleeman. Complex networks orchestrateepithelial-mesenchymal transitions. Nat Rev Mol Cell Biol, 7(2):131–142, Feb2006.

[177] Amy Hin Yan Tong and Charles Boone. Synthetic genetic array analysis in Sac-charomyces cerevisiae. Methods Mol Biol, 313:171–92, 2006.

[178] Amos Toren, Bella Bielorai, Jasmine Jacob-Hirsch, Tamar Fisher, Doron Kreiser,Orit Moran, Sharon Zeligson, David Givol, Assif Yitzhaky, Joseph Itskovitz-Eldor,Iris Kventsel, Esther Rosenthal, Ninette Amariglio, and Gideon Rechavi. CD133-positive hematopoietic stem cell ”stemness” genes contain many genes mutatedor abnormally expressed in leukemia. Stem Cells, 23(8):1142–1153, Sep 2005.

[179] Ming-Song Tsai, Shiaw-Min Hwang, Kuang-Den Chen, Yun-Shien Lee, Li-WenHsu, Yu-Jen Chang, Chao-Nin Wang, Hsiu-Huei Peng, Yao-Lung Chang, An-Shine Chao, Shuenn-Dyh Chang, Kuan-Der Lee, Tzu-Hao Wang, Hsin-Shih Wang,and Yung-Kuei Soong. Functional network analysis of the transcriptomes ofmesenchymal stem cells derived from amniotic fluid, amniotic membrane, cordblood, and bone marrow. Stem Cells, 25(10):2511–2523, Oct 2007.

[180] Tudorita Tumbar, Geraldine Guasch, Valentina Greco, Cedric Blanpain,William E Lowry, Michael Rendl, and Elaine Fuchs. Defining the epithelialstem cell niche in skin. Science, 303(5656):359–363, Jan 2004.

[181] V G Tusher, R Tibshirani, and G Chu. Significance analysis of microarraysapplied to the ionizing radiation response. Proc Natl Acad Sci U S A, 98(9):5116–5121, Apr 2001.

[182] M Uittenbogaard and A Chiaramello. Expression of the bHLH transcriptionfactor Tcf12 (ME1) gene is linked to the expansion of precursor cell populationsduring neurogenesis. Brain Res Gene Expr Patterns, 1(2):115–121, Jan 2002.

236

Page 252: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[183] K. Unsicker and K. Krieglstein. Cell signaling and growth factors in development.Wiley-VCH Weinheim.

[184] Ludovic Vallier, Morgan Alexander, and Roger A Pedersen. Activin/Nodal andFGF pathways cooperate to maintain pluripotency of human embryonic stem cells.J Cell Sci, 118(Pt 19):4495–4509, Oct 2005.

[185] Johan H van Es, Philippe Jay, Alex Gregorie!, Marielle E van Gijn, SuzanneJonkheer, Pantelis Hatzis, Andrea Thiele, Maaike van den Born, Harry Begthel,Thomas Brabletz, Makoto M Taketo, and Hans Clevers. Wnt signalling inducesmaturation of Paneth cells in intestinal crypts. Nat Cell Biol, 7(4):381–386, Apr2005.

[186] Jane E Visvader and Geo!rey J Lindeman. Cancer stem cells in solid tumours:accumulating evidence and unresolved questions. Nat Rev Cancer, 8(10):755–768,Oct 2008.

[187] Wolfgang Wagner, Alexandra Ansorge, Ute Wirkner, Volker Eckstein, ChristianSchwager, Jonathon Blake, Katrin Miesala, Jan Selig, Rainer Sa!rich, WilhelmAnsorge, and Anthony D Ho. Molecular evidence for stem cell function of theslow-dividing fraction among human hematopoietic progenitor cells by genome-wide analysis. Blood, 104(3):675–686, Aug 2004.

[188] S Wang, A D Sdrulla, G diSibio, G Bush, D Nofziger, C Hicks, G Weinmaster, andB A Barres. Notch receptor activation inhibits oligodendrocyte di!erentiation.Neuron, 21(1):63–75, Jul 1998.

[189] Tetsuro Watabe and Kohei Miyazono. Roles of TGF-beta family signaling instem cell renewal and di!erentiation. Cell Res, 19(1):103–115, Jan 2009.

[190] Robert A. Weinberg. The Biology of Cancer. Garland Science, 1 edition, June2006.

[191] D E Weinstein, M L Shelanski, and R K Liem. Suppression by antisense mRNAdemonstrates a requirement for the glial fibrillary acidic protein in the formation ofstable astrocytic processes in response to neurons. J Cell Biol, 112(6):1205–1213,Mar 1991.

[192] I L Weissman, D J Anderson, and F Gage. Stem and progenitor cells: origins,phenotypes, lineage commitments, and transdi!erentiations. Annu Rev Cell DevBiol, 17:387–403, 2001.

[193] Bryan Welm, Fariba Behbod, Margaret A Goodell, and J M Rosen. Isolation andcharacterization of functional mammary gland stem cells. Cell Prolif, 36 Suppl1:17–32, Oct 2003.

237

Page 253: STEMNESS REVISITED: A META ANALYSIS OF STEM CELL ... · UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA

[194] David J Wong, Helen Liu, Todd W Ridky, David Cassarino, Eran Segal, andHoward Y Chang. Module map of stem cell genes guides creation of epithelialcancer stem cells. Cell Stem Cell, 2(4):333–344, Apr 2008.

[195] Lynda S Wright, Jiang Li, Maeve A Caldwell, Kyle Wallace, Je!rey A Johnson,and Clive N Svendsen. Gene expression in human neural stem cells: e!ects ofleukemia inhibitory factor. J Neurochem, 86(1):179–195, Jul 2003.

[196] Gerald G Wulf, Kang-Li Luo, KathyJo A Jackson, Malcolm K Brenner, and Mar-garet A Goodell. Cells of the hepatic side population contribute to liver re-generation and can be replenished with bone marrow stem cells. Haematologica,88(4):368–378, Apr 2003.

[197] Ioannis Xenarios, Lukasz Salwinski, Xiaoqun Joyce Duan, Patrick Higney, Sul-Min Kim, and David Eisenberg. DIP, the Database of Interacting Proteins: aresearch tool for studying cellular networks of protein interactions. Nucleic AcidsRes, 30(1):303–305, Jan 2002.

[198] XQ Xu, SY Soo, W Sun, and R Zweigerdt. Global Expression Profile of HighlyEnriched Cardiomyocytes Derived From Human Embryonic Stem Cells. StemCells, Aug 2009.

[199] Jing Yang, Sendurai A Mani, Joana Liu Donaher, Sridhar Ramaswamy, Raphael AItzykson, Christophe Come, Pierre Savagner, Inna Gitelman, Andrea Richardson,and Robert A Weinberg. Twist, a master regulator of morphogenesis, plays anessential role in tumor metastasis. Cell, 117(7):927–939, Jun 2004.

[200] Xin-yu Zhang, Tian-tian Li, and Xiang-jun Liu. Detecting robust gene signaturethrough integrated analysis of multiple types of high-throughput data in livercancer. Acta Pharmacol Sin, 28(12):2005–2010, Dec 2007.

[201] A J Zhu and F M Watt. beta-catenin signalling modulates proliferative po-tential of human epidermal keratinocytes independently of intercellular adhesion.Development, 126(10):2285–2298, May 1999.

238