microarray-based gene expression analysis in cancer research11426/fulltext01.pdf · 2008. 2....

1

Microarray-based Gene Expression Analysis

in Cancer Research

Cecilia Laurell

Doctoral Thesis Royal Institute of Technology

School of Biotechnology Stockholm 2006

2

© Cecilia Laurell School of Biotechnology Department of Gene Technology Royal Institute of Technology AlbaNova University Center SE-106 91 Stockholm Sweden Printed at Universityservice US-AB Box 700 14 SE-100 44 Sweden ISBN 91-7178-542-6 ISBN 978-91-7178-542-8

3

Cecilia Laurell (2006): Microarray-based Gene Expression Analysis in Cancer Research Department of Biotechnology, Royal Institute of Technology, KTH, Stockholm, Sweden ISBN 91-7178-542-6 ISBN 978-91-7178-542-8 Abstract Biotechnological inventions during the 20th century have resulted in a wide range of approaches for explorations in the functional genomics field. Microarray technology is one of the recent advances which have provided us with snapshots of which genes are expressed in cells of various tissues and diseases. Methods to obtain reliable microarray data are continuously being developed and improved to meet the demands of biological researchers. In this thesis microarrays have been used to investigate gene expression patterns in cancer research. Four studies in three different areas were carried out covering adrenocortical tumors, p53 target genes and a comparison of RNA amplification methods. Adrenocortical tumours are among the most common tumours with an incidence of 7-9%. Malignancy of these tumors is rare. Distinction between malignant and benign tumours is often difficult to establish which makes an improvement of diagnostic approaches important. To elucidate biological processes in adrenocortical tumour development and to examine if there is a molecular signature associated with malignancy, microarray analysis was performed on 29 adrenocortical tumors and four normal specimens. It was possible to classify malignant and benign samples based on the entire expression profile. A number of potential biomarkers was identified which will be further evaluated. P53 is a gene which is mutated in 50% of all cancers. Functional p53 is a transcription factor which is activated upon cellular stress and DNA damage. Target genes are mainly involved in cell cycle arrest and apoptosis. In solid tumors cells are stressed by hypoxia. To examine which target genes p53 activate under hypoxic conditions a microarray study of the cell lines HCT116p53+/+ and HCT116p53-/- was performed. A set of novel potential p53 target genes was identified while many known target genes were found to be not transcriptionally activated during hypoxia. Follow up which was focused on how p53 affected hypoxia induced apoptosis showed that the death receptor Fas was critical. When small amounts of tissue are available, amplification of the transcript population is necessary for microarray analysis. A new strategy for amplification based on PCR was evaluated and compared to a commercial in vitro transcription protocol. Both protocols produced reliable results. Advantages with the PCR based method are a lower cost and a high flexibility due to compatibility with both sense and antisense strand microarrays.

Keywords: adrenocortical tumour, apoptosis, cancer, classification, gene expression, microarray, p53, RNA amplification © Cecilia Laurell

5

In Nature’s book of secrecy A little I can read

William Shakespeare

7

List of publications

I. Laurell C, Wirta V, Nilsson P, Lundeberg J: Comparative analysis of a 3' end tag PCR and a linear RNA amplification approach for microarray analysis. Journal of Biotechnology 2006 . [Epub Sep 22]

II. Liu T*, Laurell C*, Selivanova G, Lundeberg J, Nilsson P, Wiman KG: Hypoxia induces p53-dependent transactivation and Fas/CD95-dependent apoptosis. Cell Death and Differentiation 2006. [Epub Aug 18]

III. Velazquez-Fernandez D*, Laurell C*, Geli J, Hoog A, Odeberg J, Kjellman M, Lundeberg J, Hamberger B, Nilsson P, Backdahl M: Expression profiling of adrenocortical neoplasms suggests a molecular signature of malignancy. Surgery 2005, 138(6):1087-1094.

IV. Laurell C*, Velázquez-Fernández D*, Enberg U, Juhlin C, Geli J, Höög A, Odeberg J, Kjellman M, Lundeberg J, Larsson C, Hamberger B, Bäckdahl M and Nilsson P.: Transcriptional profiling enables molecular classification of adrenocortical tumours. Manuscript

* These authors contributed equally to the work and should therefore be considered as joint first authors.

8

Contents

Introduction 11

1 Microarray Technology 15

1.1 Microarray Platforms 15

1.1.1 Spotted Microarrays 16

1.1.2 In Situ Synthesized Arrays 18

1.1.3 Bead-based Systems 19

1.1.4 Conversions Between Microarray Platforms 20

1.2 Experimental Strategies 21

1.2.1 RNA quality assessment 21

1.2.2 RNA amplification 21

1.2.3 Target labelling 23

1.2.4 Hybridization 24

1.2.5 Scanning 24

1.3 Data analysis 25

1.3.1 Image Analysis 25

1.3.2 Background Correction 25

1.3.3 Quality Assessment and Filters 26

1.3.4 Normalization 27

1.3.5 Experimental Design 30

1.3.6 Differentially expressed genes 32

1.3.7 Clustering and Classification 34

1.3.8 Extraction of biological information 36

1.3.9 Software for Data Analysis 37

1.3.10 Public Data Bases 38

1.4 Quality Control 38

2 Microarray-based Gene Expression Analysis in Cancer Research 40

2.1 Biomarker Discovery 41

2.2 Tumour Classification 42

2.3 Therapeutic Development 44

3 Present Investigation 46

3.1 Evaluation of Two Strategies for RNA Amplification 46

9

3.1.1. 3' cDNA Tag Amplification 46

3.1.2 Paper I 48

3.2 Detection of p53 Dependent Genes Induced by Hypoxia 49

3.2.1 P53 - Cancer Gene 49

3.2.2 Regulation of p53 by Hypoxia 50

3.2.3 Paper II 51

3.3 Characterization of a Malignant Expression Signature of Adrenocortical

Tumours 53

3.3.2 Adrenal Gland 53

3.3.2 Adrenocortical Tumours 53

3.3.3 Paper III & IV 55

4 Concluding Remarks 57

Abbreviations 59

Acknowledgements 59

References 63

11

Introduction Biotechnological inventions during the last half of the previous century, including DNA

sequencing for reading of the genetic code, adoption of restriction and ligation enzymes to

cut and paste DNA into desired sequences, and amplification of molecules with

polymerase chain reaction (PCR), have lead to a revolution in our knowledge of biological

sciences (Mullis and Faloona, 1987; Sanger, 1977). A milestone in this respect of scientific

progress was the longed-for release of the human genetic code in 2001 which provided a

valuable resource for exploration of genetic diseases (Lander, 2001; Venter, 2001).

The central dogma of molecular biology is the flow of information from the heritage

material DNA, via messenger carrier molecules RNA, to the final protein products (figure

1) (Crick, 1958). These three families of molecules are the genome, the transcriptome and

the proteome. The information overlap between these was from the beginning assumed to

be total, but now we know that protein encoding genes only constitute a fraction of the

human genome and that most RNA species do not encode proteins. The fact that RNA

both can hold information and catalyze chemical reactions has lead to the idea that a RNA

world could exist.

Figure 1. A schematic view of information flow in a cell; from DNA, via messenger carrier RNA to proteins, the final products.

DNA RNA Protein

Gene ExpressionAnalysis

12

Examination of the transcriptome has long time been focused on the protein encoding

sequences, thereby leaving a substantial portion in the shadow. The recent years CAGE,

tiling microarrays and sequencing technology have shed light on a growing population of

non-coding RNAs (Gustincich, 2006). As it has been realized that the non-coding regions

are not merely junk sequences but in many cases have important regulatory roles this is

now a rapidly growing research field, to find out more about which these new members of

the transcriptome are and what they do. A brief description of members of the

mammalian transcriptome is given in box 1.

Gene expression analysis often reflects the steady-state levels of mRNA which is the result

of a delicate balance between production and decay. Half-lives of unstable transcripts are

≈ 15 min while > 24 hours for the most stable ones. Generally mRNAs encoding proteins

with important regulatory roles are continuously being transcribed and degraded allowing

Box 1. The mammalian transcriptome

mRNA The average mRNA transcript is 1.5 kb consisting of a protein encoding part in the middle flanked by untranslated regions for regulation of decay and translation. The 3’ end has a 250 b polyA tail and the 5’ end a guanosine cap structure which both are attributes required for translation. miRNA Micro RNAs are 21-23 b fully processed and binds to mRNAs promoting decay or inhibition of translation by recruiting the RISC complex. They are generated from double-stranded foldback precursors which are cleaved to their mature form by the Dicer (Bartel, 2004a). This is the endogenous process which also is exploited by exogenous interference RNA (Fire, 1998). rRNA Ribosomal RNA constitutes 90% of the RNA and have six different members. These are encoded in multiple copies throughout the genome. Together with RNA binding proteins these form a two unit 3D structure which constitutes a framework for translation of proteins. piRNA Piwi RNAs are 25-31 b and are expressed in developing mail germ cells forming a complex with the Piwi and RacQ1 proteins exhibiting DNA helicase activity and is believed to have a role in transcriptional silencing (Girard, 2006; Lau, 2006). snRNA Small nuclear RNAs regulates transcription and play an active role in splicing by providing recognition site for the intron-exon boundaries (O'Gorman, 2006; Valadkhan, 2005). snoRNA Small nucleolar RNAs are 60- 300 b and involved in RNA editing promoting methylation or pseudouridylation. They are derived from the introns of pre-mRNA transcripts. About 300 different snoRNAs have been discovered (Mattick and Makunin, 2005). tRNA Transfer RNAs are 73-93 b which form three stemloops one of which has the anticodon for an amino acid. The amino acid is coupled to the 3’ end of the tRNA. tRNA supplies the ribosomes with amino acids during translation.

13

for a transient increase when the decay is switched off. Regulation of the stability is

carried out by RNA binding proteins (RBPs) and miRNA binding to AU-rich elements

(AREs) in the 3’UTR but also to secondary structures through out the entire transcript.

There are two main pathways for degradation of fully processed mRNA; 3’-5’ decay

mediated by cytosolic exosomes and 5’-3’ decay by the Xrn1 enzyme. The first and rate

limiting step in both pathways is deadenylation of the polyA tail. This is followed by

decapping of the 5’ end for 5’-3’ decay. The exosome pathway has been shown to be the

dominant one in mammals (Beelman and Parker, 1995; Meyer, 2004).

Initiation of transcription is regulated by remodelling of the chromatine structure

followed by the binding of transcription factors to regions commonly 5’ of the

transcription start site recruiting the RNA polymerase complex. To some extent genes are

organized into clusters which can share epigenetic control but far less pronounced then

the operon structure found in prokaryotes which allows for a complete sharing of

regulation (Caron, 2001). Another feature which adds to the complexity of eukaryotic

gene expression is alternative splicing (Matlin, 2005). In average each mRNA has three

splice variants but there is a large variation and new transcripts are continuously being

discovered (Modrek and Lee, 2002). The oncogene MDM2, as an example, has more than

40 splice variants with different functions and regoluatory units (Bartel, 2004b).

Gene expression analysis aims at reflecting the mRNA levels but to some extent also

mirror the proteome. In this case a quite distorted picture is provided according to the

large-scale comparisons between protein and mRNA levels (Griffin, 2002; Gygi, 1999).

Some genes are regulated exclusively on protein level while others differ largely in

magnitude of the expression. The progress of technologies within the proteomics field can

however soon offer us a more detailed map.

The theme of this thesis is applications of microarray technology to global gene expression

analysis in cancer research. In the following sections microarray technology is described,

followed by its contributions to cancer research in general, and finally a summary of the

four studies which are the basis of this thesis.

15

1 Microarray Technology

1.1 Microarray Platforms

Microarray technology enables analysis of relative abundances of typically 30,000-50,000

different nucleic acid molecules simultaneously. The basic concept of the DNA microarray

is to take advantage of that DNA or RNA molecules with complimentary sequences bind

to each other, hybridize. The DNA molecules to be measured are immobilized onto a solid

support in an organized pattern. Analysis can be performed by hybridising DNA or RNA

in the sample to complimentary DNA on the array and then measure the amount. The

immobilized DNA is called probe and the molecules binding to it, target. To enable

measurements, the target is labelled with molecules making it possible to detect with

fluorescence or radioactive radiation, how much has bound. Depending on probe lengths,

concentrations and melting temperatures, different specificities and sensitivities can be

obtained.

Originally, quantification of a specific DNA molecule in a complex mix by hybridization,

was introduced already in 1975 by Southern, but it was not until the 90’s high-throughput

platforms were launched (Southern, 1975). During the past ten years microarrays have

had a revolutionary impact on biomedical research. The parallelization, miniaturization,

and automation opportunities offered by chip technology have resulted in that methods

examining one gene at a time in many cases have been exchanged by microarrays as the

standard tool, generating vast amounts of information with few experiments. The DNA

microarray has so far mainly been used for large-scale gene expression analysis, but plays a

16

central role also in other fields, such as genotyping, epigenetics and promoter analysis

(Gresham, 2006; Kaller, 2005; Matsuzaki, 2004; Wilson, 2006). To obtain reliable data,

techniques for microarray production, sample preparation and data handling are

continuously being developed and improved.

1.1.1 Spotted Microarrays

Spotted microarrays refer to chips where pre-synthesized probes are applied to the array

surface by robotic printing. The spot size is typically 100-200 µm. There are mainly three

types of probes for gene expression analysis; full length cDNAs, shorter PCR amplified

gene specific tags, and single stranded oligonucleotides.

cDNA microarrays: The efforts made to discover new genes by sequencing of cDNA

clones in the 1990’s, generated a large amount of cDNA libraries. Sequencing data of these

libraries for different organisms and tissues provided a first glance into specific

transcriptomes. Expressed sequence tags (ESTs) contribute to 69% of human gene

sequences in the GenBank database (Benson, 2006). By amplifying normalized cDNA

libraries with PCR, probes suitable for global microarray gene expression analysis were

obtained. In 1995 construction of the first microarray was published consisting of 1000

Arabidopsis thaliana genes (Schena, 1995). The first whole genome array was made for

Sacharomyces cerviseae and was used in the landmark cell cycle study of Spellman and

colleagues (DeRisi, 1997; Schena, 1995; Spellman, 1998).

GST microarrays: A drawback with cDNA arrays is cross-hybridization between gene

families and genes with common functional domains. To lower the amount of

homologous sequences in the probe set one could selectively avoid these regions by using

gene specific primers for gene amplification of gene specific tags (Wirta, 2005). For

prokaryotes which lack polyA tail this is an alternative to using shotgun libraries. The

procedure requires sequence information of every gene and is costly because of the

amount of primers needed to be synthesized. Successful efforts to reduce the primer cost

have been made by making efficient primer design, using unique primer pairs for each

17

gene but allowing each primer to be involved in several pairs. Homology between species

enables sharing also between different organisms (Andersson, 2005; Fernandes and

Skiena, 2002).

Oligo microarrays: A less labour intensive strategy to obtain specific probes is to

synthesize an oligonucleotide for each probe (Call, 2001; Kane, 2000; Zhao, 2001).

Commercial 40-70mer oligonucleotide sets are available from several companies. The

shorter hybridising sequence introduces however an increased susceptibility to

polymorphisms.

Besides the commercially available oligo probe sets, there have been numerous

developments of methods for optimized probe design which allows for probes adapted to

individual requirements (Emrich, 2003; Mrowka, 2002; Nielsen, 2003; Wang, 2002;

Wernersson and Nielsen, 2005). The choice of oligo length is not evident and depends on

the application. Shorter sequences are cheaper to synthesize and are sometimes necessary

to achieve specificity for certain transcripts, while longer ones can ameliorate signal

intensities as well as specificity remarkably (Ramdas, 2004). Important features in

construction of spotted microarrays besides probe design are spot concentration and

morphology, as well as the background intensity. These are affected by the printing buffer

used, humidity and temperature, robotic delivery and surface chemistry (Wrobel, 2003).

18

1.1.2 In Situ Synthesized Arrays

Affymetrix: The most widespread in situ synthesis technique, commercialized by

Affymetrix in the mid 90’s, is based on photolithography technology from the

semiconductor industry to direct DNA synthesis on glass slides producing high density

oligonucleotide microarrays (GeneChip) (Fodor, 1991; Fodor, 1993; Lockhart, 1996). The

procedure is based on synthetic linkers with photoprotected groups attached to the array

in a narrow pattern. The probe sequences are elongated stepwise by cyclic addition of

dNTPs. To obtain different sequences on distinct positions, probes are alternating being

protected and deprotected by light using different photolithographic masks for every step.

Probes are designed to give optimal hybridization conditions considering parameters such

as melting temperature and sequence. Unspecific and repetitive regions are avoided. Since

the technique only allows for synthesis of maximum 25 nucleotides, development of

methods to compensate for this shortcoming has been forced. To increase signal to noise

ratio as many as 11-20 different probes are synthesized for each mRNA. The redundancy

increases the dynamic range, lower cross-hybridization and reduces the number of false

positives. The probes are designed to match sequences through out the whole transcript

but with emphasis on the 3’ end. An additional method to filter out false positives is

offered by also synthesising an almost identical set of replicate probes. All probes which

perfectly matched sequences (PM) have a partner probe which only has one mismatch

(MM). These MM probes allow discrimination between real signals and signals due to

cross reactivity and other artefacts.

NimbleGen: In 2002 NimbleGen Systems launched a new method for high density in situ

oligosynthesis; the Maskless Array Synthesizer (MAS) (Nuwaysir, 2002). The principle is

to create virtual photolithographic masks by using a digital light processor, directing the

light to specific alternating probe positions with around 800,000 individually addressable

mirrors. Advantages with this new technique were the reduced cost, increased flexibility,

longer maximum probe length (90 nt), and the possibility to synthesize the

oligonucleotide in both 3’ to 5’ and 5’ to 3’ indirection. The last point enables elongation

with DNA polymerases after target hybridization to the probe.

19

1.1.3 Bead-based Systems

Although the methods described above have shown good results with robust and

standardized laboratory management, efforts are being made to develop alternative

strategies with increased sensitivity and specificity.

Illumina: A bead based array system introduced by Michael and colleagues in 1998 have

been used as basis for further developments by Illumina which now has a platform which

allows for analysis of 46.000 transcripts with more than 6.5 million features using a

randomly ordered array assembly followed by a decoding procedure (Gunderson, 2004;

Kuhn, 2004; Michael, 1998). Each feature consists of a bead, 3 µm in diameter, coated with

several hundreds of thousands copies of an oligonucleotide containing a 50mer probe

sequence and a shorter address sequence used for feature identification.

454: The 454 Life Sciences platform was originally developed to speed up whole genome

sequencing and is based on sequencing-by-synthesis with pyrosequencing technology

(Margulies, 2005; Ronaghi, 1996). Sequencing based gene expression analysis in general

has the advantage that novel transcripts can be discovered in addition to providing a high

specificity. Briefly, the methodology includes linker ligation and immobilization of

fragments onto microbeads (ideally one fragment per bead), followed by emulsion-based

PCR and subsequent transfer to a fiberoptic picotiter plate where the pyrosequencing

reaction takes place. ATP sulfurylase and luciferase are immobilized onto beads and added

once, while the other reagents are effciently cycled by a liquid handling system, allowing

for a maximal read length of 100 b. In 4 hours, 300,000 DNA templates can be sequenced

with an accuracy of 99.96%. Several promising methods for transcriptome analysis have

emerged based on 454 technology; DeepSAGE which is a hyper efficient method for

generation of ditag SAGE libraries, 5’ RATE which sequences 5´ends thus allowing for

mapping of transcription start sites, ultra-high-throughput EST sequencing and also

examinations of promotor binding sites and functions of the small non-coding RNAs

(Bainbridge, 2006; Gowda, 2006; Ng, 2006; Nielsen, 2006)

20

1.1.4 Conversions Between Microarray Platforms

The last few years numerous platform comparisons have been published (Barnes, 2005;

Kuo, 2002; Mah, 2004; Wang, 2005; Zhu, 2005). Surprisingly these showed that the

various platforms do not give concordant results in many cases.

One explanation is the ambiguity of comparing sequences from different parts of the

transcript. To be able to combine data from different platforms, it is of high importance

that the probe annotations are in agreement. In the beginning GeneChip users had the

annotation provided by Affymetrix to rely on since probe sequences were not made

public. In 2003 the sequences were released together with an annotation software

allowing customers to do more thorough inspections of their expression data (Liu, 2003).

Recent studies have indicated that as much as 20% of the Affymetrix probe sequences lack

a match in RefSeq db and almost 40% were wrongly annotated (Harbig, 2005; Mecham,

2004a). The accelerated curation of sequences in the databases improves however the

robustness of probe design. An advantage with cDNA arrays has so far been that the

sequences are based on actual mRNA transcripts. A drawback has been that up to 30 % of

the probes in some cases have been misidentified (Watson, 1998). Resequencing of the

printing plates followed by reannotation by blasting eliminates this problem. The most

reliable strategy to do conversions between probe sets is by sequence-matching (not by

annotation) (Mecham, 2004b; Wang, 2002). Other reasons to differences in results are

inconsistencies in experimental procedures and data analysis (Shi, 2005b). The largest

platform comparison so far with standardized protocols is the recent study by the

MicroArray Quality Control (MAQC) project. Illumina Human-6 Expression BeadChips,

48k v1.0, Agilent Whole Human Genome Oligo Microarrays, Affymetrix HG-U133 Plus

2.0 Arrays reached in this evaluation similar results regarding accuracy and sensitivity

(Canales, 2006; Shi, 2006).

21

1.2 Experimental Strategies

1.2.1 RNA Quality Assessment

For analysis of small samples and to waste minimal amounts, microfluidic capillary

systems are widely used, which provides the user with concentration, contamination and

size information. In addition to employing the ratio 28S/18S as quality measure an

artificial neural network (ANN) can be used which is trained to assess the RNA integrity

based on the entire size distribution (Schroeder, 2006). A slight degradation, with still

visible ribosomal bands, does not affect the results remarkably for most genes (Schoor,

2003). The Microarray Quality Control Consortium uses a RNA Inetgrity Number (RIN) >

8.0, 28S/18S ratio > 0.9 as quality citeria.

1.2.2 RNA Amplification

To enable microarray analysis of small samples and low abundant transcripts, methods for

amplification of the mRNA population have been presented. In general, around 10-20 µg

of total RNA is needed for each hybridizations, which corresponds to 1-2 million cells.

Tissues of such large sizes are difficult to obtain and are often non-homogenous,

containing a variety of cell types with different stages of disease which can confound the

biological effect of interest. The possibility to decrease the amount of RNA needed in the

cDNA synthesis may enhance specificity in the analysis since a more homogenous tissue

sample can be selected. Small tissue samples can be collected both from biopsies with

laser microdissection and in vivo using fine needle aspiration. RNA amplification is also

required when analyzing cell types difficult to culture. There are two main strategies for

RNA amplification; linear amplification based on in vitro transcription and exponential

amplification based on PCR.

Linear amplification: In 1990 Eberwine and colleagues introduced the technique of

amplifying mRNA by in vitro transcription maintaining relative transcript abundances for

low-throughput gene expression analysis in single neurones (Van Gelder, 1990).

22

Numerous variations of this technique have been applied and evaluated for microarray

analysis and are today a standard procedure for RNA amplification in many labs (Baugh,

2001; Dafforn, 2004; Dumur, 2004; Eberwine, 1996; Freeman, 1999; Lindberg, 2006; Luzzi,

2003; Pabon, 2001; Scherer, 2003; Stoyanova, 2004; Wadenback, 2005)

The use of oligonucleotide arrays has increased need for protocols producing sense strand

target. Generation of sense or anti sense in vitro transcribed RNA depends on which

primer in the cDNA synthesis is tagged with a transcriptase promoter sequence. The

original protocol by Eberwine described above has a T7 promoter on the oligodT primer.

Since the transcriptase generates RNA with the same sequence as the strand containing

the promoter, the RNA produced will be antisense (aRNA). One strategy of solving this is

by introducing labelled nucleotides during in vitro transcription and use the aRNA as

target (t Hoen, 2003). One can also add a promoter sequence of another transcriptase to

the primer for the second strand synthesis, and then either sense or antisense RNA can be

transcribed depending on which viral enzyme is used (Brazma, 2001). The three most

common transcriptases employed are T7, T3 and SP6, named after which bacteriophages

they are cloned from. Another option is to produce unmodified cDNA and then attach the

dye with a platinum complex which binds to guanine. A fourth alternative is to produce

an unlabelled cDNA strand from the aRNA and then use that as template for second

strand synthesis with random primers and labelled nucleotides using Klenow fragment for

elongation (Schlingemann, 2005).

Exponential amplification: Generally PCR based protocols provide higher amplification

yield in shorter time than in vitro transcription and have also the flexibility that either

strand can be used as target. The first PCR based cDNA amplification methods were

developed for production of full length cDNA probes. The most common approach for this

is template-switching PCR (Chenchik, 1996; Herrler, 2000). In order to enable PCR on a

full length cDNA, linker sequences for primers are needed in both ends. In the 3’ end, the

polyA tail is used, while in the 5’ end, Cs introduced by the reverse transcriptase from

Moloney murine leukaemia virus, MMLV-RT, are employed. For the probe application,

amplification yield and product length were the most important criteria for the quality of

23

the procedure. Less attention was paid to whether relative transcript abundances are

preserved and reproducibility of the method, which are the key features for a target

amplification protocol. Evaluations of the method for this application have however

shown results comparable to other strategies (Puskas, 2002; Saghizadeh, 2003). Another

possibility is to add a polyA tail with terminal transferase in the end of the cDNA enabling

PCR with a single primer containing a oligodT sequence (Iscove, 2002). Since long

transcripts are unfavoured in the PCR, strategies for producing short cDNAs are applied.

This can be accomplished by decreasing incubation time during reverse transcription and

using limiting amounts of nucleotides or by fragmentation through sonication (Hertzberg,

2001; Iscove, 2002; Sievertzon, 2004).

In an experimental set up, amplification and labelling strategies are certainly factors that

need to be considered if several methods are used. Since hybridization intensities are

altered to a greater extent than ratios, samples that are to be compared need to be handled

with the same protocol or in an appropriate experimental design. It is sometimes a good

idea to use amplification even though the material is sufficient for microarray expression

analysis of the vast majority of genes.

1.2.3 Target Labelling

Labelling efficiency of the target molecules is a main issue for transfer of the transcript

abundances to detectable reliable signals. In the standardized method for GeneChip

technology, biotin labelled cRNA is in vitro transcribed from double stranded cDNA.

Prior to hybridization, the cRNA is fragmented to reduce cross hybridization. After the

hybridization, streptavidin labelled with phycoerythrin is applied in two stains enabling

detection, with a biotinlylated anti-streptavidin antibody staining in between.

For dual channel microarrays the most common fluorophores are cyanines 3 and 5 and

Alexa Fluors. These are either introduced by direct or indirect labelling corresponding to

using labelled nucleotides in the reverse transcription or using aminoallyl labelled

nucleotides in which case the fluorophores are attached after cDNA synthesis by a

24

diestrification reaction. The latter method is more time consuming but produces higher

amounts of cDNA and causes less dye effects.

Signal enhancement may be an alternative when there are limiting amounts of RNA

available, the main two strategies being tyramide signal amplification and dendrimer

technology. These have proven to give reproducible reliable signals with up to 100–fold

amplification (Karsten, 2002; Manduchi, 2002).

1.2.4 Hybridization

Hybridization conditions are optimized to produce as specific and high signals as possible

compared to the background noise level. Longer incubation time increases specificity

pushing the reaction towards equilibrium. Salt increases hybridization while denaturing

agents such as formamide decrease it lowering secondary structures among target

molecules and allowing lower hybridization temperature which reduces evaporation. To

avoid cross hybridization unlabelled repetitive and common sequences are added to the

buffer.

1.2.5 Scanning

Measurements of how much target has hybridized to each spot are obtained using a high

resolution confocal scanner. The fluorophores are excited with laser light corresponding

to their optimal excitation wavelength. A photomultipier tube (PMT) is used for

detection. Adjustment of signal intensities can be achieved by changing the PMT setting

and/or the laser power to retrieve as high signals as possible without reaching the

saturation detection limit. To widen the range several scans at different settings can be

combined. To improve accuracy, the lowest signals could be neglected or, alternatively, be

subject to a calibration scheme (Bengtsson, 2004; Lyng, 2004; Shi, 2005a).

25

1.3 Data analysis

A main issue in microarray studies is how to retrieve valuable information from the

enormous amount of generated data. Since the field is so new, methods are being

developed continuously to meet the demands of biological researchers. The main

processes in the data analysis are extraction of spot signals, filtering, normalization,

assessment of differential expression, clustering and classification, and placing the study

into context of other sources of information, such as biological data bases, clinical features

and other microarray experiments. There are several reviews which summarize these

procedures (Brazma, 2001; Quackenbush, 2006).

1.3.1 Image Analysis

The 16-bit tiff image files obtained from the scanner after completed microarray

hybridizations contain a matrix of values ranging from 1 to 216 -1 (65,535) describing the

signal for each pixel of the slide. To identify which pixels belong to features, image

segmentation techniques are required. The first step is to apply a grid to the entire slide

determining the position of each feature. Thereafter the shape and size of each spot is

calculated. Depending on conformity and circularity of the spot, either adaptive circles or

seed growing is used (Adams and Bischof, 1994). For in situ synthesized arrays this step is

unnecessary, since features in this case always take the same shape. The intensities for

each feature are estimated with the mean or median of pixel values.

1.3.2 Background Correction

The slide background is often used for adjustment of spot foreground intensities and

quality acquisition. Subtracting the background to improve quality of feature values is

however not entirely accepted since the slide surface surrounding the spots might have

different properties than the areas with applied DNA and printing buffer. This difference

can to some extent be examined by applying control spots of DNA lacking complimentary

sequences in the target. The GenePix software from Axon Instruments estimates the

26

background with pixels in a fixed area around the spot (Axon, 2005). The median of these

is usually used to reduce influence of dust particles. Another method is the morphological

opening implemented in the Spot software (Buckley, 2003). Morphological opening is a

way of obtaining a smoothed image of the background without any foreground intensities.

Trends over the image is estimated and then removed by moving a sliding window twice

over the entire image. The first time the pixel in the centre of the window is replaced by

the minimum value of all pixels in the window. The next time each pixel is replaced by

the maximum value. The window is chosen to be larger than the feature diameter so that

spots will disappear in the transformed image. In the Affymetrix software MAS 5.0, the

local background estimation is obtained by dividing the slide surface into cells where the

median of the background is first calculated. This value is then adjusted to the other cells

weighted according to how distant they are (Affymetrix, 2001).

1.3.3 Quality Assessment and Filters

The image analysis software provides the user with a file containing different

measurements which can be used for quality assessment, including mean and median of

foreground and background intensities, standard deviation, spot diameter and pixel

regression ratio. Diagnostic plots revealing systematic trends and artefacts are useful for

determining the quality by manual inspection. Spatial trends over the slide surface are

visualized by plotting the median background intensities, either one channel at a time or

the log2 ratios of both.

Box-plots of the log2 ratios for entire slides or printing-pin blocks within a slide give

information of if the log2 ratio distribution is subject to any bias. A ratio-intensity plot,

displaying how the log2 ratios vary with increasing intensity is usually a way to get an

overall picture of the hybridization results. For single channel arrays a ratio-intensity plot

is created by using the average of all slides instead of a second channel. A spread in the

leftmost part of the plot reflects the uncertainty of low intensity measurements and bias

introduced by the background correction, while artefacts in the high intensity part might

27

be due to saturation. By applying a set of filters, spots with very weak signals, high

background or uneven signals are removed from further analysis not to contaminate

intensity values of high quality spots.

The Affymetrix system with 11-20 probe pairs per feature, each probe pair composed by

one with perfect matching sequence (PM) and one containing a mismatch (MM), can use

the replicates for quality control on several levels. Probes are designed throughout the

entire transcript but with emphasis on the 3’ end. An overall measure of the degradation

of the RNA sample can be assessed by calculating the ratio between 5’ and 3’ probes. In

MAS 5.0 adjustment for cross-hybridization is obtained by subtracting the intensity for

the MM probe from the PM probe. Some studies indicate however that it is better to use

just the PM intensity (Naef, 2003). It has also been observed that in many cases MM

intensity is higher than PM (Irizarry, 2003). Secondly, to get a single value for each

transcript a robust average of all the PM-MM differences is calculated using Tukey

biweights. Since the variation for a probe across slides has been observed to be lower than

the variation within a probe set on a single slide a multiplicative model accounting for

probe affinity has been proposed (Li and Wong, 2001). Based on this a robust linear model

accounting the infinity effect using a log scale has been proposed by Irizarry. Special

measures can be taken in the analysis of amplified target (Cope, 2006).

1.3.4 Normalization

To be able to retrieve reliable biological results comparing data from several experiments

with different systematic bias, numerous normalization strategies for microarray data have

been suggested.

Quantile normalization is a method mainly used for single channel arrays, but is also

applied to dual channel data when several distributions are present. The goal is to

transform data from different slides (or channels) so that equal distributions are obtained.

This is performed by ranking the feature intensities and than calculating the average of all

slides for data being on the same rank position. The value for each spot is then replaced

with the calculated mean corresponding to the rank on each slide. This means that exactly

28

the same values will be present in every slide, but assigned to different features. This

transformation might seem unrealistic and harsh to the data but has nevertheless proven

to perform superior to other methods for Affymetrix GeneChip compared to other

methods (Bolstad, 2003).

Locally weighted scatter-plot smoothing, or lowess normalization is the most widely used

method for dual channel data. It is calculated by fitting a polynomial to the scatter plot

using data in local intensity windows (Cleveland, 1979). Examining the ratio-intensity

plot, a curvature shape of the feature values is some times observed, revealing an intensity

dependence for the ratio distribution. To compensate for this Yang proposed the use of

lowess for microarrays which also has been investigated by many others further on

(Berger, 2004; Yang, 2001).

Spatial ratio effects are often influenced by the background intensities, and thus

dependent on which methods for background correction have been applied. If no

background subtraction is used, spatial normalization can compensate this. Uneven

hybridization, washing or bleaching may all give contributions to spatial ratio gradients.

One possibility is to normalize within each block on the array, which each contains

features printed with the same printing pin. This strategy is sometimes referred to as

normalization for bias introduced by different print-tips, although these seldom infer any

bias. More detailed spatial normalization can be carried out based on the background

intensities or on the spatial ratio gradients. As for intensity dependent ratio bias, spatial

bias can be corrected for with lowess, fitting a locally weighted polynomial over the slide

surface. Normalization between slides can be required also for dual-channel arrays and are

then performed with the quantile normalization described above using ratios instead of

intensities.

Although these methods are the most widespread, and have shown good results in

comparison with others, each data set is unique and has its own tendencies which must be

adjusted for. One such situation is when the expression is unevenly altered for a high

proportion of genes, for instance when global transcription rates are lowered or if we have

a chip with few features. In this case one can use specific control spots printed in different

29

concentrations on the chip, containing sequences for either exogenous RNA spiked into

the target, or housekeeping genes which are supposed to be present to constant levels in

all samples. For prokaryotes, genomic DNA can be used as a more robust alternative to

housekeeping genes, enabled by their of lack introns.

30

1.3.5 Experimental Design

The lack of long experience of microarray technology makes it not straight forward to

foresee all sources of variation which need to be accounted for to make accurately

randomized block experiments with enough replicates to measure the inferences of

interest. The common involvement of several research groups at different levels in the

experiment also complicates the overview of experimental factors.

Having identified all biological questions of interest in the experimental setting and

collected samples in a fashion that these could be answered and not confounded, the large

number of technical sources of variation needs to be considered. Several models for

estimations of biases have been suggested to adjust for dye-, array-, spatial- and spot

effects, as described above. Systematic bias may be removed with normalization methods

if only the experimental setup is adequate.

Design of microarray experiments is dependent on the maximum number of

hybridizations possible with each sample, the biological and technical variation, and the

number of arrays which can be afforded. To reduce the number one might consider

pooling of biological RNA samples (Glass, 2005; Peng, 2003; Vasselli, 2003). The risk with

mixing several samples on the same array is however that one might not be able to draw

any conclusions about the entire population, nor single individuals, since the range of

possible expression is so wide. A highly expressed transcript in the pool can be due to over

expression in a single sample.

An important factor in choice of design is the need of replication for correct estimation of

the variation. Two main types of replicates are considered, biological replicates, referring

to patients and cell cultures etc, and technical replicates, performed to estimate variation

introduced during laboratory work. Biological replicates are chosen to be representative

for the population investigated, while the technical replicates often are just repetitions of

the experiment to average out noise and enable balanced designs. The most common types

of technical replication are spot and hybridization replicates. The former can be

accomplished by printing the same material in different positions on the array to get

31

spatial variation and possibly at different concentrations to span several intensities, while

the latter corresponds to repeating the procedures from reverse transcription of the target

and further on.

Comparisons using dual channel arrays can be either direct or indirect, according to figure

2. Direct designs are suitable for comparisons of only two samples, or when the data are

paired, where there is no interest in examining variation between samples belonging to

different pairs. Indirect comparisons are performed via a reference which is hybridized on

the arrays together with one sample at a time. The reference can be either of biological

interest, such as time point zero in a time course experiment, or just a tool for comparison.

For this purpose commercial RNA is available (Novoradovskaya, 2004). Reference designs

are inefficient statistically, not taking full advantage of the possibility to join two samples

on the same microarray. A statistically more powerful strategy is the interwoven

loopdesign. An optimal design aiming at minimizing the sum of all variances is obtained

by finding the minimum of (XTX)-1, X being the design matrix (Wit and McClure, 2004).

Unequal incorporation of the different dyes causes a dye bias which needs to be

considered in the design. To account for the dye effect a in a direct comparison a dye swap

experiment is required, with samples labelled in the opposite manner. The dye bias is

circumvented using a reference design since all samples to be compared can be labelled

with the same dye. Using the interwoven loop design, the dye effect need to be

considered, not necessarily by doing a dye swap for each hybridization, but by controlling

that the design is dye balanced, meaning that each sample is labelled equally many times

with each dye.

32

Figure 2. Examples of hybridization schemes for dual channel microarrays; direct comparison with one dye swap, reference design and dye balanced interwoven loop design. A hybridization of two samples on the same slide is illustrated by an arrow between them.

Kerr and colleagues proposed an ANOVA model to consider all effects introducing

variance in microarray data (Kerr and Churchill, 2001). This eliminates the need of

additional normalization methods. The calculations need to be recalculated for every test

and are computationally demanding which has had a negative effect on its use.

In spite of the statistical inefficiency, reference design has so far been the most popular

choice in large studies. The flexibility of being able to expand sample size gradually is an

attractive ability.

1.3.6 Differentially expressed genes

A core objective in microarray data analysis is to identify genes whose transcript levels

have been altered between different conditions. Usually this is presented as a ranking list

of genes according to a statistical significance score. In the early days of microarray

technology articles were published using fold change as the single parameter to make a

decision if a gene had changed or not (Carninci, 2005; Rihn, 2000; Sgroi, 1999).

Considering the large number of genes and the small number of replicates the number of

false positives will be considerable. Especially features with high variation can obtain a

high fold change value. A few of the large number of non-differentially expressed genes

will get deviating intensities in some experiments due to noise and thus falsely be assigned

as differentially expressed.

1 2

R

31 2

12

3

4

5

67

8

9

10

11

121 2

R

31 2

12

3

4

5

67

8

9

10

11

12

33

Using an ordinary t-test, the variation for each gene will be taken into consideration, so

that genes with varying intensities within a condition will not be selected. This

improvement is however not enough since there is a risk that some genes will have a

small standard deviation by chance. A very small standard deviation will result in a high

t-value even though the gene is not differentially expressed. The wide spread use of

microarrays in concert with lack of foolproof techniques for data analysis has laid the

ground to a whole new scientific area of developing new statistical strategies for

microarray data. More than hundred articles have been published each presenting a

method especially designed for assigning differential expression of microarray data. A

major problem is to obtain an estimate of the true variation when there are so few

replicates for each measurement. One proposed strategy is to take advantage of that noise

tends to decrease with intensity and use a sliding window for calculation of a variation

score relative to spots in the same window (Yang, 2002).

Today modified t-tests have become the most widely used approaches. The main purpose

of these is to decrease statistical significance of genes with small standard deviation which

otherwise would become false positives. The modification normally consists of a small

value added to the standard deviation when the t-statistic is calculated. Because of its

statistical power and user friendly implementations, the most popular of these methods is

significance analysis of microarrays (SAM) (Tusher, 2001). The value added to the

standard deviation is chosen to minimize variation of the t-value. An alternative approach

which has shown to be equally good is empirical Bayes statistics (Lonnstedt and Speed,

2002; Smyth, 2004). This approach uses a prior estimate of how many genes are

differentially expressed and a parameter deciding the importance of this prior distribution

will contribute to the posterior probability. A penalized t-statistics can be formed also in

this case.

How long ranking list of differentially expressed genes will be selected may be based on

variation in the dataset and which follow up studies will be performed on the data, i.e.

weighing cost of false positives against the gain of making a revolutionary discovery.

34

The large number of hypothesis tests due to the high number of genes in each array

requires adjustment for multiple testing. To adjust for multiple testing and to control the

error rate either the family wise error rate (FWER) or the false discovery rate (FDR) can

be calculated. FWER is the probability of finding at least one false positive among the

selected genes, while the FDR is the fraction of false positives. FWER is often considered

as to conservative for microarray experiments, therefore FDR is the method most widely

used.

1.3.7 Clustering and Classification

Clustering is performed to divide the massive amounts of gene expression data into groups

based on similarity. This can be accomplished with two different strategies used for two

different purposes; unsupervised clustering for exploratory analysis, and supervised

clustering, which can be used to create a diagnostic device based on gene expression

signatures.

Unsupervised clustering: Unsupervised clustering is a way of obtaining a more

comprehensible representation of the data set. With reduction techniques, such as

principal component analysis (PCA), singular value decomposition (SVD) and

multidimensional scaling, it is possible to visualize the data in two or three dimensional

space so that the distance relationships can be explored by visual inspection.

For several of these methods missing values are not tolerated. To solve this, one could

either remove features with missing values from the data set or alternatively replace them

with an estimate. If technical replicates are available the averages of these can be used,

otherwise there are several strategies which have been proposed for microarray data

including k-nearest-neighbours, single value decomposition and local least squares (Alter,

2000; Kim, 2005; Troyanskaya, 2001).

Clustering can be performed either on genes or samples, or on both, showing relationships

between genes and samples which might be of biological relevance. Among the most

common methods for exploratory grouping, or clustering, are hierarchical clustering, self-

35

organising maps (SOM) (Kohonen, 2000; Toronen, 1999) and K-means clustering

(Hartigan and Wong, 1979).

Hierarchical clustering is an agglomerative method which produces a dendogram with a

bottom-up structure. To start with all data points are treated as separate clusters. The

dendogram is then formed by subsequently assigning the two closest clusters together to

form new clusters. Distance between clusters can be calculated by using the minimum,

maximum or the average distance between samples in two different clusters.

For K-means and SOM, the number of clusters to be formed is predetermined by the user.

In K-means clustering, the samples are first randomly assigned to one of the k clusters.

The centre of each cluster is obtained by calculating the mean or median of all

incorporated expression profiles. The samples are then reassigned to the closest cluster.

New cluster locations are calculated from the new cluster members, followed by another

reassigning procedure. This procedure is repeated until there are no more changes.

The SOM algorithm is similar to K-means assigning samples to the closest cluster in an

iterative process. The difference is that the samples in this case, are mapped onto a two

dimensional grid. To determine optimal numbers of clusters the ratio between cluster

diameter and distance between clusters can be used.

Supervised classification: Supervised classification on the other hand is performed to

create a tool which can be used for discrimination of new data. Development of the

classifier is based on prior information of how for instance tumour types are connected to

appearance of the gene expression profiles. Supervised clustering techniques thus needs a

data set with information of the sample labels, for instance if a tumour is malignant or not.

The aim is to create a classifier which can classify new unlabelled samples and genes based

on the microarray data. These can be divided into machine learning algorithms such as

support vector machines (SVM), ANN and k-nearest neighbours (KNN), and statistical

linear discriminate analysis.

SVM was introduced by Vapnik in 1995 and applied to microarray data in 2000 (Brown,

2000; Furey, 2000). The algorithm calculates a separating hyperplane with maximized

margins to the data points. SVM has been much used in this field compared to the other

36

methods mainly due to its capability of providing good generality despite the small sample

sizes typical for microarray experiments (Furey, 2000).

A weakness for methods based on a molecular signature of thousands of features rather

than a small set is that they are difficult to interpret in a biological context and not

applicable to lower throughput techniques. Combining these however with feature

selection methods makes them more useful and also reduces risk of over-fitting. Filtering

and pruning are examples of strategies for feature selection.

Filtering is based on the input data by filtering features based on a specific criterium.

Selection can be performed based on a threshold or a number of top genes on a ranking or

by evaluating the performance. In the first microarray classification study published,

Golub selected a subset of genes from a Signal-to-Noise (S2N) ranking list (Golub, 1999b).

Genes with large difference in mean value between groups and small within group

standard deviation are selected. A disadvantage with this approach is that many of the

genes will be very much correlated containing redundant information.

Pruning on the other hand changes the classifier to reduce its complexity and avoid

overfitting. When multilayer networks are used this is a complicated task. A trade off

between network performance and complexity is made by minimizing the sum of all

weights and/or the number of weights. In the SVM case Guyon and colleagues have

shown that ranking weight vector elements according to magnitude, eliminating the

smallest weight elements one at a time with consecutive evaluation, outperforms filtering

techniques (Guyon, 2002). In contrast to filtering techniques the genes selected by this

method do not contain as much redundant information.

1.3.8 Extraction of Biological Information

To find biological reasons why genes appear as differentially expressed or in the same

subclusters one tries to find biological features which are overrepresented in these groups.

Examples of biological annotations which may be used for this purpose are Gene Ontology

terms, Swissprot key words and KEGG pathways (Ashburner, 2000; Kanehisa, 2004).

Statistical significance of the enrichment analysis can be calculated with Fischer’s exact

37

test with adjustment for multiple testing (Al-Shahrour, 2004; Castillo-Davis and Hartl,

2003; Dennis, 2003; Zeeberg, 2003). Gene set enrichment analysis (GSEA) is a recently

developed method in which a Kolomogorov-Smirnov statistic is used to evaluate if a

ranking list is enriched for a certain gene set at the extremes (top or bottom)

(Subramanian, 2005). FDR is used to assign significance and is obtained by comparing to

the null distribution which is obtained by permuting samples labels. A similar strategy is

the significance analysis of function and expression (SAFE) (Barry, 2005). There are

several software which enables comprehensible visualization of gene expression levels in

the different pathways such as Pathway Assist, GenMAPP and MAPPFinder (Dahlquist,

2002; Doniger, 2003; Nikitin, 2003). One reason to genes being concordantly expressed is

regulation by the same transcription factors. In eukaryotes combinations of transcription

factors form cis regulatory modules for a gene to be expressed. The sets of sequence

elements in the promoter region which are subject to transcription factor binding are

often conserved between species. The co-occurence of transcription factors in conserved

promotor regions can be evalutated with software such as CRÈME, JASPAR and cisRED

(Robertson, 2006; Sharan, 2004; Vlieghe, 2006).

1.3.9 Software for Data Analysis

The rapid development of methods for data handling puts high requirements on the

software development. Besides a large amount of bioinfomatic tools designed to make

specific operations there are several commercial (e.g. Genespring, ArrayAssist and

Kensington Discovery Environment) and open-source software (Mev 4, Bioconductor)

providing complete solutions (Gentleman, 2004; Yang, 2002). The R language for

statistical computing has been increasingly popular much due to that a large number of

statistical tools are available open-source and that many new methods for microarray data

are written in R.

38

1.3.10 Public Data Bases

The massive data amount generated in a microarray study often exceeds the area of the

particular research group who produced it. To accelerate research efficiency it is therefore

practical to have data publicly available for others to explore. In the beginning this was

primarily realized on the website of each research group or by the journal publishing the

study. This spread of data made it rather a time consuming task for other researchers to

find and collect the information.

In response to the exponentially increasing number of microarray data sets published and

the urge to be able to do meta analysis of data from several data sets, several public

repository data bases for storage of microarray data have been launched. The largest one is

the Gene Expression Omnibus at NCBI hosting 4493 experiment series (Nov. 2006,

http://ncbi.nlm.nih.gov/geo/) (Edgar, 2002). In Europe the most popular database is

ArrayExpress ,online since 2002, accommodated by the European Bioinformatic Institute

(Sarkans, 2005). Today more than 30,000 hybridizations from over 1000 studies are stored

here (http://www.ebi.ac.uk/arrayexpress/). To streamline the presentation of microarray

studies the Microarray Gene Expression Data society (MGED) has proposed a format

which holds the Minimum Information About a Microarray Experiment (MIAME)

(Brazma, 2001). Upon publication of a study it is nowadays often required to have the data

publicly available in the MIAME format.

1.4 Quality Control

Quality control of microarray experiments can be carried out on several levels. RNA pools

with known transcript abundances have been developed to be used for evaluation of

laboratory and data management (Thompson, 2005). These can also be used for cross-

laboratory and platform comparisons.

To enable control on an analysis of actual samples, spike in RNA can be used which

hybridize to special control spots on the microarray. For spotted arrays small sets (~10

probes) of spike in controls which spans a range of intensities and ratios are available from

several companies. The External RNA Control Consortium is a volunteer organization of a

39

large number of companies and academic institutions aiming at developing a platform

independent set of 100 spike in sequences which are made publicly available (ERCC,

2005). Affymetrix provides a special data set which can be used to evaluate the

performance of methods for data analysis.

A supplementary approach of controlling the results is to examine the expression of genes

with another method, mainly quantitative real-time PCR (QRT-PCR) (Canales, 2006;

Higuchi, 1993; Lundholm, 2004; Rajeevan, 2001). Discordant results can be due to that

different splice variants are measured and cross-hybridization on the microarrays.

40

2 Microarray-based Gene Expression Analysis in Cancer Research Cancer is a major health concern in the western world. In Sweden 45,000 people get the

diagnosis each year. Traditionally, tumours have been categorized on the basis of

histological features. However, the information gained from microscopic examination of

staining patterns does not reveal underlying molecular events involved in cancerogenesis

and progression. To obtain a deeper understanding of the biology in tumours, adoption of

microarrays is a powerful strategy which uncovers the gene expression signatures

associated with different phenotypes at the same time.

The multistep process of cancer development starts with a genetic or epigenetic alteration

in a cell giving it a growth advantage. As aberrant cells propagate genetic instability is

accumulated and abnormal characteristics are gained. Cancer cells are typically self-

sufficient in growth signals, insensitive to apoptotic stimuli and have an increased

telomerase activity enabling a larger number of cell divisions (Hanahan and Weinberg,

2000). As the tumour grows it gains abilities to break through surrounding tissue and

induce angiogenesis to supply the tumour with nutrients and oxygen. The cancer can also

spread to other organs distant from the primary tumour. Ninety percent of cancer deaths

are caused by secondary tumours, metastasis (Sporn, 1997). For these to arise, malignant

cells need to detach from the primary tumour, break through into blood vessels, survive in

the circulation, and finally migrate into a new site. Genes that are important for cancer

development are commonly called oncogenes and tumour suppressor genes. Genetic

alterations of proto-oncogenes which turn them into their active oncogenic form imply a

gain of function which promotes tumour growth. Examples of these are growth factors

41

and their receptors, cell cycle regulators and angiogenic inducers. Tumour suppressor

genes on the other hand prohibit malignant growth by inducing actions such as apoptosis

or cell cycle arrest. These genes are disabled to give the cell cancerous characteristics. A

novel dimension to the mechanisms behind tumor development is given by the miRNA

population which also may act as tumor suppressors and oncogenes (Chen, 2005). Cancer

is a very heterogenous disease contributing to the complexity of obtaining a general

picture of its development and progress. A variety of alterations can end up in similar

behaviours. An example of how five distinct events have been linked to specific stages is

the famous model for colorectal cancer proposed by Vogelstein and coworkers

(Vogelstein, 1988).

2.1 Biomarker Discovery

Search of potential biomarkers based on gene expression profiles generated by microarray

experiments have been applied to the major cancer forms during the past ten years,

including breast, prostate, leukaemia, ovary, and lung cancer (Adib, 2004; Bhattacharjee,

2001; Dhanasekaran, 2001; Golub, 1999a; Hedenfalk, 2001; Ono, 2000) Genes have been

identified whose expression correlates with tumour grade, metastasis, survival and

recurrence. Gene expression profiling is commonly the first step in a screening process

(Figure 3).

Figure 3. Example of work flow; potential biomarkers are identified with cDNA microarrays, after verification of clone sequence and expression on protein level, tumour tissues and possibly serum can be used for screening.

cDNA microarrays Tissue microarrays Serum microarraysWestern blotsSanger sequencing

Gene expression profiling

Verification of clonesequence

Verification of protein expression

Tumour tissue screening

Serum screening

Functional assaysEpigenetic and genetic assays

cDNA microarrays Tissue microarrays Serum microarraysWestern blotsSanger sequencing

Gene expression profiling

Verification of clonesequence

Verification of protein expression

Tumour tissue screening

Serum screening

Functional assaysEpigenetic and genetic assays

42

Follow up studies can include epigenetic examination of specific regions, tissue

microarrays of protein expression and serum screening, vastly broadening the collection

of diagnostic approaches (Dhanasekaran, 2001; Monni, 2001; Petricoin, 2002). Depending

on the context, if the biomarker is to be used in a population screening or to improve

diagnosis of a detected tumour, it has to meet different demands. The increasing rate of

publications on biomarkers has not yet lead to a similar trend in clinical practice. The

number of approved biomarkers by FDA each year has in contrast decreased during the

last decade. Many biomarkers are under clinical evaluation and there are high hopes that

in the near future the promising findings of the many microarray studies will find their

use in clinical applications.

2.2 Tumour Classification

Gene expression profiles can be used to separate tumours into new and well established

tumour types. The purpose is that a more precise knowledge of the tumour biology will

lead to development of more efficient therapies.

The first study showing the possibility to perform class discovery using gene expression

microarrays were a leukaemia study conducted by Golub and colleagues published in 1999

(Golub, 1999a). A classifier was constructed to recognize acute myloid leukaemia and

acute lymphoblastic leukaemia. The classifier also happened to make impact clinically

since it was able to improve the diagnosis of a patient leading to altered treatment regime.

Since then microarrays have been used to examine clinically relevant tumour subtypes in

breast, prostate and lung cancer (Alizadeh, 2000; Dhanasekaran, 2001; Hedenfalk, 2001;

Lapointe, 2004; Perou, 2000; Sorlie, 2003). In Hedenfalk’s study on hereditary breast

adenocarcinomas with mutations in either BRCA1 or BRCA2 it was demonstrated the

value of gene expression analysis compared to genotyping. A supervised classifier was able

to separate the two groups based on a subset of the genes. Interestingly a spontaneous

tumour was classified as a BRCA1 mutation whereas sequencing showed no mutation. The

reason was aberrant methylation in the promoter region. In the breast cancer study by

43

Perou and coworkers expression profiles of 42 breast cancer tumours were subject to

hierarchical clustering revealing three relevant groups, one of them being the known

subtype Erb-B2+. It was also demonstrated that primary tumours were more similar to

their metastasis than to other primary tumours. Sorlie in a study of 115 breast cancers also

found that clustering corresponded to biological relevant subtypes, Erb-B2+ was one of

these four. Genes selected for clustering were those low variation for samples taken from

the same patient with 15 weeks of neoadjuvent therapy inbetween and with large

variation between patients. Clustering with these genes on two other independent data

sets gave similar results. Three subtypes of prostate cancer tumours were identified in the

study by Lapointe of which one was associated with recurrence. Alizadeh performed

unsupervised classification on B-cell lymphomas and found that the clusters formed

reflected different stages of cancer progression.

Most studies are performed focused on a specific cancer. There is however a gain in

collecting data from multiple cancer forms to enable constructure of a classifier which can

connect metastasis to tissue of origin. One of the few examples of this is a multiclass

cancer classification performed on 14 tumour types and 90 normal tissue samples with a

one-versus-all SVM (Ramaswamy, 2001). Accuracy of classification was 78%.

Many studies are carried out in cancer cell lines in vitro which provokes the reliability of

the results since the course of events might depend on natural environment of the cells. A

large comparison of the NCI60 cell lines to their tissue of origin was therefore carried out

which showed high resemblance in gene expression patterns between in vitro and in vivo

expression (Tong, 2006).

44

2.3 Therapeutic Development

Gene expresson profiling is an important tool at many stages in the the developmental

process of therapeutic agents. Initially in the screening step patterns of healthy samples

are compared to those of diseased and the differentially expressed genes are searched for

potential drug targets. Two to three thousand genes including G protein coupled

receptors, nuclear receptors, ion channels and protein kinases are potentially druggable

(van Duin, 2003). To determine the importance of selected genes these can be knocked

down one at a time by interference RNA (RNAi), antibody or affibody inhibition and then

the change can be studied. Once a potential drug well has been developed the response of

this can be compared to that of the RNAi. The power of the treatment and possible toxic

bireactions can thus be discovered.

NCI60 is a panel of 60 cancer cell lines which have been used for screening of drug

sensitivity. To date, the respsonse of more than 100,000 agents have been evaluated with

these cell lines. Correlations of gene expression profiles with resistance or sensitivity to

several substances have been identified. As an example the rate limiting enzyme in

fluoracil metabolism, dihydropyrimidine, was inversily correlated to sensitivity to

fluoracil (Scherf, 2000). Supervised classification has been successfully used to also predict

chemosensitivity for some compounds (Staunton, 2001).

The connectivity map is a recently published resource which can be used to find

connections between chemical agents, diseases and gene expression signatures (Lamb,

2006). One hundred and sixty-four molecules have been studied in one to four cell lines

and analyzed with Affymetrix GeneChip arrays. Connections are computed with the

GSEA algorithm described in 1.3.9. When quering the connectivity map with a gene

expression signature derived from a compound with unknown function, an idea of its

effect can be obtained if there are strong connections to agents with known functions.

Another example of the progress of microarray technology within the chemogenomics

field is a database with expression profiles of seven different types of rat tissues treated

45

with approximately 600 different compounds. Effects in liver could as an example be

predicted by the gene expression patterns in the database (Thompson, 2005).

46

3 Present Investigation

3.1 Evaluation of Two Strategies for RNA Amplification

RNA amplification is employed in microarray analysis when the amount of sample is

limited. The two main strategies are linear amplification using in vitro transcription and

exponential amplification with PCR. A broader background of available methods is given

in section 1.2.3.

3.1.1. 3' cDNA Tag Amplification

Amplification of full length cDNA with PCR is often criticized for not being able to

produce satisfactory amounts of the longest transcripts. In PCR, short fragments are

generally favoured. To prevent this size dependent imbalance in the PCR Sievertzon and

coworkers introduced a technique based on short 3' end cDNA tags. An outline of this

approach is presented in figure 4. First strand cDNA is synthesized using isolated mRNA

and a biotinylated and anchored oligodT primer containing a linker with a NotI cleavage

site. The second strand is obtained by partitioning the mRNA strand into primers with

RNaseH, polymerization of DNA with T4 DNA polymerase and ligation with DNA ligase.

The double stranded DNA is thereafter collected by precipitation, washed in a gel column

and fragmented by sonication. The produced 3' end fragments are immobilized onto

streptavidin beads and a linker is added by blunt end ligation. PCR is performed with

primers using the NotI sequence and the 5´linker. To generate an even size distribution of

47

the fragments the number of PCR cycles is optimized to achieve an even smear on an

agarose gel. Direct labelling is carried out in an asymmetric PCR with labelled

nucleotides.

Figure 4. Amplification protocol based on PCR of short 3’end cDNA tags

48

3.1.2 Paper I

In this study we compared the 3' end cDNA PCR with a commercial kit for linear

amplification based on in vitro transcription by investigating the difference to

unamplified material in microarray experiments. We also used in vitro transcribed

products on oligo microarrays with a novel labelling kit but with a large drop out due to

that the probes were not only in the 3’ end. For the comparison RNA from two cancer cell

lines was analyzed on spotted cDNA microarrays. As starting material for amplification

100 ng of total RNA was used, which invokes two cycles of in vitro transcription. To

assure that no confounding effects were introduced by the mRNA isolation in the

exponential protocol, unamplified material was analysed both with and without prior

mRNA isolation. Amplifications and hybridizations were carried out in two replicates.

The comparison was based on the Pearson correlations of log2 ratios from hybridizations

of the two cell lines between and within methods.

The size distribution of amplified material was between 100 and 600 bp for both

amplification protocols. Pearson correlations between amplification replicates showed a

better reproducibility for the PCR based method. Comparison to unamplified material

showed that both amplification protocols rendered reliable data.

49

3.2 Detection of p53 Dependent Genes Induced by Hypoxia

3.2.1 P53 - Cancer Gene

P53 has been one of the most extensively studied genes during the last decades. It was

independently identified as a cancer gene in 1979 by Levine, Lane and Old (DeLeo, 1979;

Lane and Crawford, 1980). The P53 protein attracted the attention of many researchers

due to its over-expression in malignant cells, and was also therefore presumed to be an

oncogene. After ten years of misinterpretations, Vogelstein and Raymond White finally

showed that p53 was mutated in cancer cells and that the wild type protein on the

contrary protected against cancer (a tumour suppressor gene) (Baker, 1989).

There are four conserved domains in p53 protein; the N-terminal which is required for

transcriptional transactivation (Fields and Jang, 1990), a sequence-specific DNA binding

domain (Pavletich, 1993; Wang, 1993), a tetramerization domain near the C-terminal end,

and the C-terminal domain which interacts unspecifically with single stranded DNA

(Foord, 1991). Mutations prevalent in cancers are localized in the DNA binding domain,

disrupting the capability of transactivation of downstream target genes (Hollstein, 1994).

Inherited mutations in p53 give rise to the Li-Fraumeni syndrome causing a wide range of

cancer forms, predominantly breast carcinoma, soft tissue sarcomas, osteosarcoma, brain

tumours and adrenocortical carcinoma (James, 1999).

Wild-type p53 protein binds to specific genomic sites with a consensus binding site 5'-

PuPuPuC(A/T)(T/A)GPyPyPy-3' (el-Deiry, 1992). Whole genome in silico promoter

analysis has indicated that as many as 2,500 genes can be induced by p53 (Hoh, 2002).

Most target genes are involved in apoptosis, cell cycle arrest and DNA damage control.

P53 has been described as "the guardian of the genome", referring to its role in conserving

stability by preventing genome mutations. By inducing cell cycle arrest or apoptosis upon

DNA damage, p53 prevents cells with improper genome sequence to propagate. P53 is

induced also by other types of stress such as hypoxia, hyperproliferation, and oncogene

activation.

50

P53 is expressed consecutively with a short half life in an unstressed environment.

Continuous degradation occurs through ubiquitylation by the ubiquitin E3 ligase MDM2

which also is transcriptionally induced by p53 (constituting a feedback loop). Upon stress,

degradation is disrupted and p53 is translocated to the nucleus where a homotetramer is

formed to act as a transcriptional activator of down stream target genes. Post translational

acetylation and phosphorylation promotes both stability and transcriptional activation.

3.2.2 Regulation of p53 by Hypoxia

Hypoxia may be the most common physiological inducer of P53 protein in normal body

tissues (An, 1998). Which genes p53 affects under hypoxic conditions is however not well

characterized. In hypoxic regions of solid tumours, accumulation of wild type p53 protein

is strongly correlated to apoptosis. This leads to selection for loss of p53 which in turn

leads to a more aggressive propagation of cancer. The ability of having non-apoptotic

regions of hypoxia also stimulates angiogenesis with a consequent re-oxygenation of

previously hypoxic areas.

The main molecular mediators of hypoxic response are the transcription factor hypoxia

inducible factor-1 HIF1 (Zhong, 1999) and the angiogenic factor, vascular endothelial

growth factor (VEGF) (Yancopoulos, 2000). Which roles these play together with p53 is

not yet fully understood. It has been shown that HIF1-α is required to stabilize the p53

protein for transactivation under hypoxic conditions (An, 1998) and that high levels of

p53 can inhibit HIF-inducible transcription via the coactivator p300 (Blagosklonny, 1998).

In contrast to other cellular stress such as UV- and gamma irradiation, several studies have

shown that p53 mediates transrepression, but no transactivation under hypoxic conditions

(Hammond and Giaccia, 2005).

51

3.2.3 Paper II

To examine the p53 dependent response under hypoxic conditions we conducted a

microarray gene expression study of the human colon cancer cell lines HCT116 p53+/+

and HCT116 p53-/- exposed to hypoxia. RNA was extracted at 0, 8 and 16 hours and

hybridized onto cDNA arrays containing 20.352 probes representing 12.454 genes.

First we wanted to confirm that hypoxia indeed induced accumulation of active p53

protein. This was accomplished by performing Western blots and immunofluorescence

staining. To examine p53 dependent apoptosis both cell lines were analyzed by TUNEL

staining and flow cytometry showing an increased fraction of apoptotic cells in p53+/+

cells. Based on the microarray experiments we wanted to investigate which genes might

be involved in the p53 dependent apoptosis. Among all genes with a functional annotation

of apoptosis in the Gene Ontology we found that only two genes were altered in a p53

dependent fashion. These were apoptosis inhibitor BIRC3 which was down regulated, and

the death receptor Fas which was up-regulated after 16 hours. To investigate the

importance of Fas in p53 dependent apoptosis wtp53+/+ cells were treated with an

antibody against Fas in order to block Fas signalling. This resulted in resistance to hypoxia

induced apoptosis. The Fas pathway was further investigated by treating wtp53+/+ cells

with an antibody against caspase 8 inhibitor. Also this impairment of the Fas signalling

pathway reduced apoptosis demonstrating its critical role in hypoxia induced apoptosis. A

luciferase reporter assay was used to verify that Fas was direct transcriptionally activated

by p53.

Many traditional p53 target genes were not induced by hypoxia which also was confirmed

with Northern blots and qRT-PCR. However we found several novel p53 dependently

expressed genes, such as ANXA1 which has demonstrated to have a proapoptotic effect in

circulating neutrophils, SEL1L which overexpressed in mouse tumour xenografts inhibit

growth, and SMURF1 and SMURF2 encoding E3 ubiquitin ligases regulating proteins in

the TGFβ signalling pathway. Position of putative p53 binding sites in the promoter

regions of these genes were calculated with the p53MH algorithm.

52

Further work to verify the p53 induction of these potential p53 target genes includes

ChIP-chip experiments to investigate p53 binding sites in potential target genes and

reproducibility of our findings in other cell lines. Future extension of this work involves

examination of how these genes participate in inhibition of tumour development and

their potential use in drug development.

53

3.3 Characterization of a Malignant Expression Signature of

Adrenocortical Tumours

3.3.2 Adrenal Gland

The adrenal glands are endocrine organs secreting hormones, which are necessary for life

in their role of controlling our response to stress, and regulating levels of blood glucose

and blood pressure. They are situated above the kidneys, and have a triangular shape of

size 0.5x3x4 cm. The cortex and the medulla are two separate endocrine organs with

divergent characteristics and origins. The cortex secretes glucocorticoids,

mineralocorticoids, aldesterone and sex hormones, while the medulla is responsible for

dopamine, adrenaline and noradrenaline production. Hormones exert their activity by

binding to cytosolic receptors which transforms and translocates to the nucleus where

they bind to hormone responsive elements in the genome (Dahlman, 1989; Fuller and

Young, 2005).

3.3.2 Adrenocortical Tumours

Tumours in adrenocortex are relatively frequent, with an incidence of 9% in autopsy

studies (Boushey and Dackiw, 2001). Malignancy of these is rare, the yearly incidence

being 0.5 - 2 per million inhabitants, but is associated with a very aggressive behaviour,

the five year survival being only 10%, if the tumour is resected, and a mean survival of 12

months if non-resected (Dackiw, 2001; Kjellman, 1999; Sidhu, 2003)). The high frequency

of benign tumours (adenomas) together with the aggressiveness of cancers causes a high

demand of diagnostic capabilities. With the present methods, distinction between

adenomas and cancer is often difficult to establish.

54

Sometimes tumours cause disturbances in the hormone synthesis machinery, generating

overproduction of aldosterone, cortisol and in some cases also sex hormones.

Adrenocortex consists of three layers; zona glomerulosa, fasciculata and reticularis.

Aldesterone is produced in zona glumerolosa and cortisol and sex hormones in fasciculata.

Adenomas with overproduction of cortisol give rise to Cushing’s syndrome which is

associated with a characteristic obesity, hypertension and thin, easily bruised skin.

Aldosterone producing adenomas, so called aldosteronomas, cause hypertension and

hypokalemia. Cancer causes a variety of hormonal defects in 50% of cases. Symptoms of

these imbalances can also lead to that the tumour can be detected. Incidentalomas are

adenomas with no symptoms (detected by incidence). The increased use of computed

tomography and radiologic imaging has improved the detection rate of tumours in the

adrenocortex the last years.

Treatment is predominantly carried out by surgical resection. This is performed in case of

hormonal overproduction or upon suspicion of malignancy. For the latter, a diameter

larger than 4 cm is the main criteria (Kendrick, 2001), although in rare cases tumours as

small as 2 cm may metastasize (Barnett, 2000).

Less than ten percent of adrenocortical cancers are familial, the most common syndromes

being Hereditary Adrenocortical Carcinoma (ADCC), Multiple Endocrine Neoplasia type

1 (MEN 1), Beckwith-Wiedemann Syndrome (BWS) and Li-Fraumeni Syndrome (LFS).

CGH analysis of sporadic tumors have shown that 28% of adenomas and more than 60%

of cancers have chromosomal alterations (Kjellman, 1996; Sidhu, 2002). The most

common ones in cancers are gains on chromosome 4, 5, 17 and 19 and losses on 1, 2q, 17p

and 22. Loss of heterozygosity (LOH) on locus 17p13 is highly associated with malignancy

and has a prognostic value. A somewhat lower predictor is LOH on locus 11p15 (Gicquel,

2001).

Few global gene expression profiling studies of sporadic adrenocortical tumours have yet

been published. The first report was a study of eleven cancers, four adenomas, three

normal adrenocortex and one macronodular hyperplasia using an Affymetrix array with

10,500 genes. Many genes were identified as potential biomarkers, the most prominent

55

being the already well established gene IGF2 (Giordano, 2003). DeFraipont and coworkers

examined the expression of 230 candidate genes in 57 adrenocortical tumours (de

Fraipont, 2005). An IGF2 cluster was found to predict malignancy as well as a group of

genes involved in steroidegenesis.

3.3.3 Paper III & IV

To examine underlying biological mechanisms involved in adrenocortical cancers and to

screen for potential biomarkers we conducted a microarray study of different phenotypes

of adrenocortical tumours as well as normal adrenocortex. A smaller study was first

published with focus on the difference between benign and malignant tumours. The

extended study included also normal adrenocortex and was also aimed to explore gene

expression giving rise to different hormonal activity.

The sample size in the total study consisted of 17 adenomas of which five were

overproducing glucocorticoids, four aldosteronomas and eight incidentalomas, and twelve

cancers, including six containing high amounts of necrosis and one with uncertain

diagnosis. Necrotic samples were included in the data set exclusively in a classifier and the

case with uncertain diagnosis only when performing clustering. Clustering with

unsupervised methods revealed that adrenocortical cancers have a molecular signature

clearly distinguishable from adenomas. The uncertain case clustered with the adenomas.

Analysis of differential expression showed accordingly a high number of genes, 1137, with

altered expression levels comparing cancer to all other samples, decreasing to 182 genes

for aldosteronomas, 68 for incidentalomas and none for cortisol producing adenomas. In

agreement with previous studies IGF2 was one of the most up-regulated genes in cancers.

We also found that there was a group of genes highly correlated to IGF (r > 0.97). Among

these genes were not unexpectedly the IGF2R, and interestingly also two proteins in the

ubiquitin-proteosome pathway.

Enrichment analysis according to Gene Ontology annotations of differentially expressed

genes in cancer showed that mitosis and cell adhesion were biological processes which

were overrepresented in concordance with the increased proliferation rate and the

56

invasive characteristics associated with cancer. Since receptors have a potential as targets

for treatment and bioimaging we investigated if there were any of these which could

distinguish between the tumour types. IGF2R and FGFR4 could serve as markers based on

their gene expression levels. For diagnosis of aldosteronomas VEGFB was the best

candidate. Secreted proteins are interesting considering diagnosis based on serum

screening. Among these we found IGF2, SEMA7A and HAPLN1. Tumours with high

degree of necrosis were excluded from the analysis of differentially expressed genes. These

did not show the same degree of up-regulation of the IGF2 cluster. To be able to

distinguish also these cancers from adenomas we trained a supervised classifier to address

this problem. A support vector machine could based on the expression profiles of all genes

classify all tumours correctly into adenomas and cancers.

The continuation of this work will in future studies involve examinations of differences in

protein levels as well as more thorough investigations of how the differentially expressed

genes are involved in tumourigenesis.

57

4 Concluding Remarks

Microarrays have taken our knowledge, not least within cancer research, major steps on

the road to uncovering what is taking place inside cells under different conditions. The

last years tremendous efforts have been made to facilitate microarray analysis, both in

terms of laboratory management and data handling. Despite the increasing amount of

methods, we might head towards more standardized protocols to facilitate fusion of

experiments from different laboratories (Bammler, 2005). Alternatively an increased use

of external controls in the experiments can ascertain the quality and provide guidance in

the choice analysis pathway. Public databases are valuable resources for continuous future

interrogation of data sets. Combination of gene expression data with other types of

information such as protein expression and regulatory sequence data open new doors to

draw conclusions.

In this thesis microarrays have been used for exploration of molecular signatures of

adrenocortical tumours. Clinicians are often faced with difficult discriminations between

malignant and benign cases. Development of new tools for diagnosis is urgent to provide

more efficient treatments to affected patients. Gene expression profiles provide in

themselves a potential basis for diagnosis as well as pave the road to more thorough

investigations of genes which constitute biomarker potential and give us an insight in the

biological processes involved in adrenocortical tumour development.

The role of the cancer gene p53 was examined with microarrays under hypoxic

conditions. Microarray analysis revealed a set of potential novel p53 target genes as well as

confirmed that many known target genes were not transcriptionally activated by hypoxia.

58

Follow up which was focused on how p53 affected hypoxia induced apoptosis showed that

the death receptor Fas is a critical gene.

When small amounts of tissue are available, amplification of the transcript population is

necessary for microarray analysis. A new strategy for amplification based on PCR 3’end

tags was evaluated and compared to a commercial in vitro transcription protocol. Both

protocols produced results in agreement with results of unamplified target.

59

Abbreviations

A adenine

ANOVA analysis of variance

ANXA1 annexin A1

dNTP 2'deoxyribonuleotide tri-phosphate

FDA federal drug administration

FGFR4 fibroblast growth factor receptor 4

GSEA gene set enrichment analysis

HAPLN1 hyaluronan and proteoglycan link protein 1

IGF2 insulin growth factor 2

IGF2R insulin growth factor 2 receptor

KEGG Kyoto encyclopedia of genes and genomes

MDM2 mdm2, transformed 3T3 cell double minute 2, p53 binding protein (mouse)

NCBI national center for biotechnology information

PCR polymerase chain reaction

Pu purine

Py pyrimidine

QRT-PCR qunatitative real time-polymerase chain reaction

SEL1L sel-1 suppressor of lin-12-like (C. elegans)

SEMA7A semaphorin 7A

SMURF1 SMAD specific E3 ubiquitin protein ligase 1

SMURF2 SMAD specific E3 ubiquitin protein ligase 2

TGFβ transforming growth factor beta

U uracil

VEGFB vascular endothelial growth factor β

60

Acknowledgements Så ser man målsnöret och slutet på en resa som varit fylld av minnesvärda upplevelser och lärdomar; motgångar ibland men mestadels medgångar tack vare alla trevliga och skickliga handledare och kollegor jag haft turen att arbeta med. Därför skulle jag vilja rikta ett varmt tack till: Joakim Lundeberg, handledare, för vetenskaplig vägledning och för att jag fått arbeta med spännande och utvecklande projekt, för din smittande framåtanda och stora engagemang. Peter Nilsson, handledare, för att du skapat en god forskningsmiljö och att du alltid har en lösning. Det har varit trevligt och spännande att arbeta i microarraygruppen. Mathias Uhlén för att du satt avdelningen på världskartan och för att din entusiasm för forskningen inspirerar oss alla. Per-Åke Nygren, Sophia Hober, Stefan Ståhl, och övriga gruppledare för att ni gör avdelningen till en så trevlig arbetsplats att vistas på, där sammanhållningen är stor, vårt växande antal till trots. Samarbeten: Martin Bäckdahl och Bertil Hamberger på KS för motivation och inspiration under trevliga möten, för att ni delat med er av era kunskaper och genuina engagemang. David Velasquez, for your positive attitude which lightened up long days, for being a friend and for translating all protocols to spanish (language lesson). Janos Geli, Catharina Larsson, Ulla Enberg, Anders Höög, Jacob Odeberg, Magnus Kjellman och Christoffer Juhlin på KI för trevligt binjuresamarbete. Klas Wiman, Galina Selivanova och Tao Liu på KI för trevliga möten och en lyckad publikation. Göran Wallin och Mirna Nordling på KS och examensarbetare Roland Sjögren för trevligt samarbete om hyperthyreos. Fredrik Pontén och cancergruppen i Uppsala för trevliga och lärorika möten, goda lunchmackor, samt konferensen i Sevilla. Carl Henrik, Emma och Thomas för trevligt projektarbete på ANN-kursen Hedvig och Mathias på Mat. Stat. SU, för trevligt grupparbete under statistical computing-kursen.

61

Kollegor: B3:1049 Speciellt tack vare alla omtänksammma nuvarande och pensionerade rumskamrater har det varit ett verkligt kul att gå till jobbet, diskussioner om allt från stort till smått, också vetenskapligt, har fyllt vardagen med trivsel; Emma, Esther, Karin, Mats, Olof, Ronny och Tove. Microarraygruppen för fruktsamma möten och kreativt arbete kring arrayandet, samt trevlig samvaro på konferenserna i Madrid, Philadelphia och Bergen; Afshin, Anders, Angela, Anna G, Anna W, Annelie, Christian, Daniel, Emilie, Erik, Esther, Henric A, Johan, Maria, Markus, Max, Mårten, Rebecca, Ronald och Valtteri. Anders för excellent handledning av examensarbetet. Tjejerna för alla trevliga, middagar, förfester, spelkvällar, träning, och kulturella events. Alla övriga trevliga prickar på labb och kontor, som bidrar till den goda atmosfären. Gamla vänner: Elisabeth, Jenny ,Karin, Linda, Maria och Rebecca för allt trevligt under åren på kemi som gick alldeles för snabbt tack vare er, samt alla middagar, kalas och essentiella skärgårdshelger, som ger perspektiv. Karin, för lång vänskap. Även om man inte ses så ofta längre när man bor i olika länder är det alltid lika kul. Annelie för många och långa mil i löp- och skidspår, för att du varit en vän i alla väder under åren, för fjällresor, båtbestyr, och allt vi tagit oss igenom tillsammans. Cia för att du är den du är, den bästa vän man kan ha. Utflykter till Dalarna, vandringar, semesterresor, squash och inte minst räddningsaktionen i skärgården när jag fick soppa torsk. Nicklas också, förstås. För att ni påminner mig om livet utanför KTH. Min kära familj: Mormor och morfar, för allt ni gett mig i livet och för att det alltid är lika trivsamt att få komma till er, i Uppsala och på Rindö. Mina hjältar till bröder: Carl Henrik och Gustaf, för att ni alltid ställer upp, med allt från datakonsultationer till pizza och mycket mer. Ulla och Carl-Gustaf, mina föräldrar, för att ni alltid funnits där, för allt stöd och uppmuntran ni skänkt genom åren. Ni är fantastiska!

63

References Adams, R., and L. Bischof. 1994. Seeded region growing. IEEE Transactions on Pattern

Analysis and Machine Intelligence 16:641-647. Adib, T.R., S. Henderson, C. Perrett, D. Hewitt, D. Bourmpoulia, J. Ledermann, and C.

Boshoff. 2004. Predicting biomarkers for ovarian cancer using gene-expression microarrays. Br J Cancer 90:686-92.

Affymetrix. 2001. Microarray Suite User Guide, Version 5. Al-Shahrour, F., R. Diaz-Uriarte, and J. Dopazo. 2004. FatiGO: a web tool for finding

significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20:578-80.

Alizadeh, A.A., M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J. Hudson, Jr., L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, et al. 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503-11.

Alter, O., P.O. Brown, and D. Botstein. 2000. Singular value decomposition for genome-wide expression data processing and modeling. PNAS 97:10101-10106.

An, W.G., M. Kanekal, M.C. Simon, E. Maltepe, M.V. Blagosklonny, and L.M. Neckers. 1998. Stabilization of wild-type p53 by hypoxia-inducible factor 1alpha. Nature 392:405-8.

Andersson, A., R. Bernander, and P. Nilsson. 2005. Dual-genome primer design for construction of DNA microarrays. Bioinformatics 21:325-32.

Ashburner, M., C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25-9.

Axon. 2005. GenePix Pro 6.0, User's Guide & Tutorial. Bainbridge, M.N., R.L. Warren, M. Hirst, T. Romanuik, T. Zeng, A. Go, A. Delaney, M.

Griffith, M. Hickenbotham, V. Magrini, E.R. Mardis, M.D. Sadar, A.S. Siddiqui, M.A. Marra, and S.J. Jones. 2006. Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach. BMC Genomics 7:246.

Baker, S.J., E.R. Fearon, J.M. Nigro, S.R. Hamilton, A.C. Preisinger, J.M. Jessup, P. vanTuinen, D.H. Ledbetter, D.F. Barker, Y. Nakamura, R. White, and B. Vogelstein. 1989. Chromosome 17 deletions and p53 gene mutations in colorectal carcinomas. Science 244:217-21.

Bammler, T., R.P. Beyer, S. Bhattacharya, G.A. Boorman, A. Boyles, B.U. Bradford, R.E. Bumgarner, P.R. Bushel, K. Chaturvedi, D. Choi, M.L. Cunningham, S. Deng, H.K. Dressman, R.D. Fannin, F.M. Farin, J.H. Freedman, R.C. Fry, A. Harper, M.C. Humble, P. Hurban, et al. 2005. Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods 2:351-6.

Barnes, M., J. Freudenberg, S. Thompson, B. Aronow, and P. Pavlidis. 2005. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucl. Acids Res. 33:5914-5923.

Barnett, C.C., Jr., D.G. Varma, A.K. El-Naggar, A.P. Dackiw, G.A. Porter, A.S. Pearson, A.P. Kudelka, R.F. Gagel, D.B. Evans, and J.E. Lee. 2000. Limitations of size as a criterion in the evaluation of adrenal tumors. Surgery 128:973-82;discussion 982-3.

64

Barry, W.T., A.B. Nobel, and F.A. Wright. 2005. Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 21:1943-9.

Bartel, D.P. 2004a. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281-97.

Bartel, F., L.C. Harris, P. Wurl, and H. Taubert. 2004b. MDM2 and its splice variant messenger RNAs: expression in tumors and down-regulation using antisense oligonucleotides. Mol Cancer Res 2:29-35.

Baugh, L., A. Hill, E. Brown, and C. Hunter. 2001. Quantitative analysis of mRNA amplification by in vitro transcription. Nucleic Acids Res 29:E29.

Beelman, C.A., and R. Parker. 1995. Degradation of mRNA in eukaryotes. Cell 81:179-183. Bengtsson, H., G. Jonsson, and J. Vallon-Christersson. 2004. Calibration and assessment of

channel-specific biases in microarray data with extended dynamical range. BMC Bioinformatics 5:177.

Benson, D.A., I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, and D.L. Wheeler. 2006. GenBank. Nucleic Acids Res 34:D16-20.

Berger, J.A., S. Hautaniemi, A.K. Jarvinen, H. Edgren, S.K. Mitra, and J. Astola. 2004. Optimized LOWESS normalization parameter selection for DNA microarray data. BMC Bioinformatics 5:194.

Bhattacharjee, A., W.G. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E.J. Mark, E.S. Lander, W. Wong, B.E. Johnson, T.R. Golub, D.J. Sugarbaker, and M. Meyerson. 2001. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A 98:13790-5.

Blagosklonny, M.V., W.G. An, L.Y. Romanova, J. Trepel, T. Fojo, and L. Neckers. 1998. p53 inhibits hypoxia-inducible factor-stimulated transcription. J Biol Chem 273:11995-8.

Bolstad, B.M., R.A. Irizarry, M. Astrand, and T.P. Speed. 2003. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185-193.

Boushey, R.P., and A.P. Dackiw. 2001. Adrenal cortical carcinoma. Curr Treat Options Oncol 2:355-64.

Brazma, A., P. Hingamp, J. Quackenbush, G. Sherlock, P. Spellman, C. Stoeckert, J. Aach, W. Ansorge, C.A. Ball, H.C. Causton, T. Gaasterland, P. Glenisson, F.C. Holstege, I.F. Kim, V. Markowitz, J.C. Matese, H. Parkinson, A. Robinson, U. Sarkans, S. Schulze-Kremer, et al. 2001. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29:365-71.

Brown, M.P.S., W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M. Ares, Jr., and D. Haussler. 2000. Knowledge-based analysis of microarray gene expression data by using support vector machines. PNAS 97:262-267.

Buckley, M.J. 2003. The Spot user's guide. Call, D.R., D.P. Chandler, and F. Brockman. 2001. Fabrication of DNA microarrays using

unmodified oligonucleotide probes. Biotechniques 30:368-72, 374, 376 passim. Canales, R.D., Y. Luo, J.C. Willey, B. Austermiller, C.C. Barbacioru, C. Boysen, K.

Hunkapiller, R.V. Jensen, C.R. Knight, K.Y. Lee, Y. Ma, B. Maqsodi, A. Papallo, E.H. Peters, K. Poulter, P.L. Ruppel, R.R. Samaha, L. Shi, W. Yang, L. Zhang, et al. 2006. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol 24:1115-22.

Carninci, P., T. Kasukawa, S. Katayama, J. Gough, M.C. Frith, N. Maeda, R. Oyama, T. Ravasi, B. Lenhard, C. Wells, R. Kodzius, K. Shimokawa, V.B. Bajic, S.E. Brenner,

65

S. Batalov, A.R. Forrest, M. Zavolan, M.J. Davis, L.G. Wilming, V. Aidinis, et al. 2005. The transcriptional landscape of the mammalian genome. Science 309:1559-63.

Caron, H., B. van Schaik, M. van der Mee, F. Baas, G. Riggins, P. van Sluis, M.C. Hermus, R. van Asperen, K. Boon, P.A. Voute, S. Heisterkamp, A. van Kampen, and R. Versteeg. 2001. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291:1289-92.

Castillo-Davis, C.I., and D.L. Hartl. 2003. GeneMerge--post-genomic analysis, data mining, and hypothesis testing. Bioinformatics 19:891-2.

Chen, C.-Z. 2005. MicroRNAs as Oncogenes and Tumor Suppressors. N Engl J Med 353:1768-1771.

Chenchik, A., L. Diachenko, F. Moqadam, V. Tarabykin, S. Lukyanov, and P.D. Siebert. 1996. Full-length cDNA cloning and determination of mRNA 5' and 3' ends by amplification of adaptor-ligated cDNA. Biotechniques 21:526-34.

Cleveland, W.S. 1979. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74:829 - 836.

Cope, L., S.M. Hartman, H.W. Gohlmann, J.P. Tiesman, and R.A. Irizarry. 2006. Analysis of Affymetrix GeneChip data using amplified RNA. Biotechniques 40:165-6, 168, 170.

Crick, F.H. 1958. On protein synthesis. Symp Soc Exp Biol 12:138-63. Dackiw, A.P., J.E. Lee, R.F. Gagel, and D.B. Evans. 2001. Adrenal cortical carcinoma. World

J Surg 25:914-26. Dafforn, A., P. Chen, G. Deng, M. Herrler, D. Iglehart, S. Koritala, S. Lato, S. Pillarisetty, R.

Purohit, M. Wang, S. Wang, and N. Kurn. 2004. Linear mRNA amplification from as little as 5 ng total RNA for global gene expression analysis. Biotechniques 37:854-7.

Dahlman, K., P.E. Stromstedt, C. Rae, H. Jornvall, J.I. Flock, J. Carlstedt-Duke, and J.A. Gustafsson. 1989. High level expression in Escherichia coli of the DNA-binding domain of the glucocorticoid receptor in a functional form utilizing domain- specific cleavage of a fusion protein. J. Biol. Chem. 264:804-809.

Dahlquist, K.D., N. Salomonis, K. Vranizan, S.C. Lawlor, and B.R. Conklin. 2002. GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet 31:19-20.

de Fraipont, F., M. El Atifi, N. Cherradi, G. Le Moigne, G. Defaye, R. Houlgatte, J. Bertherat, X. Bertagna, P. Plouin, E. Baudin, F. Berger, C. Gicquel, O. Chabre, and J. Feige. 2005. Gene expression profiling of human adrenocortical tumors using complementary deoxyribonucleic Acid microarrays identifies several candidate genes as markers of malignancy. Journal of Clinical Endocrinological Metabolism 90:1819-29.

DeLeo, A.B., G. Jay, E. Appella, G.C. Dubois, L.W. Law, and L.J. Old. 1979. Detection of a transformation-related antigen in chemically induced sarcomas and other transformed cells of the mouse. Proc Natl Acad Sci U S A 76:2420-4.

Dennis, G., Jr., B.T. Sherman, D.A. Hosack, J. Yang, W. Gao, H.C. Lane, and R.A. Lempicki. 2003. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4:P3.

DeRisi, J.L., V.R. Iyer, and P.O. Brown. 1997. Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale. Science 278:680-686.

Dhanasekaran, S.M., T.R. Barrette, D. Ghosh, R. Shah, S. Varambally, K. Kurachi, K.J. Pienta, M.A. Rubin, and A.M. Chinnaiyan. 2001. Delineation of prognostic biomarkers in prostate cancer. Nature 412:822-826.

Doniger, S.W., N. Salomonis, K.D. Dahlquist, K. Vranizan, S.C. Lawlor, and B.R. Conklin. 2003. MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol 4:R7.

66

Dumur, C.I., C.T. Garrett, K.J. Archer, S. Nasim, D.S. Wilkinson, and A. Ferreira-Gonzalez. 2004. Evaluation of a linear amplification method for small samples used on high-density oligonucleotide microarray analysis. Anal Biochem 331:314-21.

Eberwine, J. 1996. Amplification of mRNA populations using aRNA generated from immobilized oligo(dT)-T7 primed cDNA. Biotechniques 20:584-91.

Edgar, R., M. Domrachev, and A.E. Lash. 2002. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research 30:207 - 210.

el-Deiry, W.S., S.E. Kern, J.A. Pietenpol, K.W. Kinzler, and B. Vogelstein. 1992. Definition of a consensus binding site for p53. Nat Genet 1:45-9.

Emrich, S.J., M. Lowe, and A.L. Delcher. 2003. PROBEmer: a web-based software tool for selecting optimal DNA oligos. Nucl. Acids Res. 31:3746-3750.

ERCC. 2005. Proposed methods for testing and selecting the ERCC external RNA controls. BMC Genomics 6:150.

Fernandes, R.J., and S.S. Skiena. 2002. Microarray synthesis through multiple-use PCR primer design. Bioinformatics 18 Suppl 1:S128-35.

Fields, S., and S.K. Jang. 1990. Presence of a potent transcription activating sequence in the p53 protein. Science 249:1046-9.

Fire, A., S. Xu, M.K. Montgomery, S.A. Kostas, S.E. Driver, and C.C. Mello. 1998. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391:806-11.

Fodor, S.P.A., J.L. Read, M.C. Pirrung, L. Stryer, A.T. Lu, and D. Solas. 1991. Light-directed, spatially addressable parallel chemical synthesis. Science 251:767-773.

Fodor, S.P.A., R.P. Rava, X.C. Huang, A.C. Pease, C.P. Holmes, and C.L. Adams. 1993. Multiplexed biochemical assays with biological chips. Science 364:555-556.

Foord, O.S., P. Bhattacharya, Z. Reich, and V. Rotter. 1991. A DNA binding domain is contained in the C-terminus of wild type p53 protein. Nucleic Acids Res 19:5191-8.

Freeman, T.C., K. Lee, and P.J. Richardson. 1999. Analysis of gene expression in single cells. Curr Opin Biotechnol 10:579-82.

Fuller, P.J., and M.J. Young. 2005. Mechanisms of Mineralocorticoid Action. Hypertension 46:1227-1235.

Furey, T.S., N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler. 2000. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906-914.

Gentleman, R.C., V.J. Carey, D.M. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, K. Hornik, T. Hothorn, W. Huber, S. Iacus, R. Irizarry, F. Leisch, C. Li, M. Maechler, A.J. Rossini, G. Sawitzki, et al. 2004. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80.

Gicquel, C., X. Bertagna, V. Gaston, J. Coste, A. Louvel, E. Baudin, J. Bertherat, Y. Chapuis, J.M. Duclos, M. Schlumberger, P.F. Plouin, J.P. Luton, and Y. Le Bouc. 2001. Molecular markers and long-term recurrences in a large cohort of patients with sporadic adrenocortical tumors. Cancer Res 61:6762-7.

Giordano, T.J., D.G. Thomas, R. Kuick, M. Lizyness, D.E. Misek, A.L. Smith, D. Sanders, R.T. Aljundi, P.G. Gauger, N.W. Thompson, J.M. Taylor, and S.M. Hanash. 2003. Distinct transcriptional profiles of adrenocortical tumors uncovered by DNA microarray analysis. Am J Pathol 162:521-31.

Girard, A.l., R. Sachidanandam, G.J. Hannon, and M.A. Carmell. 2006. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature advanced online publication.

67

Glass, A., J. Henning, T. Karopka, T. Scheel, S. Bansemer, D. Koczan, L. Gierl, A. Rolfs, and U. Gimsa. 2005. Representation of individual gene expression in completely pooled mRNA samples. Biosci Biotechnol Biochem 69:1098-103.

Golub, T.R., D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander. 1999a. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression. Science 286:531 - 537.

Golub, T.R., D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander. 1999b. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531-7.

Gowda, M., H. Li, J. Alessi, F. Chen, R. Pratt, and G.L. Wang. 2006. Robust analysis of 5'-transcript ends (5'-RATE): a novel technique for transcriptome analysis and genome annotation. Nucleic Acids Res.

Gresham, D., D.M. Ruderfer, S.C. Pratt, J. Schacherer, M.J. Dunham, D. Botstein, and L. Kruglyak. 2006. Genome-Wide Detection of Polymorphisms at Nucleotide Resolution with a Single DNA Microarray. Science 311:1932-1936.

Griffin, T.J., S.P. Gygi, T. Ideker, B. Rist, J. Eng, L. Hood, and R. Aebersold. 2002. Complementary profiling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae. Mol Cell Proteomics 1:323-33.

Gunderson, K.L., S. Kruglyak, M.S. Graige, F. Garcia, B.G. Kermani, C. Zhao, D. Che, T. Dickinson, E. Wickham, J. Bierle, D. Doucet, M. Milewski, R. Yang, C. Siegmund, J. Haas, L. Zhou, A. Oliphant, J.B. Fan, S. Barnard, and M.S. Chee. 2004. Decoding randomly ordered DNA arrays. Genome Res 14:870-7.

Gustincich, S., A. Sandelin, C. Plessy, S. Katayama, R. Simone, D. Lazarevic, Y. Hayashizaki, and P. Carninci. 2006. The complexity of the mammalian transcriptome. J Physiol 575:321-32.

Guyon, I., J. Weston, S. Barnhill, and V. Vapnik. 2002. Gene Selection for Cancer Classification using Support Vector Machines. Mach. Learn. 46:389-422.

Gygi, S.P., Y. Rochon, B.R. Franza, and R. Aebersold. 1999. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 19:1720-30.

Hammond, E.M., and A.J. Giaccia. 2005. The role of p53 in hypoxia-induced apoptosis. Biochem Biophys Res Commun 331:718-25.

Hanahan, D., and R.A. Weinberg. 2000. The hallmarks of cancer. Cell 100:57-70. Harbig, J., R. Sprinkle, and S.A. Enkemann. 2005. A sequence-based identification of the

genes detected by probesets on the Affymetrix U133 plus 2.0 array. Nucleic Acids Res 33:e31.

Hartigan, J.A., and M.A. Wong. 1979. A K-means clustering algorithm. Applied Statistics 28:100 - 108.

Hedenfalk, I., D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, O.P. Kallioniemi, B. Wilfond, A. Borg, and J. Trent. 2001. Gene-expression profiles in hereditary breast cancer. N Engl J Med 344:539-48.

Herrler, M. 2000. Use of SMART-generated cDNA for differential gene expression studies. J Mol Med 78:B23.

Hertzberg, M., M. Sievertzon, H. Aspeborg, P. Nilsson, G. Sandberg, and J. Lundeberg. 2001. cDNA microarray analysis of small plant tissue samples using a cDNA tag target amplification protocol. Plant J 25:585-91.

Higuchi, R., C. Fockler, G. Dollinger, and R. Watson. 1993. Kinetic PCR analysis: real-time monitoring of DNA amplification reactions. Biotechnology (N Y) 11:1026-30.

68

Hoh, J., S. Jin, T. Parrado, J. Edington, A.J. Levine, and J. Ott. 2002. The p53MH algorithm and its application in detecting p53-responsive genes. Proc Natl Acad Sci U S A 99:8467-72.

Hollstein, M., K. Rice, M.S. Greenblatt, T. Soussi, R. Fuchs, T. Sorlie, E. Hovig, B. Smith-Sorensen, R. Montesano, and C.C. Harris. 1994. Database of p53 gene somatic mutations in human tumors and cell lines. Nucleic Acids Res 22:3551-5.

Irizarry, R.A., B. Hobbs, F. Collin, Y.D. Beazer-Barclay, K.J. Antonellis, U. Scherf, and T.P. Speed. 2003. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249-64.

Iscove, N.N., M. Barbara, M. Gu, M. Gibson, C. Modi, and N. Winegarden. 2002. Representation is faithfully preserved in global cDNA amplified exponentially from sub-picogram quantities of mRNA. Nat Biotechnol 20:940-3.

James, L.A., A.M. Kelsey, J.M. Birch, and J.M. Varley. 1999. Highly consistent genetic alterations in childhood adrenocortical tumours detected by comparative genomic hybridization. Br J Cancer 81:300-4.

Kaller, M., E. Hultin, B. Zheng, B. Gharizadeh, K.-L. Wallin, J. Lundeberg, and A. Ahmadian. 2005. Tag-array based HPV genotyping by competitive hybridization and extension. Journal of Virological Methods 129:102-112.

Kane, M.D., T.A. Jatkoe, C.R. Stumpf, J. Lu, J.D. Thomas, and S.J. Madore. 2000. Assessment of the sensitivity and specificity of oligonucloetides (50 mer) micorarrays. Nucleic Acids Res 28:4552 - 4557.

Kanehisa, M., S. Goto, S. Kawashima, Y. Okuno, and M. Hattori. 2004. The KEGG resource for deciphering the genome. Nucleic Acids Res 32:D277-80.

Karsten, S.L., V.M.D. Van Deerlin, C. Sabatti, L. H. Gill, and D.H. Geschwind. 2002. An evaluation of tyramide signal amplification and archived fixed and frozen tissue in microarray gene expression analysis. Nucl. Acids Res. 30:e4-.

Kendrick, M.L., R. Lloyd, L. Erickson, D.R. Farley, C.S. Grant, G.B. Thompson, C. Rowland, W.F. Young, Jr., and J.A. van Heerden. 2001. Adrenocortical carcinoma: surgical progress or status quo? Arch Surg 136:543-9.

Kerr, M.K., and G.A. Churchill. 2001. Experimental design for gene expression microarrays. Biostat 2:183-201.

Kim, H., G.H. Golub, and H. Park. 2005. Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21:187-98.

Kjellman, M., O.P. Kallioniemi, R. Karhu, A. Hoog, L.O. Farnebo, G. Auer, C. Larsson, and M. Backdahl. 1996. Genetic aberrations in adrenocortical tumors detected using comparative genomic hybridization correlate with tumor size and malignancy. Cancer Res 56:4219-23.

Kjellman, M., L. Roshani, B.T. Teh, O.P. Kallioniemi, A. Hoog, S. Gray, L.O. Farnebo, M. Holst, M. Backdahl, and C. Larsson. 1999. Genotyping of adrenocortical tumors: very frequent deletions of the MEN1 locus in 11q13 and of a 1-centimorgan region in 2p16. J Clin Endocrinol Metab 84:730-5.

Kohonen, T. 2000. 3 ed. Springer. Kuhn, K., S.C. Baker, E. Chudin, M.H. Lieu, S. Oeser, H. Bennett, P. Rigault, D. Barker,

T.K. McDaniel, and M.S. Chee. 2004. A novel, high-performance random array platform for quantitative gene expression profiling. Genome Res 14:2347-56.

Kuo, W.P., T.K. Jenssen, A.J. Butte, L. Ohno-Machado, and I.S. Kohane. 2002. Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics 18:405-12.

Lamb, J., E.D. Crawford, D. Peck, J.W. Modell, I.C. Blat, M.J. Wrobel, J. Lerner, J.-P. Brunet, A. Subramanian, K.N. Ross, M. Reich, H. Hieronymus, G. Wei, S.A.

69

Armstrong, S.J. Haggarty, P.A. Clemons, R. Wei, S.A. Carr, E.S. Lander, and T.R. Golub. 2006. The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 313:1929-1935.

Lander, E.S., L.M. Linton, B. Birren, C. Nusbaum, M.C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. Howland, L. Kann, J. Lehoczky, R. LeVine, P. McEwan, K. McKernan, et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921.

Lane, D.P., and L.V. Crawford. 1980. The complex between simian virus 40 T antigen and a specific host protein. Proc R Soc Lond B Biol Sci 210:451-63.

Lapointe, J., C. Li, J.P. Higgins, M. van de Rijn, E. Bair, K. Montgomery, M. Ferrari, L. Egevad, W. Rayford, U. Bergerheim, P. Ekman, A.M. DeMarzo, R. Tibshirani, D. Botstein, P.O. Brown, J.D. Brooks, and J.R. Pollack. 2004. Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci U S A 101:811-6.

Lau, N.C., A.G. Seto, J. Kim, S. Kuramochi-Miyagawa, T. Nakano, D.P. Bartel, and R.E. Kingston. 2006. Characterization of the piRNA complex from rat testes. Science 313:363-7.

Li, C., and W.H. Wong. 2001. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A 98:31-6.

Lindberg, J., E. af Klint, A.-K. Ulfgren, A. Stark, T. Andersson, P. Nilsson, L. Klareskog, and J. Lundeberg. 2006. Variability in synovial inflammation in rheumatoid arthritis investigated by microarray technology. Arthritis Research & Therapy 8:R47.

Liu, G., A.E. Loraine, R. Shigeta, M. Cline, J. Cheng, V. Valmeekam, S. Sun, D. Kulp, and M.A. Siani-Rose. 2003. NetAffx: Affymetrix probesets and annotations. Nucl. Acids Res. 31:82-86.

Lockhart, D.J., H. Dong, M.C. Byrne, M.T. Follettie, M.V. Gallo, M.S. Chee, M. Mittmann, C. Wang, M. Kobayashi, H. Norton, and E.L. Brown. 1996. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotech 14:1675-1680.

Lonnstedt, I., and T.P. Speed. 2002. Replicated microarray data. Stat Sinica 12:31 - 46. Lundholm, L., S. Moverare, K.R. Steffensen, M. Nilsson, M. Otsuki, C. Ohlsson, J.A.

Gustafsson, and K. Dahlman-Wright. 2004. Gene expression profiling identifies liver X receptor alpha as an estrogen-regulated gene in mouse adipose tissue. J Mol Endocrinol 32:879-92.

Luzzi, V., M. Mahadevappa, R. Raja, J.A. Warrington, and M.A. Watson. 2003. Accurate and reproducible gene expression profiles from laser capture microdissection, transcript amplification, and high density oligonucleotide microarray analysis. J Mol Diagn 5:9-14.

Lyng, H., A. Badiee, D.H. Svendsrud, E. Hovig, O. Myklebost, and T. Stokke. 2004. Profound influence of microarray scanner characteristics on gene expression ratios: analysis and procedure for correction. BMC Genomics 5:10.

Mah, N., A. Thelin, T. Lu, S. Nikolaus, T. Kuhbacher, Y. Gurbuz, H. Eickhoff, G. Kloppel, H. Lehrach, B. Mellgard, C.M. Costello, and S. Schreiber. 2004. A comparison of oligonucleotide and cDNA-based microarray systems. Physiol. Genomics 16:361-370.

Manduchi, E., L.M. Scearce, J.E. Brestelli, G.R. Grant, K.H. Kaestner, and C.J. Stoeckert, Jr. 2002. Comparison of different labeling methods for two-channel high-density microarray experiments. Physiol Genomics 10:169-79.

Margulies, M., M. Egholm, W.E. Altman, S. Attiya, J.S. Bader, L.A. Bemben, J. Berka, M.S. Braverman, Y.J. Chen, Z. Chen, S.B. Dewell, L. Du, J.M. Fierro, X.V. Gomes, B.C. Godwin, W. He, S. Helgesen, C.H. Ho, G.P. Irzyk, S.C. Jando, et al. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376-80.

70

Matlin, A.J., F. Clark, and C.W. Smith. 2005. Understanding alternative splicing: towards a cellular code. Nat Rev Mol Cell Biol 6:386-98.

Matsuzaki, H., H. Loi, S. Dong, Y.-Y. Tsai, J. Fang, J. Law, X. Di, W.-M. Liu, G. Yang, G. Liu, J. Huang, G.C. Kennedy, T.B. Ryder, G.A. Marcus, P.S. Walsh, M.D. Shriver, J.M. Puck, K.W. Jones, and R. Mei. 2004. Parallel Genotyping of Over 10,000 SNPs Using a One-Primer Assay on a High-Density Oligonucleotide Array. Genome Res. 14:414-425.

Mattick, J.S., and I.V. Makunin. 2005. Small regulatory RNAs in mammals. Hum. Mol. Genet. 14:R121-132.

Mecham, B.H., D.Z. Wetmore, Z. Szallasi, Y. Sadovsky, I. Kohane, and T.J. Mariani. 2004a. Increased measurement accuracy for sequence-verified microarray probes. Physiol Genomics 18:308-15.

Mecham, B.H., G.T. Klus, J. Strovel, M. Augustus, D. Byrne, P. Bozso, D.Z. Wetmore, T.J. Mariani, I.S. Kohane, and Z. Szallasi. 2004b. Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements. Nucl. Acids Res. 32:e74-.

Meyer, A., U. Niemann, and M. Behrend. 2004. Experience with the surgical treatment of adrenal cortical carcinoma. Eur J Surg Oncol 30:444-9.

Michael, K.L., L.C. Taylor, S.L. Schultz, and D.R. Walt. 1998. Randomly ordered addressable high-density optical sensor arrays. Anal Chem 70:1242-8.

Modrek, B., and C. Lee. 2002. A genomic view of alternative splicing. Nat Genet 30:13-9. Monni, O., M. Barlund, S. Mousses, J. Kononen, G. Sauter, M. Heiskanen, P. Paavola, K.

Avela, Y. Chen, M. Bittner, and A. Kallioniemi. 2001. Comprehensive copy number and gene expression profiling of the 17q23 amplicon in human breast cancer. Proc Natl Acad Sci 98:5711 - 5716.

Mrowka, R., J. Schuchhardt, and C. Gille. 2002. Oligodb--interactive design of oligo DNA for transcription profiling of human genes. Bioinformatics 18:1686-1687.

Mullis, K.B., and F.A. Faloona. 1987. Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzymol 155:335-50.

Naef, F., N.D. Socci, and M. Magnasco. 2003. A study of accuracy and precision in oligonucleotide arrays: extracting more signal at large concentrations. Bioinformatics 19:178-84.

Ng, P., J.J. Tan, H.S. Ooi, Y.L. Lee, K.P. Chiu, M.J. Fullwood, K.G. Srinivasan, C. Perbost, L. Du, W.K. Sung, C.L. Wei, and Y. Ruan. 2006. Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res 34:e84.

Nielsen, H.B., R. Wernersson, and S. Knudsen. 2003. Design of oligonucleotides for microarrays and perspectives for design of multi-transcriptome arrays. Nucl. Acids Res. 31:3491-3496.

Nielsen, K.L., A.L. Hogh, and J. Emmersen. 2006. DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples. Nucleic Acids Res.

Nikitin, A., S. Egorov, N. Daraselia, and I. Mazo. 2003. Pathway studio--the analysis and navigation of molecular networks. Bioinformatics 19:2155-2157.

Novoradovskaya, N., M.L. Whitfield, L.S. Basehore, A. Novoradovsky, R. Pesich, J. Usary, M. Karaca, W.K. Wong, O. Aprelikova, M. Fero, C.M. Perou, D. Botstein, and J. Braman. 2004. Universal Reference RNA as a standard for microarray experiments. BMC Genomics 5:20.

Nuwaysir, E.F., W. Huang, T.J. Albert, J. Singh, K. Nuwaysir, A. Pitas, T. Richmond, T. Gorski, J.P. Berg, J. Ballin, M. McCormick, J. Norton, T. Pollock, T. Sumwalt, L.

71

Butcher, D. Porter, M. Molla, C. Hall, F. Blattner, M.R. Sussman, et al. 2002. Gene Expression Analysis Using Oligonucleotide Arrays Produced by Maskless Photolithography. Genome Res. 12:1749-1755.

O'Gorman, W., K.Y. Kwek, B. Thomas, and A. Akoulitchev. 2006. Non-coding RNA in transcription initiation. Biochem Soc Symp:131-40.

Ono, K., T. Tanaka, T. Tsunoda, O. Kitahara, C. Kihara, A. Okamoto, K. Ochiai, T. Takagi, and Y. Nakamura. 2000. Identification by cDNA Microarray of Genes Involved in Ovarian Carcinogenesis. Cancer Res 60:5007-5011.

Pabon, C., Z. Modrusan, M.V. Ruvolo, I.M. Coleman, S. Daniel, H. Yue, and L.J. Arnold, Jr. 2001. Optimized T7 amplification system for microarray analysis. Biotechniques 31:874-9.

Pavletich, N.P., K.A. Chambers, and C.O. Pabo. 1993. The DNA-binding domain of p53 contains the four conserved regions and the major mutation hot spots. Genes Dev 7:2556-64.

Peng, X., C.L. Wood, E.M. Blalock, K.C. Chen, P.W. Landfield, and A.J. Stromberg. 2003. Statistical implications of pooling RNA samples for microarray experiments. BMC Bioinformatics 4:26.

Perou, C.M., T. Sorlie, M.B. Eisen, M. van de Rijn, S.S. Jeffrey, C.A. Rees, J.R. Pollack, D.T. Ross, H. Johnsen, L.A. Akslen, O. Fluge, A. Pergamenschikov, C. Williams, S.X. Zhu, P.E. Lonning, A.L. Borresen-Dale, P.O. Brown, and D. Botstein. 2000. Molecular portraits of human breast tumours. Nature 406:747-52.

Petricoin, E.F., 3rd, D.K. Ornstein, C.P. Paweletz, A. Ardekani, P.S. Hackett, B.A. Hitt, A. Velassco, C. Trucco, L. Wiegand, K. Wood, C.B. Simone, P.J. Levine, W.M. Linehan, M.R. Emmert-Buck, S.M. Steinberg, E.C. Kohn, and L.A. Liotta. 2002. Serum proteomic patterns for detection of prostate cancer. J Natl Cancer Inst 94:1576-8.

Puskas, L.G., A. Zvara, L. Hackler, Jr., and P. Van Hummelen. 2002. RNA amplification results in reproducible microarray data with slight ratio bias. Biotechniques 32:1330-4, 1336, 1338, 1340.

Quackenbush, J. 2006. Computational approaches to analysis of DNA microarray data. Methods Inf Med 45 Suppl 1:91-103.

Rajeevan, M.S., D.G. Ranamukhaarachchi, S.D. Vernon, and E.R. Unger. 2001. Use of real-time quantitative PCR to validate the results of cDNA array and differential display PCR technologies. Methods 25:443-51.

Ramaswamy, S., P. Tamayo, R. Rifkin, S. Mukherjee, C.H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub. 2001. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 98:15149-54.

Ramdas, L., D.E. Cogdell, J.Y. Jia, E.E. Taylor, V.R. Dunmire, L. Hu, S.R. Hamilton, and W. Zhang. 2004. Improving signal intensities for genes with low-expression on oligonucleotide microarrays. BMC Genomics 5:35.

Rihn, B.H., S. Mohr, S.A. McDowell, S. Binet, J. Loubinoux, F. Galateau, G. Keith, and G.D. Leikauf. 2000. Differential gene expression in mesothelioma. FEBS Letters 480:95-100.

Robertson, G., M. Bilenky, K. Lin, A. He, W. Yuen, M. Dagpinar, R. Varhol, K. Teague, O.L. Griffith, X. Zhang, Y. Pan, M. Hassel, M.C. Sleumer, W. Pan, E.D. Pleasance, M. Chuang, H. Hao, Y.Y. Li, N. Robertson, C. Fjell, et al. 2006. cisRED: a database system for genome-scale computational discovery of regulatory elements. Nucleic Acids Res 34:D68-73.

72

Ronaghi, M., S. Karamohamed, B. Pettersson, M. Uhlen, and P. Nyren. 1996. Real-Time DNA Sequencing Using Detection of Pyrophosphate Release. Analytical Biochemistry 242:84-89.

Saghizadeh, M., D.J. Brown, J. Tajbakhsh, Z. Chen, M.C. Kenney, D.B. Farber, and S.F. Nelson. 2003. Evaluation of techniques using amplified nucleic acid probes for gene expression profiling. Biomol Eng 20:97-106.

Sanger, F., S. Nicklen, and A.R. Coulson. 1977. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 74:5463-7.

Sarkans, U., H. Parkinson, G.G. Lara, A. Oezcimen, A. Sharma, N. Abeygunawardena, S. Contrino, E. Holloway, P. Rocca-Serra, G. Mukherjee, M. Shojatalab, M. Kapushesky, S.A. Sansone, A. Farne, T. Rayner, and A. Brazma. 2005. The ArrayExpress gene expression database: a software engineering and implementation perspective. Bioinformatics 21:1495-501.

Schena, M., D. Shalon, R.W. Davis, and P.O. Brown. 1995. Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray. Science 270:467-470.

Scherer, A., A. Krause, J. Walker, S. Sutton, D. Seron, F. Raulf, and M. Cooke. 2003. Optimized protocol for linear RNA amplification and application to gene expression profiling of human renal biopsies. Biotechniques 34:546 - 550.

Scherf, U., D.T. Ross, M. Waltham, L.H. Smith, J.K. Lee, L. Tanabe, K.W. Kohn, W.C. Reinhold, T.G. Myers, D.T. Andrews, D.A. Scudiero, M.B. Eisen, E.A. Sausville, Y. Pommier, D. Botstein, P.O. Brown, and J.N. Weinstein. 2000. A gene expression database for the molecular pharmacology of cancer. Nat Genet 24:236-44.

Schlingemann, J., O. Thuerigen, C. Ittrich, G. Toedt, H. Kramer, M. Hahn, and P. Lichter. 2005. Effective transcriptome amplification for expression profiling on sense-oriented oligonucleotide microarrays. Nucleic Acids Res 33:e29.

Schoor, O., T. Weinschenk, J. Hennenlotter, S. Corvin, A. Stenzl, H.G. Rammensee, and S. Stevanovic. 2003. Moderate degradation does not preclude microarray analysis of small amounts of RNA. Biotechniques 35:1192-6, 1198-201.

Schroeder, A., O. Mueller, S. Stocker, R. Salowsky, M. Leiber, M. Gassmann, S. Lightfoot, W. Menzel, M. Granzow, and T. Ragg. 2006. The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Mol Biol 7:3.

Sgroi, D.C., S. Teng, G. Robinson, R. LeVangie, J.R. Hudson, Jr., and A.G. Elkahloun. 1999. In Vivo Gene Expression Profile Analysis of Human Breast Cancer Progression. Cancer Res 59:5656-5661.

Sharan, R., A. Ben-Hur, G.G. Loots, and I. Ovcharenko. 2004. CREME: Cis-Regulatory Module Explorer for the human genome. Nucleic Acids Res 32:W253-6.

Shi, L., W. Tong, Z. Su, T. Han, J. Han, R.K. Puri, H. Fang, F.W. Frueh, F.M. Goodsaid, L. Guo, W.S. Branham, J.J. Chen, Z.A. Xu, S.C. Harris, H. Hong, Q. Xie, R.G. Perkins, and J.C. Fuscoe. 2005a. Microarray scanner calibration curves: characteristics and implications. BMC Bioinformatics 6 Suppl 2:S11.

Shi, L., W. Tong, H. Fang, U. Scherf, J. Han, R.K. Puri, F.W. Frueh, F.M. Goodsaid, L. Guo, Z. Su, T. Han, J.C. Fuscoe, Z.A. Xu, T.A. Patterson, H. Hong, Q. Xie, R.G. Perkins, J.J. Chen, and D.A. Casciano. 2005b. Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential. BMC Bioinformatics 6 Suppl 2:S12.

Shi, L., L.H. Reid, W.D. Jones, R. Shippy, J.A. Warrington, S.C. Baker, P.J. Collins, F. de Longueville, E.S. Kawasaki, K.Y. Lee, Y. Luo, Y.A. Sun, J.M. Willey, R.A. Setterquist, G.M. Fischer, W. Tong, Y.P. Dragan, D.J. Dix, F.W. Frueh, F.M. Goodsaid, et al. 2006. The MicroArray Quality Control (MAQC) project shows inter-

73

and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24:1151-61.

Sidhu, S., C. Gicquel, C.P. Bambach, P. Campbell, C. Magarey, B.G. Robinson, and L.W. Delbridge. 2003. Clinical and molecular aspects of adrenocortical tumourigenesis. ANZ J Surg 73:727-38.

Sidhu, S., D.J. Marsh, G. Theodosopoulos, J. Philips, C.P. Bambach, P. Campbell, C.J. Magarey, C.F. Russell, K.M. Schulte, H.D. Roher, L. Delbridge, and B.G. Robinson. 2002. Comparative genomic hybridization analysis of adrenocortical tumors. J Clin Endocrinol Metab 87:3467-74.

Sievertzon, M., C. Agaton, P. Nilsson, and J. Lundeberg. 2004. Amplification of mRNA populations by a cDNA tag strategy. Biotechniques 36:253-9.

Smyth, G. 2004. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3:Article 3.

Sorlie, T., R. Tibshirani, J. Parker, T. Hastie, J.S. Marron, A. Nobel, S. Deng, H. Johnsen, R. Pesich, S. Geisler, J. Demeter, C.M. Perou, P.E. Lonning, P.O. Brown, A.L. Borresen-Dale, and D. Botstein. 2003. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100:8418-23.

Southern, E.M. 1975. Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol 98:503-17.

Spellman, P.T., G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, and B. Futcher. 1998. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9:3273-97.

Sporn, M.B. 1997. The war on cancer: a review. Ann N Y Acad Sci 833:137-46. Staunton, J.E., D.K. Slonim, H.A. Coller, P. Tamayo, M.J. Angelo, J. Park, U. Scherf, J.K.

Lee, W.O. Reinhold, J.N. Weinstein, J.P. Mesirov, E.S. Lander, and T.R. Golub. 2001. Chemosensitivity prediction by transcriptional profiling. Proc Natl Acad Sci U S A 98:10787-92.

Stoyanova, R., J.J. Upson, C. Patriotis, E.A. Ross, E.P. Henske, K. Datta, B. Boman, M.L. Clapper, A.G. Knudson, and A. Bellacosa. 2004. Use of RNA amplification in the optimal characterization of global gene expression using cDNA microarrays. J Cell Physiol 201:359-65.

Subramanian, A., P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander, and J.P. Mesirov. 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102:15545-50.

t Hoen, P.A., F. de Kort, G.J. van Ommen, and J.T. den Dunnen. 2003. Fluorescent labelling of cRNA for microarray applications. Nucleic Acids Res 31:e20.

Thompson, K.L., B.A. Rosenzweig, P.S. Pine, J. Retief, Y. Turpaz, C.A. Afshari, H.K. Hamadeh, M.A. Damore, M. Boedigheimer, E. Blomme, R. Ciurlionis, J.F. Waring, J.C. Fuscoe, R. Paules, C.J. Tucker, T. Fare, E.M. Coffey, Y. He, P.J. Collins, K. Jarnagin, et al. 2005. Use of a mixed tissue RNA design for performance assessments on multiple microarray formats. Nucleic Acids Res 33:e187.

Tong, W., A.B. Lucas, R. Shippy, X. Fan, H. Fang, H. Hong, M.S. Orr, T.-M. Chu, X. Guo, P.J. Collins, Y.A. Sun, S.-J. Wang, W. Bao, R.D. Wolfinger, S. Shchegrova, L. Guo, J.A. Warrington, and L. Shi. 2006. Evaluation of external RNA controls for the assessment of microarray performance. Nat Biotech 24:1132-1139.

Toronen, P., M. Kolehmainen, G. Wong, and E. Castren. 1999. Analysis of gene expression data using self-organizing maps. FEBS Lett 451:142-6.

74

Troyanskaya, O., M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R.B. Altman. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17:520-5.

Tusher, V.G., R. Tibshirani, and G. Chu. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98:5116-21.

Wadenback, J., D.H. Clapham, D. Craig, R. Sederoff, G.F. Peter, S. von Arnold, and U. Egertsdotter. 2005. Comparison of standard exponential and linear techniques to amplify small cDNA samples for microarrays. BMC Genomics 6:61.

Valadkhan, S. 2005. snRNAs as the catalysts of pre-mRNA splicing. Current Opinion in Chemical Biology 9:603-608.

van Duin, M., H. Woolson, D. Mallinson, and D. Black. 2003. Genomics in target and drug discovery. Biochem Soc Trans 31:429-32.

Van Gelder, R.N., M.E. von Zastrow, A. Yool, W.C. Dement, J. Barchas, and J.H. Eberwine. 1990. Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Science USA 87:1663-1667.

Wang, H., X. He, M. Band, C. Wilson, and L. Liu. 2005. A study of inter-lab and inter-platform agreement of DNA microarray data. BMC Genomics 6:71.

Wang, P., F. Ding, H. Chiang, R.C. Thompson, S.J. Watson, and F. Meng. 2002. ProbeMatchDB--a web database for finding equivalent probes across microarray platforms and species. Bioinformatics 18:488-489.

Wang, Y., M. Reed, P. Wang, J.E. Stenger, G. Mayr, M.E. Anderson, J.F. Schwedes, and P. Tegtmeyer. 1993. p53 domains: identification and characterization of two autonomous DNA-binding regions. Genes Dev 7:2575-86.

Vasselli, J.R., J.H. Shih, S.R. Iyengar, J. Maranchie, J. Riss, R. Worrell, C. Torres-Cabala, R. Tabios, A. Mariotti, R. Stearman, M. Merino, M.M. Walther, R. Simon, R.D. Klausner, and W.M. Linehan. 2003. Predicting survival in patients with metastatic kidney cancer by gene-expression profiling in the primary tumor. PNAS 100:6958-6963.

Watson, A., A. Mazumder, M. Stewart, and S. Balasubramanian. 1998. Technology for microarray analysis of gene expression. Curr Opin Biotechnol 9:609-14.

Venter, J.C., M.D. Adams, E.W. Myers, P.W. Li, R.J. Mural, G.G. Sutton, H.O. Smith, M. Yandell, C.A. Evans, R.A. Holt, J.D. Gocayne, P. Amanatides, R.M. Ballew, D.H. Huson, J.R. Wortman, Q. Zhang, C.D. Kodira, X.H. Zheng, L. Chen, M. Skupski, et al. 2001. The sequence of the human genome. Science 291:1304 - 1351.

Wernersson, R., and H.B. Nielsen. 2005. OligoWiz 2.0--integrating sequence feature annotation into the design of microarray probes. Nucl. Acids Res. 33:W611-615.

Wilson, I.M., J.J. Davies, M. Weber, C.J. Brown, C.E. Alvarez, C. MacAulay, D. Schubeler, and W.L. Lam. 2006. Epigenomics: mapping the methylome. Cell Cycle 5:155-8.

Wirta, V., A. Holmberg, M. Lukacs, P. Nilsson, P. Hilson, M. Uhlen, R.P. Bhalerao, and J. Lundeberg. 2005. Assembly of a gene sequence tag microarray by reversible biotin-streptavidin capture for transcript analysis of Arabidopsis thaliana. BMC Biotechnol 5:5.

Wit, E., and J. McClure. 2004. Microarray Statistics. 1 ed. John Wiley & Sons. Vlieghe, D., A. Sandelin, P.J. De Bleser, K. Vleminckx, W.W. Wasserman, F. van Roy, and

B. Lenhard. 2006. A new generation of JASPAR, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res 34:D95-7.

Vogelstein, B., E.R. Fearon, S.R. Hamilton, S.E. Kern, A.C. Preisinger, M. Leppert, Y. Nakamura, R. White, A.M. Smits, and J.L. Bos. 1988. Genetic alterations during colorectal-tumor development. N Engl J Med 319:525-32.

75

Wrobel, G., J. Schlingemann, L. Hummerich, H. Kramer, P. Lichter, and M. Hahn. 2003. Optimization of high-density cDNA-microarray protocols by 'design of experiments'. Nucl. Acids Res. 31:e67-.

Yancopoulos, G.D., S. Davis, N.W. Gale, J.S. Rudge, S.J. Wiegand, and J. Holash. 2000. Vascular-specific growth factors and blood vessel formation. Nature 407:242-8.

Yang, I.V., E. Chen, J.P. Hasseman, W. Liang, B.C. Frank, S. Wang, V. Sharov, A.I. Saeed, J. White, J. Li, N.H. Lee, T.J. Yeatman, and J. Quackenbush. 2002. Within the fold: assessing differential expression measures and reproducibility in microarray assays. Genome Biol 3:research0062.

Yang, Y.H., S. Dudoit, P. Luu, and T.P. Speed. 2001. Normalization for cDNA microarray data. In Microarrays: optical technologies and informatics 4266:141 - 152.

Zeeberg, B.R., W. Feng, G. Wang, M.D. Wang, A.T. Fojo, M. Sunshine, S. Narasimhan, D.W. Kane, W.C. Reinhold, S. Lababidi, K.J. Bussey, J. Riss, J.C. Barrett, and J.N. Weinstein. 2003. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4:R28.

Zhao, X., S. Nampalli, A.J. Serino, and S. Kumar. 2001. Immobilization of oligodeoxyribonucleotides with multiple anchors to microchips. Nucleic Acids Res 29:955-9.

Zhong, H., A.M. De Marzo, E. Laughner, M. Lim, D.A. Hilton, D. Zagzag, P. Buechler, W.B. Isaacs, G.L. Semenza, and J.W. Simons. 1999. Overexpression of hypoxia-inducible factor 1alpha in common human cancers and their metastases. Cancer Res 59:5830-5.

Zhu, B., G. Ping, Y. Shinohara, Y. Zhang, and Y. Baba. 2005. Comparison of gene expression measurements from cDNA and 60-mer oligonucleotide microarrays. Genomics 85:657-665.

microarray-based gene expression analysis in cancer research11426/fulltext01.pdf · 2008. 2....

Documents