Download - GE, miRNA, Exon (non-NGS data) with Partek Genomics Suite 69/20/2011 1 GE, miRNA, Exon (non-NGS data) with Partek Genomics Suite 6.6 Jean Jasinski, Ph.D. Field Application Scientist

9/20/2011

1

GE, miRNA, Exon (non-NGS data) with Partek Genomics Suite 6.6

Jean Jasinski, Ph.D.

Field Application Scientist

Agenda

University of Minnesota Tuesday, September 20 , 2011

10:00 a.m.-noonGene Expression, miRNA, Exon, CNV, ChIP-ChipWhat’s new in PGS 6.6, Overview Array Assays & Analysis

12:00 – 1:00 Lunch Break (Pizza)

1:00 – 3:00 Partek Flow, PGS, and NGS analysis

9/20/2011

2

Who is Partek?

• Founded in 1993

• Build tools for statistics & visualization

• Focused on genomics

• Thousands of customers worldwide

• Worldwide, world-class customer support

• We are growing – Job openings

What is Partek Genomics Suite™?

• Desktop software - no server required

• Graphic user interface - no scripting needed

• Designed for biologists & bioinformaticians

• Intuitive workflows

• Windows, Mac, Linux

• Competitively priced

9/20/2011

3

ONE Software for…any assay

Gene Expression RNA-seqsRNA-seqRNA-seqsRNA-seqRNA-seqsRNA-seq

ExonExonExon

DNA-seq

miRNA

ChIP-chip

CN/LOH /ASCNMethylation

ChIP‐Seq

Partek® Genomics Suite™

ONE software …any technology

RT-PCR

Partek® Genomics Suite™

9/20/2011

4

Partek GS™ Statistical Toolbox

• Powerful Statistics

• Parametric

• Non‐parametric

• ANOVA, Welch’s ANOVA, Repeated Measures ANOVA

• Fisher’s Exact test, chi‐square test

• Power Analysis

• Survival Analysis

• Model Prediction

• Correlation tests

• Multiple Test Corrections

• ..More

Overview of Layout

9/20/2011

5

Tool Layout

Spreadsheet (unlimited size)

Summary

Help, Tutorials, and Software Updates

• Recorded presentations

• Tutorials with datasets

• White Papers

9/20/2011

6

Webinar Title Date

Statistical Concepts laying the foundation for Microarray & NGS Data AnalysisWe will cover the statistical concepts at the core of microarray and NGS data analysis in Partek GS, from a biological perspective.

October 26, 2010

Clinical Applications & Microarray Data AnalysisLearn about Classification algorithms, Survival Analysis, Biomarker identification and more.

November 18, 2010

Statistical Tools for Next-Generation Sequencing Data AnalysisWe will describe the statistical tools in Partek GS that help biologists extract relevant information from NGS data.

December 9, 2010

iDEA WebinarAnalysis of a Multi-Assay Next Generation Sequencing Study

August 24th, 2011

Partek Webinars

New in 6.6

• BAM file direct import (no .pdata)

• Hierarchical clustering

• Methylation Workflow (NGS)

• Profile Trellis

• Exporting Zipped Project

• Self Organizing Maps

Don’t be fooled by the “beta” version; upgrade today (esp. NGS users)

9/20/2011

7

BAM file import (no more .pdata files)

Convert SAM to BAM

Sort & Index (PGS maywrite a new indexedBAM file—need writepermission in folder)

New Hierachical Clustering

14Copyright © 2011 Partek Incorporated. All rights reserved.

9/20/2011

8

Gene Patterns by Category – Profile Trellis

SOM Fingerprinting– Look for Clustering Genes

• Expression levels are standardized

• Relative difference in gene expression is provided by colors – red high; blue low; greens indicates no change

• For example the left lower corner is high in the normal cell lines MCF10A but same area is low in BT474

9/20/2011

9

Exporting Zipped Projects

• Preserves parent/child relationship

• Includes annotation files

• Single file can be moved from one system to another (must unzip manually)

• Archive completed projects

Methylation Hilbert Curve

DenselyMethylated

Centromere – no reads

End of data –padded with zeros

A way to view one-dimensional data in two-dimensional space.

locality-preserving behavior: meaning points close to each other in 1-D space are on average close to each other in 2D space.

9/20/2011

10

Gene ExpressionmicroRNAExonCopy NumberChIP-ChIPMethylation

Microarray Data Assays

9/20/2011

11

Data File Types

File > Import:

• Text Files (.txt, .csv)

• Vendor formats • Affymetrix, Illumina, Agilent,

Nimblegen, Nanostring

• GEO (Gene Expression Omnibus)

• ODBC Compliant Databases

• Partek Express Study File

• Excel – Save as .csv or .txt

Import – Affymetrix CEL filesNormalization Default - RMA• Background correction• Quantile Normalization• Log2 Transformation• Median Polish Probeset Summarization

Options (Customize…)• Partek Defaults (mean summarization)• Adjust for GC content or probe sequence• Import Control probes• Exclude/Include Probes

9/20/2011

12

Import – Agilent Data

Import ‐ Illumina Expression Data

• Export Partek Project from BeadStudio or Genome Studio (.ppj file)

•Partek Plug-in available for download through website

• Tutorial available on website to install plug-in

9/20/2011

13

Import – Nanostring Data

Files (rcc) imported

Normalization method

Background Subtraction

Import - Nimblegen Data

1) Import individual files• .pair, .calls, .ngd, .txt

2) Import Nimblegen Project Directory

•Subdirectories

• Specify annotation file

• Choose Species

9/20/2011

14

Import – Taqman RQ

• Import Ct values

• Set undetermined values

Import – SOLiD SAGE

• Counts are already mapped to known locations (transcripts)

• Transformation?

9/20/2011

15

Workflow

• Import & Normalization

• QA/QC – Exploratory Analysis

• Analysis

• Visualization

• Biological Interpretation

• Genomic Integration• Gene Expression with Copy Number • MicroRNA Integration

Assign Sample Attributes

• There are many ways to assign sample information (treatments, phenotype, other clinical information)

1) From a “sampleInfo” file (Affymetrix Import)

2) By creating treat/phenotype groups and dragging the samples into the appropriate group

3) By splitting apart the filename

4) By manually adding columns and filling them in (similar to Excel)

9/20/2011

16

1) Assigning sampleInfo file during Import

.fmt format with associated .txt or binary (i.e. SampleInfo.txt.fmt)

There must be a “key” column which identifies the filename (exactly – cAsE sEnsITive)

1) Creating a sampleInfo File

Multiple Chip Types – 2 x 250K

ChipType1 ChipType2Attribute1SubjX_A1

Attribute2SubjX_A2

NSP STY Type Subject

CRL-2325D_NSP.CEL CRL-2325D_STY.CEL Normal 1

CRL-2324D_NSP.CEL CRL-2324D_STY.CEL Tumor 1

CRL-5957D_NSP.CEL CRL-5957D_STY.CEL Normal 2

CRL-5868D_NSP.CEL CRL-5868D_STY.CEL Tumor 2

CCL-256.1D_NSP.CEL CCL-256.1D_STY.CEL Normal 3

……. ……. …… ……

9/20/2011

17

2) Add attributes from existing column

2) Getting Sample Attributes From Filename

Specify what character(s) separate the factors

Specify factor names and type (e.g. categorical or numeric)

Example Filename: “< TisMap_Brain_01_v1_WTGene1.CEL>”

9/20/2011

18

3) Assigning Sample Attributes after Import

3) “Drag & Drop” Specification of Groups

Name the new attribute and all categories of that attribute.

Group samples by dragging and dropping

(e.g. attribute name is “TissueType”, and categories are “Tumor” and “Normal”)

9/20/2011

19

Edit Sample Information

Factors vs. Response Variables

• There are 2 fundamental types of measurements per sample:

• Treatment/Phenotype information (factor variable)

• Any variable that describes the samples is a factor variable

• The organism’s “response” to treatment (response variable)

FACTORS RESPONSE

9/20/2011

20

Fixed vs. Random Variables (mixed model ANOVA)

Imagine if you repeated the experiment 20 years from now, would the same levels of each factor be used again?

•Tissue – Yes, the same tissues would be used – Fixed effect•Gender – Yes, the same genders would be used again – Fixed effect•Scan Date / Subject – No, samples would be taken from other subjects – Random Effect (red)

Exploratory Analysis

9/20/2011

21

Exploratory Analysis, PCA

• Points close together are similar across genome

• Outliers?

• Batch effect?

• Help set up ANOVA model

• Edit Plot Properties

Linear transformation to Convert n original variables into 3 dimensions

Exploratory Analysis

View>Box & Whiskers>Rows(Response)

Outliers? Normal Distribution?

9/20/2011

22

Finding differentially Expressed GenesANOVA Model

• ANOVA – measure the effects of multiple experimental factors (phenotypes) on expression levels

• Assumptions

• Normal Distribution• Approximately Equal Variance between groups• Sample Independence

• Balanced & Unbalanced | random & fixed effects (mixed- model ANOVA) |nested factors | any number of categorical effects or numeric values |Interactions

• P-values & F ratio are displayed by default• Contrasts & Fold Change• Batch effect

Differential Expression

• ANOVA will partition variability due to the factors in the model

• Test on every gene/probeset on chip

• Cross Tabs – balance of samples

• Advanced Tab for mixed-model ANOVA (model statistics, REML)

9/20/2011

23

Contrasts / Pair-wise Comparisons

• Select Factor / Interaction drop down

• Log2 transformed?

• Keep control consistency

• Report other statistics

• Contrasts added

• Fold change, ratio

Results of ANOVA

9/20/2011

24

Plot Sources of Variation

Examine each factor’s contribution to variability in the response variables

Error is the amount of variability NOT explained by the model

Columns taller than Error column may be significant

Mean or Median

Create Gene List

• Each factor/contrast of model is listed

• cutoff for FDR p-value & Fold change

•# of genes that pass

• Configure default settings

• Save list or temporary

• Advanced tab

Tools > List Manager

9/20/2011

25

Multiple Test Corrections

• FDR (False Discovery Rate) is the proportion of false positives among all positives.

• Partek implements the “step up” FDR method by default (Benjamini & Hochberg, 1995).

• Additional methods – “step down” FDR, q‐Value, Dunn‐Sidak, Bonferroni

(lenient) (restrictive)

FDRStep Up/Down & q-Value

Uncorrectedp-value

Bonferroni/Dunn-Sidak

FWER

Hierarchical Clustering (6.5 style)

• Parent/child spreadsheet relationship

• Change plot properties

• Color, labels, text legend

*Change default color schemeEdit> Preferences > Colors tab

9/20/2011

26

GO Enrichment

microRNA

9/20/2011

27

miRNA

• Short ~22 nucleotide sequences that bind to complementary sequences in the 3’ UTR of multiple target mRNA’s

• important post-transcriptional regulators of gene expression – usually silencing

• Even diagnostic of some conditions

• microRNA’s can target hundreds of mRNA targets each

• miRNA analysis allows a more complete view of the biological system

microRNA Workflow

• Import - Vendor neutral support

• QA/QC & Exploratory Analysis

• Analysis – Diff. Expressed microRNA’s

• Easy integration of GX & microRNA

1. Combine miRNA analysis with GX

2. Find miRNAs targeting genes of interest

3. Find miRNAs which correlate with targets

*Also available option in the GX workflow

9/20/2011

28

miRNA Analysis

Import

9/20/2011

29

microRNAProperties

• miRNA analysis is similar to gene expression analysis

• After import the data, make sure the species is defined• File>Properties

• If marker ID is miRNAname, annotation file is optional

Normalization

• As the “best” normalization for miRNAdata is still under discussion, here are a few normalizations available in Partek

• Quantile normalization or full RMA

• Normalization to control probes/genes

• Normalization to 3rd quartile

• Loess

• Mathematical transformations such as logarithm

9/20/2011

30

Exploratory analysis (QA/QC)

• PCA analysis, histogram analysis, clustering and much more

• Visualize natural sample grouping

• Distribution of data

• Preliminary analysis for ANOVA

Statistical analysis

• ANOVA to find differentially expressed microRNAs

• Mixed Model, Unlimited number/categorical factors

• Assumptions of ANOVA met

• Pair-wise comparisons

9/20/2011

31

microRNA Target Databases

Other:Not listed (i.e., Pictar)

Custom Format: Tab-Delimited file of microRNA-mRNA target pairs

mir mrna

mir1 GENE1

…. …

Integrative Genomics

Data is associated using target prediction databases:• By default: TargetScan or miRBase

Partek supports three integration experiment designs:

1. Separate GX and miRNA analysis (no statistics)Combine the result of the two analysis

2. Differential gene expression analysis onlyFind miRNAs which target changed genes

3. Paired GX and miRNA analysis:Correlate the expression of genes and the miRNAs which target them

9/20/2011

32

Separate GX and miRNA analysis

1. Separate gene expression and miRNA analysis

What do I need?Two datasets: Gene expression and miRNAThese do not have to be done on the same samples

What will I get?A combination of the statistical results from both analysis

What am I testing?Are my significantly changed genes the target of significantly changed miRNAs?

Combine miRNAswith their mRNA Targets

• Choose differentially expressed microRNAspreadsheet

• Select column with microRNAs

• Choose mRNA result spreadsheet (ANOVA or differentially expressed genes)

• Select Gene Symbol

9/20/2011

33

1. Separate GX and miRNA analysis

Use case:I have run a gene expression analysis experiment and a miRNA experiment. I want to see if my miRNAs of interest correspond with genes of interest

If you do not yet have gene expression data, you can get a list of gene targets of miRNAs

Putative Targets

Get all the Targets of the microRNA’s

9/20/2011

34

Differential gene expression analysis

2. Differential gene expression analysis only

What do I need?Any gene expression experiment can be used

What will I get?microRNAs that target a disproportionately high number of significant mRNA’s

What am I testing?Are my significantly changed genes the target of specific miRNAs?

2. Over-represented microRNA targetsTo test if significantly changed genes are the target of specific miRNAs

Fishers Exact test –P‐value indicates the overrepresentation of targets within gene list

9/20/2011

35

Differential Expression Results

Smaller Enrichment p-values indicate the more over-represented the significant genes are

miR-124 has been found to be the most abundant microRNAexpressed in neuronal cells

Paired GX and miRNA analysis

3. Paired GX and miRNA analysis:

What do I need?A gene expression and miRNA analysis using the same samples in both assaysSample IDs must match

What will I get?The correlation of miRNAs with their targets.

What am I testing?Does miRNA abundance effect mRNA abundance?

9/20/2011

36

Paired GX and miRNA analysis

• Results:• Negative correlation = high level of microRNA associated with low expression

of targeted gene

• Positive correlation = high level of microRNA associated with high expression of targeted gene

Each row is a miRNA with its targeted mRNA pair

Correlation Scatter Plot

<Right+Click> row header

9/20/2011

37


Workflow

Gene Level Analysis

(summarize to genes)

Alternative Splicing Analysis

(Exon level analysis)

*Import through workflow will be ExonLevel analysis

Import

9/20/2011

38

Gene Level analysis – Summarize Exons

Differential Expression – Gene Level

• Summarize exons to genes

• ANOVA

• Detect differential gene expression detection

9/20/2011

39

Alternative splicing

DifferentialExpression

AlternativeSplicing

Exon A Exon B

D N D N

Exon A Exon B

DN

D

N

Exon A Exon B

D N DN

No Alt Splice Alternative Splicing

Alternative splicing analysis – ANOVA Specify ANOVA model

Filter out non‐expressing probes

9/20/2011

40

Three Views into Expression

Exon Level

Gene Level

Exon Level

Gene Level

Differentially ExpressedExons

Differentially ExpressedGenes

Alt SpliceCandidates

Gene View

Access from Alt-Splice Result Spreadsheet

Filtered - translucent

9/20/2011

41


What is copy number variation?

• Copy Number Variation (CNV) is a segment ofDNA in which copy number differences havebeen found by comparison of two or moregenomes.

• CNV caused by genomic rearrangements such as deletions, duplications, inversions and translocations of particular genetic regions

• CNV can be detected by a collection of closely spaced genomic markers to measure the abundance of DNA across multiple samples and compared against a reference.

• Range 1kb to several Mb

Amp Del

Normal

9/20/2011

42

Standard Copy Number Processing Workflow

Copy Number/LogRatio

Detect regions on each sample

Analysis on regions across samples

Import from Affymetrix, Agilent, Illumina, NimbleGen, etc…

Find genes overlap with regions

Import Allele Intensity (Affy .cel files)

Genomic integration

Biological interpretation

Visualize data at any of the steps

Paired/Unpaired Copy Number Creation

• PAIREDTwo samples (case/control) taken from each subject

• The normal sample is baseline for the case sample for each subject

• Output copy number values only for case sample in each subject

UNPAIRED

Affy : SNP6, SNP5, 100K,500K Illumina: 1M, Omni1‐Quad

9/20/2011

43

Baseline Choices

Better ability to detect true copy number

More robustness to sources of noise

UnpairedUniversalReferenceLarge Hapmapbaseline run inthird party lab

Paired

DNA andreference fromsame patient

UnpairedExperimentalReferenceReference fromsimilar samplesrun in same lab

UnpairedLab ReferenceReference fromlarger unrelatedgroup of samplesrun in same lab

Import Copy Number Assays

Agilent• Feature Extraction• Choose LogRatio to

import• Change the log base to 2

Nimblegen• .pair in raw data folder or(LOESS – one color to the other)• *normalized.txt in processed data folder(output – corrected logratio)

Illumina• Partek plug‐in for GenomeStudio• LogRRatio( Allele intensities)• B allele frequency• Genotype calls

Affymetrix

• .CEL (allele intensities)• .CHP (genotype calls)• Create Allele Ratio

9/20/2011

44

Import Affymetrix MIP ChipFile > Import > Affymetrix > MIP Copy Number text file

• Choose input file, annotation file

• Three files to choose from:— ASCN— Total Copy Number— Allele Ratio

• Values pre‐calculated by Affymetrix

• Must adjust Copy Number values below zero to use analysis options

• Set values below zero to small number

Choose Sample ID

• Necessary for integration with other integrative analysis (e.g., LOH, gene expression)

• Not recommended to use file name because of the different extension (i.e. CEL & CHP)

9/20/2011

45

How to find regions of CNV?(Amplifications & Deletions)

• Monitoring trends across multiple adjacent markers

• Define chromosomal breakpoints where these trends in chromosomal abundance changes

• Methods in Partek:• Hidden Markov Model

• Genomic segmentation

2 “normal”

Partek Genomic Segmentation• Find a breakpoint that produces different

neighboring regions

Segmentation Parameters

• Specify minimum number of genomic markers

• Two sided t‐test to comparing two neighboring regions

• Based on significance and amount of changes to decide whether to insert breakpoint

Region Report

• 2 One sided t‐test to compare the mean of the region with expected range to determine aberration status

• Expected range: the range around each expected copy number. In a diploid region , the expected range would be 2+/‐ 0.3 which is from 1.7‐2.3.

2.62.2

Signal to Noise

(2.6-2.2)=0.4 > 0.3

9/20/2011

46

HMM vs Segmentation

• HMM - Good on homogenous samples with anticipated states (copy number)

• Segmentation ‐ Good for heterogeneous sample when you don’t know the copy number state

Segmentation Result Spreadsheet

• One row per segment per sample

• First 3 columns are the genomic location: chromosome, start, end

• Copy Number status is based on the report parameters

9/20/2011

47

GC Wave Correction on Copy Number

Adjust copy number/logratio based on local gc content• Need reference genome in .2bit format

Diskin, et.al; Adjustment of genomic waves in signal intensities from whole genome SNP genotyping platforms, Nucleic Acid Res., 2008, 36: 19

Analyze Detected Segments

Region of sample 1

Region of sample 2

Region of sample 3

Region of sample 4

Region of sample 5

Result ResultResult

Result

Analyze regions across

multiple samples Detect changes by Phenotype

Unbalance between samples will indicate increased significance between levelsCan have minimum # probes less

than Segmentation default

9/20/2011

48

Segment‐analysis Spreadsheet

Copy number status information among all the samples

Detect changes by category (i.e. phenotype) ‐ Chi‐square results

Detailed information on each sample for each region (i.e. average copy number)

Plot Detected Regions

Karyoview (Histogram View)— Sample frequency on aberration regions

Classification View— View each region in each sample separately

9/20/2011

49

Create Region List• Specify criteria to filter down to interesting regions based on

• p‐value• length• number of marker• chromosome• number of aberration samples

Find Overlapping Genes• Annotations can be attached onto

any region spreadsheet

• The annotation source can be specified or mapped to custom annotations

• Region can be extended on both ends

• Output a new gene list spreadsheet or on a new column in the selected region spreadsheet

9/20/2011

50

Gene Overlap ResultOutput a new gene list spreadsheet or on a new column in the selected region spreadsheet

• Region Overlap: The intersection in base pairs divided by the size of the region

•Gene Overlap:The region of the intersection divided by the size of the gene

Right click on a row header to browse to location

Finding Genes on Breakpoints

• Add refseq genes to shared segments in a new sheet

• Any gene with less than 1 “gene overlap” is a potential break point gene

• These could be fusion genes depending on where the breakpoint overlaps with the gene and possibly be “driving” a phenotype

Amplification

9/20/2011

51

Test for Known Abnormalities• Input file:

• Filtered segmentation/HMM result spreadsheet• Abnormality database

• If overlap = positive• Output is each row is a feature testing in each sample

Cluster Genome

Copy number spreadsheet is used to verify how the samples are clustered on the whole genome or selected chromosomes

• default is showing cluster on chromosome 1

• click Show All button to cluster on the whole genome

• combine left and right click on chromosome number to select chromosomes

9/20/2011

52

Chromosome View

Copy Number & LOH

• Regions of Copy Number Amplification and Deletion

• Regions of LOH

• Combined regions of CN and LOH

• Overlapping genes

9/20/2011

53

LOH & CNLOH

Amplification

Deletion

Amplification Delw/

LOH

Copy neutral LOH

Ampw/

LOH

Deletion


9/20/2011

54

ChIP-ChIP

• Import

• Goal: Detect regions enriched by transcription factor chromatin immunoprecipitation( ChIP)

• Motif Discovery

• Find Genes that overlap enriched regions

• Adjust for probe sequence • RMA background correction• Quantile normalization• Log base 2 transformation

Import, normalization

Nimblegen

Agilent

• Feature Extraction Files• LogRatio• Transform Log(10) to Log(2)

Affymetrix

• .pair, .ngd, .pos• define species• choose annotation

More dense assays Less dense assays

9/20/2011

55

ANOVA or t-test

ANOVA• More complex model• Batch effect• Remember to calculate T‐statistic

T‐test• 1 factor model• Faster• Typical experiments only have 1 comparison• Result is t‐statistic

The MAT algorithm (Johnson et. al, PNAS, 2006) is used to find regions of binding in Tiling experiment:

1. Estimating probe level t‐statistics

2. Using the trimmed mean of probe‐level t‐statistics in a window of fixed genomic length to generate MAT scores

3. An empirical distribution is used to determine MAT score significance by sampling windows from the original data

4. After identifying regions of a specified target length as significant, combine with other close regions

Detect regions of Significance(MAT)

MAT

9/20/2011

56

MAT Algorithm Results

P‐value(region): the p‐value of the most significant window in the region

Fraction of negatively enriched: proportion on false positive probes in a region. = (# of probes not significant) / (# of probes in reported region)Regions of high value less confident or caused by large number of false positives

MAT‐score: maximum MAT score for this regionpos(+) = trimmed mean of t‐statistics from specified contrast was positiveneg(‐) = trimmed mean of t‐statistics from specified contrast was negative

Create Region List

• Positive MAT score means one group is enriched over another• Filter by MAT score > 0• Filter by p‐value• Intersection between to region lists

9/20/2011

57

Motif Discovery

De NovoGibbs Motif Sampler

KnownJASPAR Database

Find Overlapping Genes

*PAZAR coming soon

9/20/2011

58

Distance of Methylation to TSS

Thank you for your attentionHungry???

Download - GE, miRNA, Exon (non-NGS data) with Partek Genomics Suite 69/20/2011 1 GE, miRNA, Exon (non-NGS data) with Partek Genomics Suite 6.6 Jean Jasinski, Ph.D. Field Application Scientist

Top Related