pathway analysis - software tools and services

11
projects.bigcat.unimaas.nl http://projects.bigcat.unimaas.nl/ncsb2013/material/pathway-analysis/ Pathway Analysis Tutorial 1: Introduction WikiPathways and PathVisio In this f irst tutorial you will be introduced to WikiPathways and PathVisio. PathVisio uses the complete analysis pathway collection of WikiPathways. The pathways that are tagged as analysis pathways are part of this collection. Step 1: Find a pathway in WikiPathways First, go to wikipathways.org and search f or Mitochondrial LC-Fatty Acid Beta-Oxidation in the search box (Figure 1.1). Second, search f or the rat Mitochondrial LC-Fatty Acid Beta-Oxidation pathway by selecting the correct species (Figure 1.2). Finally, click at the Mitochondrial LC-Fatty Acid Beta-Oxidation pathway and the pathway will be displayed in full screen (Figure 1.3). Q1: What is the identifier of the pathway? Hint: Have a look at the web address of the pathway. Figure 1.1: Search WikiPathways Figure 1.2: Select species Figure 1.3: Display pathway in WikiPathways Step 2: Download a pathway in WikiPathways in gpml format In WikiPathways you can save the pathways in dif f erent f ormats, f or example as pdf or png. Another option is as gpml which is the f ormat used by PathVisio. At the bottom of the Mitochondrial LC-Fatty Acid Beta- Oxidation pathway in WikiPathways you will f ind a download button. Click at this button and save the pathway as gpml (see Figure 2). Figure 2: Save pathway in gmpl format

Upload: others

Post on 16-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pathway Analysis - Software Tools and Services

pro ject s.bigcat .unimaas.nl http://pro jects.bigcat.unimaas.nl/ncsb2013/material/pathway-analysis/

Pathway Analysis

Tutorial 1: Introduction WikiPathways and PathVisio

In this f irst tutorial you will be introduced to WikiPathways and PathVisio. PathVisio uses the completeanalysis pathway collection of WikiPathways. The pathways that are tagged as analysis pathways are partof this collection.

Step 1: Find a pathway in WikiPathways

First, go to wikipathways.org and search f or Mitochondrial LC-Fatty Acid Beta-Oxidation in the search box(Figure 1.1). Second, search f or the rat Mitochondrial LC-Fatty Acid Beta-Oxidation pathway by selecting thecorrect species (Figure 1.2). Finally, click at the Mitochondrial LC-Fatty Acid Beta-Oxidation pathway and thepathway will be displayed in f ull screen (Figure 1.3).

Q1: What is the identifier of the pathway? Hint: Have a look at the web address of the pathway.

Figure 1.1: Search WikiPathways Figure 1.2: Select species Figure 1.3: Display pathway inWikiPathways

Step 2: Download a pathway in WikiPathways in gpml format

In WikiPathways you can save the pathways in dif f erent f ormats, f or example as pdf or png. Another optionis as gpml which is the f ormat used by PathVisio. At the bottom of the Mitochondrial LC-Fatty Acid Beta-Oxidation pathway in WikiPathways you will f ind a download button. Click at this button and save thepathway as gpml (see Figure 2).

Figure 2: Save pathway in gmpl format

Page 2: Pathway Analysis - Software Tools and Services

Step 3: Start PathVisio

Note: If you already installed PathVisio as instructed, you can skip step 2a.

a. Copy the directory PathVisio-3.1.0 f rom the provided USB stick onto your laptop.

If you want to install PathVisio at home, you can download PathVisio f romhttp://www.pathvisio.org/downloads/. (Download binary installation to use PathVisio of f line).

b. Start PathVisio by executing the pathvisio.bat f ile (Windows) or the pathvisio.sh f ile (Linux / MacOSX)in the PathVisio-3.1.0 directory (Fig. 3.1).

c. Now PathVisio will start all modules (Fig 3.2), and the PathVisio main window will be opened (Fig. 3.3)

Figure 3.1: Start PathVisio Figure 3.2: PathVisio will start allmodules

Figure 3.3: PathVisio opens with anempty pathway view

Step 4: Select the ID mapping database

The pathway you downloaded is a rat pathway. As mentioned in the morning lecture you need to have an IDmapping database to be able to recognize the genes in the pathway. The rat ID mapping database(=Rn_Derby_20120602.bridge) is available via the USB stick in the pathways directory. To select the genedatabase in PathVisio go to Data -> Select Gene Database -> Select the Rn_Derby_20120602.bridge file.The selected rat gene database f ile is now displayed at the bottom panel (see Figure 4).

Page 3: Pathway Analysis - Software Tools and Services

Figure 4: Selected human gene database

Step 5: Open the downloaded pathway in PathVisio

To open the downloaded pathway go to File -> Open -> Select the downloaded pathway in gpml format.Figure 5 shows the pathway in PathVisio.

Figure 5: Downloaded pathway opened in PathVisio

Step 6: Select a gene and study side panel

Click at the Cpt1a gene box in the opened rat Mitochondrial LC-Fatty Acid Beta-Oxidation pathway, seeFigure 6.1. In the backpage in the panel at the right hand site shows the annotation of the gene and theavailable cross ref erences.

Page 4: Pathway Analysis - Software Tools and Services

Go back to the pathway and double click at the Cpt1a gene box. Now a DataNode properties panel isopened showing the annotation, literature and comments, see Figure 6.2.

Q2: Which identifier and database are used to annotate the Scp2 gene in the rat Mitochondrial LC-Fatty Acid Beta-Oxidation pathway?

Figure 6.1: Selected Cpt1a gene box + backpage Figure 6.2: DataNode panel

Tutorial 2: Data Visualization and Analysis in PathVisio

In this tutorial you are going to perf orm pathway analysis in PathVisio to help biological interpretation. Youare going to:

Search f or regulated pathways that might be relevant to study in more detail.

Visualize your data on a pathway diagram so you can explore the data in a biological context.

Outline

Dataset description

Step 1: Pathways and identif ier mapping databases

Step 2: Data f ile preparation f or PathVisio

Step 3: Import the data into PathVisio

Step 4: Create a visualization by coloring logFC and p.value

Step 5: Search f or regulated pathways

Transcriptomics data set

Background

The transcriptomics data set is published and the data is available via ArrayExpress, see E-MTAB-797.A subset of the Toxicogenomics Project, a 5-year collaborative project (2002-2007) by a consortiumcomprising the Japanese government and several pharmaceutical companies, was selected. This project

Page 5: Pathway Analysis - Software Tools and Services

produced a large-scale database of transcriptomics and pathology data potentially usef ul f or predicting thetoxicity of new chemical entit ies. Conventional in vivo toxicology data was collected f rom single dose andrepeat dosing studies on rats, and gene expression measured for the liver.

Paper:Takeki Uehara, Atsushi Ono, Toshiyuki Maruyama, Ikuo Kato, Hiroshi Yamada, Yasuo Ohno, TetsuroUrushidani The Japanese toxicogenomics project: application of toxicogenomics. Mol Nutr FoodRes: 2010, 54(2);218-27 [PubMed:20041446] [WorldCat.org] [DOI]

Be aware that this paper gives you a description of the Toxicogenomics projects. At the ArrayExpress entrypage you will f ind a better description of the experimental setup.A detailed description of the study design can be f ound at the website of ArrayExpress, see protocols.

Description of selected transcriptomics samples

Animal modelHepatocytes of 6 week old male Sprague-Dawley rats were treated f or 8 hours with 30 micromolarFenofibrate. Fenof ibrate is an activator (=agonist) of peroxisomal-prolif erator receptor (PPAR) alpha.Fenof ibrate was added to the medium directly or as a 1,000X stock solution in DMSO. Cells were exposedto compound f or 8 hr bef ore collection. Af ter compound exposure, the hepatocytes were lysed with RLTbuf f er and collected f or expression prof iling.

NOTE: The f enof ibrate treatment was part of a large screen of many toxicological compounds. Fenof ibratewas given in three dif f erent dosages. Here we choose the highest concentration given f or 8 hours

TranscriptomicsTotal RNA was isolated f rom the hepatocyte lysate using an RNeasy kit (Qiagen). 10 ug of f ragmentedcRNA was hybridized to the probe array f or 18 h at 45C at 60 rpm, af ter which the array was washed andstained by streptavidin-phycoerythrin using Fluidics Station 400 (Af f ymetrix) and scanned by Gene ArrayScanner (Af f ymetrix). The Af f ymetrix GeneChip Rat Genome 230 2.0 [Rat230_2] was used.

Pre-processing

The quality of the Af f ymetrix microarrays used in the rat experiment was analyzed using arrayanalysis.org,an Af f ymetrix analysis pipeline developed at the Department of Bioinf ormatics, Maastricht University.Checking the quality ensures that the downstream analysis is not biased by any (large) technical inf luences,which in turn may lead to a biased biological outcome. Af ter the QC analysis the gene expression data werenormalized using GC-RMA normalization.

Stat ist ical analysis

The normalized gene expression data was statistically analysed using the limma package in R-Bioconductor. This package uses moderated t and F-statistics based on linear modelling in order toperf orm dif f erential gene expression analysis f or data arising f rom microarray experiments. The mainadvantage of limma over tradit ional t or F-tests is, that inf ormation is borrowed f rom other genes f orestimation of variences and standard errors of a single gene. This stabilises the analysis particularly f orsmall sample sizes.

Stat ist ically analyzed data set

The statistically analyzed transcriptomics data set is available on the provided USB stick in the datasetdirectory as Feno_High_vs_Control.txt .

Page 6: Pathway Analysis - Software Tools and Services

Open the f ile using excel and have a look at the statistically analyzed data. In the data f ile you will f ind thef ollowing columns:

1. Ensembl: this column contains the identif iers of the genes in the data set.

2. Syscode : this column specif ies the data source of the identif ier. In our example data set weare using En f or Ensembl. This column is optional if all the identif iers are f rom the samedatabase.

3. logFC: the f old change is a metric f or comparing an expression level between two distinctexperimental conditions. Log transf ormed data is easier to handle statistically. Here wecompared high-f enof ibrate-treated versus control.

4. p.value : statistical signif icance

5. p.value.adj: corrected p-value f or multiple testing

Step 1: Pathways and identif ier mapping databases

In addition to the experimental data f ile, you need two other types of f iles to use PathVisio:

Pathways: A set of pathway f iles in GPML f ormat (*.gpml f iles)

Rat identif ier mapping database: A species-specif ic identif ier mapping database so PathVisio cantake care of the identif ier mapping step.

Note: For this workshop, we prepared USB sticks containing all the data f iles that you need f or thisanalysis. If you want to repeat the analysis at home you can download the data f rom the f ollowing websites(you can also f ind the data f or other species there):

Pathways: You can f ind them on the USB stick in directory pathway-analysis/pathways-rno-2013-08-28. They have been downloaded f rom Wikipathways. You can f ind pathways f or dif f erent speciesthere.

Identif ier mapping database: You can f ind the mapping database f or rat on the UBS stick (pathway-analysis/Rn_Derby_20120602.bridge). It has been downloaded f rom the BridgeDb website

Step 2: Data f ile preparat ion for PathVisio

PathVisio can load any type of quantitative data (expression values, f old changes, p values, conf idencescores,…) or textual data if required. The data has to be saved as a tab separated f ile (.txt or .csv).We already pre-processed the dataset described in Step 0 and provide a f ile containing the Ensemblidentif ier, the system code, the log f old change, p value and adjusted p value f or the comparison highdose vs. control in the liver samples.

You can copy the data set f rom the USB stick or download it f rom here.

If you have your own data set and want to prepare it f or the import in PathVisio, open the f ile with Exceland save it as a CSV (Comma separated) f ile.

Step 3: Import the data into PathVisio

In the menu bar of PathVisio, click Data → Import expression data (Fig. 3a)

Use the Browse buttons to locate the f ollowing f iles (Fig. 3b):

Input f ile : The experiment data f ile (Feno_High_vs_Control.txt). Make sure that you have alocal copy on the hard-drive (don’t use the f ile on the USB directly).

Page 7: Pathway Analysis - Software Tools and Services

Output f ile : Will be f illed in automatically af ter selecting the input f ile, you don’t need tochange this.

Gene database : Use the identif ier mapping database f or rat (Rn_Derby_20120602.bridge).

Click “Next”.

Make sure that tab is selected, because the columns in our data are delimited by tabs. Check thepreview if it looks as you would expect (Fig. 3c)

Click “Next”.

Select the columns that contain the gene identif iers and identif ier type. In our data set we don’t havea system code column, so we have to select “Use the same system code for all rows“. Please selectEnsembl and NOT Ensembl Rat (Fig. 3d). You can also use the Syscode column if you want.

Click “Next”.

The data will now be imported into an expression dataset that is saved as a .pgex f ile on yourharddisk. Any exceptions will be reported to the f ile .pgex.ex. No exceptions should occur f or ourdataset (Fig. 3e).

Click “Finish”.

Note: An exception about old Ensembl identif iers might pop up. Please ignore this warning (Fig.3f ).

In the f ootbar of PathVisio you can see which identif ier mapping databases and which data set areloaded (see Fig 3g).

Fig 3a: Select menuitem Data → Importexpression data

Fig 3b: Choose input fileand gene database

Fig 3c: Check the preview of thedata.

Fig 3d: Specifythe identifercolumn andsystem code

Fig 3e: Make sure thatno exceptions occurwhen importing thedata.

Fig 3f: Ignore theexception about thewarning of the oldEnsembl identifiers.

Fig 3g: In the footbar of PathVisioyou can see which mappingdatabases and which data set areloaded.

Step 4: Create a visualizat ion by coloring logFC and p.value

Page 8: Pathway Analysis - Software Tools and Services

Bef ore we start with the pathway statistics to f ind changed pathways, we are going to specif y how the datashould be visualized on the pathways. We are going to test this with the Mitochondrial LC-Fatty Acid Beta-Oxidation pathway.

Tip: PathVisio allows you to change the def ault values f or several settings (see Edit → Preferences →Display → Colors). In this visualization example, we changed the “Criteria not met” color to red:

The data set contains values f or log f old change andp.value. We are going to visualize those two values intogether on the gene nodes in the pathway.

1. Go to Data → Visualization Options

2. Create a new visualization by clicking the button inthe top-right corner and select “New” (Fig. 4a).

3. Specif y a name f or the visualization (e.g. “pathway-tutorial”) (Fig. 4b)

4. Check the box in f ront of “Expression as color” andthe box in f ront of “Text label” (Fig. 4c).

5. In the expression as color panel, select Advanced. Then select the logFC column and create a newvisualization (Fig. 4d).

6. For the logFC it makes sense to use a gradient f rom -2 to 2. Choose a gradient f rom blue to yellow(blue being under-expressed, yellow being over-expressed, Fig 4e). Click Ok.

7. Select the p.value column and create a new visualization (Fig. 4f ). For the p.value we will def ine acolor rule ([p.value] < 0.05), see Fig 4g. Click on new color set. Click on “Add Rule” Specif y rule logicand color. Then press “Ok”.

8. The pathway element are now split in two columns. The f irst column is the logFC gradient while thesecond column specif ies if a measurement was signif icant or not (p-value < 0.05). In the legend tabon the right side, you can see which column in the pathway element represents what, see Fig 4h.

Open the Mitochondrial LC-Fatty Acid Beta-Oxidation pathway f rom the Rat pathways on the USB stick.The pathway will now look somewhat like Fig. 4h.

Q3: Which two genes in the rat Mitochondrial LC-Fatty Acid Beta-Oxidation pathway have a high logfold change and are significantly changed?

Tip: To save the pathway with the data visualization, click on File -> Export. Here you can save the pathwayin dif f erent f ormats so you can use it in presentations, like *.png.

Fig 4a: Create anew visualization.

Fig 4b: Give thevisualization aname, like“pathway-tutorial”.

Fig 4c: Select the “Expressionby color” and “Text label”checkboxes. Specify agradient from -2 to 2.

Fig 4d: Specify the advancedexpression as color visualization.Create a visualization for thelogFC.

Page 9: Pathway Analysis - Software Tools and Services

Fig 4e: Create agradientvisualization forthe logFC from -2to 2.

Fig 4f: Create anew visualizationfor the p.value.

Fig 4g: Create a rule basedvisualization for the p.value([p.value < 0.05]).

Fig 4h: Resulting colored pathwayimage. Expression Data can beshown in the backpage below thecross references.

Step 5: Search for regulated pathways

In the f inal step of this tutorial we are going to f ind out which pathways are enriched with regulated genes.We can then study these pathways and f or example see whether they are inf luenced by the compound f edto the rats. These pathways might provide leads f or f urther investigation of the biological implications.

To identif y regulated pathways, we are going to use PathVisio to calculate a z-score f or each pathway.

1. Go to “Data->statistics”. (Fig. 5a)

2. The “Pathway Statistics” dialog will open (Fig. 5b)

3. In the text f ield below “Expression:”, type “([logFC] < -1 OR [logFC] > 1) AND [p.value] < 0.05″ (withoutthe quotes). This expression def ines which genes are signif icantly changed (up or down) in geneexpression in the high dose treated animals.

4. In the text f ield below “Pathway Directory:”, f ill in the directory where the pathway (gpml) f iles arelocated (see step 1). You can also use the “Browse” button to locate and select the directory.

5. Click the “Calculate” button. You should see a progress dialog tit led “Calculate Z-scores”.

6. Af ter a f ew minutes, the analysis should be f inished and you will see a list of pathways appear in thedialog, (Fig. 5b).

7. If you click on a pathway in the list, it will be opened. You can then apply the visualization created inthe previous section to study the gene expression prof iles and f ind out if any of the genes werechanged in the data set, see Fig 5c.

8. Save the list of pathways by clicking on the “Save results” button. You can open the statistical resultthen in Excel.

Page 10: Pathway Analysis - Software Tools and Services

Note: Please be aware that the results can be slightly dif f erent due to recent changes in the pathwaycollection.

Q4: Have a close look at the highest ranked pathways. Are these in line with what you expect basedon the known effects of PPARalpha activation?

Fig 5a: Open statistics dialog. Fig 5b: Define all settings andrun statistics.

Fig 5c: Click a row in the result list toopen the pathway.

Optional Tutorials: Design your own pathway / Workf low integration

Optional 1: Design your own pathway → Learn how to draw a pathway.

If you f inished the f irst part and still have time lef t, please continue with this tutorial.

WikiPathways was established to f acilitate the contribution and maintenance of pathway inf ormation by thebiology community. WikiPathways is an open, collaborative platf orm dedicated to the curation of biologicalpathways. WikiPathways thus presents a new model f or pathway databases that enhances andcomplements ongoing ef f orts, such as KEGG, Reactome and Pathway Commons. Building on the sameMediaWiki sof tware that powers Wikipedia, we added a custom graphical pathway editing tool andintegrated databases covering major gene, protein, and small-molecule systems. The f amiliar web-basedf ormat of WikiPathways greatly reduces the barrier to participate in pathway curation. More importantly, theopen, public approach of WikiPathways allows f or broader participation by the entire community, rangingf rom students to senior experts in each f ield. This approach also shif ts the bulk of peer review, editorialcuration, and maintenance to the community.

We are using the circadian clock pathway as an example in the tutorial. Please f ollow the 11 steps on thetutorials page.

Optional 2: Rerun the analysis f rom R, Perl or Python using PathVisioRPC.

If you have some programming experience, you can rerun the analysis that we just perf ormed in PathVisiof rom any programming language that supports XMLRPC.We are using Python as an example here (it is usually pre-installed on Linux and MacOSX, For Windows:install Python 3, double click on python-3.3.2.msi installer in the usb drive (or download it here)and follow onscreen instructions). The XMLRPC module is pre- installed in python.

Please Note : The XMLRPC library has been named xmlrpc.client in Python 3 as opposed to xmlrpclib in Python 2. Change your code as necessary.

Page 11: Pathway Analysis - Software Tools and Services

You can either use the directory pathway-analysis/pathvisio-rpc on the USB stick or download the zipf ile containing all necessary f iles here (59 MB). Remember to extract the f iles af ter downloadedingthe zip f older.

First you need to start the PathVisioRPC server. The executable jar f ile PathVisioRPC-standalone.jarthat you downloaded in the previous step launches the PathVisioRPC server on your local computer.Open a terminal : use cd to change the current working directory to the f older, which you downloadedand unzipped in the previous step. Then, type java -jar PathVisioRPC-standalone.jar to start thePathVisioRPC server on port 7777. Leave this terminal open while running the script, you will see theserver output during a request here.

Now we can run the python script python PathVisioRPC-Python.py f rom the command line. WindowsUsers: Go to the ncsb-workshop-pathvisio-rpc f older and double click on the PathVisioRPC-Python-Windows.py script to execute it. The script will run f or a while and produce results. The f unctions inthe script are described below. If you want to redo the analysis with another data set you will need tochange the f ile path and visualization settings in this f ile. For this tutorial it is important that all thef iles are present in the same directory so we can use the relative f ile locations. The commands in thescript are simple and straightf orward:

server.PathVisio.importData(...): perf orms the data import step and creates a pgex f ile in theresult directory.

server.PathVisio.createVisualizat ion(...): specif ies the gradient and color rule that we usedduring the tutorial.

server.PathVisio.calculatePathwayStatist ics(...): calculates and exports the z-scorestatistics results as HTML pages.

Go in the results directory and open the index.html page. It will provide you an overview of thepathway analysis and you can click on the pathway list to show the pathway diagram. The nodes inthe pathway are clickable and the backpage will be opened in another tab.

Tip: PathVisioRPC allows you to include PathVisio into your workf low. You can run multiple analysis a lotf aster than doing it by hand.