egan tutorial: loading network data

21
EGAN Tutorial: Loading Network Data October, 2009 Jesse Paquette UCSF Helen Diller Family Comprehensive Cancer Center [email protected]

Upload: carol

Post on 23-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

EGAN Tutorial: Loading Network Data. October, 2009 Jesse Paquette UCSF Helen Diller Family Comprehensive Cancer Center [email protected]. Preamble. This document has many slides with multi-step animations Best viewed in Slide Show mode - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EGAN Tutorial: Loading Network Data

EGAN Tutorial:Loading Network Data

October, 2009Jesse PaquetteUCSF Helen Diller Family Comprehensive Cancer Center [email protected]

Page 2: EGAN Tutorial: Loading Network Data

Preamble

• This document has many slides with multi-step animations– Best viewed in Slide Show mode

• The EGAN graphical user interface is evolving– Icons may change– Menus may change– Button/widget placement may change– This document probably won’t change as quickly– Please contact the developers if you notice major

discrepancies between this and EGAN

Page 3: EGAN Tutorial: Loading Network Data

Loading network data: An overview

• The EGAN pre-collated network represents only a fraction of available data

• Additional data can be loaded as– Gene sets/association nodes

• Pathways, annotation terms, articles, transcription factor targets, miRNA targets, conserved domains, significant gene sets/clusters from experiments, etc.

– Gene-gene edges• Protein-protein interactions, literature co-occurrence,

expression correlation, sequence homology, transcription factor targets, kinase targets, etc.

• This document will outline the steps for loading additional gene sets and gene-gene edges into EGAN

Page 4: EGAN Tutorial: Loading Network Data

Loading gene sets into EGAN

Page 5: EGAN Tutorial: Loading Network Data

Loading gene sets into EGAN:Gene set file formats• Two possible tab-delimited text formats

– GMT• All default pre-collated gene sets in EGAN are all specified via GMT

files• Each row represents a different gene set

– GMX• Transposed GMT• Each column represents a different gene set

• First two columns of GMT (or rows for GMX) specify– Gene set ID (first column)

• Can potentially be used to link out to the gene set’s web page via URL– Gene set name (second column)

• Can be empty or same as the ID• Subsequent columns list the genes in each set

– Gene identifiers must be mappable to Entrez Gene IDs• EGAN provides a wide variety of mapping file options

– Entrez Gene ID, HUGO Gene Symbol, assay-specific IDs, Ensembl, GenBank, UniProt, etc.

• EGAN expects that all entity IDs are the same type for each file

Page 6: EGAN Tutorial: Loading Network Data

Loading gene sets into EGAN: An example

First column: gene set IDs

Second column: gene set names

Later columns: gene identifiers

Each row is a gene set

Page 7: EGAN Tutorial: Loading Network Data

Loading gene sets into EGAN: An example

Save as tab-delimited text

Page 8: EGAN Tutorial: Loading Network Data

Loading gene sets into EGAN: An example• Download or construct a gene set file

– This example will use c2.cgp.v2.5.symbols.gmt from MSigDB (download this file to follow along)

• You’ll have to log-in with your email address to download MSigDB gene sets

• Launch EGAN H. sapiens

Page 9: EGAN Tutorial: Loading Network Data

Loading gene sets into EGAN: An example

Click on “7) Association Data”

Shown are the default pre-collated gene sets.

We want to load a new one.

Click “Browse…”

Select your GMT file and click“Specify gene association set”.

This GMT file uses Gene Symbols for gene identifiers.

Select “HUGO Gene Symbol” from the drop-down menu.

Now specify that these gene sets are of type “MSigDB C2: chemical and genetic perturbations” by selecting that option from the drop-down menu.

This MSigDB type has been pre-defined for EGAN, which is why it exists in this menu. Finally, click “Add Set”When you are finished loading data,

click “Finish – Launch EGAN”.

Page 10: EGAN Tutorial: Loading Network Data

Loading gene sets into EGAN: An example

Whenever you change the network configuration by adding or removing files, you will be given the option to save the new configuration to a tab-delimited text file.

If you choose to save a .config file, next time you will only need to specify that file (item 3 in the Launch EGAN Wizard).

Page 11: EGAN Tutorial: Loading Network Data

Loading gene sets into EGAN: An example

When EGAN finishes loading, your new set(s) will be available for exploration

Page 12: EGAN Tutorial: Loading Network Data

Loading gene-gene edges into EGAN

Page 13: EGAN Tutorial: Loading Network Data

Loading gene-gene edges into EGAN:File formats• Two possible tab-delimited text formats

– SIF (Simple Interaction File) format commonly used in Cytoscape• .sif extension (required in EGAN)• Each line represents a gene-gene relationship• Three columns

– First column is first gene– Middle column is ignored in EGAN– Third column is second gene

– EGAN interaction file format• .txt file extension• Three columns, like SIF

– Middle column is a PubMed ID

• Gene identifiers must be mappable to Entrez Gene IDs– EGAN provides a wide variety of mapping file options

• Entrez Gene ID, HUGO Gene Symbol, assay-specific IDs, Ensembl, GenBank, UniProt, etc.

– EGAN expects that all entity IDs are the same type for each file

Page 14: EGAN Tutorial: Loading Network Data

Loading gene-gene edges into EGAN: An example

Each row is a gene-gene relationship

Third column: second gene

First column: first gene

Page 15: EGAN Tutorial: Loading Network Data

Loading gene-gene edges into EGAN: An example

Save as tab-delimited text

Page 16: EGAN Tutorial: Loading Network Data

Loading gene-gene edges into EGAN: An example• Download or construct a gene-gene edge file

– This example will use HPN.sif, a set of kinase-target relationships available in the “.sif Gzip-ed files” link at NetworKIN (download this file to follow along)

• You’ll have to accept the NetworKIN license in order to download data

• Launch EGAN H. sapiens

Page 17: EGAN Tutorial: Loading Network Data

Loading gene-gene edges into EGAN: An example

Click on “8) Gene Relationship Edges”

Shown are the default pre-collated gene-gene edge files.

We want to load a new one.

Click “Browse…”

Select your SIF (or EGAN .txt) file and click “Specify gene-gene edge set”

This SIF file uses Gene Symbols for gene identifiers.

Select “HUGO Gene Symbol” from the drop-down menu.

Now specify that these gene sets are of type “NetworKIN” by selecting that option from the drop-down menu.

The NetworKIN type has been pre-defined for EGAN, which is why it exists in this menu.

Finally, click “Add Set”When you are finished loading data, click “Finish – Launch EGAN”

Page 18: EGAN Tutorial: Loading Network Data

Loading gene-gene edges into EGAN: An example

Whenever you change the network configuration by adding or removing files, you will be given the option to save the new configuration to a tab-delimited text file.

If you choose to save a .config file, next time you will only need to specify that file (item 3 in the Launch EGAN Wizard).

Page 19: EGAN Tutorial: Loading Network Data

Loading gene-gene edges into EGAN: An example

When EGAN finishes loading, your new gene-gene edges will be available for exploration

Page 20: EGAN Tutorial: Loading Network Data

Loading network data: Tips and hints

• Both the MSigDB and NetworKIN types were pre-defined in EGAN– This may not be the case for your new data– You can use the “Custom Node/Custom Edge” types as a default

• You can specify your own type definitions in a Type Definition file– Give your added nodes and edges distinct colors and links– See item 4 in the Launch EGAN Wizard– Use this type definition file as a template – just add the appropriate lines for

your new types• You can specify gene set, gene-gene edge and mapping files via

URL (or .jar file, but that’s tricky)– Just type or paste the URL into the appropriate text field instead of clicking

“Browse…”• Potential issues to consider

– Identifiers used in your gene set/gene-gene edge file might not be found in the mapping file

– Genes in your mapping file might not be present in the network– These issues are written (rather crudely) to the Log

• Inspect the log file if you notice unexpected behavior

Page 21: EGAN Tutorial: Loading Network Data

Questions/comments?

• Visit http://groups.google.com/group/ucsf-egan for downloads, documentation and discussion– Requires an account with Google Groups