cage: a cancer gene annotation system for cancer...

15
CaGe User Guide 1 CaGe: a cancer gene annotation system for cancer genomics User’s Guide Content 1. What is CaGe 2. Audience of CaGe 3. Main menus and functions 4. Cancer gene annotation 5. Cancer pathway annotation 6. Browsing cancer & cancer related genes 7. Browsing cancer pathways 1. What is CaGe CaGe: is a cancer gene annotation system which provides information on cancer genes, mutations and associated annotations based on the reported cancer causative and related gene information database. For an input in which various formats are allowed, CaGe searches cancer gene databases with converted standard gene symbols and provides cancer gene related annotations efficiently through intuitive web user interfaces (Figure 1). Also, CaGe serves additional useful functions including cancer pathway annotation, job managements for the user submitted annotation tasks, and browsing cancer genes and pathways with the various useful cross-links between the annotations and to the external public annotation databases. CaGe will be a useful tool in the identification of cancer causative mutations in the NGS-based cancer genomics. Figure 1 Schematic diagram for the construction and service of CaGe.

Upload: others

Post on 15-Jan-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

CaGe User Guide 1

CaGe: a cancer gene annotation system for cancer genomics

User’s Guide

Content 1. What is CaGe

2. Audience of CaGe

3. Main menus and functions

4. Cancer gene annotation

5. Cancer pathway annotation

6. Browsing cancer & cancer related genes

7. Browsing cancer pathways

1. What is CaGe

CaGe: is a cancer gene annotation system which provides information on cancer genes, mutations

and associated annotations based on the reported cancer causative and related gene information

database. For an input in which various formats are allowed, CaGe searches cancer gene databases

with converted standard gene symbols and provides cancer gene related annotations efficiently

through intuitive web user interfaces (Figure 1). Also, CaGe serves additional useful functions

including cancer pathway annotation, job managements for the user submitted annotation tasks, and

browsing cancer genes and pathways with the various useful cross-links between the annotations and

to the external public annotation databases. CaGe will be a useful tool in the identification of cancer

causative mutations in the NGS-based cancer genomics.

Figure 1 Schematic diagram for the construction and service of CaGe.

Page 2: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

2 CaGe User Guide

Motivation: The second-generation (or next-generation) DNA sequencing (NGS) has become an

effective tool for cancer genomic studies through various applications including whole-genome,

exome and transcriptome approaches.

One of the hurdles in current cancer genomics is to identify causative somatic mutations out of many

candidates. A common approach is to refer to known cancer causative gene list and associated

information. Known cancer gene information can be used for determining cancer driver mutations (or

genes) and the cancer gene related information including expression, interactions and pathways can be

also useful in scoring novel candidates.

Main features:

- Powerful cancer gene annotation function in various input types and formats,

- Cancer pathway annotation,

- Job management for user submitted annotation tasks,

- Browsing function for cancer genes and related information including mutations, cancer types,

literature, tissues, and pathways, etc.,

- Browsing function for cancer pathways,

- Cross-search between cancer genes and pathway information,

- Download function of annotation results in a tab delimited text format,

- And various external links to the public annotation databases.

Raw data sources:

- Cancer gene information: Cancer Gene Census (Rel. 2010. 12.; Welcome Trust Sanger Institute),

Cancer Gene Index (downloaded on 2011. 2.; National Cancer Institute)

- Pathway set: KEGG pathway (Rel.57, 2011. 1.), BioCarta, Reactome (downloaded on 2011. 2.)

- Gene information: Entrez Gene (downloaded on 2011. 2.)

- Gene name information: HGNC (downloaded on 2011. 2.)

- Protein information: UniProt (Rel. 2011. 3.)

- External links: Entrez Gene, HGNC, OMIM, UniProt, PubMed, KEGG, BioCarta, Reactome

Statistics of the information in CaGe:

Table 1 Statistics of the information in CaGe.

No. Cancer & pathway gene set Gene set code # of Genes # of Sig. Pathwaysa)

1 Cancer Gene Census CGC 436 -

2 Cancer Gene Index CGI 7181 -

3 CGC-based Pathway Genes CGC_PATH 5790 143

4 CGI-based Pathway Genes CGI_PATH 6744 176

a) Pathways with p-values < 0.05 in Fisher's exact test.

Availability: CaGe is accessible at http://mgrc.kribb.re.kr/cage.

CaGe was developed and is maintained by Medical Genomics Research Center (MGRC) at Korea

Research Institute of Bioscience and Biotechnology (KRIBB).

Page 3: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

CaGe User Guide 3

2. Audience of CaGe

The CaGe is designed to prioritize cancer causative gene candidates into 1) reported known cancer

genes, 2) novel cancer gene candidates, and 3) others and to provide various useful annotations for

those genes. So it will be an effective tool for cancer genomics researchers as an indispensable tool in

the identification of cancer driver mutations through the massive-parallel sequencing based cancer

genomics.

3. Main menus and functions

A brief summary for main functions of the CaGe (http://mgrc.kribb.re.kr/cage) are as follows.

-Home menu ((1) in Figure 1): This menu is linked to the home page of the CaGe. The home page

shows an introduction to the CaGe and some related information. Also, the home page provides the

user’s guide for CaGe in ‘.pdf’ format and some useful links about CaGe, cancer research related

sites, and other related web sites serviced by the KRIBB.

-AnnoGenes menu ((2) in Figure 1): This menu is linked to the cancer gene annotation page

providing a key function of the CaGe. Through the annotation page, for a given gene list, CaGe

searches cancer gene databases with converted standard gene symbols and provides cancer gene

related annotations through intuitive web user interfaces.

-AnnoPathways menu ((3) in Figure 1): This menu is linked to the cancer pathway annotation

page providing another key function of the CaGe. Through this cancer gene annotation page,

CaGe evaluates overlaps between user input gene set and cancer pathways or other biological

pathways and conduct statistical test.

-CancerGenes menu ((4) in Figure 1): This menu is linked to the cancer gene browsing page. In

addition to the annotation function, Cage provides a cancer gene browsing function by showing a

table formatted cancer gene and cancer related gene list with various cancer related annotations.

-CancerPathways menu ((5) in Figure 1): This menu is linked to the cancer pathway browsing

page. This page provides a significantly intersecting pathway list for the reported cancer gene sets.

The detailed information of each pathway including a member gene list of that pathway is

provided in the additional detailed pathway information page linked to each pathway name.

Moreover, the links on the genes in a detailed pathway information page connects the browsing

flow to the cancer and cancer related gene information shown by the Browse menu.

Figure 2 Main menu of the CaGe.

Page 4: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

4 CaGe User Guide

4. Cancer gene annotation

The cancer gene annotation is a key function of the CaGe. The cancer gene annotation page is linked

to one of the main menus, ‘AnnoGenes’. Through this cancer gene annotation page, CaGe searches

cancer gene databases with converted standard gene symbols for an input gene list, and provides

cancer gene related annotation through intuitive web user interfaces. (See the workflow composed of

serially connected steps as A�1�2�3�4, 5 or 8 in Figure 3.)

Figure 3 Information flow in the retrieval system of CaGe.

Detailed cancer gene annotation procedure is as follows:

A. Job submission (A����1 in Figure 3)

1) Click the ‘AnnoGenes‘ menu (2) in Figure 2 then the gene list input page is open (Figure 4).

2) Input a gene list into the gene list input area (6) in Figure 4, or select a gene list file using the file

upload button (8) in Figure 4.

3) Select a gene list data type inserted in the gene list input area or selected file in step 2) by using the

radio buttons (9) in Figure 4.

4) Select one or more cancer gene sets or cancer pathway gene sets to annotate the user gene list by

checking the checkboxes (10) in Figure 4.

5) Submit the job by clicking [Annotation] button (11) in Figure 4. Then the input gene list or selected

gene list file to be annotated is transferred to the CaGe server and annotation job management

page is shown (Figure 5).

6) Button (12) in Figure 4 provides the function for clearing gene list input area (6) in Figure 4.

Page 5: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

CaGe User Guide 5

Figure 4 User gene list or gene list file input page for cancer gene annotation by AnnoGenes menu.

Figure 5 Annotation job management page after a job submission. The lowest job is just submitted

one with ‘Submitted’ status.

Page 6: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

6 CaGe User Guide

B. Seeing annotation results (1����2����3����4, 5 or 8 in Figure 3)

After a submission of the annotation job, annotation job management page is shown (Figure 5). Any

undeleted previous jobs are listed in the job list table sorted by submission date and time. And the

currently submitted job follows at the last row on the table. Users can see the annotation results for

any listed jobs on the job list table and can delete unnecessary jobs.

-To see the annotation result for any job in the list, click the name of the job (14) in the list (Figure 5),

then the annotation result tables including 1) job information table and 2) annotation result table are

shown below the job list table (Figure 6).

1) Job information table (1�2 in Figure 3)

The job information table shows detailed information about the clicked job including the input gene

list type, cancer gene sets applied, statistics about the genes and mutations in the user input, and links

((16) in Figure 6) for two text output files, annotation summary file and annotation file for matched

cancer genes.

-The link on summary file is connected to the annotation summary file shown as (a) in Figure 7. It

is a text file containing a brief job information summary, the list of the annotated genes and cancer

gene sets applied for annotation.

-The link on annotation file for matched cancer genes is connected to the annotation result file

shown as (b) in Figure 7. It is also a text file including all the content of the user input gene list or

gene list file and annotations for each gene matched to the cancer genes or cancer pathway genes

in the CaGe supported cancer gene database (Table 1). Fields are tab-delimited.

2) Annotation result table (1�2�3�4, 5 or 8 in Figure 3)

The annotation result table shows the matched cancer gene list and associated cancer gene

annotations including cancer type, cancer pathway, or gene’s role in cancer. The links on the gene

symbols (links (17) in Figure 6) provide the detailed gene information page (Figure 8) to show the

detailed annotations associated with genes including basic gene annotations, external database links

(HGNC, OMIM, Ensembl, HPRD, UniProt), cancer related annotations, and cancer pathways.

Annotations shown in detailed gene information page are somewhat different according to the type of

cancer gene set sources. For the case of the CGC genes detailed gene information page is shown as

(a) in Figure 8 and for the case of the CGI genes it is shown as (b) in Figure 8.

-The link on the gene symbol, (18) or (21) in Figure 8 is connected to the gene information page of

the NCBI Entrez gene database.

-The links on the external public database names, (19) or (22) in Figure 8 are connected to the

corresponding gene information page of each database.

-The links on the cancer gene related pathway names, (20) or (24) in Figure 8 are connected to the

detailed pathway information page shown in Figure 9. And the link on the pathway database name,

(25) in Figure 9 provides a connection to the pathway information pages of corresponding pathway

databases including KEGG and Reactome (Figure 10). Currently the link to the BioCarta is not

provided based on the license policy of BioCarta database. And links on the gene symbols (26) and

(27) are connected detailed gene information page (Figure 8) again to provide cross-search

between gene and pathway information.

-For the CGI cancer genes, one more link is provided on PubMed IDs for the reference literatures,

(23) in Figure 8 connected with each literature information page of the PubMed sites.

Page 7: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

CaGe User Guide 7

Figure 6 Annotation result page composed of the two tables to show the job information and matched

cancer genes.

(a)

(b)

Figure 7 Partial contents of (a) annotation summary and (b) annotation result file in tab delimitated

text.

Page 8: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

8 CaGe User Guide

(a)

(b)

Figure 8 Detailed cancer gene information pages: (a) for Cancer Gene Census genes and (b) Cancer

Gene Index genes.

Page 9: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

CaGe User Guide 9

Figure 9 A detailed information page for a cancer gene pathway.

Figure 10 A pathway page on the KEGG site hyperlinked from the pathway information page.

C. Deleting the previous annotation jobs (1����1 in Figure 3)

Users can delete unnecessary jobs by clicking the delete links in the Action column of the job list

table ((15) in Figure 5). All the information related with the clicked job is deleted in the CaGe server.

[Note] Caution!!! CaGe does not confirm deleting the job before performing the delete process.

Page 10: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

10 CaGe User Guide

5. Cancer pathway annotation

The cancer pathway annotation is another key function of the CaGe. The cancer pathway annotation

page is linked to one of the main menus, ‘AnnoPathways’. Through this cancer gene annotation

page, CaGe evaluates overlaps between user input gene set and cancer pathways or other biological

pathways and conduct statistical test (one tailed Fisher’s exact test based on the hypergeometric

distribution. The cancer pathway annotation workflow composed of serially connected steps as

B�6�7�8�9 or 3 in Figure 3).

Detailed cancer pathway annotation procedure is as follows:

A. Job submission (B����6 in Figure 3)

1) Click the ‘AnnoPathways‘ menu (3) in Figure 2 then the gene list input page is open (Figure 11).

2) Input a gene list into the gene list input area (6’) in Figure 4, or select a gene list file using the file

upload button (8’) in Figure 11.

3) Select a gene list data type inserted in the gene list input area or selected file in step 2) by using the

radio buttons (9’) in Figure 11.

4) Select a pathway database to be used for annotation by checking the checkboxes (10’) in Figure 11.

Currently 3 pathway databases are provided: 2 pre-analyzed cancer pathways (Cancer Gene

Census gene-based pathways and Cancer Gene Index gene-based pathways) for the cancer

pathway limited analysis and general pathways from BioCarta/KEGG/Reactome for general

biological pathway analysis.

5) Submit the job by clicking [Annotation] button (11’) in Figure 11. Then the input gene list or

selected gene list file to be annotated is transferred to the CaGe server and annotation job

management page is shown (Figure 12).

6) Button (12’) in Figure 11 provides the function for clearing gene list input area (6’) in Figure 11.

B. Seeing pathway annotation results (6����7����8����9 or 3 in Figure 3)

After a submission of the pathway annotation job, annotation job management page is shown (Figure

12). Any undeleted previous jobs are listed in the job list table sorted by submission date and time.

And the currently submitted job follows at the last row on the table. Users can see the annotation

results for any listed jobs on the job list table and can delete unnecessary jobs.

-To see the pathway annotation result for any job in the list, click the name of the job (14’) in the list

(Figure 12), then the annotation result tables including 1) job information table and 2) annotation

result table are shown below the job list table (Figure 13).

1) Job information table (6�7 in Figure 3)

The job information table shows detailed information about the clicked pathway annotation job

including the input gene list type, cancer gene sets applied, statistics about the genes and mutations in

the user input, and links ((16’) in Figure 13) for two text output files, (a) Overlap gene file including

intersecting genes between user input gene set and each tested cancer pathways and (b) Statistical test

result file for the intersection in tab delimitated text

Page 11: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

CaGe User Guide 11

Figure 11 User gene list or gene list file input page for cancer pathway annotation by AnnoPathways

menu.

Figure 12 Annotation job management page after a job submission. The lowest job is just submitted

one with ‘Submitted’ status.

Page 12: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

12 CaGe User Guide

Figure 13 Annotation result page composed of the two tables to show the job information and

assigned cancer pathways.

(a)

(b)

Figure 14 Partial contents of (a) intersecting genes between user input gene set and each tested cancer

pathways and (b) statistical test result file for the intersection in tab delimitated text.

Page 13: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

CaGe User Guide 13

-The link on Overlap gene file is connected to the pathway-overlapping gene file shown as (a) in

Figure 14. It is a text file containing a group of information for intersecting genes between user

input gene set and each tested cancer pathways including pathway names, number of genes

overlapping, number of genes in tested pathways, overlapping gene list and pathway gene list.

-The link on Statistical test result file is connected to the statistical test result file shown as (b) in

Figure 14. This file has the p-values for the overlapping genes between the user gene set and tested

pathways and q-values for the multiple test correction with FDR control.

2) Annotation result table (6�7�8�9 or 3 in Figure 3)

The annotation result table shows the assigned pathway list and statistical parameters. The links on

the pathway names (links (17’) in Figure 13) provide the detailed pathway information page as

described in ‘B. Seeing annotation results’ part in the previous chapter and Figure 9. The link on the

pathway database name, (25) in Figure 9 provides a connection to the pathway information pages of

corresponding pathway databases including KEGG and Reactome (Figure 10). And links on the gene

symbols (26) and (27) are connected detailed gene information page (Figure 8) again to provide

cross-search between gene and pathway information.

C. Deleting the previous annotation jobs (6����6 in Figure 3)

Users can delete unnecessary jobs by clicking the delete links in the Action column of the job list

table ((15’) in Figure 12). All the information related with the clicked job is deleted in the CaGe

server.

[Note] Caution!!! CaGe does not confirm deleting the job before performing the delete process.

6. Browsing cancer and cancer related genes

The CancerGenes menu is linked to the cancer gene browsing page providing cancer gene browsing

function showing a table formatted cancer gene and cancer related gene list with various cancer

related annotations as shown in Figure 11. The information search flow in cancer gene browsing page

is composed of some serial steps, C�2�3�4, 5 or 8 in Figure 3.

-The listed cancer gene and cancer related gene sets are 1) CGC cancer gene set, 2) CGI cancer

gene set, 3) CGC-based cancer pathway gene set, and 4) CGI-based cancer pathway gene set as

summarized in Table 1.

-The information in the cancer gene list table is the same as that in annotation result table of

Cancer Gene Annotation function in Figure 6. And the information search flow after the link on

the gene symbol, (28) in Figure 15 is also the same (2�3�4, 5 or 8 in Figure 3).

Page 14: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

14 CaGe User Guide

Figure 15 Cancer and cancer pathway gene list page by Browse menu.

7. Browsing cancer pathways

The CancerPathways menu is linked to the cancer pathway browsing page providing a significantly

enriched pathway list for the reported cancer gene sets as shown in Figure 16. The detailed

information of each pathway including the member gene list is provided in the additional detailed

pathway information page as previously shown in Figure 9. And the links on the genes in a detailed

pathway information page connect the browsing flow to the cancer and cancer related gene

information by the Browse menu. So, the information search flow in cancer pathway browsing page

is composed of serial steps, D�7�8�9 or 3 in Figure 3.

-The cancer pathway sets are 1) CGC-based cancer pathway set and 2) CGI-based cancer

pathway set.

-The information in the cancer pathway list table is a pathway name with a source pathway

database, the number of genes in the pathway, the number of overlapping genes between base

cancer gene set and pathway, p-value for the overlap, and q-value for the multiple test correction

with FDR.

-Both links on the pathway names (29), as shown in Figure 16, are connected to the detailed

pathway information page as shown in Figure 9. Information search flow after the detailed

pathway information page is the same as previously described in the direction about annotation

result table in the chapter 4 Cancer gene annotation.

Page 15: CaGe: a cancer gene annotation system for cancer genomicsmgrc.kribb.re.kr/cage/include/cageUserGuide.pdf · CaGe: a cancer gene annotation system for cancer genomics User’s Guide

CaGe User Guide 15

Figure 16 Cancer pathway list page by Pathways menu.

Comments and questions:

Young-Kyu Park ([email protected]) or Seon-Young Kim ([email protected])

[End of document]