literature informatics august 20 , 2013

158
Literature Informatics August 20, 2013 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Service Health Sciences Library System University of Pittsburgh [email protected] http://www.hsls.pitt.edu/molbio

Upload: pascal

Post on 25-Feb-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Literature Informatics August 20 , 2013. Ansuman Chattopadhyay , PhD Head, Molecular Biology Information Service Health Sciences Library System University of Pittsburgh [email protected] http://www.hsls.pitt.edu/molbio. Ansuman Chattopadhyay , Ph.D. [email protected]. 1990-1996 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Literature Informatics August 20 ,  2013

Literature InformaticsAugust 20, 2013

Ansuman Chattopadhyay, PhDHead, Molecular Biology Information ServiceHealth Sciences Library SystemUniversity of [email protected]

http://www.hsls.pitt.edu/molbio

Page 2: Literature Informatics August 20 ,  2013

•1990-1996University of Nebraska-LincolnPhD. in BiochemistryProtein synthesis initiation in eukaryotic system

•1997-2001Vanderbilt University School of Medicine, NashvilleResearch FellowEpidermal Growth Factor (EGF) mediated signal transduction

•2001-2002Cellomics Inc., PittsburghKnowledge Engineer

•2002- 2006Information Specialist in Molecular Biology and GeneticsHSLS, University of Pittsburgh

Ansuman Chattopadhyay, [email protected]

•2006- PresentHead, Molecular Biology Information ServiceHSLS, University of Pittsburgh

http://www.hsls.pitt.edu/molbio

Page 3: Literature Informatics August 20 ,  2013

•1993-2001SUNY Upstate Medical University, SyracusePhD in NeuroscienceSpecificity of connectivity in the mammalian olfactory system

•2001-2005Yale University School of Medicine, New HavenPostdoctoral FellowDevelopment & regeneration in the mammalian olfactory system

•2004-2006Southern Connecticut State University, New HavenMasters in Library Science

•2006- 2007Johns Hopkins Medical Institutions, Welch Medical Library, BaltimoreBasic Science Librarian

Carrie L Iwema, PhD, MLS, AHIP

[email protected]

•2007- PresentUniversity of Pittsburgh, Health Sciences Library System, PittsburghInformation Specialist in Molecular Biology

http://www.hsls.pitt.edu/molbio

Page 4: Literature Informatics August 20 ,  2013

HSLS Molecular Biology Information Service

Workshops

Website

Software Licensing

Bioinformatics Consultations

http://www.hsls.pitt.edu/molbio

Page 5: Literature Informatics August 20 ,  2013

Today’s Agenda 10 am to 12 pm

Genes, Proteins and Literature Searching

1pm to 3 pm

NCBI Resources - Genetic Variations Databases

3pm to 3:30 pmQ A Session

Page 6: Literature Informatics August 20 ,  2013

Literature Informatics

http://www.hsls.pitt.edu/guides/genetics

Page 7: Literature Informatics August 20 ,  2013

Learn how to

.. find the most appropriate literature

.. mine the literature

.. manage your collected information

.. browse scientific papers

http://www.hsls.pitt.edu/molbio

Page 8: Literature Informatics August 20 ,  2013

Topics

Introduction Intuitive PubMed Search Next-Generation Literature Search

Tools Reference Management PDF Reader

http://www.hsls.pitt.edu/molbio

Page 9: Literature Informatics August 20 ,  2013

Literature Informatics

Comprehensive search: MESH term based PubMed Search PubMed special topics query

Next-generation literature search tools: GoPubMed, eTBlast Quertle HuGE Navigator Utopia Docs

http://www.hsls.pitt.edu/molbio

Page 10: Literature Informatics August 20 ,  2013

Introduction

Page 11: Literature Informatics August 20 ,  2013

Genomic achievements since the Human Genome Project

http://www.hsls.pitt.edu/molbio

Page 12: Literature Informatics August 20 ,  2013

Progress in Genomics1990 2003 2013

Time

Technology

6-8 year 3-4 months 2-3 days Time

1B 10-50 M 4-6 K

Cost Source: Eric Green; HGP10 Symposium

Page 13: Literature Informatics August 20 ,  2013

DNA Sequencing Cost

http://www.hsls.pitt.edu/molbio

Page 14: Literature Informatics August 20 ,  2013

Big DATA Biology

Single GeneSingle Protein

Single lab

Small Science

Multi-Gene System-wide

High-throughputMulti-Institution

Big Science

Page 15: Literature Informatics August 20 ,  2013

Growth of PubMed Citations

Lu et al. Database (Oxford). 2011: baq036

266 K

• Breast Cancer

104K • Schizophrenia

9.9K • BRCA1

67K . p53

Aug 20th 2013

http://www.hsls.pitt.edu/molbio

Page 16: Literature Informatics August 20 ,  2013

Searching PubMed

http://www.hsls.pitt.edu/molbio

Page 17: Literature Informatics August 20 ,  2013

Find published literature with statistical and numerical data on DENGUE OUTBREAKS in India.

What genes are reported to be associated with the disease SCHIZOPHRENIA?

http://www.hsls.pitt.edu/molbio

Page 18: Literature Informatics August 20 ,  2013

Citations: 20 millionJournals: 5200

Schizophrenia: 96,912Schizophrenia gene: 7382

Dengue outbreaks in India: 329 Dengue outbreaks statistics India: 21

http://www.hsls.pitt.edu/molbio

Page 19: Literature Informatics August 20 ,  2013

Challenges

Am I getting everything / the right things?

How to digest this?

http://www.hsls.pitt.edu/guides/genetics

Page 20: Literature Informatics August 20 ,  2013

Medical Subject Headings (MeSH)

http://www.hsls.pitt.edu/guides/genetics

The U.S. National Library of Medicine's controlled vocabulary (thesaurus)

Arranged in a hierarchical manner called the MeSH Tree Structures

Updated annually

Page 21: Literature Informatics August 20 ,  2013

MeSH Vocabulary Headings

over 24,000 representing concepts found in the biomedical literature (Body Weight, Kidney, Radioactive Waste)

Subheadings attached to headings to describe a specific aspect of a

concept (adverse effects , metabolism, diagnosis, therapy)

Supplementary Concept Records over 172,000 terms in a separate chemical thesaurus -

updated weekly (cordycepin , valspodar , tacrolimus binding protein 4)

Publication Types(Letter, Review, Randomized Controlled Trial)

http://www.hsls.pitt.edu/guides/genetics

Page 22: Literature Informatics August 20 ,  2013

MeSH Tree Structure A. Anatomy

B. OrganismsC. DiseasesD. Chemical and DrugsE. Analytical, Diagnostic and

Therapeutic Techniques and EquipmentF. Psychiatry and PsychologyG. Biological SciencesH. Physical SciencesI. Anthropology, Education,

Sociology and Social PhenomenaJ. Technology and Food and BeveragesK. Humanities L. Information Science M. Persons N. Health CareV. Publication Characteristics Z. Geographic Locations

http://www.hsls.pitt.edu/guides/genetics

Page 23: Literature Informatics August 20 ,  2013

MeSH Indexing

http://www.hsls.pitt.edu/guides/genetics

Source: NLM

Page 24: Literature Informatics August 20 ,  2013

MeSH Indexing

http://www.hsls.pitt.edu/guides/genetics

Page 25: Literature Informatics August 20 ,  2013

MeSH Indexing

http://www.hsls.pitt.edu/guides/genetics

Page 26: Literature Informatics August 20 ,  2013

Find published literature with statistical and numerical data on DENGUE OUTBREAKS in India.

Page 27: Literature Informatics August 20 ,  2013

PubMed Query Using MeSH http://www.ncbi.nlm.nih.gov/mesh

http://www.hsls.pitt.edu/molbio

Page 28: Literature Informatics August 20 ,  2013

http://www.hsls.pitt.edu/molbio

Find articles on “Dengue outbreaks in India” by searching PubMed using MeSH terms

Link to the video tutorial:http://media.hsls.pitt.edu/media/clres2705/mesh.swf

Resources•Mesh Browser : http://www.ncbi.nlm.nih.gov/mesh

•PubMed: http://www.ncbi.nlm.nih.gov/pubmed

Page 29: Literature Informatics August 20 ,  2013

PubMed Query Using MeSH

http://www.hsls.pitt.edu/guides/genetics

Page 30: Literature Informatics August 20 ,  2013

PubMed Query Using MeSH

http://www.hsls.pitt.edu/guides/genetics

Page 31: Literature Informatics August 20 ,  2013

Building a PubMed Query

http://www.hsls.pitt.edu/guides/genetics

Page 32: Literature Informatics August 20 ,  2013

Building a PubMed Query

http://www.hsls.pitt.edu/guides/genetics

Page 33: Literature Informatics August 20 ,  2013

Building a PubMed Query

http://www.hsls.pitt.edu/guides/genetics

Page 34: Literature Informatics August 20 ,  2013

Building a PubMed Query

http://www.hsls.pitt.edu/guides/genetics

Page 35: Literature Informatics August 20 ,  2013

Building a PubMed Query

http://www.hsls.pitt.edu/guides/genetics

Page 36: Literature Informatics August 20 ,  2013

Building PubMed QueriesTerm Boolean Term Boolean Term # papersDengue AND Outbreaks 823Dengue * AND Outbreaks 746Dengue AND Outbreaks AND India 123Dengue* AND Outbreaks AND India 116Dengue AND Outbreaks/

statistics and numerical data

AND India 7

Dengue* AND Outbreaks/statistics and numerical data

AND India 7

http://www.hsls.pitt.edu/guides/genetics

Page 37: Literature Informatics August 20 ,  2013

Useful Links for MeSH

MESH Browser: http://www.ncbi.nlm.nih.gov/mesh 18 ways to improve your Pubmed searches by Carrie Iwema

http://bitesizebio.com/2008/03/05/18-ways-to-improve-your-pubmed-searches/

Searching by using the MeSH Database. NCBI Handbook : http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helppubmed&part

=pubmedhelp#pubmedhelp.Searching_by_using_t

http://www.hsls.pitt.edu/guides/genetics

Page 38: Literature Informatics August 20 ,  2013

What genes are reported to be associated with the disease SCHIZOPHRENIA?

Page 39: Literature Informatics August 20 ,  2013

Topic-Specific PubMed Querieshttp://www.nlm.nih.gov/bsd/special_queries.html

http://www.hsls.pitt.edu/guides/genetics

Page 40: Literature Informatics August 20 ,  2013

http://www.hsls.pitt.edu/molbio

Find genes that are reported to be associated with the disease SCHIZOPHRENIA by searching PubMed

Link to the video tutorial:http://media.hsls.pitt.edu/media/clres2705/scz.swf

Resources•PubMed Clinical Queries: http://www.ncbi.nlm.nih.gov/pubmed/clinical

Page 41: Literature Informatics August 20 ,  2013

PubMed Special Topic Queries

http://www.hsls.pitt.edu/molbio

Page 42: Literature Informatics August 20 ,  2013

Search Filters

http://www.hsls.pitt.edu/molbio

Page 43: Literature Informatics August 20 ,  2013

PubMed Search Filter: Medical Genetics ("schizophrenia"[MeSH Terms] OR

"schizophrenia"[All Fields]) AND (("genetics, medical"[MeSH Terms] OR ("genetics"[All Fields] AND "medical"[All Fields]) OR "medical genetics"[All Fields] OR ("medical"[All Fields] AND "genetics"[All Fields])) OR ("genotype"[MeSH Terms] OR "genotype"[All Fields]) OR "genetics"[Subheading] AND ("genetics"[Subheading] OR "genetics"[All Fields] OR "genetics"[MeSH Terms]))

http://www.hsls.pitt.edu/molbio

Page 44: Literature Informatics August 20 ,  2013

Topic-Specific PubMed Queries

Page 45: Literature Informatics August 20 ,  2013

PubMed Search Result Display

http://www.hsls.pitt.edu/molbio

How to digestthis?

Page 46: Literature Informatics August 20 ,  2013

Data Mining-Knowledge Discovery

http://www.hsls.pitt.edu/molbiohttp://www.hsls.pitt.edu/molbio

Page 47: Literature Informatics August 20 ,  2013

Next-Generation Literature Search Tools

GoPubMedQuertle

Page 48: Literature Informatics August 20 ,  2013

Latest Innovations in Literature Searching

GoPubMed Display search results sorted into meaningful topics and subtopics

http://www.hsls.pitt.edu/molbio

Page 49: Literature Informatics August 20 ,  2013

GoPubMed

http://www.hsls.pitt.edu/molbio

www.gopubmed.com

Page 50: Literature Informatics August 20 ,  2013

http://www.hsls.pitt.edu/molbio

Find genes that are reported to be associated with the disease SCHIZOPHRENIA by using GoPubMed

Link to the video tutorial:http://media.hsls.pitt.edu/media/clres2705/gopubmed.swf

Resources• GoPubMed: http://www.gopubmed.org/web/gopubmed/2?WEB10O00h00100090000

Page 51: Literature Informatics August 20 ,  2013

GoPubMed Search Result

http://www.hsls.pitt.edu/molbio

Page 52: Literature Informatics August 20 ,  2013

GoPubMed Search Result Analysis

http://www.hsls.pitt.edu/molbio

Page 53: Literature Informatics August 20 ,  2013

GoPubMed Search Result Analysis

http://www.hsls.pitt.edu/molbio

Page 54: Literature Informatics August 20 ,  2013

Latest Innovations in Literature Searching

http://www.hsls.pitt.edu/molbio

Page 55: Literature Informatics August 20 ,  2013

GoPubMed

Noteworthy links

GoPubMed: exploring PubMed with the Gene Ontology. Doms A,Schroeder M., Nucleic Acids Res. 2005 Jul 1; 33 (Web Server issue):W783-6. http://www.ncbi.nlm.nih.gov/pubmed/15980585

http://www.hsls.pitt.edu/molbio

Page 56: Literature Informatics August 20 ,  2013

PubMed driven Web Tools

http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/search/

http://www.hsls.pitt.edu/molbio

Page 57: Literature Informatics August 20 ,  2013

PubMed based Tools

http://www.hsls.pitt.edu/molbio

Lu et al. Database (Oxford). 2011; 2011: baq036

http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/search/

Page 58: Literature Informatics August 20 ,  2013

Extract gene list from Literature

http://www.hsls.pitt.edu/molbio

http://www.quertle.info/

Page 59: Literature Informatics August 20 ,  2013

Questions for Quertle

What genes cause Asthma? What cell lines are used in diabetes research? Which cell types are known to express EGFR? What animals are used in studies for diabetes? Which protein kinases activate TP53?

http://www.hsls.pitt.edu/molbio

Page 60: Literature Informatics August 20 ,  2013

http://www.hsls.pitt.edu/molbio

A short video on Quertle

Link to the video tutorial:http://media.hsls.pitt.edu/media/molbiovideos/quertle-ac0212.swf

Resources•Quertle: www.quertle.info

Page 61: Literature Informatics August 20 ,  2013

Search Engine for NIH funded research http://projectreporter.nih.gov/reporter.cfm

http://www.hsls.pitt.edu/molbio

Page 62: Literature Informatics August 20 ,  2013

NIH Grant Applications to Gene List

http://www.hsls.pitt.edu/molbio

Page 63: Literature Informatics August 20 ,  2013

PubMed

MESH GoPubMed Quertle

Page 64: Literature Informatics August 20 ,  2013

Search Engine for finding

Disease Causing Genes

Page 65: Literature Informatics August 20 ,  2013

Search Engine Just for Human GeneticsCDC HuGENavigator : http://hugenavigator.net/

http://www.hsls.pitt.edu/molbio

Page 66: Literature Informatics August 20 ,  2013

http://www.hsls.pitt.edu/molbio

Find human genes reported to be associated with Asthma

Find human SNPs reported to be associated with Asthma

Link to the video tutorial:http://media.hsls.pitt.edu/media/clres2705/asthma.swf

Resources• HugeNavigator: http://hugenavigator.net/HuGENavigator/home.do

Page 67: Literature Informatics August 20 ,  2013

GWAS Catalog http://www.ebi.ac.uk/fgpt/gwas/#timeseriestab

http://www.hsls.pitt.edu/molbio

Page 68: Literature Informatics August 20 ,  2013

Search Engine Just for Human Genetics

http://www.hsls.pitt.edu/molbio

Page 69: Literature Informatics August 20 ,  2013

Search Engine Just for Human GeneticsCDC HuGENavigator : http://hugenavigator.net/

http://www.hsls.pitt.edu/molbio

Page 70: Literature Informatics August 20 ,  2013

Search Engine Just for Human Geneticshttp://hugenavigator.net/HuGENavigator/huGEPedia.do

http://www.hsls.pitt.edu/molbio

Page 71: Literature Informatics August 20 ,  2013

Search Engine Just for Human GeneticsCDC HuGENavigator : http://hugenavigator.net/

http://www.hsls.pitt.edu/molbio

Page 72: Literature Informatics August 20 ,  2013

Find Disease Causing SNPs

What SNPs are associated with “Schizophrenia”?

http://hugenavigator.net/HuGENavigator/gWAHitStartPage.do

http://www.hsls.pitt.edu/molbio

Page 73: Literature Informatics August 20 ,  2013

Hands On

Search PubMed and retrieve a list of genes that can serve as biomarkers for Alzheimer Disease

Page 74: Literature Informatics August 20 ,  2013

Software for Finding Similar Text in Published Literature

Page 75: Literature Informatics August 20 ,  2013

Text-based Similarity Search Tools

Search Box:

http://www.hsls.pitt.edu/molbio

Page 76: Literature Informatics August 20 ,  2013

Text Similarity Search Tools

eTBLASThttp://etest.vbi.vt.edu/etblast3/

http://www.hsls.pitt.edu/molbio

Page 77: Literature Informatics August 20 ,  2013

Text Similarity Search Tools eTBLAST

http://www.hsls.pitt.edu/molbio

Page 78: Literature Informatics August 20 ,  2013

Text Similarity Search Tools

http://www.hsls.pitt.edu/molbio

Page 79: Literature Informatics August 20 ,  2013

Déjà Vu: a Database of Highly Similar Citationshttp://dejavu.vbi.vt.edu/dejavu/duplicate/

http://www.hsls.pitt.edu/molbio

Page 80: Literature Informatics August 20 ,  2013

Reference Management Tools

http://www.hsls.pitt.edu/molbio

Page 81: Literature Informatics August 20 ,  2013

Automated email Notification Tool

http://www.ncbi.nlm.nih.gov/sites/myncbi/Save your searches at My NCBI and set up an email notification on new publication based on your search query

http://www.hsls.pitt.edu/molbio

Page 82: Literature Informatics August 20 ,  2013

My NCBI1

23

http://www.hsls.pitt.edu/molbio

Page 83: Literature Informatics August 20 ,  2013

My NCBI

http://www.hsls.pitt.edu/molbio

4

Page 84: Literature Informatics August 20 ,  2013

My NCBI email Notification

http://www.hsls.pitt.edu/molbio

Page 85: Literature Informatics August 20 ,  2013

Reference Management Tools

Connotea

CiteULike

Mendaley

Zotero

online

downloadable

Scientific Research Papers

Web pages

EndNoteRefworks

http://www.hsls.pitt.edu/molbio

Page 86: Literature Informatics August 20 ,  2013

PDF Reader

http://www.hsls.pitt.edu/molbio

Page 87: Literature Informatics August 20 ,  2013

PDF Reader

Utopia.Docs ReadCube

Page 88: Literature Informatics August 20 ,  2013

NextGen PDF Reader: Utopia docs

http://www.hsls.pitt.edu/molbio

Page 89: Literature Informatics August 20 ,  2013

Utopia DocsPMID: 22683712

Page 90: Literature Informatics August 20 ,  2013

http://www.hsls.pitt.edu/molbio

Read a paper using Utopia.Docs

Link to the video tutorial:http://www.hsls.pitt.edu/molbio/videos/play?v=95

Resources• Utopia.Docs: http://getutopia.com/

Page 91: Literature Informatics August 20 ,  2013

READCUBE

READCUBE MEDIA FILE

Mendeley Utopia Docs

Page 92: Literature Informatics August 20 ,  2013

Search Engine for Life Scientists

http://www.hsls.pitt.edu/molbio

Page 93: Literature Informatics August 20 ,  2013

Molecular Databases

Nucleic Acids Research : Annual databases Issue NAR: Annual Web Server Issue Oxford Journal : Bioinformatics BioMedCentral: BMC Bioinformatics

http://www.hsls.pitt.edu/molbio

Growth of bioinformatics tools

Page 94: Literature Informatics August 20 ,  2013

Biomedical & Life Sciences Search Engines

OBRC : University of Pittsburghhttp://www.hsls.pitt.edu/molbio/obrc

Bioinformatics.cahttp://bioinformatics.ca/links_directory/

OReFil : University of Tokyohttp://orefil.dbcls.jp/

http://www.hsls.pitt.edu/molbio

Page 95: Literature Informatics August 20 ,  2013

Peptide Sequence >nxp|NX_P00533-1|EGFR|Epidermal growth factor receptor|Iso 1

MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEV VLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALA VLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDF QNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGC TGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYV VTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFK NCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAF ENLEIIRGRTKQHGQFSLAVVSLNITSLGLRSLKEISDGDVIISGNKNLCYANTINWKKL FGTSGQKTKIISNRGENSCKATGQVCHALCSPEGCWGPEPRDCVSCRNVSRGRECVDKCN LLEGEPREFVENSECIQCHPECLPQAMNITCTGRGPDNCIQCAHYIDGPHCVKTCPAGVM GENNTLVWKYADAGHVCHLCHPNCTYGCTGPGLEGCPTNGPKIPSIATGMVGALLLLLVV ALGIGLFMRRRHIVRKRTLRRLLQERELVEPLTPSGEAPNQALLRILKETEFKKIKVLGS GAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGI CLTSTVQLITQLMPFGCLLDYVREHKDNIGSQYLLNWCVQIAKGMNYLEDRRLVHRDLAA RNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSY GVTVWELMTFGSKPYDGIPASEISSILEKGERLPQPPICTIDVYMIMVKCWMIDADSRPK FRELIIEFSKMARDPQRYLVIQGDERMHLPSPTDSNFYRALMDEEDMDDVVDADEYLIPQ QGFFSSPSTSRTPLLSSLSATSNNSTVACIDRNGLQSCPIKEDSFLQRYSSDPTGALTED SIDDTFLPVPEYINQSVPKRPAGSVQNPVYHNQPLNPAPSRDPHYQDPHSTAVGNPEYLN TVQPTCVNSTFDSPAHWAQKGSHQISLDNPDYQQDFFPKEAKPNGIFKGSTAENAEYLRV

APQSSEFIGA

Page 96: Literature Informatics August 20 ,  2013

http://www.hsls.pitt.edu/molbio

Search for bioinformatics ResourcesLocate online tools that predict phosphorylation sites in a protein sequence.

Link to the video tutorial:http://media.hsls.pitt.edu/media/Search%20Bioinfo%20Resoursec.swf

Resources• Search.HSLS.MolBio :

Page 97: Literature Informatics August 20 ,  2013

Life Sciences Search Enginehttp://www.hsls.pitt.edu/molbio/

http://www.hsls.pitt.edu/molbio

Page 98: Literature Informatics August 20 ,  2013

Hands on

(a) Locate two online databases on microRNA targets, and report their URLs.

(b) Identify one database that store all HIV inhibiting siRNA data and cite its URL.

(c) Cite a paper offering a step-by-step guide

for analyzing ChIP-seq data.

Page 99: Literature Informatics August 20 ,  2013

Summary Construct a comprehensive PubMed query:

MESH Browser Retrieve gene/protein/disease information straight from a PubMed search :

GoPubmed, Quertle Find suitable journals for manuscript submission, etc… eTBLAST

Find disease causing genes and disease causing SNPs: HUGENavigator_ Geneprospector, Phenopedia, GWASIntegrator

Setup email alert for new publications My NCBI

Search for databases and software OBRC

http://www.hsls.pitt.edu/guides/genetics

Page 100: Literature Informatics August 20 ,  2013

Summary Setup email alert for new publications

My NCBI Search for databases and software

OBRC

http://www.hsls.pitt.edu/guides/genetics

Page 101: Literature Informatics August 20 ,  2013

Gene/Protein Information Mining

http://www.hsls.pitt.edu/guides/genetics

Page 102: Literature Informatics August 20 ,  2013

Bioinformatics Databases & Software Providers

National Center for Biotechnology Information (NCBI) Home page Site map Resource Guide

European Bioinformatics Institute (EBI) Home page Databases Software

http://www.hsls.pitt.edu/guides/genetics

Page 103: Literature Informatics August 20 ,  2013

Gene Information Gateways

o Open access resources:

National Center for Biotechnology Information (NCBI) Genbank Refseq

Entrez Gene Gene Expression Omnibus (GEO) OMIM

http://www.hsls.pitt.edu/guides/genetics

Page 104: Literature Informatics August 20 ,  2013

Protein Information Hubso Open access resources:

European Bioinformatics Institute (EBI)

Uniprot Interpro Prosite STRING

UCSC Genome Bioinformatics BLAT Search Gene Detail Page

http://www.hsls.pitt.edu/guides/genetics

Page 105: Literature Informatics August 20 ,  2013

Protein Information Hubso Open access resources:

National Center for Biotechnology Information (NCBI) Refseq Entrez Gene Conserved Domain Database (CDD) Molecular Modeling Database (MMDB) 3D structure viewer: Cn3D

http://www.hsls.pitt.edu/guides/genetics

Page 106: Literature Informatics August 20 ,  2013

Gene/Protein Information

Chromosomal location, mRNA, genomic seq, orthologs, paralogs, regulatory elements,

Amino acid seq, domain architecture, protein structure, post translational modifications

Gene expression, biological pathways, protein interaction map, disease association, biomarkers

http://www.hsls.pitt.edu/guides/genetics

Page 107: Literature Informatics August 20 ,  2013

Gene Questions ?

What is its function?

What are its neighboring genes?

What is its genomic seq?How many splice varients are there?What are its intron-exon architechure?

What diseases are associated with it?

Which tissues it expressed ?

How can I get its cDNA clone?

http://www.hsls.pitt.edu/guides/genetics

Page 108: Literature Informatics August 20 ,  2013

SNP

Genomic Sequence

Expression Profile

Interacting Partners3D Structure

mRNA Sequence

Chromosomal Localization

Disease

Amino acid Sequence

Homologous Sequences

http://www.hsls.pitt.edu/guides/genetics

NCBI : Entrez Gene

Page 109: Literature Informatics August 20 ,  2013

Entrez GeneFind: gene symbols and aliases sequences: genomic, mRNA, protein intron-exon architecture genomic context: neighboring and antisense

genes Interacting partners associated gene ontology terms: function,

cellular component and biological process

http://www.hsls.pitt.edu/guides/genetics

Page 110: Literature Informatics August 20 ,  2013

Entrez Gene a searchable database of genes, from RefSeq

genomes, and defined by sequence and/or located in the NCBI Map Viewer

Statistics Gene: 7974 organisms Genbank: 160,000 organisms

each record represents a single gene from a given organism

http://www.hsls.pitt.edu/guides/genetics

Page 111: Literature Informatics August 20 ,  2013

NCBI Sequence Databases

GenBank archival database of nucleotide sequences

from >160,000 organisms More info GenPept

conceptual translation of GenBank CDS Refseq

based on GenBank record, non-redundant expert verified databases of reference sequences

http://www.hsls.pitt.edu/guides/genetics

Page 112: Literature Informatics August 20 ,  2013

International Nucleotide Sequence Database Collaboration

http://www.hsls.pitt.edu/guides/genetics

Page 113: Literature Informatics August 20 ,  2013

Primary Vs Derivative databases

http://www.hsls.pitt.edu/guides/genetics

Page 114: Literature Informatics August 20 ,  2013

RefSeq Scope & Accessions

Genomic DNA NC_123456 - complete genome, complete

chromosome, complete plasmid NG_123456 - genomic region NT_123456 - genomic contig

mRNA - NM_123456 Protein - NP_123456

more about RefSeq scope and accessions...

http://www.hsls.pitt.edu/guides/genetics

Page 115: Literature Informatics August 20 ,  2013

RefSeq Status Codes

Provisional Reviewed Predicted Genome Annotation

more about RefSeq status codes

http://www.hsls.pitt.edu/guides/genetics

Page 116: Literature Informatics August 20 ,  2013

Hands on

Find mRNA sequence for your gene of interest (p53, BRCA1, EGFR, PLCg1)

Start page: Entrez core nucleotide Use Limits, History and Preview Index

http://www.hsls.pitt.edu/guides/genetics

Page 118: Literature Informatics August 20 ,  2013

Video Tutorials

http://www.hsls.pitt.edu/molbio/videos?c=3

http://www.hsls.pitt.edu/guides/genetics

Page 119: Literature Informatics August 20 ,  2013

Find mRNA Sequence for Reelin Gene.

http://www.hsls.pitt.edu/guides/genetics

Page 120: Literature Informatics August 20 ,  2013

Gene FunctionWhat is its function?

Entrez Gene Page:

Summary (TOC)Gene OntologyGeneRIFsPathways (TOC)Biosystems (Links)

http://www.hsls.pitt.edu/guides/genetics

Page 121: Literature Informatics August 20 ,  2013

Gene Ontology (GO)

Controlled vocabulary tagging

• Function• Biological Processes• Cellular Component

http://www.hsls.pitt.edu/guides/genetics

Page 122: Literature Informatics August 20 ,  2013

Gene Ontology (GO) and KEGG GO

information page GO evidence codes

KEGG Information page

http://www.hsls.pitt.edu/guides/genetics

Page 123: Literature Informatics August 20 ,  2013

Function How many splice variants are there?What is/are its sequence?

Entrez Gene Page:

Genomic regions…(TOC)UCSC (Links)

http://www.hsls.pitt.edu/guides/genetics

Video Tutorials

Page 124: Literature Informatics August 20 ,  2013

Alternative Splicing

http://www.hsls.pitt.edu/guides/genetics

Page 125: Literature Informatics August 20 ,  2013

Intron-Exon CoordinatesWhat are its intron-exon architechure?

Entrez Gene Page:

DisplayChange it from Full report to Gene Table

http://www.hsls.pitt.edu/guides/genetics

Video Tutorials

Page 126: Literature Informatics August 20 ,  2013

Neighboring GenesWhat are its neighboring genes?

Entrez Gene Page:

Genomic context(TOC)

http://www.hsls.pitt.edu/guides/genetics

Video Tutorials

Page 127: Literature Informatics August 20 ,  2013

Chromosomal location

http://www.hsls.pitt.edu/guides/genetics

Page 128: Literature Informatics August 20 ,  2013

Associated DiseasesWhat diseases are associated with it? Entrez Gene Page:TOC

•General Information_Phenotype

LinksOMIMHuGE Navigator 

http://www.hsls.pitt.edu/guides/genetics

Video Tutorials

Page 129: Literature Informatics August 20 ,  2013

HomologeneWhat are its homologous genes?

Entrez Gene Page:

LinkHomologenechange Display settings

http://www.hsls.pitt.edu/guides/genetics

Video Tutorials

Page 130: Literature Informatics August 20 ,  2013

ReagentsHow can I get its cDNA clone?

..antibodies? .. siRNA ?

Entrez Gene Page:

TOC:Additional LinksResearch MateriasExact Antigen

http://www.hsls.pitt.edu/guides/genetics

Video Tutorials

Page 131: Literature Informatics August 20 ,  2013

Protein Information Gateways

http://www.hsls.pitt.edu/guides/genetics

Page 132: Literature Informatics August 20 ,  2013

UniprotKB : Universal Protein Resource : a comprehensive, centralized protein

information resource Developed by a consortium:

European Bioinformatics Institute (EBI) the Swiss Institute of Bioinformatics (SIB) the Protein Information Resource (PIR) Comprised of:

--Swiss-Prot: biologist-curated annotation data--TrEMBL: computationally annotation data--PIR-International Protein Sequence Database (PIR-PSD): the most

comprehensive and expertly-curated protein sequence database in the public domain for over 20 years.

Funded by: NIH, NSF, the European Union and the Swiss Federal government

Link to Wiki, YouTube, Blogs and Tweets: http://www.kosmix.com/topic/uniprot?

Tutorial Video: http://www.youtube.com/watch?v=TCF3qWn7siI&feature=youtube_gdata

http://www.hsls.pitt.edu/guides/genetics

Page 133: Literature Informatics August 20 ,  2013

Protein Questions ?

http://www.hsls.pitt.edu/guides/genetics

What is its Function?Amino acid sequence?

… molecular wt? isoelectric point (PI)? …post translational modifications? … presence of domain/pattern/profile? … hydrophobicity? … homologous orthologs? Etc.

Structure? … secondary and tertiary?

Interaction Partner?

Page 134: Literature Informatics August 20 ,  2013

Uniprot Video Tutorial

http://www.hsls.pitt.edu/molbio/videos/play?v=19

http://www.hsls.pitt.edu/guides/genetics

Page 135: Literature Informatics August 20 ,  2013

Protein Function from UniprotKB Uniprot Search:

http://www.hsls.pitt.edu/guides/genetics

Look under: general annotation_Function, ontologies_keywords, geneontology

Page 136: Literature Informatics August 20 ,  2013

Protein Sequence

Uniprot

• Sequence annotations

• sequences

Gene• Genomic regions,

transcripts, and products

• ccds (consensus cds report)

UCSC

• Sequence and links

http://www.hsls.pitt.edu/guides/genetics

Page 137: Literature Informatics August 20 ,  2013

Protein Sequence Analysis

http://www.hsls.pitt.edu/guides/genetics

PTM

• Uniprot• Seq annt

• IPA• Modificatio-ns

and Regulation

PI/MW

• Uniprot

• Seq_Tool• Compute PI

Hydroph-obicity

• Uniprot

• Seq_Tool• ProtScale

Peptide Digest

• Uniprot• Seq_Tools• PeptideMass• PeptideCutter

Homologous Seq

• Entrez Gene• Homologene

Domain/pattern• Uniprot• Sequence

annotation• InterPro• Entrez gene• Conserved

Domain

Page 138: Literature Informatics August 20 ,  2013

Protein Domain Resources

Protein Domain Databases:

InterPro

http://www.hsls.pitt.edu/guides/genetics

Page 139: Literature Informatics August 20 ,  2013

Protein Domains Wikipedia:

A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of evolutionarily related proteins. Domains vary in length from between about 25 amino acids up to 500 amino acids in length. The shortest domains such as zinc fingers are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin.

http://www.hsls.pitt.edu/guides/genetics

Page 140: Literature Informatics August 20 ,  2013

Protein Domain: SH3 Src homology 3 domains; SH3 domains bind to proline-rich ligands

with moderate affinity and selectivity, preferentially to PxxP motifs; they play a role in the regulation of enzymes by intramolecular interactions, changing the subcellular localization of signal pathway

components and mediate multiprotein complex assemblies.

http://www.hsls.pitt.edu/guides/genetics

Page 141: Literature Informatics August 20 ,  2013

Protein Structure

Primary

Secondary

Tertiary

Quarternary

http://www.hsls.pitt.edu/guides/genetics

Useful links: http://www.kosmix.com/topic/protein_structure?

Taken from wikipedia

Page 142: Literature Informatics August 20 ,  2013

Protein Structure

http://www.hsls.pitt.edu/guides/genetics

NCBI

Page 143: Literature Informatics August 20 ,  2013

Finding Protein Structure

PDB

Entrez Structure

NCBI BLINK via Entrez Gene/Protein

http://www.hsls.pitt.edu/guides/genetics

Page 144: Literature Informatics August 20 ,  2013

Structure Databases and Viewer Databases:

RCSB Protein Data Bank (PDB) State University of New Jersey (Rutgers), the San Diego Supercomputer Center at the University of California San

Diego, the University of Wisconsin-Madison Link http://www.kosmix.com/topic/protein_data_bank?

MMDB NCBI's structure database is called MMDB (Molecular Modeling DataBase), and it is a

subset of three-dimensional structures obtained from the Protein Data Bank (PDB), excluding theoretical models..

Viewer: Cn3D :

a helper application for your web browser that allows you to view 3-dimensional structures from NCBI's Entrez retrieval service.

Rasmol: EBI First glance in j mol : A simple tool for macromolecular visualization. (More..)

http://www.hsls.pitt.edu/guides/genetics

Page 145: Literature Informatics August 20 ,  2013

Protein Structure

Search for the 3D structure of P53 Entrez structure

View the crystal structure of mouse p53 core domain (MMDB: 42987) or Crystal Structure Of A P53 Core Dimer Bound To Dna ( PDB:2GEQ)

http://www.hsls.pitt.edu/guides/genetics

Page 146: Literature Informatics August 20 ,  2013

Manipulating the Structure Viewer Window

Page 147: Literature Informatics August 20 ,  2013

Find Similar Structure: NCBI VAST

http://www.hsls.pitt.edu/guides/genetics

Page 148: Literature Informatics August 20 ,  2013

NCBI BLink

BLink ("BLAST Link") displays the results of BLAST searches that have been done for every protein sequence in the Entrez Proteins data domain.

To access it, follow the BLink link displayed beside any hit in the results of an Entrez Proteins search.

http://www.hsls.pitt.edu/guides/genetics

Page 149: Literature Informatics August 20 ,  2013

Hands-on Protein Structure

View the crystal structure of Chronophin (PDB entry: 2P69).

A variant of this protein with mutations in its amino acid sequence has been isolated. Can you predict any effect of its mutations into its function?

Hint: Find the amino acid residues which are in close contact (3.5 A) with PYRIDOXAL-5'-PHOSPHATE (PLP).

Label the amino acids and save the picture in PNG format. Learn more on Chronophin structure at:http://kb-dev.psi-structuralgenomics.org/KB/archives.jsp?pageshow=3

http://www.hsls.pitt.edu/guides/genetics

Page 150: Literature Informatics August 20 ,  2013

Hands-on Protein Structure of Chronophin

http://kb-dev.psi-structuralgenomics.org/KB/archives.jsp?pageshow=3

http://www.hsls.pitt.edu/guides/genetics

Page 151: Literature Informatics August 20 ,  2013

Sequence Alignment in Cn3D

NCBI

http://www.hsls.pitt.edu/guides/genetics

Page 152: Literature Informatics August 20 ,  2013

Hands-On Can you identify the human protein which contains a

short peptide sequence: GPDGMPVIYHGHTLTTKIKFSDVLHTIKE ?

What is its function? What is its calculated PI and molecular wt? Which region of this protein is most hydrophobic? Locate five experimentally verified S/T/Y phosphorylation sites present in this

protein. Find the homologous mouse and fruit fly orthologs of this human protein and

report the % protein identity it shares with these orthologs. How many protein domains are reported to be present in this human protein? Find the location of its largest domain.

http://www.hsls.pitt.edu/guides/genetics

Page 153: Literature Informatics August 20 ,  2013

Licensed Tools for Gene/Protein Information

http://www.hsls.pitt.edu/guides/genetics

Page 154: Literature Informatics August 20 ,  2013

HSLS Licensed Tools

BioBase Metacore Ingenuity IPA

http://www.hsls.pitt.edu/guides/genetics

Page 155: Literature Informatics August 20 ,  2013

Gene/Protein facts from Biobase

http://www.hsls.pitt.edu/guides/genetics

http://goo.gl/9wpwG

Page 156: Literature Informatics August 20 ,  2013

BioBase BioKnowledge Library

http://www.hsls.pitt.edu/guides/genetics

Page 157: Literature Informatics August 20 ,  2013

Protein Function from IPA

http://www.hsls.pitt.edu/guides/genetics

Page 158: Literature Informatics August 20 ,  2013

Thank you!Any questions?

Ansuman [email protected]

http://www.hsls.pitt.edu/guides/genetics