wormbase - home | national academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf ·...
TRANSCRIPT
![Page 2: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/2.jpg)
Mission
Provide the biomedical research
community with accurate, current,
and accessible information on the
genetics, genomics, and biology of
the model system Caenorhabditis
elegans and related nematodes.
![Page 3: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/3.jpg)
C. elegans in 30 seconds Relatively simple organism, advanced genetic system.
Hermaphrodite
Male
1mM
![Page 4: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/4.jpg)
Invariant lineage
C. elegans in 30 seconds
![Page 5: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/5.jpg)
302 neurons
Simple nervous system Described connectivity
C. elegans in 30 seconds
![Page 6: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/6.jpg)
A frozen C. elegans library Rapid generation time
C. elegans in 30 seconds
![Page 7: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/7.jpg)
100 MBp Genome
1998 (!)
C. elegans in 30 seconds
~20K genes
![Page 8: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/8.jpg)
A tradition of Open Science
1994 2000 1989 1974
1st genetic screen
published
BioNet
www
gopher
1963
Brenner’s
Letters
1995
Gazette AceDB
development
begins
2003
![Page 9: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/9.jpg)
The WormBase Consortium
![Page 10: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/10.jpg)
User Community
1106 laboratories
53 countries
3000 researchers
Country Labs
United States 594
Canada 62
United Kingdom 60
Japan 58
Germany 48
France 31
China 28
Spain 20
Switzerland 20
The Netherlands 16
Registered C. elegans laboratories
![Page 11: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/11.jpg)
User Community
185 countries
Biomedical researchers studying
aging, neurobiology, cancer, etc.
37K unique users/month
5.5M page views / month
![Page 12: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/12.jpg)
wormbase.org
![Page 13: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/13.jpg)
Contents & Features
28 Species
Genomes
Genes
Orthology / Homology / Paralogy
Comparative Genomics
Strains / Antibodies / Oligos
Expression
Lineage & Connectivity
Authors & Publications
Labs
Reports
Genome Browsers
Alignment Tools
Query Tools
APIs
Data Mining Platforms
Social Features
FTP
Forums, Wikis, Blogs
![Page 14: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/14.jpg)
Workflow
2. Integration & analysis
1. Curation
3. Presentation
![Page 15: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/15.jpg)
Curation Goals
1. Extract data from the scientific
literature.
2. Develop standards to structure data.
3. Facilitate new insights by making
prose observations computable.
![Page 16: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/16.jpg)
Curated Sources
Scientific literature (~30K papers)
User submissions
Genomic sequences (gene models)
3rd party datasets
![Page 17: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/17.jpg)
Early Realizations Curation is hard and time-consuming!
Requires automation.
Need tools to facilitate.
Balance of breadth and depth critical for
making useful community resource.
Many data types.
Prioritization is key.
Work procedurally through data types.
![Page 18: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/18.jpg)
Van Auken et al, Database, 2012
Hybrid automated/manual
curation strategy
![Page 19: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/19.jpg)
Curated data types
Phenotypes Expression Patterns
Sequence Features Gene Interactions
Anatomy Function
Pathways
Reagents Human Disease Relevance
![Page 20: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/20.jpg)
Reference datasets Large scale data at WormBase
• Proteomics (mass spec)
• Transcriptomics (splicing, UTRs)
• Expression (microarray, in vivo imaging)
• Interactions (physical, genetic)
• Perturbation: RNAi, systematic mutation
• Lineage and connectivity
![Page 21: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/21.jpg)
Reference datasets
Broad reference data sets can
fill knowledge gaps
• Verification can be difficult
• Relevance?
• Utilization varies greatly.
Confidence?
![Page 22: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/22.jpg)
Do we assess the quality of…
Publication is the gold standard.
experimental design? external data?
Revisit: erroneous data
Request corrections or clarifications when warranted
![Page 23: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/23.jpg)
Remaining backlog
![Page 24: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/24.jpg)
Curation: Lessons Learned
• harder and consumes more time than expected
• more enriching to the final product than expected
• curation ensures data integrity and builds trust in
the resource
![Page 25: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/25.jpg)
Curation: Suggestions • Start early to develop best practices.
• Automate as much as possible.
• Employ domain experts for high value manual
curation and to confirm precision of automated
curation.
• Expect publication rate and new data types to
exceed manual curation capacity (10% Y-o-Y).
• Refining curation will be an ongoing enterprise.
![Page 26: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/26.jpg)
What fundamentals
have driven our
workflow design?
![Page 27: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/27.jpg)
1. Ease of data modeling and loading
What fundamentals have
driven our design?
Emphasis on collecting and sharing data.
![Page 28: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/28.jpg)
What fundamentals have
driven our design?
2. Handling unknown unknowns
Yet-to-be-discovered …
- datatypes
- data relationships
Data model must be able to evolve.
![Page 29: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/29.jpg)
3. Ability to track supporting evidence,
metadata, and provenance
Reproducibility and accountability.
What fundamentals have
driven our design?
![Page 30: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/30.jpg)
What fundamentals have
driven our design?
4. Coping with high-connectivity data
eg: What happens to downstream
annotations if gene merge? Orthology,
proteomics, expression, etc…
![Page 31: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/31.jpg)
What fundamentals have
driven our design?
5. Finding a suitable refresh rate
How often will you update analyses?
Datasets evolve. New data becomes
available. Analyses need to be
updated.
How tolerant will your community be of
stale data?
![Page 32: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/32.jpg)
What fundamentals have
driven our design?
5. Finding a suitable refresh rate
1 week -> 2 weeks -> 3 weeks -> 1 month -> 2 months
2001 2002 2005 2008 2011
Balance of stability, rate of new data,
cost/time of analysis, churn.
![Page 33: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/33.jpg)
1. A flexible model/workflow is essential.
2. Evidence and metdata collection needs
to be central to process.
3. High connectivity data presents unique
challenges.
4. Needed to adjust release frequency.
Design: Lessons Learned
![Page 34: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/34.jpg)
Design: Suggestions
1. Build flexibility into both the data model
and workflow.
2. Be aware of consequences of changing
high connectivity data.
3. Refresh frequency is a balance of user
needs, resources, and rate of change.
![Page 35: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/35.jpg)
Integration & Interoperability
![Page 36: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/36.jpg)
Suggestions for integrating with
organismal databases (easy)
• Liaise with organismal databases early and often!
• Use stable identifiers! Most organism databases
have them. Please?
![Page 37: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/37.jpg)
Suggestions for integrating with
organismal databases (harder)
Reciprocal data exchange and cross links
Crosslinks alone are boring and do not engage
users.
Without some supporting context, crosslinks do
not increase interoperability.
![Page 38: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/38.jpg)
Suggestions for integrating with
organismal databases (hardest)
Avoid direct data import
Except for core scaffolding features (genomes,
genes, eg), use APIs to fetch and embed
functional data.
![Page 39: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/39.jpg)
Interoperability Suggestions
1. Provide data in (multiple) common formats
2. API (RESTful) with JSON and XML delivery
3. Data files programmatically accessible —
simple is better (FTP), no registration barrier
or fancy web-based download scheme.
4. Consistent, shared identifiers
![Page 40: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/40.jpg)
If you build it, will they come?
![Page 41: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/41.jpg)
Pageviews vs time
0
20,000,000
40,000,000
60,000,000
80,000,000
2001 2005 2010 2013
![Page 42: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/42.jpg)
Nurture Your
Community Collect feedback Chat, Twitter, Google Alerts, mailing lists,
conferences, webinars, surveys.
Measure Web logs, CloudWatch, Google Analytics
Set standards Data quality, curation, submission,
help desk response times.
![Page 43: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/43.jpg)
Metrics of success
Small user communities, niche domains.
Providing annotation or feedback is a low
priority for busy scientists.
Positive feedback rare, but you’ll know
when users don’t like something!
Not easy to measure.
![Page 44: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/44.jpg)
Suggested Metrics
• Page Views
• Citation Rate
• Downloads
• Queries & Resolutions
• Rate / precision of curation
• Database size / objects / submissions
![Page 45: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/45.jpg)
Performance Metrics
![Page 46: WormBase - Home | National Academiessites.nationalacademies.org/.../webpage/ssb_160890.pdf · 2020-04-08 · Comparative Genomics Strains / Antibodies / Oligos Expression Lineage](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f8d33e76711bb6bb17bff68/html5/thumbnails/46.jpg)
Acknowledgments
Paul Sternberg
Juancarlos Chan
Wen Chen
Chris Grove
Raymond Lee
Ranjana Kishore
Cecilia Nakamura
Daniela Raciti
Gary Schindelman
Mary Ann Tuli
Kimberly Van Auken
Xiaodong Wang
Karen Yook
Hans-Michael Muller
Yuling Li
James Done
Lincoln Stein
Sibyl Gao
Todd Harris
Matt Berriman
Paul Kersey
Paul Davis
Thomas Done
Kevin Howe
Michael Paulini
Gary Williams
@tharris
@wormbase