navigating the neuroscience data landscape maryann martone, ph. d. university of california, san...

27
Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

Upload: lauren-scott

Post on 13-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

Navigating the Neuroscience Data

Landscape

Maryann Martone, Ph. D.University of California, San Diego

Page 2: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

“Neural Choreography”

“A grand challenge in neuroscience is to elucidate brain function in relation to its multiple layers of organization that operate at different spatial and temporal scales. Central to this effort is tackling “neural choreography” -- the integrated functioning of neurons into brain circuits--their spatial organization, local and long-distance connections, their temporal orchestration, and their dynamic features. Neural choreography cannot be understood via a purely reductionist approach. Rather, it entails the convergent use of analytical and synthetic tools to gather, analyze and mine information from each level of analysis, and capture the emergence of new layers of function (or dysfunction) as we move from studying genes and proteins, to cells, circuits, thought, and behavior....

However, the neuroscience community is not yet fully engaged in exploiting the rich array of data currently available, nor is it adequately poised to capitalize on the forthcoming data explosion. “

Akil et al., Science, Feb 11, 2011

Page 3: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

NIF is an initiative of the NIH Blueprint consortium of institutes What types of resources (data, tools, materials,

services) are available to the neuroscience community?

How many are there? What domains do they cover? What domains do

they not cover? Where are they?

Web sites Databases Literature Supplementary material

Who uses them? Who creates them? How can we find them? How can we make them better in the future?

http://neuinfo.org

• PDF files

• Desk drawers

Page 4: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

How many resources are there?

•NIF Registry: A catalog of neuroscience-relevant resources• > 4800

currently listed• > 2000

databases•And we are finding more every day

Page 5: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

The Neuroscience Information Framework: Discovery and utilization of web-based

resources for neuroscience

A portal for finding and using neuroscience resources

A consistent framework for describing resources

Provides simultaneous search of multiple types of information, organized by category

Supported by an expansive ontology for neuroscience

Utilizes advanced technologies to search the “hidden web”

http://neuinfo.org

UCSD, Yale, Cal Tech, George Mason, Washington Univ

Supported by NIH Blueprint

Literature

Database Federation

Registry

Page 6: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

What are the connections of the hippocampus?

Hippocampus OR “Cornu Ammonis” OR “Ammon’s horn” Query expansion:

Synonyms and related concepts

Boolean queriesData sources

categorized by “data type” and level of nervous

system

Common views across multiple

sources

Tutorials for using full

resource when getting there

from NIF

Link back to record in

original source

Page 7: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

Results are organized within a common framework

Connects to

Synapsed with

Synapsed by

Input region

innervates

Axon innervates

Projects to

Cellular contact

Subcellular contact

Source site

Target site

Each resource implements a different, though related model; systems are complex and difficult to learn, in many cases

Page 8: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

The scourge of neuroanatomical nomenclature

•NIF Connectivity: 6 databases containing connectivity primary data or claims• Brain Architecture Management System (rodent)• Connectome Wiki (human)• Brain Maps (various)• CoCoMac (primate cortex)• UCLA Multimodal database (Human fMRI)• Avian Brain Connectivity Database (Bird)

•Total: 1800 unique brain terms (exluding Avian)

•Number of exact terms used in > 1 database: 42•Number of synonym matches: 99•Number of partonomy matches: 385

The INCF is working with NIF to develop semantic and spatial strategies for translating anatomy across information systems

Page 9: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

What is an ontology?

Brain

Cerebellum

Purkinje Cell Layer

Purkinje cell

neuron

has a

has a

has a

is a

Ontology: an explicit, formal representation of concepts relationships among them within a particular domain that expresses human knowledge in a machine readable form Branch of philosophy: a theory of

what is e.g., Gene ontologies

Provide universals for navigating across different data sources Semantic “index”

Provide the basis for concept-based queries to probe and mine data Perform reasoning Link data through relationships not

just one-to-one mappings

Page 10: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

PONS program Structural Lexicon Taskforce

Concentrate on Human, Non-human Primate, Rat and Mouse

Define structural concepts from level of organ to macromolecular complexes

Provide a set of criteria by which structures can be identified

Neuronal Registry Taskforce Establish conventions for

naming new types of neurons Establish a standard set of

properties to define neurons Create a Neuron Registry for

registering new types of neurons

Deployment and representation (Alan Ruttenberg) Brought together ontologists

working across scales

Courtesy of Chris Mungall, Lawrence Berkeley Labs

***Not about imposing a single view of anatomy; about making concepts computable and being able to translate among views

Page 11: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

NeuroLex Wiki

http://neurolex.org Stephen Larson

•Provide a simple framework for defining the concepts required• Cell, Part of brain,

subcellular structure, molecule

•Community based:• Avian

neuroanatomy• Fly neurons

(England)• Neuroimaging

terms • Brain regions

identified by text mining

•Creating a computable index for neuroscience data

•INCF working to coordinate Wiki efforts underway at Allen Institute, Blue Brain and Neurolex

Demo D03

Page 12: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

Comparison of traffic to NIF Portal vs Neurolex

5000 hits 15000 hits

Wiki is readily indexed by search engines

Page 13: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

Neurons in Neurolex

INCF building a knowledge base of neurons and their properties via the Neurolex Wiki

Led by Dr. Gordon Shepherd

Consistent and parseable naming scheme

Knowledge is readily accessible, editable and computable

Stephen Larson

Page 14: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

NIF data federation

Images

Drugs

Anti-bodies

Grants

Pathways

Animals

Percentage of data records per data type

connectivity

Brain activation foci

Microarray98%

Primary data, secondary data, claims, repositories

Recently added: BioNOT literature mining tool; Retraction Watch blog

Page 15: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

What do you mean by data?Databases come in many shapes and

sizes Primary data:

Data available for reanalysis, e.g., microarray data sets from GEO; brain images from XNAT; microscopic images (CCDB/CIL)

Secondary data Data features extracted

through data processing and sometimes normalization, e.g, brain structure volumes (IBVD), gene expression levels (Allen Brain Atlas); brain connectivity statements (BAMS)

Tertiary data Claims and assertions

about the meaning of data E.g., gene

upregulation/downregulation, brain activation as a function of task

Registries: Metadata Pointers to data sets or

materials stored elsewhere Data aggregators

Aggregate data of the same type from multiple sources, e.g., Cell Image Library ,SUMSdb, Brede

Single source Data acquired within a

single context , e.g., Allen Brain Atlas

Page 16: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

StriatumHypothalamusOlfactory bulb

Cerebral cortex

Brain

Bra

in r

eg

ion

Data source

Vadim Astakhov, Keppler Workflow Engine

NIF landscape analysis

Page 17: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

How much of the landscape do we have?

Query for “reference” brain structures and their parts in NIF Connectivity database

Page 18: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

NIF Reports: Male vs Female

Gender bias

NIF can start to answer interesting questions about neuroscience research, not just about neuroscience

Page 19: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

Embracing duplication: Data Mash ups

•~300 PMID’s were common between Brede and SUMSdb•Same information; value added

Same data; different aspects

Page 20: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

Same data: different analysisChronic vs acute

morphine in striatum

Drug Related Gene database: extracted statements from figures, tables and supplementary data from published article

Gemma: Reanalyzed microarray results from GEO using different algorithms

Both provide results of increased or decreased expression as a function of experimental paradigm 4 strains of mice 3 conditions: chronic

morphine, acute morphine, saline

Mined NIF for all references to GEO ID’s: found small number where the same dataset was represented in two or more databaseshttp://www.chibi.ubc.ca/Gemma/

home.html

Page 21: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

How easy was it to compare?

Gemma: Gene ID + Gene SymbolDRG: Gene name + Probe ID

Gemma: Increased expression/decreased expressionDRG: Increased expression/decreased expression

But...Gemma presented results relative to baseline chronic morphine; DRG with respect to saline, so direction of change is opposite in the 2 databases

Analysis: 1370 statements from Gemma regarding gene expression as a

function of chronic morphine 617 were consistent with DRG; over half of the claims of the

paper were not confirmed in this analysis Results for 1 gene were opposite in DRG and Gemma 45 did not have enough information provided in the paper to make

a judgment

NIF annotation standard

Page 22: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

Grabbing the long tail of small data

Analysis of NIF shows multiple databases with similar scope and content

Many contain partially overlapping data

Data “flows” from one resource to the next Data is

reinterpreted, reanalyzed or added to

When does it become something else?

Is duplication good or bad?

Page 23: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

Phases of NIF 2006-2008: A survey of what was out there

2008-2009: Strategy for resource discovery NIF Registry vs NIF data federation Ingestion of data contained within different technology

platforms, e.g., XML vs relational vs RDF Effective search across semantically diverse sources

NIFSTD ontologies

2009-2011: Strategy for data integration Unified views across common sources Mapping of content to NIF vocabularies

2011-present: Data analytics Uniform external data references

Page 24: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

Data, not just stories about them!47/50 major

preclinical published cancer studies could not

be replicated “The scientific community assumes that the claims in a preclinical study can be taken at face value-that although there might be some errors in detail, the main message of the paper can be relied on and the data will, for the most part, stand the test of time. Unfortunately, this is not always the case.”

Getting data out sooner in a form where they can be exposed to many eyes and many analyses, and easily compared, may allow us to expose errors and develop better metrics to evaluate the validity of data

Begley and Ellis, 29 MARCH 2012 | VOL 483 | NATURE | 531

“There are no guidelines that require all data sets to be reported in a paper; often, original data are removed during the peer review and publication process. “

Page 25: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

A global view of data

You (and the machine) have to be able to find itAccessible through the webAnnotations

You have to be able to use itData type specified and in a usable

formYou have to know what the data

meanSome semanticsContext: Experimental metadataProvenance: Where did the data

come from?

Reporting neuroscience data within a consistent framework helps enormously

Page 26: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

NIF team (past and present)

Jeff Grethe, UCSD, Co Investigator, Interim PIAmarnath Gupta, UCSD, Co InvestigatorAnita Bandrowski, NIF Project LeaderGordon Shepherd, Yale UniversityPerry MillerLuis MarencoRixin WangDavid Van Essen, Washington UniversityErin ReidPaul Sternberg, Cal TechArun RangarajanHans Michael MullerYuling LiGiorgio Ascoli, George Mason UniversitySridevi Polavarum

Fahim Imam, NIF Ontology EngineerLarry LuiAndrea Arnaud StaggJonathan CachatJennifer LawrenceLee HornbrookBinh NgoVadim AstakhovXufei QianChris ConditMark EllismanStephen LarsonWillie WongTim Clark, Harvard UniversityPaolo CiccareseKaren Skinner, NIH, Program Officer

Page 27: Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego

Concept-based search: search by meaning

Search Google: GABAergic neuron Search NIF: GABAergic neuron

NIF automatically searches for types of GABAergic neurons

Types of GABAergic neurons