citances and what should our ui look like? marti hearst sims, uc berkeley supported by nsf...

14
Citances and What should our UI look like? Marti Hearst SIMS, UC Berkeley http://biotext.berkeley.edu Supported by NSF DBI-0317510 and a gift from Genentech

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Citances andWhat should our UI look

like?

Marti HearstSIMS, UC Berkeley

http://biotext.berkeley.eduSupported by NSF DBI-0317510 and a gift from Genentech

Acquiring Labeled Data using Citances

A discovery is made …

A paper is written …

That paper is cited …

and cited …

and cited …

… as the evidence for some fact(s) F.

Each of these in turn are cited for some fact(s) …

… until it is the case that all important facts in the field can be found in citationsentences alone!

Citances Nearly every statement in a bioscience journal article

is backed up with a cite. It is quite common for papers to be cited 30-100

times. The text around the citation tends to state biological

facts. (Call these citances.)

Different citances will state the same facts in different ways …

… so can we use these for creating models of language expressing semantic relations?

Using Citances Potential uses of citation sentences

(citances) creation of training and testing data for

semantic analysis, synonym set creation, database curation, document summarization, and information retrieval generally.

Issues for Processing Citances

Text span Identification of the appropriate phrase, clause,

or sentence that constructs a citance. Correct mapping of citations when shown as lists

or groups (e.g., “[22-25]”). Grouping citances by topic

Citances that cite the same document should be grouped by the facts they state.

Normalizing or paraphrasing citances For IR, summarization, learning synonyms,

relation extraction, question answering, and machine translation.

Citances:

Some preliminary results: Citances to a document align well

with a hand-built curation. Citances are good candidates for

paraphrase creation.

Paraphrase Creation Algorithm1. Extract the sentences that cite the target.

2. Mark the NEs of interest (genes/proteins, MeSH terms)

and normalize.3. Dependency parse (MiniPar).4. For each parse

For each pair of NEs of interesti. Extract the path between them.ii. Create a paraphrase from the path.

5. Rank the candidates for a given pair of NEs.6. Select only the ones above a threshold.7. Generalize.

Creating a Paraphrase

Given the path from the dependency parse:Restore the original word order. Add words to improve grammaticality.

• Bim … shown … be … following nerve growth factor withdrawal.

• Bim [has] [been] shown [to] be [upregulated] following nerve growth factor withdrawal.

Sample Sentences NGF withdrawal from sympathetic neurons

induces Bim, which then contributes to death.

Nerve growth factor withdrawal induces the expression of Bim and mediates Bax dependent cytochrome c release and apoptosis.

The proapoptotic Bcl-2 family member Bim is strongly induced in sympathetic neurons in response to NGF withdrawal.

In neurons, the BH3 only Bcl2 member, Bim, and JNK are both implicated in apoptosis caused by nerve growth factor deprivation.

Their Paraphrases NGF withdrawal induces Bim. Nerve growth factor withdrawal induces the

expression of Bim. Bim has been shown to be upregulated

following nerve growth factor withdrawal. Bim implicated in apoptosis caused by

nerve growth factor deprivation.

They all paraphrase: Bim is induced after NGF withdrawal.

BioText User Interface

Discussion