![Page 1: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/1.jpg)
1
The BioText Project
SIMS Affiliates MeetingNov 14, 2003
Marti HearstAssociate Professor
SIMS, UC Berkeley
Projected sponsored by NSF DBI-0317510, ARDA AQUAINT, and a gift from Genentech
![Page 2: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/2.jpg)
2
BioText Project Goals
• Provide fast, flexible, intelligent access to information for use in biosciences applications.– Better search results– Text mining
• Focus on– Textual Information– Tightly integrated with other resources
• Ontologies• Record-based databases
![Page 3: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/3.jpg)
3
People
• Project Leaders: – PI: Marti Hearst Co-PI: Adam Arkin
• Computational Linguistics– Barbara Rosario– Presley Nakov
• Database Research– Ariel Schwartz– Gaurav Bhalotia (graduated)
• User Interface / Information Retrieval– Kevin Li– Dr. Emilia Stoica
• Bioscience– Dr. TingTing Zhang
![Page 4: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/4.jpg)
4
Outline
• Main Goals– Text Mining Examples– System Architecture– Apoptosis problem statement
• Recent results in – Abbreviation definition recognition– Semantic relation recognition (from
text)– Search User Interfaces– Hierarchical grouping of journals
![Page 5: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/5.jpg)
5
Text Mining Example 1
• How to discover new information … • … As opposed to discovering which
statistical patterns characterize occurrence of known information.
• Method:– Use large text collections to gather
evidence to support (or refute) hypotheses
– Make Connections– Gather Evidence
![Page 6: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/6.jpg)
6
Etiology Example
• Don Swanson example, 1991• Goal: find cause of disease
– Magnesium-migraine connection
• Given – medical titles and abstracts– a problem (incurable rare disease)– some medical expertise
• find causal links among titles– symptoms– drugs– results
![Page 7: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/7.jpg)
7
Gathering Evidence
stress
migraine
CCB
magnesium
PA
magnesium
SCD
magnesiummagnesium
![Page 8: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/8.jpg)
8
Gathering Evidence
migraine magnesium
stress
CCB
PA
SCD
![Page 9: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/9.jpg)
9
Swanson’s Linking Approach
• Two of his hypotheses have received some experimental verification.
• His technique– Only partially automated– Required medical expertise
![Page 10: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/10.jpg)
10
Text Mining Example 2:
• How to find functions of genes?– Have the genetic sequence– Don’t know what it does– But …
• Know which genes it coexpresses with• Some of these have known function
– So …infer function based on function of co-expressed genes
• This is problem suggested by Michael Walker and others at Incyte Pharmaceuticals
![Page 11: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/11.jpg)
11
Gene Co-expression:Role in the genetic pathway
g?
PSA
Kall.
PAP
h?
PSA
Kall.
PAP
g?
Other possibilities as well
![Page 12: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/12.jpg)
12
Make use of the literature
• Look up what is known about the other genes.
• Different articles in different collections
• Look for commonalities – Similar topics indicated by Subject
Descriptors– Similar words in titles and abstracts
adenocarcinoma, neoplasm, prostate, prostatic neoplasms, tumor markers, antibodies ...
![Page 13: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/13.jpg)
14
Formulate a Hypothesis
• Hypothesis: mystery gene has to do with regulation of expression of genes leading to prostate cancer
• New tack: do some lab tests– See if mystery gene is similar in
molecular structure to the others– If so, it might do some of the same
things they do
![Page 14: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/14.jpg)
15
Outline
• Main Goals– Text Mining Examples– System Architecture– Apoptosis problem statement
• Recent results in – Abbreviation definition recognition– Semantic relation recognition (from
text)– Search User Interfaces– Hierarchical grouping of journals
![Page 15: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/15.jpg)
16
BioText: ArchitectureBioText: Architecture
Sophisticated Text Analysis
Annotations inDatabase
ImprovedSearch Interface
![Page 16: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/16.jpg)
17
Recent Result (Schwartz & Hearst 03)
• Fast, simple algorithm for recognizing abbreviation definitions.– Simpler and faster than the rest– Higher precision and recall– Idea: Work backwards from the end
• Examples:– In eukaryotes, the key to transcriptional regulation of the
Heat Shock Response is the Heat Shock Transcription Factor (HSF).
– Gcn5-related N-acetyltransferase (GNAT)
• Idea: use redundancy across abstracts to figure out abbreviation meaning even when definition is not present.
![Page 17: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/17.jpg)
18
BioText: A Two-Sided ApproachBioText: A Two-Sided Approach
SwissProt
Blast
Mesh
GOWordNet
Medline
JournalFull Text
Sophisticated DatabaseDesign & Algorithms
EmpiricalComputational Linguistics
Algorithms
![Page 18: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/18.jpg)
19
Death ReceptorsSignaling
Survival Factors Signaling
Ca++ Signaling
P53 pathway
Caspase 12
Effecter Caspases (3,6,7)
Caspase 9
Apaf 1IAPs
NFkB
Mitochondria Cytochrome c
Bax, Bak
Apoptosis
Bcl-2 like
BH3 only
Apoptosis Network
Smac
ER Stress
Genotoxic Stress
Initiator Caspases (8, 10)
AIF
Lost of Attachment Cell Cycle stress, etc
Slide courtesy TingTing Zhang
![Page 19: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/19.jpg)
20
The issues (courtesy TingTing Zhang):
• The network nodes are deduced from reading and processing of experimental knowledge by experts. Every month >1000 apoptosis papers are published.
• The supporting experimental data are gathered in different organs, tissues, cells using various techniques.
• There are various levels of uncertainty associated with different techniques used to answer certain questions.
• Depending on the expression patterns for the players in the network, the observation may or may not be extended to other contexts.
• We need to keep track of ALL the information in order to understand the system better.
![Page 20: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/20.jpg)
21
Simple cases:
• Mouse Bim proteins (isoforms EL, L, S) binds to human Bcl-2 (bacteriophoage screening using cDNA expression library from T-Lymphoma cell line KO52DA20).• Human BimEL protein is 89% identical to mouse BimEL, Human BimL is 85% identical to mouse BimL (Hybridization of mouse bim cDNA to human fetal spleen and peripheral blood cDNA library).• Bim mRNA is detected in B and T lyphoid cells (Northern blot analysis of mouse KO52DA20, WEHI 703, WEHI 707, WEHI7.1, CH1, WEHI231 WEHI415, B6.23.16BW2 cell extracts).• BimL protein interact with Bcl-2 OR Bcl-XL, or Bcl-w proteins (Immuno-precipitation (anti-Bcl-2 OR Bcl-XL OR Bcl-w)) followed by Western blot (anti-EEtag) using extracts human 293T cells co-transfected with EE-tagged BimL AND (bcl-2 OR bcl-XL OR bcl-w) plasmids)• BimL deleted of the BH3 domain does not bind to Bcl-2 OR Bcl-XL, or Bcl-w proteins (under experimental conditions mentioned above)
![Page 21: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/21.jpg)
22
Computational Language Goals
• Recognizing and annotating entities within textual documents
• Identifying semantic relations among entities
• To (eventually) be used in tandem with semi-automated reasoning systems.
![Page 22: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/22.jpg)
23
Main Ideas for NLP Approach
• Assign Semantics using – Statistics– Hierarchical Lexical Ontologies to
generalize– Redundancy in the data
• Build up Layers of Representation– Syntactic and Semantic– Use these in a feedback loop
![Page 23: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/23.jpg)
24
Computational Linguistics Goals
• Mark up text with semantic relations
![Page 24: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/24.jpg)
25
Recent Result:Descent of Hierarchy
• Idea: – Use the top levels of a lexical
hierarchy to identify semantic relations
• Hypothesis:– A particular semantic relation holds
between all 2-word Noun Compounds that can be categorized by a MeSH pair.
![Page 25: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/25.jpg)
26
Definition
• NC: Any sequence of nouns that itself functions as a noun– asthma hospitalizations – health care personnel hand wash
• Technical text is rich with NCs Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment.
![Page 26: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/26.jpg)
27
• Identification• Syntactic analysis (attachments)
• [Baseline [headache frequency]]• [[Tension headache] patient]
• Our Goal: Semantic analysis• Headache treatment treatment for headache• Corticosteroid treatment treatment that uses
corticosteroid
NCs: Three tasks
![Page 27: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/27.jpg)
28
Main Idea:
• Top-level MESH categories can be used to indicate which relations hold between noun compounds
• headache recurrence– C23.888.592.612.441 C23.550.291.937
• headache pain– C23.888.592.612.441 G11.561.796.444
• breast cancer cells– A01.236 C04 A11
![Page 28: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/28.jpg)
29
Linguistic MotivationCan cast NC into head-modifier relation, and assume head noun has an argument and qualia structure.
– (used-in): kitchen knife– (made-of): steel knife– (instrument-for): carving knife– (used-on): putty knife– (used-by): butcher’s knife
![Page 29: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/29.jpg)
30
Distribution of Frequent Category Pairs
![Page 30: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/30.jpg)
31
How Far to Descend?• Anatomy: 250 CPs
– 187 (75%) remain first level– 56 (22%) descend one level – 7 (3%) descend two levels
• Natural Science (H01): 21 CPs– 1 (4%) remain first level– 8 (39%) descend one level – 12 (57%) descend two levels
• Neoplasm (C04) 3 CPs:– 3 (100%) descend one level
![Page 31: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/31.jpg)
32
Evaluation• Apply the rules to a test set• Accuracy:
– Anatomy: 91% accurate– Natural Science: 79%– Diseases: 100%
• Total:– 89.6% via intra-category averaging– 90.8% via extra-category averaging
![Page 32: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/32.jpg)
33
Summary of NC Work
• Lexical hierarchy useful for inferring semantic relations
• Works because semantics are constrained and word sense ambiguity is not too much of a problem
• Can it be extended to other types of relations?– Preliminary results on one set of relations
are promising.
![Page 33: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/33.jpg)
34
Database Research Issues
• Efficiently and effectively combining – Relational databases & Text– Hierarchical Ontologies– Layers of Annotations
![Page 34: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/34.jpg)
35
Interface Issues
• Create intuitive, appealing interfaces that are better than what’s currently out there.
• Start with existing assigned metadata
• As text analysis improves, incorporate the results into the interface.
![Page 35: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/35.jpg)
36
![Page 36: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/36.jpg)
37
![Page 37: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/37.jpg)
38
![Page 38: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/38.jpg)
39
![Page 39: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/39.jpg)
40
Some Recent Work
• Organizing BioScience Journal Names– Currently there are > 3500
![Page 40: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/40.jpg)
41
![Page 41: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/41.jpg)
42
![Page 42: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/42.jpg)
43
Some Recent Work
• Organizing BioScience Journal Names– Currently there are > 3500
• Idea:– Group them into faceted hierarchies
semi-automatically– Using clustering of title terms,
synonym similarity via WordNet, and other techniques
![Page 43: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/43.jpg)
44
![Page 44: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/44.jpg)
45
![Page 45: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/45.jpg)
46
Summary
• BioText aims to improve access to bioscience information via– Sophisticated language analysis– Integration of results into
• Annotated database• Flexible user interface
• Eventual goal– Semi-automated mining and
discovery
![Page 46: 1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI-0317510, ARDA](https://reader035.vdocuments.site/reader035/viewer/2022062715/56649d7e5503460f94a6088f/html5/thumbnails/46.jpg)
47
There’s lots to do!
biotext.berkeley.edu
For more information: