biomiss: language diversity of computing

Post on 12-Apr-2017

260 Views

Category:

Science

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Language Diversity of Computing

Or, how to talk with a computer.

Jeremy Yang(Mgr., Systems & Programming)

Translational Informatics Div.Dept. of Internal MedicineUniversity of New Mexico

BioMISS -- Thursday, Oct 15, 2015 1

Language Diversity Examples

Python Perl Fortran C R

C++ Java Basic SQL Sparql

XML XSD XPath URLs bash

HTML HTTP ASCII UTF-8 regex

Scala ICD-10 Ruby OWL RDF

2

A Working Definition of “Language”

● Coherent symbology (symbolic system)

3

Languages: Some major advances

COBOL(1960) Sparql

(2008)

Java (1995) 4

1950

FORTRAN (1953)

1960 1970 1980 1990 2000 2010

SQL(1979)

C(1969)

C++ (1979)

Perl (1987)

Python (1989)

HTML (1990)

XML (1997)

RDF (1999)

Language merit vs. elitism

5

Why do we care about languages?

● Compatibility● Efficiency● Usability

● Knowledge representation

● Intelligence● Evolution

Naturellement!6

7

℅ Prof Harald Sack, Hasso Plattner Institute, U. Potsdam, MOOC: “Semantic Web Technologies”

Programming paradigms

Object Oriented● classes● instances● methods● ~ nouns

8

Functional● functions● routines● parameters● ~ verbs

Programming paradigms are language paradigms.

9

Object Oriented Example:

CDK = Chemistry Development Kit

Open source Java package & API

Computers have “evolved” from numerical calculators to knowledge processors.

Knowledge representation and processing via language!

10

Italian Music Terms

Choice of language should be guided by the domain.

Q: So what is the problem?A: Language gaps

CODE

JARGON

MEANING

“Interpretation”

MATH

11

Q: So what is the problem?A: Standards (so many!)

“Why can’t my iPhone talk to my ...”

● TV● Audio system● Car● Medical records

12

Q: So what is the problem?

A: Language shapes, empowers, limits thought. (Sapir-Whorf Hypothesis, aka Linguistic Relativity)

13

Q: So what is the problem?A: Abstraction

● Overgeneralizing● Reality is concrete!● But: abstraction organizes knowledge● (a feature, not a bug!)

14

“We think in generalities, but we live in detail.” -- Alfred North Whitehead

15

Abstraction: Shakespeare quotes

“Full of sound and fury, signifying nothing.”

16

"On to this one quicker than a jackrabbit on a hot date. Look at this finish! That is beyond world class."

"Braver than a matador in a pink tutu he was."

"Racing Santander’s butcher men tried to hack down Xavi. Xavi dancing over the combine harvesters that are coming after him."

“He could make an onion cry.” (on Lionel Messi) "Where the insane

becomes the routine with this man. He is nothing less than a ball whisperer."

Abstraction: Ray Hudson Quotes

17

“You campaign in poetry. You govern in prose.” - Mario Cuomo

But maybe all language is poetic.

Languages of Biomedical Knowledge

18

19

Which cirrhosis?Specificity?

http://apps.who.int/classifications/icd10

Translation and mapping terms

20

story

history

Our project:Illuminating the Druggable Genome (IDG)

$4.9M21

Illuminating the Druggable GenomeKnowledge Management Center (IDG-KMC)

Translational Informatics DivisionChief: Tudor Oprea, MD, PhD

IDG-KMC Workflow

22

IDG-KMC Collaborator Network

23

Slide ℅ Tudor Oprea

24

Heterogeneous data integration. Language diversity.

IDG-KMC Language Challenge:Case #1: Drug Nomenclature

25http://pasilla.health.unm.edu/tomcat/drugdb

IDG-KMC Language Challenge:Case #2:Disease Nomenclature

26

27

ICD Disease Ontology● The International Classification of

Diseases (ICD) is the standard diagnostic tool for epidemiology, health management and clinical purposes.

● WHO● Clinical emphasis ● Procedures (CM)● EMR● Versions

● The mission the Disease Ontology (DO) is to provide an open source ontology for the integration of biomedical data that is associated with human disease.

● Academic network● Research emphasis● Community driven● Continual updates

Disease nomenclature● Nosology, classification, ontology● 17k codes in ICD-9. 155k codes in ICD-10.● Implicit: Disease model of medicine

28

My recent Dx: Otitis

Disease vs. Condition vs. Symptom vs. Phenotype

29

℅ WebMD

30

IDG KMC: Gene expression vs. Tissues; Different sources, tissue terms.

IDG-KMC: TCRD - Target Central Research Db+------------+------------+--------+------+------------------------------------------------------------------+--------+-------+| doid | Disease | zscore | conf | Protein | idgfam | tdl |+------------+------------+--------+------+------------------------------------------------------------------+--------+-------+| DOID:13189 | Gout | 3.512 | 1.8 | Alpha-protein kinase 1 | Kinase | Tbio || DOID:13189 | Gout | 3.214 | 1.6 | Serine/threonine-protein kinase SIK1 | Kinase | Tchem || DOID:13189 | Gout | 2.922 | 1.5 | Melanocortin receptor 3 | GPCR | Tchem || DOID:13189 | Gout | 2.797 | 1.4 | Taste receptor type 2 member 30 | GPCR | Tbio || DOID:13189 | Gout | 2.576 | 1.3 | Taste receptor type 2 member 16 | GPCR | Tbio || DOID:13189 | Gout | 2.379 | 1.2 | Hepatocyte nuclear factor 4-gamma | NR | Tbio || DOID:13189 | Gout | 2.441 | 1.2 | Tyrosine-protein kinase SYK | Kinase | Tchem || DOID:13189 | Gout | 1.948 | 1.0 | cGMP-dependent protein kinase 2 | Kinase | Tchem || DOID:13189 | Gout | 1.798 | 0.9 | Pannexin-1 | IC | Tbio || DOID:13189 | Gout | 1.517 | 0.8 | Taste receptor type 2 member 38 | GPCR | Tbio || DOID:13189 | Gout | 1.565 | 0.8 | Transient receptor potential cation channel subfamily A member 1 | IC | Tclin || DOID:13189 | Gout | 1.531 | 0.8 | Transient receptor potential cation channel subfamily V member 1 | IC | Tclin || DOID:13189 | Gout | 1.388 | 0.7 | Adenosine kinase | Kinase | Tchem || DOID:13189 | Gout | 1.427 | 0.7 | Interleukin-1 receptor-associated kinase 1 | Kinase | Tchem || DOID:13189 | Gout | 1.375 | 0.7 | Transient receptor potential cation channel subfamily M member 3 | IC | Tbio || DOID:13189 | Gout | 1.255 | 0.6 | Free fatty acid receptor 4 | GPCR | Tchem || DOID:13189 | Gout | 1.231 | 0.6 | P2X purinoceptor 2 | IC | Tbio || DOID:13189 | Gout | 1.198 | 0.6 | Proto-oncogene tyrosine-protein kinase Src | Kinase | Tclin || DOID:13189 | Gout | 1.108 | 0.6 | Tribbles homolog 1 | Kinase | Tbio || DOID:13189 | Gout | 1.093 | 0.5 | Activin receptor type-1B | Kinase | Tchem || DOID:13189 | Gout | 1.048 | 0.5 | Transient receptor potential cation channel subfamily V member 2 | IC | Tbio |+------------+------------+--------+------+------------------------------------------------------------------+--------+-------+

Disease-gene associations via literature text mining. 31

32

Text mining, named entity recognition, term frequencyNatural language processing, Google, Watson, Siri, and the state of the art

Language Diversity of Computers

Final Thought:

“Can we talk?”*

℅ Joan Rivers, 1933-201433

top related