biomiss: language diversity of computing

33
The Language Diversity of Computing Or, how to talk with a computer. Jeremy Yang (Mgr., Systems & Programming) Translational Informatics Div. Dept. of Internal Medicine University of New Mexico BioMISS -- Thursday, Oct 15, 2015 1

Upload: jeremy-yang

Post on 12-Apr-2017

260 views

Category:

Science


2 download

TRANSCRIPT

Page 1: BioMISS: Language Diversity of Computing

The Language Diversity of Computing

Or, how to talk with a computer.

Jeremy Yang(Mgr., Systems & Programming)

Translational Informatics Div.Dept. of Internal MedicineUniversity of New Mexico

BioMISS -- Thursday, Oct 15, 2015 1

Page 2: BioMISS: Language Diversity of Computing

Language Diversity Examples

Python Perl Fortran C R

C++ Java Basic SQL Sparql

XML XSD XPath URLs bash

HTML HTTP ASCII UTF-8 regex

Scala ICD-10 Ruby OWL RDF

2

Page 3: BioMISS: Language Diversity of Computing

A Working Definition of “Language”

● Coherent symbology (symbolic system)

3

Page 4: BioMISS: Language Diversity of Computing

Languages: Some major advances

COBOL(1960) Sparql

(2008)

Java (1995) 4

1950

FORTRAN (1953)

1960 1970 1980 1990 2000 2010

SQL(1979)

C(1969)

C++ (1979)

Perl (1987)

Python (1989)

HTML (1990)

XML (1997)

RDF (1999)

Page 5: BioMISS: Language Diversity of Computing

Language merit vs. elitism

5

Page 6: BioMISS: Language Diversity of Computing

Why do we care about languages?

● Compatibility● Efficiency● Usability

● Knowledge representation

● Intelligence● Evolution

Naturellement!6

Page 7: BioMISS: Language Diversity of Computing

7

℅ Prof Harald Sack, Hasso Plattner Institute, U. Potsdam, MOOC: “Semantic Web Technologies”

Page 8: BioMISS: Language Diversity of Computing

Programming paradigms

Object Oriented● classes● instances● methods● ~ nouns

8

Functional● functions● routines● parameters● ~ verbs

Programming paradigms are language paradigms.

Page 9: BioMISS: Language Diversity of Computing

9

Object Oriented Example:

CDK = Chemistry Development Kit

Open source Java package & API

Computers have “evolved” from numerical calculators to knowledge processors.

Knowledge representation and processing via language!

Page 10: BioMISS: Language Diversity of Computing

10

Italian Music Terms

Choice of language should be guided by the domain.

Page 11: BioMISS: Language Diversity of Computing

Q: So what is the problem?A: Language gaps

CODE

JARGON

MEANING

“Interpretation”

MATH

11

Page 12: BioMISS: Language Diversity of Computing

Q: So what is the problem?A: Standards (so many!)

“Why can’t my iPhone talk to my ...”

● TV● Audio system● Car● Medical records

12

Page 13: BioMISS: Language Diversity of Computing

Q: So what is the problem?

A: Language shapes, empowers, limits thought. (Sapir-Whorf Hypothesis, aka Linguistic Relativity)

13

Page 14: BioMISS: Language Diversity of Computing

Q: So what is the problem?A: Abstraction

● Overgeneralizing● Reality is concrete!● But: abstraction organizes knowledge● (a feature, not a bug!)

14

“We think in generalities, but we live in detail.” -- Alfred North Whitehead

Page 15: BioMISS: Language Diversity of Computing

15

Abstraction: Shakespeare quotes

“Full of sound and fury, signifying nothing.”

Page 16: BioMISS: Language Diversity of Computing

16

"On to this one quicker than a jackrabbit on a hot date. Look at this finish! That is beyond world class."

"Braver than a matador in a pink tutu he was."

"Racing Santander’s butcher men tried to hack down Xavi. Xavi dancing over the combine harvesters that are coming after him."

“He could make an onion cry.” (on Lionel Messi) "Where the insane

becomes the routine with this man. He is nothing less than a ball whisperer."

Abstraction: Ray Hudson Quotes

Page 17: BioMISS: Language Diversity of Computing

17

“You campaign in poetry. You govern in prose.” - Mario Cuomo

But maybe all language is poetic.

Page 18: BioMISS: Language Diversity of Computing

Languages of Biomedical Knowledge

18

Page 19: BioMISS: Language Diversity of Computing

19

Which cirrhosis?Specificity?

http://apps.who.int/classifications/icd10

Page 20: BioMISS: Language Diversity of Computing

Translation and mapping terms

20

story

history

Page 21: BioMISS: Language Diversity of Computing

Our project:Illuminating the Druggable Genome (IDG)

$4.9M21

Page 22: BioMISS: Language Diversity of Computing

Illuminating the Druggable GenomeKnowledge Management Center (IDG-KMC)

Translational Informatics DivisionChief: Tudor Oprea, MD, PhD

IDG-KMC Workflow

22

Page 23: BioMISS: Language Diversity of Computing

IDG-KMC Collaborator Network

23

Page 24: BioMISS: Language Diversity of Computing

Slide ℅ Tudor Oprea

24

Heterogeneous data integration. Language diversity.

Page 25: BioMISS: Language Diversity of Computing

IDG-KMC Language Challenge:Case #1: Drug Nomenclature

25http://pasilla.health.unm.edu/tomcat/drugdb

Page 26: BioMISS: Language Diversity of Computing

IDG-KMC Language Challenge:Case #2:Disease Nomenclature

26

Page 27: BioMISS: Language Diversity of Computing

27

ICD Disease Ontology● The International Classification of

Diseases (ICD) is the standard diagnostic tool for epidemiology, health management and clinical purposes.

● WHO● Clinical emphasis ● Procedures (CM)● EMR● Versions

● The mission the Disease Ontology (DO) is to provide an open source ontology for the integration of biomedical data that is associated with human disease.

● Academic network● Research emphasis● Community driven● Continual updates

Page 28: BioMISS: Language Diversity of Computing

Disease nomenclature● Nosology, classification, ontology● 17k codes in ICD-9. 155k codes in ICD-10.● Implicit: Disease model of medicine

28

Page 29: BioMISS: Language Diversity of Computing

My recent Dx: Otitis

Disease vs. Condition vs. Symptom vs. Phenotype

29

℅ WebMD

Page 30: BioMISS: Language Diversity of Computing

30

IDG KMC: Gene expression vs. Tissues; Different sources, tissue terms.

Page 31: BioMISS: Language Diversity of Computing

IDG-KMC: TCRD - Target Central Research Db+------------+------------+--------+------+------------------------------------------------------------------+--------+-------+| doid | Disease | zscore | conf | Protein | idgfam | tdl |+------------+------------+--------+------+------------------------------------------------------------------+--------+-------+| DOID:13189 | Gout | 3.512 | 1.8 | Alpha-protein kinase 1 | Kinase | Tbio || DOID:13189 | Gout | 3.214 | 1.6 | Serine/threonine-protein kinase SIK1 | Kinase | Tchem || DOID:13189 | Gout | 2.922 | 1.5 | Melanocortin receptor 3 | GPCR | Tchem || DOID:13189 | Gout | 2.797 | 1.4 | Taste receptor type 2 member 30 | GPCR | Tbio || DOID:13189 | Gout | 2.576 | 1.3 | Taste receptor type 2 member 16 | GPCR | Tbio || DOID:13189 | Gout | 2.379 | 1.2 | Hepatocyte nuclear factor 4-gamma | NR | Tbio || DOID:13189 | Gout | 2.441 | 1.2 | Tyrosine-protein kinase SYK | Kinase | Tchem || DOID:13189 | Gout | 1.948 | 1.0 | cGMP-dependent protein kinase 2 | Kinase | Tchem || DOID:13189 | Gout | 1.798 | 0.9 | Pannexin-1 | IC | Tbio || DOID:13189 | Gout | 1.517 | 0.8 | Taste receptor type 2 member 38 | GPCR | Tbio || DOID:13189 | Gout | 1.565 | 0.8 | Transient receptor potential cation channel subfamily A member 1 | IC | Tclin || DOID:13189 | Gout | 1.531 | 0.8 | Transient receptor potential cation channel subfamily V member 1 | IC | Tclin || DOID:13189 | Gout | 1.388 | 0.7 | Adenosine kinase | Kinase | Tchem || DOID:13189 | Gout | 1.427 | 0.7 | Interleukin-1 receptor-associated kinase 1 | Kinase | Tchem || DOID:13189 | Gout | 1.375 | 0.7 | Transient receptor potential cation channel subfamily M member 3 | IC | Tbio || DOID:13189 | Gout | 1.255 | 0.6 | Free fatty acid receptor 4 | GPCR | Tchem || DOID:13189 | Gout | 1.231 | 0.6 | P2X purinoceptor 2 | IC | Tbio || DOID:13189 | Gout | 1.198 | 0.6 | Proto-oncogene tyrosine-protein kinase Src | Kinase | Tclin || DOID:13189 | Gout | 1.108 | 0.6 | Tribbles homolog 1 | Kinase | Tbio || DOID:13189 | Gout | 1.093 | 0.5 | Activin receptor type-1B | Kinase | Tchem || DOID:13189 | Gout | 1.048 | 0.5 | Transient receptor potential cation channel subfamily V member 2 | IC | Tbio |+------------+------------+--------+------+------------------------------------------------------------------+--------+-------+

Disease-gene associations via literature text mining. 31

Page 32: BioMISS: Language Diversity of Computing

32

Text mining, named entity recognition, term frequencyNatural language processing, Google, Watson, Siri, and the state of the art

Page 33: BioMISS: Language Diversity of Computing

Language Diversity of Computers

Final Thought:

“Can we talk?”*

℅ Joan Rivers, 1933-201433