limsoon wong kent ridge digital labs singapore

19
Show & Tell Limsoon Wong Kent Ridge Digital Labs Singapore From Informatics to Bioinformatics

Upload: imani-powers

Post on 02-Jan-2016

16 views

Category:

Documents


1 download

DESCRIPTION

From Informatics to Bioinformatics. Limsoon Wong Kent Ridge Digital Labs Singapore. What is Bioinformatics?. What are the Themes of Bioinformatics?. Bioinformatics = Data Mgmt + Knowledge Discovery Data Mgmt = Integration + Transformation + Cleansing Knowledge Discovery = - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Limsoon WongKent Ridge Digital Labs

Singapore

From Informaticsto Bioinformatics

Page 2: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

What is Bioinformatics?What is Bioinformatics?

Page 3: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

What are the Themes of Bioinformatics?What are the Themes of Bioinformatics?

Bioinformatics =

Data Mgmt + Knowledge Discovery

Data Mgmt =

Integration + Transformation + Cleansing

Knowledge Discovery =

Statistics + Algorithms + Databases

Page 4: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

What are the Benefits of Bioinformatics?What are the Benefits of Bioinformatics?

To the patient:

Better drug, better treatment To the pharma:

Save time, save cost, make more $ To the scientist:

Better science

Page 5: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Data IntegrationData Integration

A DOE “impossible query”:

For each gene on a given cytogenetic band,

find its non-human homologs. source type location remarks

GDB Sybase Baltimore Flat tablesSQL joinsLocation info

Entrez ASN.1 Bethesda Nested tablesKeywordsHomolog info

Page 6: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Data Integration ResultsData Integration Resultssybase-add (#name:”GDB", ...);

create view L from locus_cyto_location using GDB;

create view E from object_genbank_eref using GDB;

select

#accn: g.#genbank_ref, #nonhuman-homologs: H

from

L as c, E as g,

(select u

from g.#genbank_ref.na-get-homolog-summary as u

where not(u.#title string-islike "%Human%") andalso

not(u.#title string-islike "%H.sapien%")) as H

where

c.#chrom_num = "22” andalso

g.#object_id = c.#locus_id andalso

not (H = { });

• Using Kleisli:

• Clear

• Succint

• Efficient

• Handles

•heterogeneity

•complexity

Page 7: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Data WarehousingData Warehousing

Motivation efficiency availabilty “denial of service” data cleansing

Requirements efficient to query easy to update. model data naturally

{(#uid: 6138971,

#title: "Homo sapiens adrenergic ...",

#accession: "NM_001619",

#organism: "Homo sapiens",

#taxon: 9606,

#lineage: ["Eukaryota", "Metazoa", …],

#seq: "CTCGGCCTCGGGCGCGGC...",

#feature: {

(#name: "source",

#continuous: true,

#position: [

(#accn: "NM_001619",

#start: 0, #end: 3602,

#negative: false)],

#anno: [

(#anno_name: "organism",

#descr: "Homo sapiens"), …] ), …)}

Page 8: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Data Warehousing ResultsData Warehousing Results Relational DBMS is

insufficient because it forces us to fragment data into 3NF.

Kleisli turns flat relational DBMS into nested relational DBMS. It can use flat relational DBMS such as Sybase, Oracle, MySQL, etc. to be its updatable complex object store. It can even use all of these systems simultaneously!

! Log inoracle-cplobj-add (#name: "db", ...);

! Define table

create table GP (#uid: "NUMBER", #detail: "LONG")using db;

! Populate table with GenPept reportsselect #uid: x.#uid, #detail: x into GPfrom aa-get-seqfeat-general "PTP” as xusing db;

! Map GP to that tablecreate view GP from GP using db;

! Run a queryto get title of 131470select x.#detail.#title from GP as xwhere x.#uid = 131470;

Page 9: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Epitope PredictionEpitope Prediction

TRAP-559AAMNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYSEEVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLNLNDNAIHLYVNVFSNNAKEIIRLHSDASKNKEKALIIIRSLLSTNLPYGRTNLTDALLQVRKHLNDRINRENANQLVVILTDGIPDSIQDSLKESRKLSDRGVKIAVFGIGQGINVAFNRFLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAVCVEVEKTASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQCEEERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENIIDNNPQEPSPNPEEGKDENPNGFDLDENPENPPNPDIPEQKPNIPEDSEKEVPSDVPKNPEDDREENFDIPKKPENKHDNQNNLPNDKSDRNIPYSPLPPKVLDNERKQSDPQSQDNNGNRHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREEHEKPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVPGAATPYAGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN

Page 10: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Epitope Prediction ResultsEpitope Prediction Results

Prediction by our ANN model for HLA-A11 29 predictions 22 epitopes 76% specificity

1 66 100Rank by BIMAS

Number of experimental binders 19 (52.8%) 5 (13.9%) 12 (33.3%)

Prediction by BIMAS matrix for HLA-A*1101

Page 11: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Gene Expression AnalysisGene Expression Analysis

Clustering gene expression profiles Classifying gene expression profiles

find stable differentially expressed genes

Page 12: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Gene Expression Analysis ResultsGene Expression Analysis Results

The Discovery System• Correlation test• Voter selection• Class prediction

Page 13: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Protein Interaction ExtractionProtein Interaction Extraction

“What are the protein-protein interaction pathwaysfrom the latest reported discoveries?”

Page 14: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Protein Interaction Extraction ResultsProtein Interaction Extraction Results

Rule-based system for processing free texts in scientific abstracts

Specialized in extracting

protein names extracting

protein-protein interactions

Page 15: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Transcription Start PredictionTranscription Start Prediction

Page 16: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Transcription Start Prediction ResultsTranscription Start Prediction Results

Page 17: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Medical Record AnalysisMedical Record Analysis

Looking for patterns that are valid novel useful understandable

age sex chol ecg heart sick49 M 266 Hyp 171 N64 M 211 Norm 144 N58 F 283 Hyp 162 N58 M 284 Hyp 160 Y58 M 224 Abn 173 Y

Page 18: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Medical Record Analysis ResultsMedical Record Analysis Results

DeEPs, a novel “emerging pattern’’ method

Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI benchmarks

Works for gene expressions

Page 19: Limsoon Wong Kent Ridge Digital Labs Singapore

Show & Tell

Behind the SceneBehind the Scene

Research Vladimir Bajic Vladimir Brusic Jinyan Li See-Kiong Ng Limsoon Wong Louxin Zhang

Business Peter Saunders

Industry Assignees Hao Han (gX) Rahul Despande (MC)

Engineering

Allen Chong Judice Koh SPT Krishnan Seng Hong Seah Guanglan Zhang Zhuo Zhang

Students Huiqing Liu Song Zhu Kun Yu