future biology is semantic information...

33
Future biology is Semantic information science Tetsuro Toyoda, ph.D Director of Bioinformatics And Systems Engineering Division, RIKEN, Japan 1 Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. Nature of LifeInformation Nature of Replication Selfreplication2 Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011. - 3 -

Upload: others

Post on 14-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

1

Future biology is Semanticinformation science

Tetsuro Toyoda, ph.DDirector of

Bioinformatics And Systems Engineering Division, RIKEN,

Japan11

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Nature of Life=Information

Nature of Replication

(Self‐replication)

2

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 3 -

Page 2: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

2

Life evolving 

through replications and selectionsVarious replications(Fertilized eggs)

Replication of genome information(DNA replication)

Various groups of individuals

Evolution Cycle (About 3.8billion years)

Natural Selection3

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Genome is blueprint of Life(Genetic Information )

Genomic information is recorded by combined DNA blocks.

DNA blocks are structured by only 4 kinds of blocks: A T GDNA blocks are structured by only 4 kinds of blocks: A, T, G, and C.

To map sequences of DNA blocks is Genome sequencing. 

Sequenced genomic information is represented by character information as shown below. ATGCTAGTCGCGTGCTAGTCGATCTCGTGCA・・・・・・

Human’s genome represented by about 3 billion letters.

Genome of a model organism (A. thaliana ) is represented by about 100 million letters.

Image of DNA blocks4

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 4 -

Page 3: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

3

Biological diversity resulting from diversity of Genomic information

Informatics viewpoint

Sequences of DNA blocks evolve↓means  

Information is evolved(Biological diversity)

Informatics viewpoint

Material viewpoint

DNA of human and microorganisms are consisted of the same 4 kinds of DNA blocks (G,A,C, and T).↓

Any living organism hasn’t evolved as a material. (Common among living organisms)

Material viewpoint

5

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Nature of Life = Genomic InformationRecording media for information Expression of information (phenotype)

Digital I

CD、DVD

nform

ation

Gen

o

Copied CD(Copied information)

omic In

form

ation

Cell

DNA

Copied DNA(Copied information) 6

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 5 -

Page 4: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

4

Invention of the machines decoding genomic information from DNA.

DInfo

CD、DVD

The invention allows people to deal with genomic information with computer.

Digital rm

ation

Gen

omic 

Inform

ation

DNA

DNASequencer

DNA Synthesizer

7

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Future biology is information science.Future biologist should learn computer and mathematics

DNASequencer

DNA Synthesizer

8

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 6 -

Page 5: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

5

SciNetS is our platform on the web

f G d ifor Genome design

Part 2

Bioinformatics And Systems Engineering Division, RIKEN,

Japan99

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

What is SciNetS?

What is SciNetS?

10

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 7 -

Page 6: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

6

SciNetS Purpose, Methods and System

Purpose-Methods-System

SciNetS: Science for the Planet and SocietySciNetS: Science for the Planet and Society

SciNetS: Science, Innovation, Education, Technology, Synergy

SciNetS: Scientist's Networking SystemSciNetS: Scientist s Networking System

11

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Science for the Planet and SocietySciNetS

Science for the Planet and Society

Purpose‐Science for the Planet and Society

ySciNetS

Science for the Planet and SocietySciNetS

"The purpose of Scinets is nothing less than to provide the tools so that humankind can adapt the planet's environment and itself so that all living things and human society may survive in the face of an uncertain future."

Tetsuro ToyodaDirector of BASE Division

The cloning of species such as rice to acquire stronger, or more environmentally friendly, crops could be made easier thanks to the intelligent search engine such as PosMed-plus. 12

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 8 -

Page 7: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

7

ScienceScinets

InnovationScinets

Methods and Disciplines

Scinets

EducationScinets

TechnologyScinets

SynergyScinets

"SciNetS creates new synergy for the disciplines of science, technology and education, as a collaboration engine and incubator for innovation"

Tetsuro ToyodaDirector of BASE 13

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Scientist'sSciNets

NetworkingSciNets

System Infrastructure for Collaboration

System: Infrastructure for CollaborationThe Scinets Scientist's Networking System is a web system built on the next generation Semantic Web standard.

Making use of the latest in cloud computing technology the system is capable of simultaneously

SystemSciNetS

What databases are needed to realize rational organism design? That is the question we attempt to answer.”

Tetsuro Toyoda technology, the system is capable of simultaneously hosting upwards of thousands of virtual laboratories, enabling every individual user to access and control each database and data item.

Scinets is developed and operated by theBASE Bioinformatics And Systems Engineering divisionof RIKEN Institute of Physical and Chemical Science.

Tetsuro Toyoda Director of BASE

14

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 9 -

Page 8: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

8

Virtual Labs

Virtual labs on the Scinets system enable each lab researcher to easily create and publish databases without the need to maintain individual web servers.

This information infrastructure is beneficial for researchers conducting international collaborative research.

15

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Virtual Labs - Integrated information infrastructure

These Virtual labs are used, for example, as integrated life science information infrastructure incorporating functions including a programming environment, a digital lab notes system, and information resources for biomass engineering research activities.

SciNetS contained 644 tables and 7.5 million rows of SemanticTable data as of December 2010.

Integrated Database of MammalsThe integrated databases of mammals in Scinets is developed to secure sustainability, accessibility, utility and publicity of the data produced from multiple large-scale programs promoted by RIKEN.

16

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 10 -

Page 9: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

9

• Plant Integrated Database

• Mammalian Integrated Database

• Integrated Database of Protein Experiments and

Current Virtual Lab Platforms in Scinets

• Integrated Database of Protein Experiments and Structure

• SemanticTable

• GenoCon International Rational Genome Design Contest “where students and researchers compete in designing genome base sequences.”

• Many otherso Personal databaseso Repository for electronic laboratory notes of

unreported datao Joint research or ‘medical clouds’—an electronic

health chart network among medical specialists and clinicians.

17

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

As an integrated database, Scinets provides a versatile database container compatible with international standards. The system facilitates data sharing and also controls data disclosure.

Using the semantic web, Scinets provides a ‘total incubation infrastructure system’ for life science-related databases.

A Total Incubation Center for Databases

18

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 11 -

Page 10: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

10

This container automatically enables the standardization and collection of data.

Automatic Standardized Data Collection

19

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

In this fashion Virtual laboratories can create control and evaluate their activities with fine control over what, when and how they are disclosed.

Balanced Control, Evaluation and Disclosure

This creates a new balance where data can be integrated into existing databases, and spread worldwide.

During semantic inference, a user gains access to a Semantic-JSON repository by invoking Semantic-JSON URIs, consisting of an internal ID and a command name, along with step-by-step receipt of a fragment of Semantic Web data in the JSON format.

20

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 12 -

Page 11: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

11

The semantic web is an extension of the widely used internet.

Though the Internet is suitable for use by people, they can read, understand and search for information by following

Scinets is key to analyzing the data

hyperlinks to documents. Automated computer-based data analysis is limited because links don't define the relationships between documents.

In the semantic web, all data has meaning, and every link refers to the relationships between the data, enabling computers to search data effectively for automated data analysis, including doing

Scinets also integrates semantic social networking tools such as Facebook for its users to connect and collaborate with each other.

y , g gthe science itself.

21

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

“A database is not merely a container for data, it is also a place where even life can evolve.We can create useful biological resources from information

Is even a place where life can evolve

resources by selecting useful genes from databases, designing new genomes, and returning them to the world of living organisms."

Tetsuro ToyodaDirector of BASE

22

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 13 -

Page 12: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

12

The benefit of statistical inference plus explicit semantic relationsKeyword search genetic study data => Ranked candidate genes

Process Huge, Complex, Distributed Data

Global databases such as MEDLINE Scinets web service PosMed

(http://omicspace.riken.jp)

Produces practical in silico positional cloning system as a web service integrating variousOmics entities and biomedical document sets.

Neural Network Processing

23

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Through GenoCon Open platform for rational genome design –

Develop Researchers to playDevelop Researchers to play significant roles in new industries.

Part 3

Bioinformatics And Systems Engineering Division, RIKEN,

Japan2424

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 14 -

Page 13: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

13

By applying Semantic Web for Systems and Synthetic Biology

Creates Useful bio-resources from info-resources

25

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Fossil Resources Biomass

CO2

Glyco

sy CO2

CO2

CO2Crude

oilApplications

of Renewable Resource

Plants

Sustainable society supported by Green Biotechnology

Petrochemistry process

Distillation

Heavy O

il

Gas oil

Kerosene

Gasoline

Naphtha

Bioprocess

Syngas

Oil, etc

Sugar

Development of

Innovation Technology

Creation of New Industry

ylation

EnergyChemicalProducts Materials

21st Century - Sustainable Society~ Carbon-neutral ~

20th Century - Consumptive Society

New Industry BioMaterials

ChemicalProductsEnergy

↓Search for resourcesDevelopment of oilfield (Fossil resources)

↓Search for resourcesDevelopment of genome (Biological Resource)

26

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 15 -

Page 14: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

14

Toward an age of Rational Design using genomic data.

27

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Open platform for rational genome design

GenoCon International Rational Genome 

Design Contest offers contestants the chance to compete in technologies for rational genome design using a browser‐based programming environment provided at http://genocon.org

28

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 16 -

Page 15: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

15

Optimize genome design for plants that can absorb and resolve formaldehyde to prevent sick house syndrome.

• Introduce two enzymes, extracted from bacteria, into Calvin cycle of plants

• Enhancement of formaldehyde-resistance by shepherd's-purse and Tobacco.

AB: Gene introduction strain

Formaldehyde tolerance designed by Prof. Izui is the inaugural challenge in GenoCon 29

shepherd‘s-purse

AB: Gene introduction strainCK: Control

Tobacco

PHI

HPS Hexulose 6-phosphate synthase

Hexulose 6-phosphate isomerase

Introduced enzymes Patent applied by Prof. Izui,

Kyoto/Kinki University 2929

Various Plant data of Arabidopsis thalianaintegrated in RIKEN SciNetS

Resources from RIKEN• RIKEN RARGE Database• RIKEN Ds transposon line• RIKEN Activation tagging line • RIKEN Arabidopsis Fox‐hunting LineRIKEN Arabidopsis Fox‐hunting Line• RIKEN Arabidopsis Phenotype • RIKEN A.thaliana gene family• RIKEN Arabidopsis Protein record• RIKEN OmicBrowse – Arabidopsis

Resources opened by TAIR • Locus, PO annotation, GO annotation

↓Standardized data generated automatically

Supported by Japan’s National Project of database integration30

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 17 -

Page 16: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

16

RIKEN provides a web‐based programming interface for a user to process each data items in our database cloud, where all the programs and files the user has created are stored. 

Programming on a web browser

31Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Design freely and Test safely

32

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 18 -

Page 17: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

17

Process Flow for Open Optimization Research

Outsourcer/Organizer(Premium user of the platform)Call for designs within the scope

of his/her patented invention ↑ Initial example of DNA sequenceNot yet optimal design

Coverage protected by the patent

Contributor/Contestants(Free users of the platform)

Submit their designs to the contest

Evaluators(Funded from the beneficiary)

Evaluates submitted designs First

SecondThird

Designed examples submitted by the contestants

Outsourcer/Organizer(The beneficiary)

Obtains better designs protected within the scope of the original patent

Initial DNA sequence

↓ A better example of DNA sequence is discovered

Coverage protected by the patent

Contestants enjoy the contest and outsourcer/organizer obtain better examples of DNA sequences.

Designs that have been ranked experimentally

33

Incentives for participants in Open Optimization Research

Outsourcer/Organizer/Sponsor(The beneficiary)

Call for designs within the scope of his/her patented inventionand obtain optimal designs

Contestants/Researchers/Students(Contributors)

• Enjoy the contest experience• Learn from their experience

R i d

Open Optimization Research

p g

Researchers/Technical partners

(Evaluators)Obtain numerous examples

f i tifi t di

•Receive awards•Contribute to community

PR Sponsor(The beneficiary)

• Education related advertisement

GenoCon Platform (SciNetS)Provides infrastructure for contest and shares designs

and programs from researchers around the world

for scientific studies to create publications with

funding from the beneficiary

• Education-related advertisement• Learning material business

34

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 19 -

Page 18: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

18

SciNetS Semantic Web Case Study: y

Mouse Genome Project.

Part 4

Bioinformatics And Systems Engineering Division, RIKEN,

Japan3535

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Cloud computing is Internet‐based computing, whereby shared resources, software and information are provided to computers and other devices on‐

demand, like the electricity grid.

Cloud connectingmultiple resources

on demand 

Data resources integrated X‐ray protein dataNext‐generation sequencer data

Supercomputer36

- 20 -

Page 19: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

19

Emergence of new publication media for databases

37

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

SciNetS (Scientist’s Networking System)

38

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 21 -

Page 20: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

20

N b f d t b h t d

Number of databases (Virtual Labs) hosted is larger than the number of servers in the system.

Databases > Server computers > System

The many database projects provided by SciNetS

Scale out  (a single system hosting numerous databases)

Number of databases hostedby RIKEN SciNeS

10,000

1 10 100 1,00010,000100,0001,000,000

Number of databases to be integrated

Hosting databases separatelyhas a limitation

Available at http://database.riken.jp

→ SciNetS is expected to function as a new academic medium for publishing individual databases as entire sets of research results.

39

More than 200 Virtual laboratories integrated on SciNetS

40

- 22 -

Page 21: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

21

All the SciNetS Databases contain meta data, a Semantic Web

Mouse Phenotype Database

“Meta data” (RDF) includes semantic links among data items and

archives

items and resources

“Data resources” include experimental results or statistical results 41

A Semantic web of Mouse databases

42

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 23 -

Page 22: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

22

Searching 31 mammalian databases and literature

43

Allele Name

Link to equivalent record in MGI

Link to Gdf5 gene in MGI

Link to other alleles in MGI

Container automatically enables the standardization and collection of data

Link to other alleles in MGI

Image (mouse over to magnify) Mutant mouse conveying this allele

Phenotype annotation in MGI

Link to CDT‐DB for Gdf5 gene expression

Detailed Phenotype annotation in RIKEN

44

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 24 -

Page 23: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

23

InterPhenome: International Data Sharing of Mouse Phenome

45

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Importing a Phenotype Database to Protégé 

The URI of this database

1. Copy the URI

2. Click “OK”

1. Copy the URI

46

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 25 -

Page 24: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

24

Individuals

Access with a URI

Browsing Mouse Phenotype Data imported in Protégé

Class Tree

Semantic Network

Retrieve RDF

Properties47

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

http://semantic‐json.org

Semantic‐JSON: API for accessing SemanticWeb data with JSON

48

- 26 -

Page 25: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

25

Sample Operation in Mathematica

Load the Semantic‐JSON package

Set the server’s URL

Obtain labels of two individuals

Semantic‐JSON supports most languages used by Bio‐informaticians

Obtain the shortest paths between two individuals

Similarly, Semantic‐JSON is available in popular languages including Java, Ruby, Perl, Python.

49

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

RIKEN provides a web‐based programming interface for a user to process each data items in our database cloud, where all the programs and files the user has created are stored. 

Web‐based programming interface 

50

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 27 -

Page 26: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

26

“Superbrain” solving biological problems by simulating artificial brain understanding the knowledge of biology

Disease-related genetic interval

RIKEN Super Petaflop Computer

For example, a problem of genetics

Estimate candidate genes

Estimate

RIKEN Super Petaflop Computer

Superbrain

Collective Intelligence

1951

Limit of general search through database

Proposition: “My job is a jail”

job jail

Boolean understands that job ≠ jail52

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 28 -

Page 27: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

27

Open‐endedness in the brain 

Proposition: “My job is a jail”

job jail

Restriction?

Brain tries to understand beyond the framework of general concept

53

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Developing a brain type database(Superbrain)

Gene1 Ke ord Search b ser

hitGene1Momentary

Documents including Omics information,

literature, etc.~10 million Documentron

Concepts Unit~100 thousands

neuron

Document file for browse~10 million

Documentron

Concepts Unit~100 thousands

neuron

Large scale system which includes over 10 million documents and concepts for each neuron. 

Research Result

Display the hit document by

rounding up concept units such as

1

Gene3

Gene5

Keyword Search by user

keywords(Diabetes)

Conditions(chromosomal region)

hit

hit

1

Gene4

Momentary inference !

↑All calculations take within 1 second→Realization of high‐speed web research

ontology class, candidate gene, etc.

Inference layer

(attractor)

Concept layer(statistical test)

Expression layer

(Document summary )

Input layer(text search)

Out put Layer(web Service)

54

- 29 -

Page 28: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

28

New high speed technology calculates tens of thousands of spreadsheets

hit

The technology, creating 3-dimension spreadsheets for every gene (X=1,2,,,several tens of thousands of )

at once.

Including Excluding

Documentation files including literatures

(~10 millions)

Gene name , concept units, etc.

(~100 thousands)

(Calculating signal processing among neurons on brain-type database)

GeneX

hit

hit

gsearch word

gsearch word

Include

Gene X“A “units of Documents

“B “units of Documents

Exclude Gene X

“C “units of Documents

“D “units of Documents

Classifications by types of document database

Keyword search by users.

↑based on this table, statistical inferences such as  calculation of statistic, Fisher’s exact test, can be done.

各種概念とキーワードとの関連性を網羅的かつ、高速・統計的に評価できる

Correspondence relationship which curators associated

genes with documents.

(Medline, OMIM, PPI,Bio resources 等)

44↑All calculations within 1 second→Realization of high‐speed web research55

Ranking results of candidate genes↓

Keyword :  type 2 diabetesGenome Condition: gene of no.6 chromosome

Searching for candidate genes by positional cloning.

Conditions of compound retrieval

Search example: Inference of causative gene by positional cloning.

Interval

Rough mapping based on crossing experiment

(~10Mbp wide interval)

Too many genes

Inference of candidate gene

Identification of causative gene variation

So far contributed to 65 achievements of searching researches on disease-causing genes.

56

- 30 -

Page 29: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

29

Semantic Table

Part 5

Bioinformatics And Systems Engineering Division, RIKEN,

Japan5757

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Introduction

SemanticTable.org

58

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 31 -

Page 30: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

30

About SemanticTable.org• A "Table" is the most common type of data format used in biological research. The Semantic Web, on the other hand, where knowledge is represented by statements comprising a subject, predicate and object, is also becoming a major container for data accessible by automatic computational reasoners; i.e. computers.

Here we present "SemanticTable.org", a data-mart server providing a huge amount of biological semantic-

About Semantic Table

Here we present SemanticTable.org , a data mart server providing a huge amount of biological semanticweb data as simple table-style data files that are reversely convertible to the semantic-web format.The SemanticTable.org server contains 644 tables and 7.5 million rows of SemanticTable data as of December 2010.

Conversion from table data files to semantic-web data files and vice versa are supported by our web site (http://semantictable.org) and the World Wide Web Consortium (W3C) web site RDFa Distiller and Parser page at (http://www.w3.org/2007/08/pyRdfa/).

• We propose SemanticTable, based on the standard RDFa specification, as the most understandable way for biologists not familiar with semantic-web specifications to utilize semantic-web data.

59

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Introduction>> Biological data formats used in life science research .

• Biologists like to use the table format for their research results

• Since most biological research results are essentially structured kinds of data, however, it is appropriate to represent these research results as Semantic Web graphs consisting of subject-property-object triples.

>> Characteristics of table data and semantic data

Introduction

>> Characteristics of table data and semantic data .

60

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 32 -

Page 31: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

31

What is the Semantic Table• SemanticTable is a data container with advantages of both tables and Semantic Web graphs:

1. It provides a table view for biological data convertible to TSV format (tab-separated-value) text table data popular among biologists for computer processing..

2 It is convertible to explicit Semantic Web graph such as RDF that a computer can read and

What is the Semantic Table?

2. It is convertible to explicit Semantic Web graph such as RDF that a computer can read and process intelligently.

61

Explicit Correspondence Semantic Table and Semantic Web Graph

This table presents implicitly the following semantic meanings, which may be understood by people, but not by computers

Yellow row - Table Header

Correspondence between Semantic Table and Semantic Web Graph

Yellow row Table Header(Class or property name of data)

Blue Row - Table Data

Green Column - Subjects

Red column - Objects

- The first row means:“The Person named Michael is 23 years of age and his gender is Male”

- The second row means: “ The Person named Jennifer is 27 years of age and her gender is Female”

For Intelligent processing by a computer, the table data needs to be presented with explicit semantic data as follows in computer- readable Semantic Web graph format.

62Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 33 -

Page 32: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

32

Semantic Table MergingSemanticTable SemanticTable

Merging Semantic Tables

Using Semantic data could allow automatic drag and drop field

rearrangement by a program or browser running on a PC.

SemanticTable Merge Program Features:

RDF RDF

・Merging Multiple Class Semantic Tables・Subject Column Swapping・Dumping to Text

SemanticTable

join

63

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

Join the SemanticTable Consortium

C ll f d l t j i th ti t t d th h• Call for developers to join the consortium to create and author such programs; the fundamental semantic groundwork is ready to use.

• Open collaboration for the standard will jumpstart the new platform.

• Open source programs for SemanticTable run on cloud-computing technology, as do the programs for spreadsheet tables.

• We hope the SemanticTable format will be distributed broadly and freely for general use, as well as for scientific use.

Bioinformatics And Systems Engineering (BASE) division, RIKEN Yokohama Institute of Chemical and Physical Science

www.base.riken.jp/english

[email protected] 64

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 34 -

Page 33: Future biology is Semantic information sciences-web.sfc.keio.ac.jp/conference2011/0101-toyoda.pdf · DNA blocks are structured by only 4 kinds of blocks: A, T, G, and C. To map sequences

2011/3/4

33

End

Thank you very much!

Bioinformatics And Systems Engineering Division, RIKEN,

Japan6565

Copyright © Tetsuro Toyoda, March 4, 2011. Presentation at Semantic Web Conference 2011.

- 35 -