Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo
Residenza Universitaria Torrescalla
Milano -Italy
Scientific discovery is one of the most characterizing activity of the human mind
Can computers emulate human mind in scientific discovery?
Or are computers only a strong help in this activity?
Artificial Intelligence
Theoreticalfoundations
Methodologies Techniques
Artificial Intelligence
Hardware Software
Performancesof human mind
PHENOMENON
MODEL
LAW
ABDUCTION
INDUCTION
DEDUCTION
ADDUCTION
Philosophical background
Historical analysis
The early days of artificial intelligence saw various attempts to automate creative tasks of scientific and mathematical inference. Perhaps the earliest examples (on electronic computers) of symbolic mathematical or scientific inference were master’s theses at MIT (J.F. Nolan) and at Temple (H. G. Kahrimanian) in 1953 on analytical differentiation in the calculus. Starting in the 1960’s, Lederberg invented an algorythm for generating molecular structures efficiently, which led to the Stanford Dendral project whose goal was to elucidate molecularstructure on the basis of mass spectograms and other experimental evidence.
Space-state search
BACON DISCOVERS KEPLER’S THIRD LAW:
The squares of the periods of planets are proportional to the cubes of the mean radii of their orbits: P^2 / D^3.
BACON
• AI program developed in 1970s.
• BACON is provided with knowledge of certain mathematical relationships.
• It carries out a search through the space of possible compositions of those relationships
Heuristics of BACON
DmPn =constant
In conclusion:
• BACON does not know what it has discovered. It is BACON creators who comprehend the significance of the discovery.
So who is the real discoverer, human or machine?
Computer(Artificial
Intelligence)
Mathematicaldefinitions
anddemonstrations
The Four Color Theorem
First demonstration
1922 Franklin max 25 regions
Heesch
Reducibilityand
discharging
Appel e
Haken
1476particular cases
1200 hours processing
a not rigorous demonstration
The demonstration couldn’t be verified by a human brain
Proving something to the people would mean persuading
a sufficient number of qualified people. If we accept this kind of definition, in the future it will be possible that calculators will help men
in the discovery of the new laws of math
FROM AUTOMATED DISCOVERY PROGRAMS TO COMPUTATIONAL
SCIENTIFIC DISCOVERY SYSTEMS
• Up to the 80’s automated discovery programs discovered laws already known
• The state space approach leads to the explosion of possibilities
• Only the scientists can introduce heuristics
that can limit the number of possible states• Who really makes the discovery : man or
machines?
RECENT RESEARCH
• The idea of totally automated discoveries is abandoned
• The new trend is towards computer supported scientific discovery
• The new goal is to obtain really new discoveries, that can be published on specialized literature
MACHINE LEARNING
Different kinds of machine learning :
• Supervised learning
• Unsupervised learning
• Reinforcement learning
SUPERVISED LEARNING
• Ensemble of elemental units combined in a reticular structure
• Net elements, called neurons, are organized in layers and are tightly interconnected
• To each link is associated a weight that represents a kind of inner knowledge
MAIN FEATURES• Learning• Prediction
TRAINING• Training-set of examples (as input/output)• Learning Algorithm• Weights “calibration”
GOAL• Generalization of training results based on
test-set ( prediction )
TRAINING PHASE
Formal neuron structure
NEURAL NETS
NEURAL NETSThe endeavour of emulating the real neural nets leads to conceive various
kinds of nets that can be classified according to some parameters :
• Use
• Learning algorithms
• Links structure
The most known models are :
• Feedforward nets (with back-propagation algorithm,most used)
• Associative nets
• Stochastic nets
• Self-organizing nets (Kohonen, also unsupervised)
• Genetic nets ( models from Darwin evolution theory)
UNSUPERVISED LEARNING
DATA MINING Process of exploration and analysis ,with automatic and
semi-automatic tools, of vast amount of data, oriented to discover significant structures and rules and to develop predictive or explicative models of a specific phenomenon.
We have many techniques of data mining :
Decisional trees , data warehouse , clustering , associative rules and temporal sequences......
CLUSTERING
Cluster: Objects/data collection – Similar compared with each object in the same cluster – Different from other clusters objects
CLUSTERING ANALYSIS :To group objects together in cluster
• Clustering is defined as unsupervised classification: It doesn’t use any background knowledge on studied data set
TIPICAL APPLICATIONS• As stand-alone tool to try to understand how data are distributed (for ex. in genic expression data analysis,astronomic data
elaboration....)• As preprocessing pass for other algorithms
REINFORCEMENT LEARNING
• System acts directly on problem making attempts
• A teacher “rewards” or “punishes”
the system through a numerical
signal of reinforce , depending on
system instant behaviour
Rewards and punishments
An example : GOLEM ( built by Muggleton and Feng, 1992 )
PROBLEM : Prediction of secondary structure of protein from the _ ________ sequence of amino acids.
• A traditional method used for discovering the secondary structure is X-ray crystallography , but a crystal structure determination may require one or more man-year.
• In general, other techniques also used for this problem are costly,time-consuming and often limitated by some proteins parameters (like size ...).
• From this the need of computational systems support.
GOLEM : problem description
The two main substructures of proteins are : • α – helix structure• β – filamants structure
GOLEM :
• Restricts the field of his analysis to α– helix proteins• Attempts to predict, from primary _structure, if a particular residue (amino acid) belongs or not to the α– helix _type.
β – filaments structure _
α – helix structure
GOLEM FUNCTIONING (1)
TRAINING SET : 12 proteins ,non homologous, with well known structures
(LEARNING) of α– helix type , comprising 1612 residues.
+ BACKGROUND KNOWLEDGE
=SMALL SET OF RULES used for predicting which residues belong to α–helix
_ proteins.
TEST SET : 4 proteins (structure known) , α–helix type, comprising 416 _ __ residues
ACCURACY(on test set): 81% ( ±2 )
GOLEM FUNCTIONING (2)
• Information coded in 1 or 2 parts predicates
Ex: α(155C,105) means that a particular protein (155c) residue (in 105
___ position) is a α–helix type .
• Preferential research toward residues that show particular links characters with the others (data mining)
• Research of rules carried out with an iterative procedure that involves
a bootstrapping learning process.
• Then the rules generated by GOLEM can be considered hypothesis about
the ways through which α–helix form in nature.They define the pattern of relations that ,if present in the sequence of residues, indicates that a specific residue could be part of a α–helix .
GOLEM RESULTS• One of the rules produced by GOLEM ,concerning protein structure is for example the RULE 12 :
There is a α–helix residue in the protein A in position B if : 1 – The residue in B -2 is not proline 2 – The residue in B -1 is neither aromatic nor proline 3 – The residue in B is big, neither aromatic and nor lysine 4 – The residue in B +1 is hydrophobic and not lysine 5 – The residue in B +2 is neither aromatic nor proline 6 – The residue in B +3 is neither aromatic nor proline, and or small or polar and, 7 – The residue in B +4 is hydrophobic and not lysine
This rule has an ACCURACY of 95% in training and of 81% on test set.This rule was not known before GOLEM discovered it and it has contributed to one of the most important actual problem of natural sciences.That’s why we can credit to GOLEM the discover of a natural law.
CONCLUSION
We have presented some attempts of creating artificial intelligence programs that make scientific discoveries
From the historical analysis we have shown that the first idea of A.I programs that autonomously discover scientific laws has been abandoned
The new trend is that of computer supported scientific discovery in which Artificial Intelligence is a useful and sometimes necessary tool for scientific research