predictive science
TRANSCRIPT
-
8/7/2019 Predictive Science
1/101
Predictive Sciencea Tautology
Peter Nordin
-
8/7/2019 Predictive Science
2/101
-
8/7/2019 Predictive Science
3/101
The Answers is:
-
8/7/2019 Predictive Science
4/101
-
8/7/2019 Predictive Science
5/101
-
8/7/2019 Predictive Science
6/101
The asymmetry of similarity
! What thing is this like?
-
8/7/2019 Predictive Science
7/101
! And what is this like?
-
8/7/2019 Predictive Science
8/101
A heuristic measure of amount of
information: Shannons guessinggame
1. Pony?
2. Cow?
3. Dog?
345. Pegasus!
1. Pony?
2. Cow?
3. Dog?
345. Pegasus!
345
!
-
8/7/2019 Predictive Science
9/101
Science is Prediction
! When does the next solar eclipise in Europe occur?
! The next solar eclipse in Europe will happen in
August 12, 2026.
-
8/7/2019 Predictive Science
10/101
Science is Compression
-
8/7/2019 Predictive Science
11/101
The Model and
Science and Prediction
The Model
-
8/7/2019 Predictive Science
12/101
The Turkey and the issue
with inductive predictions(1)
-
8/7/2019 Predictive Science
13/101
The Turkey and the issue
with inductive predictions(2)
-
8/7/2019 Predictive Science
14/101
Mandatory Reading
-
8/7/2019 Predictive Science
15/101
-
8/7/2019 Predictive Science
16/101
-
8/7/2019 Predictive Science
17/101
-
8/7/2019 Predictive Science
18/101
All Real Science is
Predictive Science
! Predict when the sun will set tomorrow
! Predict if you will be sick or well by taking
this medicine
! Predict what will happen in this project if this
methodology is used
-
8/7/2019 Predictive Science
19/101
How to predict
anything:
1. Collect facts
2. Find a shortmodel fitting all the facts
3. Extrapolate that model into the future,
probability is length of model
4. Meta loop: Collect and include facts
about your model finding adventures,
goto step 2 and use for planning
-
8/7/2019 Predictive Science
20/101
-
8/7/2019 Predictive Science
21/101
-
8/7/2019 Predictive Science
22/101
Companies and
Prediction
! A company is a collection of peoplepredicting risk from actions
! No risk - no gain
-
8/7/2019 Predictive Science
23/101
Recent progress
-
8/7/2019 Predictive Science
24/101
Recent advances:
Universal Learning
Algorithms.There is a theoreticallyoptimal way of predicting the future,
given the past. It can be used to define
an optimal (though noncomputable)
rational agent that maximizes its
expected reward in almost arbitraryenvironments sampled from
computable probability distributions.
-
8/7/2019 Predictive Science
25/101
Recent advances:
All Scientist,: Physicists and economists and otherscientists make predictions based on observations. So
does everybody in daily life. Did you know that there is atheoretically optimalway of predicting? Everyscientist
should know about it.
Normally we do not know the true conditional probability distribution p(next event | past). But assume we do know that p is in some set P of
distributions. Choose a fixed weight w_q for each q in P such that the
w_q add up to 1 (for simplicity, let P be countable). Then construct theBayesmix M(x) = Sum_q w_q q(x), and predict using M instead of the
optimal but unknown p.How wrong is it to do that? The recent exciting work ofMarcus Hutter
(funded through Juergen Schmidhuber's SNF research grant"Unification of Universal Induction and Sequential Decision Theory")provides general and sharp loss bounds:
Let LM(n) and Lp(n) be the total expected losses of the M-predictor andthe p-predictor, respectively, for the first n events. Then LM(n)-Lp(n) is
at most of the order of sqrt[Lp(n)]. That is, M is not much worse than p.
And in general, no other predictor can do better than that!In particular, if p is deterministic, then the M-predictor soon won't make
any errors any more!If P contains ALL computable distributions, then M becomes the
celebrated enumerable universal prior. That is, after decades ofsomewhat stagnating research we now have sharp loss bounds forRaySolomonoff's universal (but incomputable) induction scheme (1964,
1978).Alternatively, reduce M to what you get if you just add up weighted
estimated future finance data probabilities generated by 1000
commercial stock-market prediction software packages. If only one ofthem happens to work fine (but you do not know which) you still should
get rich..
-
8/7/2019 Predictive Science
26/101
Intelligence
! Is compression
! If used for prediction
-
8/7/2019 Predictive Science
27/101
-
8/7/2019 Predictive Science
28/101
=
-
8/7/2019 Predictive Science
29/101
Art?
-
8/7/2019 Predictive Science
30/101
Theory Pyramid
Undedecidable stuff etc
Optimal Cognition
Algorithmic Information The.
Optimal prediction
Exprerimental planning
Turingcompete repr.
Bayes etc
Multivariate distrib stats
Sing var distrib stat
-
8/7/2019 Predictive Science
31/101
Agent
-
8/7/2019 Predictive Science
32/101
Formal Agent Model
-
8/7/2019 Predictive Science
33/101
Gdel machine
-
8/7/2019 Predictive Science
34/101
Artificial Intelligence
! Information-theoretic,
! Statistical, and
! Philosophical,
! Foundations of
! Artificial Intelligence
-
8/7/2019 Predictive Science
35/101
!
Universal AI
Universal Artificial Intelligence
= =
Decision Theory = Probability + Utility Theory
+ +
Universal Induction = Ockham + Bayes + Turing
-
8/7/2019 Predictive Science
36/101
Pieces of the puzzle
! Philosophical Issues: common principle
to their solution is Occams simplicity
principle. Based on Occams andEpicurus principle, Bayesian probability
theory, and Turings universal machine,
Solomonofdeveloped a formal theory
of induction.
! the sequential/online setup considered
in this pres and place it into the widermachine learning context.
-
8/7/2019 Predictive Science
37/101
What is I
! Informal Definition of (Artificial) Intelligence?
! Intelligence measures an agents ability to achievegoals in a wide range of environments.
! Emergent: Features such as the ability to learn andadapt, or to understand, are implicit in the above
definition as these capacities enable an agent to
succeed in a wide range of environments.
!
The science of Artifi
cial Intelligence is concernedwith the construction of intelligent systems/artifacts/agents and their analysis.
-
8/7/2019 Predictive Science
38/101
The Hiearchy
! InductionPredictionDecisionAction
! Having or acquiring or learning or inducing a model of
the environment an agent interacts with allows theagent to make predictions and utilize them in its
decision process offinding a good next action.
! Induction infers general models from specificobservations/facts/data, usually exhibiting regularities
or properties or relations in the latter.
! Example Induction: Find a model of the world
economy.
! Prediction: Use the model for predicting the futurestock market.
! Decision: Decide whether to invest assets in stocks orbonds. Action: Trading large quantities of stocks
influences the market.
-
8/7/2019 Predictive Science
39/101
-
8/7/2019 Predictive Science
40/101
Sequence
! Example 2:
! Digits of a Computable Number Extend
14159265358979323846264338327950288419716939937?
! Looks random?! Frequency estimate: n = length of
sequence. ki = number of occured i = Probabilityof next digit being i is i n . Asymptotically i n 1 10(seems to be) true.
! But we have the strong feeling that (i.e. with highprobability) the next digit will be 5 because theprevious digits were the expansion of!.
! Conclusion: We prefer answer 5, since we see more
structure in the sequence than just random digits.
-
8/7/2019 Predictive Science
41/101
Sequence 2
! Example 3:
! Number Sequences Sequence: x1 , x2 , x3 , x4 ,x5 , ... 1, 2, 3, 4, ?, ...
! x5 = 5, since xi = i for i = 1..4.
! x5 = 29, since xi = i 4 10i 3 + 35i2 49i + 24.Conclusion: We prefer 5, since linear relation involves
less arbitrary parameters than 4th-order polynomial.Sequence:
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,?
! 61, since this is the next prime
! 60, since this is the order of the next simple group
! Conclusion: We prefer answer 61, since primes are amore familiar concept than simple groups. On-Line
Encyclopedia of Integer Seque
-
8/7/2019 Predictive Science
42/101
Occam?
! Occams Razor to the Rescue
! Is there a unique principle which allows us to
formally arrive at a prediction which - coincides(always?) with our intuitive guess -or- even better, -which is (in some sense) most likely the best orcorrect answer?
! Yes! Occams razor: Use the simplest explanation
consistent with past data (and use it for prediction). Works! For examples presented and for many more. Actually Occams razor can serve as a foundation ofmachine learning in general, and is even afundamental principle (or maybe even the mere
defi
nition) of science.
! Problem: Not a formal/mathematical objective
principle. What is simple for one may be complicatedfor another.
-
8/7/2019 Predictive Science
43/101
Blue Emeralds?
! Grue Emerald Paradox
! Hypothesis 1: All emeralds are green.
! Hypothesis 2: All emeralds found till y2010 are
green, thereafter all emeralds are blue.
! Which hypothesis is more plausible? H1!Justification?
! Occams razor: take simplest hypothesis consistentwith data. is the most important principle in machine
learning and science.
-
8/7/2019 Predictive Science
44/101
View on probalilites
! Uncertainty and Probability
! The aim of probability theory is to describeuncertainty. Sources/interpretations for uncertainty:
! Frequentist: probabilities are relative frequencies.(e.g. the relative frequency of tossing head.)
! Objectivist: probabilities are real aspects of the
world. (e.g. the probability that some atom decays in
the next hour)
! Subjectivist: probabilities describe an agents degreeof belief. (e.g. it is (im)plausible that extraterrestrians
exist)
-
8/7/2019 Predictive Science
45/101
What we need
! Kolmogorov complexity
! Universal Distribution
! Inductive Learning
-
8/7/2019 Predictive Science
46/101
Principle of
Indifference(Epicurus)
!Keep all hypotheses thatare consistent with the
facts
-
8/7/2019 Predictive Science
47/101
Occams Razor
! Among all hypotheses consistent with thefacts, choose the simplest
! Newtons rule #1 for doing nature
philosophy
! We are to admit no more costs of nature
things than such as are both true and
sufficient to explain the appearances
-
8/7/2019 Predictive Science
48/101
Question
! What does simplest mean?
! How to define simplicity?
! Can a thing be simple under one definition
and not under another?
-
8/7/2019 Predictive Science
49/101
Bayes Rule
! P(H|D) = P(D|H)*P(H)/P(D)
-P(H) is often considered as initial degree
of belief in H
! In essence, Bayes rule is a mapping fromprior probability P(H) to posterior
probability P(H|D) determined by D
-
8/7/2019 Predictive Science
50/101
How to get P(H)
! By the law of large numbers, we canget P(H|D) if we use many examples
!Give as much information about thatfrom only a limited of number ofdata
! P(H) may be unknown,uncomputable, even may not exist
! Can we find a single probabilitydistribution to use as priordistribution in each different case,with a proximately the same result asif we had used the real distribution
-
8/7/2019 Predictive Science
51/101
Hume on Induction
! Induction is impossible because we can onlyreach conclusion by using known data and
methods.
! So the conclusion is logically alreadycontained in the start configuration
-
8/7/2019 Predictive Science
52/101
-
8/7/2019 Predictive Science
53/101
-
8/7/2019 Predictive Science
54/101
-
8/7/2019 Predictive Science
55/101
-
8/7/2019 Predictive Science
56/101
-
8/7/2019 Predictive Science
57/101
-
8/7/2019 Predictive Science
58/101
Only one algorithm?
-
8/7/2019 Predictive Science
59/101
Solomonoff s Theory of
Induction
! Maintain all hypotheses consistent with thedata
! Incoporate Occams Razor-assign the
simplest hypotheses with highest probability
! Using Bayes rule
-
8/7/2019 Predictive Science
60/101
Kolmogorov
Complexity
! k(s) is the length of the shortest programwhich, on no input, prints out s
! k(s)=n
! k(s) is objective (program languageindependent) by Invariance Theorem
-
8/7/2019 Predictive Science
61/101
Universal Distribution
! P(s) = 2-k(s)
! We use k(s) to describe the complexity of anobject. By Occams Razor, the simplest
should have the highest probability.
-
8/7/2019 Predictive Science
62/101
Problem: !P(s)>1
! For every n, there exists a n-bit string s, k(s)= log n, so P(s) = 2-log n = 1/n
! "+1/3+.>1
-
8/7/2019 Predictive Science
63/101
Levins improvement
! Using prefix-free program
! A set of programs, no one of which is a
prefix of any other
! Krafts inequality
! Let L1, L2, be a sequence of natural
numbers. There is a prefix-code with this
sequence as lengths of its binary code words
iff!n2-ln
-
8/7/2019 Predictive Science
64/101
Multiplicative
domination
! Levin proved that there exists c, c*p(s) >=p(s) where c depends on p, but not on s
! If true prior distribution is computable, thenuse the single fixed universal distribution p
is almost as good as the actually truedistribution itself
-
8/7/2019 Predictive Science
65/101
! Turings thesis: Universal turingmachine can compute all intuitivelycomputable functions
!
Kolmogorovs thesis: the Kolmogorovcomplexity gives the shortestdescription length among alldescription lengths that can beeffectively approximated according to
intuition.! Levins thesis: The universal
distribution give the largestdistribution among all the distributionthat can be effectively approximatedaccording to intuition
-
8/7/2019 Predictive Science
66/101
Universal Bet
! Street gambler Bob tossing a coin and offer:
! Next is head 1 give Alice 2$
! Next is tail 0 pay Bob 1$
! Is Bob honest?
! Side bet: flip coin 1000 times, record the
result as a string s
! Alice pay 1$, Bob pay Alice 21000-k(s) $
-
8/7/2019 Predictive Science
67/101
! Good offer:
! !|s|=1000 2-1000 21000-k(s)=!|s|=1000 2
-k(s)
-
8/7/2019 Predictive Science
68/101
Notice
! The complexity of a string is non-
computable
-
8/7/2019 Predictive Science
69/101
Conclusion
! Kolmogorov complexity optimal effectivedescriptions of objects
! Universal Distribution optimal effective
probability of objects
! Both are objective and absolute
-
8/7/2019 Predictive Science
70/101
The most neutral possible prior
! Suppose we want a
prior so neutral thatit never rules out a
model
! Possible, if limit to
computablemodels
! Mixture of all
(computable) priors,
with weights, "i, that
decline fairly fast:
! Then, this
multiplicatively
dominates all priors
! though neutral priors
will mean slowlearning
! m(x) are universal
priors
-
8/7/2019 Predictive Science
71/101
The most neutral possible coding
language
! Universal programming languages (Java, matlab, UTMs,etc)
! K(x) = length of shortest program in Java, matlab, UTM,
that generates x(Kis uncomputable)
! Invariance theorem
!any languages L1, L2,
#c,
! $x|KL1(x)-KL2(x)| #c
! Mathematically justifies talk ofK(x), not KJava(x) , KMatlab(x),
-
8/7/2019 Predictive Science
72/101
So does this mean that choice oflanguage doesnt matter?
! Not quite!
! ccan be large
! And, for any $L1, c0, #L2, xsuch that
! |KL1(x)-KL2(x)| $c0
! The problem of the one-instruction code for the
entire data set
But Kolmogorov complexity can be made
concrete
-
8/7/2019 Predictive Science
73/101
Compact Universal Turing
machines
! 210 bits, !-calculus ! 272, combinators
Not much room to hide, here!
-
8/7/2019 Predictive Science
74/101
Neutral priors and Kolmogorov
complexity
! A key result:
! K(x) = -log2m(x) o(1)
! Where m is auniversal prior
! Analogous to the
Shannons sourcecoding theorem
! And foranycomputable q,
! K(x) # -log2q(x) o(1)
! Fortypicalxdrawn from q(x)
! Any data, x, that islikely for anysensible probabilitydistribution has lowK(x)
-
8/7/2019 Predictive Science
75/101
Prediction by simplicity
! Find shortest program/explanation for current
corpus (binary string)
! Predict using that program! Strictly, use weighted sum of
explanations, weighted by brevity
-
8/7/2019 Predictive Science
76/101
Prediction is possible (Solomonoff,1978)
Summed error has finite bound
! sj is summed squared error betweenprediction and true probability on item j
! So prediction converges [faster than 1/
nlog(n)], for corpus size n
! Computability assumptions only (nostationarity needed)
-
8/7/2019 Predictive Science
77/101
Summary so far
! Simplicity/occam- close and deep
connections with Bayes
! Defines universal prior (i.e., based on
simplicity)
! Can be made concrete
! General prediction results
! A convenient dual framework to Bayes,
when codes are easier than probabilities
-
8/7/2019 Predictive Science
78/101
-
8/7/2019 Predictive Science
79/101
-
8/7/2019 Predictive Science
80/101
-
8/7/2019 Predictive Science
81/101
-
8/7/2019 Predictive Science
82/101
-
8/7/2019 Predictive Science
83/101
-
8/7/2019 Predictive Science
84/101
-
8/7/2019 Predictive Science
85/101
-
8/7/2019 Predictive Science
86/101
-
8/7/2019 Predictive Science
87/101
-
8/7/2019 Predictive Science
88/101
-
8/7/2019 Predictive Science
89/101
-
8/7/2019 Predictive Science
90/101
-
8/7/2019 Predictive Science
91/101
-
8/7/2019 Predictive Science
92/101
-
8/7/2019 Predictive Science
93/101
Methods
-
8/7/2019 Predictive Science
94/101
-
8/7/2019 Predictive Science
95/101
-
8/7/2019 Predictive Science
96/101
Infrastructure
-
8/7/2019 Predictive Science
97/101
-
8/7/2019 Predictive Science
98/101
-
8/7/2019 Predictive Science
99/101
-
8/7/2019 Predictive Science
100/101
-
8/7/2019 Predictive Science
101/101