|
Rules of Probabilistic Reasoning:
Rules of Reasoning with
probabilities. This lemma
gives effectively a new theory of and a new approach to probability.
Fortunately for those who are familiar with probability theory, the
standard theorems are preserved.
1. Introduction: One family of theories concerning theoretical
probabilities in the given sense is Bayesianism, that is based on
Bayes' Theorem or, equivalently,
on the observation that conditional probability in principle may explain
how one can learn from experience.
In the following discussion some knowledge about
probability theory,
logic and Bayesianism is presupposed. A
useful and readable introduction to the latter is Howson & Urbach:
'Scientific
Reasoning - The Bayesian Approach'.
Bayesianism is attractive but comes with several problems, such as
the assumption of omniscience (it seems as if one should know all
consequences of one's theories, and all probabilities of all possible
consequences); the problem of priors (how does one arrive at the
probability of theories); the problem of predictions (how does
one arrive at the probability of the predictions of theories); the
problem of old evidence (what if one finds that a theory entails
something one knew already, but wasn't aware that the theory entailed);
and problems with the subjective interpretation of probability
(that many Bayesians presuppose).
2. Preliminaries: There are various answers to these problems by Bayesians, but the
present lemma sketches another approach, that is based on taking time
serious, including the fact that actual theories are developed in time,
by actual people, and do not arrive ready made in scientific journals
with all their logical consequences, and that seeks to answer the above
problems by some postulates that serve as basis for Rules of Reasoning
with probabilities, and the statement of which requires a few
preliminary assumptions about time, propositions and notation.
First, it is assumed that all propositions to be considered have a
temporal index, which satisfies assumptions for a temporal logic.
In the present only the skeleton is used, namely the temporal index,
for which a suffix is used, that is supposed to range over moments or
stretches of time, with (P).t+x later than
(P).t if 'x' is
positive and '(P)' a proposition, and
(P).t-x earlier
than (P).t . Here '(P).t' is read as 'P at t',
and it is also tacitly assumed that the temporal adfix distributes
properly over logical connectives, as e.g. in (P&Q).t
IFF ((P).t&(Q).t), for this should be implied by
the presupposed temporal logic.
Second, it is assumed propositions come in two kinds:
Theoretical
propositions, indicated by the predicate 'THE', that may contain all
manner of generalizations, abstract entities, hypothetical entites etc.,
and empirical propositions, indicated by the predicate 'EMP',
that must state a claim that is decidable by some experience, and thus
cannot be equivalent to a generalization that goes beyond experience
etc.
The predictions of an empirical theory are supposed to include
empirical propositions, and though much may be said here, e.g. of a
methodological nature, about how to ascertain that something is an
empirical proposition the problems here also will not be further
discussed, as not very relevant to what this lemma is about.
Third, we assume standard predicate logic and standard probability
theory, and some basic knowledge about intervals and functions.
Intervals will be rendered as '[X,Y]' with 'X' for the lower boundary
and 'Y' for the upper boundary, while 'f(Z)' is used for the phrase 'is
a function of Z' i.e. depends on Z. Thus 'x e [10-1/2f(10),
10+1/2f(10)]' says that x belongs to the interval that depends on f(10)
and lies symmetrically around 10, and if f(10)=10 this means x is
between 5 and 15 inclusive. Both intervals and functions could be left
out, but are included for resp. clarity and generality.
Fourth, it is both convenient and adequate to explicitly include
reference to the background knowledge that is assumed. This is written
as 'K', and it is assumed that
p(K).t = 1 if t=now: What one
counts presently as background knowledge is presently certain (but may
be found false tomorrow in some respect). Note this implies for any Q at
t=now that p(Q|K).t
= p(Q).t
.
3. Postulates for real probability: Now the postulates are as follows, and will be briefly explained
and discussed after their statement:
A. Real probabilities are proportions of cardinal numbers of sets in some
domain D.
(Di)(D)(
DiaD
-->
p(Di) = #(Di) : #(D)
p(Di|Dj) = #(DiODj) : #(Dj))
This is a new interpretation of what probabilities are: proportions
of cardinal numbers. This has the great advantage that, thus defined,
probabilities exist objectively if the sets they are derived from exist
objectively, since such sets have cardinal numbers. This is also why
they are called real probabilities.
The justification is that proportions as defined go a long way
towards the standard axioms for probability theory but need some
supplementation to reason with probabilities. (See:
Measurement of reality by
truth and probability)
Also the definition has the disadvantage that often the cardinalities
of sets are not known or only imperfectly known, and anyway different
people may have different ideas about them.
A convenient concept that can be defined here is that of a
random set:
B. A set D is a random set iff for
every element of the set the
probability that it belongs to any subset of the set equals
the
probability of the subset in the set:
RandomSet(D) IFF (Di inc
D)(deD)( p(deDi)=p(Di) )
Normally it is easy to make a random set D' for a given set D: One useful way to is to
put each name of each of the elements in D on
a slip of paper (of the same size and kind for all names); put these slips in
a vase, bowl or urn; thoroughly shake it; and blindly select from it.
C. Probabilities are objective, but there are personal probabilities:
What a person a believes an objective probability to be.
aB(p(Di)=d) IFF p(a,Di)=d
Thus, a personal probability is no more or other than a personal belief about
what a real probability is or might be. The personal probability exists, if it
does, because a person as thought about what the real probability might be; the
real probability exists because real sets have cardinal numbers.
Note that the real probability is neatly
indicated and kept apart by the notations for personal probability of
a which is
p(a,Di)=d and the real probability
p(Di)=x. This allows us to say that a believes
truly that
p(a,Di)=d iff
p(Di)=d i.e. if his personal probability for
Di
is the same as the real one. And all that 'aB(p(Di)=d)'
means is that 'a believes that the probability of Di
equals d'.
The justification of the postulate is that there is evidently a need for it.
This immediately introduces the temporal complication and relativization I spoke
of above:
Real probabilities may exist in time, but this they do in their own sense
that needs not to be discussed here, but personal probabilities do exist in time
and depend on one's evidence and
knowledge.
4. Postulates for personal probability: How personal probabilities
exist in time and depend on one's evidence and
knowledge needs some assumptions to get straight, first for
deductions:
D. Deductions are independent of
time once made:
(T)(P)(K)(t)(x>0) (K & T |= P).t
|= (K & T |= P).t+x
That is: If P is a prediction that
has been deduced from background
knowledge K and theory T
at t, then this remains so at any later time.
This seems obviously true in case if - as is assumed -
predictions always are
deductions from K&T. Or, in other words: If one
insists that the relation between an explanation and what it explains is
deductive.
Next, there is an assumption that explains how to arrive at
conditional probabilities:
E. Probabilities of predictions P from theories T&K with
background-
knowledge K are deductions from T&K:
(T)(P)(K)(q)(t) ( p(P | T & K).t = q IFF ( T & K |= p(P)=q
).t )
That is: The statement of a conditional probability of
P given
T&K at any time amounts to a deduction of the probability of
P from T&K. The
fact that conditional probability is explained in terms
of a deductive theory means that the probability of the prediction must depend
on assumptions made in T or K.
The justification of this postulate is that it gives a neat and
intuitive explanation for conditional probabilities of the stated kind, that
also says whence come probabilities from empirical propositions: From
assumptions in one's theories or background knowledge about the unconditional
probabilities of events. And in the end - given
the earlier assumptions above - these depend on what one knows or assumes about
the cardinal numbers of the sets of things one theorizes about.
Also, the last two assumptions have the great benefit of
explaining why Bayes' Theorem would work and how it is to be used in time and in
general, which is not clear from standard probability theory. For we have the
following theorem:
T1: p(P).t+x=1 -->
p(T|K).t+x
=
p(P|T&K).t *
p(T|K).t
: p(P|K).t
Proof:
(1) Suppose p(P).t+x=1
(2) p(T|P&K).t+x =
p(T|K).t+x by (1, PT)
(3) p(T|P&K).t
= p(T|K).t+x by (2, C, D)
(4)
p(T|P&K).t = p(P|T&K).t
* p(T&K).t
: p(P&K).t by
PT
(5)
p(T|K).t+x = p(P|T&K).t *
p(T|K).t : p(P|K).t by (2-4)
This explains the working of
Bayes' Theorem in time: If at a later time we verify a prediction of a
theory we can recalculate the probability of the theory using probabilities from
an earlier time (presumably the last time for which we have the required
evidence). Note that the new probability of the theory is the same as the
probability of the theory was at the earlier time if and only if in fact the
theory was irrelevant to the prediction at that earlier time.
F. Empirical probabilities depend on empirical samples.
(P)(K)(t) (P e EMP).t |=
p(P|K).t e [freq[P|K]-1/2.f(freq[P|K]), freq[P|K]+1/2.f(freq[P|K])].t
That is: The probability of an empirical proposition at t given
background knowledge K falls in an interval that depends on the empirical
frequency of P on K at t. It is here, with '1/2.f(freq[P|K]',
that the interval notation and the functional notation mentioned above is used.
The simplest case is that the dependency is identity, and then p(P|K) is supposed to be somewhere in a
symmetrical interval around freq(P|K).
The functional relation that determines the size of the interval
may depend on the size of the sample, for example. Also, it is useful to have an
interval-estimate rather than a point-estimate, if only to account for
uncertainties and for statistical estimates.
The justification here is that one needs, for empirical
theories, real and intersubjectively valid empirical evidence, and the only good
kind that one has here is such as is based on empirical samples (that have been
collected in proper methodological ways).
Note also that frequencies are not probabilities: They are
summaries of evidence for probabilities, that in the end are no more than a list
of particular data of what has been found in experience or experiment.
G. Theoretical probabilities depend on their least probable proper
consequence.
(T)(K)(t) (T e THE).t |=
p(T|K).t e [Min(T|K)-1/2.f(Min(T|K)), Min (T|K)+1/2.f(Min(T|K))].t)
This postulate for theoretical propositions is similar to the former
postulate for empirical propositions, in that it proposes an interval
within which the probability falls that depends functionally on a
quantity. But the quantity is not a frequency, for this one cannot have
for theories. Instead, the quantity proposed, written as 'Min(T|K).t' is
the least probable of the known proper consequences of T&K at t, where
the proper consequences of T&K are the Q that are entailed by(T&K) at t
that fail to be entailed by (~T&K) at t.
Formally, using '=d' for 'is by definition' and noting that
p(K).t=1
which makes
Min(T&K).t
=
Min(T|K).t
we define:
pc(K&T).t =d {Q:
(K&T |= Q).t
& ~((K&~T)|= Q).t}
Min(T|K).t
=d {q: p(Q).t=q &
Qepc(K&T).t
& (S)(Sepc(K&T).t
--> p(Q).t<=p(S).t}
The justification here is that the least probable of the known proper
consequences of T&K at t is something one can intersubjectively agree on
at t: It depends on the known evidence and on what can be deduced from
T&K. And it is consistent with probability theory in that it is the
maximum T is capable of, given K, for a theory cannot be higher than its
least probable proper consequence (since if T |= P then pr(T) <= pr(P)).
H. What is irrelevant to a theory and a
prediction of it is also irrelevant
if the theory and the prediction are true:
(T)(P)(X)(t)( (T irr X).t & (P irr X).t
|= (P irr X | T).t )
This postulate H enables one to test theories given predictions by
enabling one to abstract from irrelevant circumstances, which always
exist.
It involves a definition of irrelevance that extends the standard
probabilistic definition of independence while relying on some of the
above assumptions:
(A irr B).t
=d (x)( (A |= p(B)=x).t
IFF (~A |= p(B)=x).t
) &
(y)( (B |= p(A)=y).t
IFF (~B |= p(A)=y).t
)
This implies the standard properties of independence using postulates
A and B above. Also used in the above postulate H is conditional
irrelevance, that is defined similarly
(A irr B | C).t
=d
(x)( (C&A |= p(B)=x).t
IFF (C&~A |= p(B)=x).t
) &
(y)( (C&B |= p(A)=y).t
IFF (C&~B |= p(A)=y).t
)
The justification of postulate H is that it enables
induction, that is, in effect, that it enables the testing of
theories by their
predictions, which cannot be done properly if the
result of an experiment may depend on any fact that also happens to be
true, and involves no more than the assumption that a theory should also
entail whatever is relevant to its predictions.
The reasoning that postulate H enables is this, given that a
prediction P from a theory
T gets verified in a context where there
also happens some event X such that
(T irr X).t and
(P irr X).t:
p(T|P&X).t = p(P|T&X).t*p(T&X).t:p(P&X).t
by PT
=
p(P|T&X).t*p(T).t*p(X).t:p(P).t*p(X).t
by T irr X and P irr X
=
p(P|T&X).t*p(T).t:p(P).t
by algebra
=
p(P|T).t*p(T).t:p(P).t
by H
=
p(P&T).t:p(P).t
by PT
=
p(T|P).t
by PT
And thus one can inductively confirm a theory and abstract from all
manner of irrelevant circumstances. See:
Problem of Induction.
|