|
Bayesian Conditionalization: The
application of the following theorem of elementary probability theory:
p(T|P)=p(P|T)*p(T):p(P) to revise p(T) to p(T|P) if one learns that P is
true.
There are quite a few conceptual problems in using this theorem in
this way, and
the present lemma articulates one approach to dissolve these problems. One
key-idea is the distinction between degrees of belief and probabilities,
that are both brought together on the common footing of
proportions. Other assumptions
that relate proportions, probabilities and degrees of belief are made
Section 1.
Sections:
1. Probabilities, degrees of belief and proportions
2. Derivation of Bayesian Conditionalization for degrees of belief
3. A simple example
4. The reason for this lemma
5. Alternative and further axioms
6. Putting it all together
1. Probabilities, degrees of belief and proportions
I start with articulating a number of assumptions about
probabilities, degrees of belief and proportions. Some knowledge about
probabilities and
proportions is presupposed in what
follows, and can be gotten by way of the links. Likewise, I presuppose
some elementary knowledge of standard logic.
Ax.1: Probabilities are proportions.
Proportions are taken as ratios of cardinal numbers of subsets
in sets, and include conditional proportions as in probability theory.
Ax.2: Degrees of belief are proportions.
Therefore probabilities and degrees of belief share the formalities
and properties of proportions. Since degrees of belief may alter with
time they involve a reference to time, and Ax.2 accordingly may be
formalized thus:
(Ax.2) (t)(a)(X) ( ps(a,X).t e PROPORTION )
Here t is a temporal index, a names a person, and X a belief of the person,
and so ps(a,X).t is the degree of belief of a in X at time t.
Ax.3: Degrees of belief follow beliefs in probabilities.
In other words: A person's degree of belief in X - if the person is rational
- conforms to his belief about the probability of X. The reason for this
assumption is that one's beliefs about the probabilities are what one believes
and have a degree. Writing 'aB(pr(X)=y).t' for 'a believes at t that the
probability of X is y' Ax.3 may be formalized thus:
(Ax.3) (t)(a)(X)(y) ( ps(a,X).t = y IFF aB( pr(X)=y).t )
Next, we have a similar assumption as A.3 for conditional degrees of
belief, but without assuming these are derived from conditional
probabilities:
Ax.4: Conditional degrees of belief are beliefs in probabilities from
hypotheses.
This is an explanation of what conditional degrees of belief are that can be
formalized thus:
(Ax.4) (t)(a)(X)(Y)(z)( ps(a,Y|X).t = z IFF aB(X |- pr(Y)=z).t
)
One particular point about aB(X |- pr(Y)=z).t is that what a
believes and assumes about X allows a at t to deduce that pr(Y)=z if X is true.
This may involve rather a lot of assumptions in X, but that is as may be, and is
also the place where these assumptions should be.
Ax.5: Beliefs in probabilities from hypotheses once adopted remain adopted until revised.
This is a natural assumption about beliefs in probabilities from hypotheses:
Assumptions do not depend on time but on oneself and such hypotheses as one has,
and one retains them until one revises them, and revises them when revising the evidence. Ax.5 can be formalized thus:
(Ax.5) (t)(a)(X)(Y) ( aB(X |- Y).t-1 |- aB(X |- Y).t )
Note what I have not assumed: That degrees of beliefs are the same as
probabilities, and that what I have assumed that goes quite a long way in
that direction is: One can infer degrees of belief given beliefs in
probabilities. For this follows from Ax.3, which can be seen as a version of
David Lewis' so called Principal Principle.
Apart from the notation, there are alternatives for some of the above
assumptions. I mention two cases.
Thus, one possible weakening of Ax.3 is to assume that degrees of belief
depend functionally on beliefs about probability, e.g. as a linear function,
that allows for qualifications relating to the quality of the evidence one has.
However, it seems generally most sensible to use Ax.3 as stated for one's
calculations, and only after one has made them and has revised one's degrees of
belief to qualify the result with reference to the quality of the evidence, if
this is necessary. A general assumption that may enter here is
Ax.3A: Degrees of belief follow beliefs in probabilities and never
exceed them.
The reason is that one's probabilities state such guesses and such
evidence as one has, and therefore are in the nature of the best one can
do for the moment, given what one believes.
A possible additional assumption in Ax.5 is that also ~(aB~(X |-
Y)).t. The revised axiom accordingly may be written as
(Ax.5A) (t)(a)(X)(Y) ( aB(X |- Y).t-1 & ~(aB~(X |- Y)).t. |- aB(X |- Y).t )
This may be taken as saying that if a at t-1 believes that X entails Y and at
t a does believe that Y does not follow from X then at t a believes that X
entails Y. Thus the hypothesis at t affirms that one has not revised one's
estimate at t-1. In conclusion one has that one believes at t what one believed
at t-1.
2. Derivation of Bayesian Conditionalization for degrees of belief
Given these assumptions, the following theorem can be proved, where I avoid
universal quantifiers and use free variables instead (that allow the inference
of universal quantifiers):
T1. ps(a,E).t=1 & ps(a,E|C).t-1 = x & ps(a,E).t-1
= y & ps(a,C).t-1=z
|- ps(a,C).t=ps(a,C|E).t-1= x*z:y
This asserts Bayesian Conditionalization for degrees of belief follows from the usual hypotheses involved in
Bayesian conditionals. Note there is an explicit reference to time, and the new
evidence at t is that ps(a,E).t=1. The conclusion is a new
probability for C at t that differs from the old one at t-1 unless x=y.
One reason for this introduction of temporal indexes is that it makes a lot
of intuitive sense when speaking of degrees of belief. Another reason is given
in the next section.
Here is the proof of T1 with some comments:
(1) ps(a,E).t=1 by AI
(2) ps(a,E|C).t-1 = x by AI
(3) ps(a,E).t-1 = y
by AI
(4) ps(a,C).t-1=z
by AI
This lists all assumptions of the theorem to be proved.
(5) ps(a,C|E).t-1 = ps(a,E|C).t-1 * ps(a,C).t-1 : ps(a,E).t-1 by PT,
A1-4
(6) ps(a,C|E).t-1 = x*z : y by 2,3,4,7
Note that (5) follows from A1-4: Its left-hand side equals its right-hand
side in probability theory and therefore both sides are the same as degrees of
belief, since ratios of the same quantities are the same, and degrees of belief
follow probabilities. And then (6) follows from the assumptions.
(7) ps(a,C|E).t-1 = ps(a,C|E).t by 2-4, A5
This follows from A5 since all assumptions to derive ps(a, C|E).t-1
have been made. Note that this line is a crucial step for the proof.
(8) ps(a,C|E).t = ps(a,C&E).t
: ps(a,E).t by A2-3
This follows since one's degrees of beliefs are proportions that follow one's
beliefs in probabilities: Both terms on the right hand side are equal to their
corresponding probabilities, and therefore their quotient equals the conditional
degree of belief on the left hand side.
(9) ps(a,C|E).t = ps(a,C).t by 1,8
And this follows as ps(a,E).t=1, whence ps(a,C&E).t
= ps(a,C).t
,
and so we have the desired
conclusion
(10) ps(a,C).t = x*z : y by
6,7,9
QED. But I can say and prove more about conditionalizing:
(11) aB(pr(E)=1).t by 1, A3
(12) aB( C |- pr(E)=x ).t-1
by 2, A4
Here degrees of beliefs and beliefs in probabilities are interchanged. Now,
taking a few things for granted here that anyway make intuitive sense concerning
inferences with propositions of the form 'aB(X).t' and 'aB(pr(X)=y).t'
(13) aB(E).t by 11
This follows from aB(pr(E)=1).t . Since there also is ps(a,C|E).t =
x*z : y we have
(14) aB(E |- pr(C)=x*z : y ).t by 9,10,
A4
and now one can infer a revisal of one's degree of belief in the probability of C
by
ordinary Modus Ponens, that derives aB(Y).t from aB(X).t &
aB(X |- Y).t :
(15) aB(pr(C)=x*z : y ).t
by 13, 14
And thus Bayesian Conditionalization may be related to and explained in terms
of ordinary conditionalization, given the above assumptions. For clearly what we
can also prove is the counterpart of T1 in terms of beliefs:
T2. aB(pr(E)=1).t & aB(C |- pr(E)= x).t &
aB(pr(E)=y).t-1 & aB(pr(C)=z).t-1
|- aB(pr(C)=x*z:y).t-1
The proof of this can be gleaned from the foregoing proof, and is
simply a matter of using the assumptions that have been made that
relate degrees of beliefs and probabilities. It may be objected here
that the reasoning with propositional attitudes has not been clarified,
but all that is required here are assumptions to the following effect:
Ax.6 aB(pr(X)=1).t |- aB(X).t
Ax.7 aB(X |- Y).t |- (aB(X).t
|- aB(Y).t) These are sufficient for the above inferences
(13)-(15) and are intuitively obvious and unobjectionable.
3. A simple example It may be good to give
a schematic example of the sort of reasoning outlined above. Suppose
there is a disease C which is not very common, which usually but not
always comes with symptoms E, that without the disease are rare. Suppose
then that the probabilities are as follows - where I have written
everything to make sense also as percentages, since that is intuitively
helpful:
|
Table 1 |
| |
C |
~C |
|
| E |
9 |
1 |
10 |
| ~E |
1 |
89 |
90 |
| |
10 |
90 |
100 |
We need not concern ourselves here with how the probabilities were
precisely established, but what does matter is that C and E and their
complements refer to classes of cases, such as numbers of incidence of
the disease and the symptoms. It follows from the above assumptions
that one's degrees of belief are numerically the same as the
probabilities one believes. Accordingly, one's degree of belief that
someone has C is 1/10. Note that this does not concern a class of
cases, but the application of a known probability to a particular case,
with a degree of belief in return that is numerically the same as a
probability, on the strength of the assumptions we made about the
relations between proportions, probabilities and degrees of belief.
Now suppose that one finds that this particular person does show the
symptoms E. Then one's degree of belief that the person has the disease
C by the above reasoning changes from 1/10 to 9/10. Note that nothing
changes in the probabilities: They remain just as they were, and indeed
may be used again for other cases of possible C of other persons. There
is in the present approach no revised probability: There only is a
revised degree of belief given new evidence. If there are priors and
posteriors they are not in probability but in degrees of belief, and
indeed in section 2 the prior corresponds to ps(a,C).t-1 = z
and the posterior to ps(a,C).t = x*z : y.
4. The reason for this lemma The main
reason for the lemma on Bayesian Conditionalization derives from two
beliefs I have
- (A) The principle of Bayesian
Conditionalization is the best approach towards a logic of scientific
inference: Only something like it explains how we can learn from the
evidence and from experience and how we can revise our degrees of
belief systematically and rationally in the light of such evidence as
we have.
- (B) The usual accounts of Bayesian
Conditionalization are for various reasons mistaken: In particular,
what conditional probabilities are is not clearly articulated in
standard approaches; the lack of reference in the standard account to
what evidence one has at what time is confusing; and indeed the
standard account confuses degrees of belief and probabilities
systematically.
There is a lot of literature on the topic. Useful texts for (A) are
Howson & Urbach and
Adams.
Useful texts for (B) are the same plus
Stegmüller and Lewis. The last concerns 'A Subjectivist's Guide to
Objective Chance' in
Jeffrey Ed.
What is new in the present proposal are the axiomatic assumptions
in Section 1 and the proof in Section 2.
Apart from what was said in Section 1 about each assumption, one basic
conviction that motivates all assumptions is that degrees of belief
are not probabilities, but must conform them to them if one is
rational and one's belief in the probabilities is rational. And the
reason degrees of belief look like probabilities and behave like them
is that both degrees of belief and probabilities are proportions.
But in the present approach, the proportion that the probabilities
measure and express concern the real facts of the matter, whereas the
proportion that the degrees of belief measure and express
concern the beliefs of a person about the application of his beliefs
about the probabilities to some specific case. 5.
Alternative and further axioms
The axioms in section 1 seem intuitive, but here is
an alternative set, that also incorporates the distinction between
theoretical and
empirical
propositions, and an explicit
reference to presumed background
knowledge K. It will be
assumed that ps(a,K).t
= 1, but clearly one may have different background knowledge at different
times.
In order to write one of the axioms in a fairly clear way we also need
a definition, namely the definition of proper consequence of
K&X, which we write as 'K&X |< Y' and define as follows: aB(K&X |<
Y).t =def aB( (K&X |- Y)
& ~(K&~X |- Y)).t
Thus, Y is proper consequence of K&X at t if it follows at t
from K&X but not from K&~X.
Now the axioms for degrees of belief are these
Ax1: ps(a, X|K).t =y & X e EMP
IFF aB( K |- p(X)=y).t
Ax2: ps(a, X|K).t =y & X e THE
IFF aB( (EY)(Ez)( (K&X |< p(Y) = z & y=z).t &
aB( ~(EZ)(Ez)( (K&X |< p(Z) = z & y=z)).t
Ax3: ps(a, Y|X&K).t =z IFF aB(
K&X |- p(Y)=z ).t
Ax4: ps(a, X&Y|K).t
= ps(a, Y|X&K).t * ps(a, X|K).t
Ax5: ps(a, X|K).t
= ps(a, X&Y|K).t + ps(a, X&~Y|K).t
Ax6: ps(a, Y|X&K).t-1
= ps(a, Y|X&K).t
Ax1 can be seen as defining when one's degree of belief in X at t given
background knowledge K equals y in case X is an
empirical proposition:
Precisely if one believes that the probability of X at t is y given K.
Here one may rely on frequencies and sampling for one's beliefs in
probabilities, for the proposition X is supposed to be empirical.
Ax2 can be seen as defining when one's degree
of belief in X at t given background knowledge K equals y in case X is a
theoretical proposition. It is formulated as it is to insist that the Y and
Z that are used are proper consequences of K&X.
On this understanding one's degree of belief in
theory X at t given background knowledge K equals y
precisely if one believes that there is a proposition Y
at t that is a proper consequence of K&X that has probability y given K&X and
one believes there is not a proposition Z at t that is a proper consequence of
K&X that has a probability z such that z is smaller than y.
In brief: One's degree of belief in a theoretical proposition X
at t given K equals the probability of the least probable proper consequence of
X given K that one believes at t.
Ax3 can be seen as defining when one's degree
of belief in Y at t given K and X is z: Precisely if one believes that the
probability of Y at t is z given K and X.
These three axioms accordingly generate degrees
of belief from beliefs in probabilities. They all are relative to
time, which is best taken as some sort of interval, like 'today' or 'this
hour that I am thinking about this problem' (and not an infinitesimally small
now): At a later time, one may know more or believe less or differently.
Ax4 can be seen as defining one's degree of
belief in X&Y at t given K: This equals the product of one's degree of belief in
Y at t given X&K and one's degree of belief in X at t given K.
The degrees of belief on the right side of Ax4
can be obtained by way of Ax1-A3.
Ax5 can be seen as defining one's degree of belief in X
at t
given K in general,
whether X is theoretical or empirical: This equals the sum of one's degree of
belief in X&Y at
t given K and one's degree of belief in X&~Y
at
t given K for any Y.
The degrees of belief on the right side of Ax5
can be obtained by way of Ax1-A4.
Ax6 can be seen as imposing a consistency-condition on conditional degrees of
belief in Y given K&X in time: These conditional probabilities must be the same at
t as at t-1.
Note that by Ax3 what Ax6 says is
this
aB( K&X |- p(Y)=z ).t =
aB( K&X |- p(Y)=z ).t-1
and thus one plausible ground for Ax6 is that deductions are valid or not
irrespective of time: They do not depend on time but on logic and assumptions.
Now all one needs in the present terms and notations for recalculating
one's degrees of belief given one's beliefs in probabilities are these
(*) aB( K |- p(T) ) = t ).t-1.
aB( K&T |- p(F) = h ).t-1.
aB( K&~T |- p(F) = g ).t-1.
together with either of aB( K |- p(F ) = 1 ).t or
aB( K |- p(F ) = 0 ).t
for the given axioms allow one to calculate
respectively
ps(a, T | F&K ).t =
(h*t) : (h*t + g*(1-t)) if one believes F is true
ps(a, T | ~F&K ).t = ((1-h)*t) : ((1-h)*t + (1-g)*(1-t)) if one
believes ~F is true.
And of course both calculations are correct whatever one believes about F,
but one can believe - logically speaking - at most one of two contradictory
alternatives.
The algebra required for these calculations can
be gleaned from the following table, in which ps(a,K).t
= 1. The degrees of belief that are presupposed in (*), whence they
follow by the axioms given in this section, are coloured red, and all others can
be derived from these:
|
Table 2 |
| K |
T |
~T |
|
| F |
ps(a, F|T&K).t*ps(a,T&K).t |
ps(a, F|~T&K).t*ps(a,~T&K).t |
ps(a,F&K).t |
| ~F |
ps(a, ~F|T&K).t*ps(a,T&K).t |
ps(a, ~F|~T&K).t*ps(a,~T&K).t |
ps(a,~F&K).t |
| |
ps(a,T&K).t |
ps(a,~T&K).t |
1 |
And to conclude, here are the patterns of inference that the present note
provides axioms for.
First, in case p(F) = 1 at t:
aB( K |- p(T) ) = t ).t-1.
aB( K&T |- p(F) = h ).t-1.
aB( K&~T |- p(F) = g ).t-1.
aB( K |- p(F) = 1 ).t.
----------------------------------------
aB( K |- p(T) = (h*t) : (h*t + g*(1-t)) ).t.
And in case p(F) = 0 at t:
aB( K |- p(T) ) = t ).t-1.
aB( K&T |- p(F) = h ).t-1.
aB( K&~T |- p(F) = g ).t-1.
aB( K |- p(F) = 0 ).t.
---------------------------------------------------
aB( K |- p(T) = ((1-h)*t) : ((1-h)*t + (1-g)*(1-t))).t.
6. Putting it all
together again
The previous section involved background knowledge K because this is
realistic and generally present, but algebraically it makes no difference and
can be left out without any difference in calculated values, and that is the
plan we follow in the present section where we put the bits together.
What generally happens when using Bayesian Conditionalization with theories
and verified or falsified predictions can be summarized in tabular form as
follows.
First, at time t-2 all we have is a
real or possible fact F with some probability we believe. In order to
put it all in tabular form we start here with this:
|
Table 3 - at time
t-2 |
| aB |
|
|
|
| F |
|
|
pr(a,F).t-2 |
| ~F |
|
|
pr(a,~F).t-2 |
| |
|
|
1 |
It makes sense to remark that the times we shall refer to are t-2,
t-1 and t, and these are best conceived as intervals or periods in
which we consider our facts and hypotheses, and try to arrive at some
new conclusions. Second, having the mere possible fact F with a
believed probability at t-2 we introduce a
new theory T at t-1 and so we have ps(F|~T).t-1 as
corresponding to the originals for F: ps(a, F|~T).t-1 = ps(a,F).t-2
. Hence in general ps(a,F).t-1 will differ from ps(a,F).t-2.
What we get at t-1 before finding out about F is accordingly this, where
our assumptions have been made red:
|
Table 4 -
possibilities at t-1 |
| K |
T.t-1 |
~T.t-1 |
|
| F |
ps(a, F|T).t-1*ps(a,T).t-1 |
ps(a, F).t-2*ps(a,~T).t-1 |
ps(a,F).t-1 |
| ~F |
ps(a, ~F|T).t-1*ps(a,T).t-1 |
ps(a, ~F).t-2*ps(a,~T).t-1 |
ps(a,~F).t-1 |
| |
ps(a,T).t-1 |
ps(a,~T).t |
1 |
Here it makes sense to introduce some abbreviatory notation:
ps(a,T).t-1 = t
ps(a, F|T).t-1 = h
ps(a, F|~T).t-1 = ps(a,F).t-2
= f
We can calculate all degrees of belief in the last table from these, based on
the assumption that indeed they are proportions, like probabilities, and thus
the same algebra applies. Putting in these abbreviations, using '~x' for '1-x'
we get at t-1 the following, with juxtaposition for multiplication:
|
Table 5 -
fractions at t-1 |
| K |
T.t-1 |
~T.t-1 |
|
| F |
(ht) |
(f~t) |
(ht+f~t) |
| ~F |
(~ht) |
(~f~t) |
(~ht+~f~t) |
| |
(t) |
(~t) |
1 |
Third, at t we find that F, and we
recalculate both ps(a, T).t and ps(a, F).t using
Bayesian Conditionalization i.e. ps(a,T).t = ps(a, F|T).t-1*ps(a,T).t-1
: ps(a,F).t-1 for T and similarly in the other cases - and
note the temporal indexes. The result in terms of our abbreviations
looks as follows, if we work all possibilities out by algebra and
Bayesian Conditionalization:
|
Table 6 -
fractions at t after Bayesian Conditionalization |
| K |
T.t |
~T.t |
|
| F |
h(ht):(ht+f~t) |
(ht+f~t)(f~t):(ht+f~t) |
h(ht)+(f~t)(ht+f~t):(ht+f~t) |
| ~F |
~h(ht):(ht+f~t) |
(~ht+~f~t)(f~t):(ht+f~t) |
~h(ht)+(f~t)(~ht+~f~t):(ht+f~t) |
| |
(ht):(ht+f~t) |
(f~t):(ht+f~t) |
1 |
There is a new degree of belief for T at t, namely
(ht):(ht+f~t), and also a new degree of belief
for ~T at t and for F at t.
Note first, also with respect to the example in
section 2, that there is a difference between the case (A) of
applying statistics about a disease and symptoms to a patient and (B) of testing a
theory with a prediction. For (A) you may have empirical
probabilities for all cases, but not for (B). Also, in case of (A) you
can plausibly say that you apply probabilities based on classes of cases of
patients to a particular patient. The new T and F are then applicable to that
patient. But in case of (B) this mode of proceeding is not so plausible.
It is the second case I am really concerned
with, in principle. Here are three points about it, that summarize some of the
above points and add some:
- We started with pr(a,F).t-2 and no hypothesis at t-2.
- At t-1 we introduced three
hypotheses ps(a, F|T).t-1
and ps(a,T).t-1
and put ps(a, F).t-2 = ps(a, F|~T).t-1.
The reason for this last hypothesis is that it is what we started with
at t-2. We obtain a new calculated ps(a, F).t-1 = ps(a, F|T).t-1*ps(a,T).t-1
+ ps(a, F).t-2*ps(a,~T).t-1
= ht+f~t in abbreviated notation. Since in abbreviated notation this
means ht+f~t = ht+f(1-t) = t(h-f)+f we have some clues to how this
differs from f or when it is the same.
- At t we can calculate ps(a,T).t
using Bayesian Conditionalization,
on the hypothesis that F is true at t. This uses the ps(a, F) at t-1 i.e.
ht+f~t. But we also can calculate ps(a,~T).t
and then ps(a, F).t
and ps(a, ~F).t. This is more complicated but it also is
basic algebra, and requires no new data, and uses what was given or
assumed at t-1.
- At t we can also calculate ps(a,T).t
using Bayesian Conditionalization, on
the hypothesis that F is false at t: ps(a, T|~F).t-1
= ps(a, ~F|T).t-1
*ps(a,T).t-1
: ps(~F).t-1
= ~ht :
(~ht+~f~t). See Table 5.
- And we can likewise calculate ps(a,
~T|F).t
and ps(a, ~T|~F).t, and the algebraical results can
again be gleaned from Table 5.
So at t one arrives at what amounts to the
above table in general if one has explicitly calculated everything using
Bayesian Conditionalization, but of course with specific fractions for specific
cases.
One interesting fact about the last table is that
it charts the alternatives of a hypothesis, and indeed the true real
frequency of F, if any, which the alternative hypotheses attempt to catch, is
not the marginal ht+f~t which sums both.
Next, a remark about the stability of the
conditionals, which in the present reconstruction amounts to the stability of h
i.e. ps(a, F|T).t-1 = ps(a, F|T).t and of f i.e. ps(a,
F|~T).t-1 = ps(a, F|~T).t = ps(a, F).t-2 =
pr(a, F).t-2. In fact, these conditionals should be stable
intuitively, for they are hypotheses about reality: We may get rid of one
of them - T or ~T - by the evidence, once we have found it.
It remains to consider Bayesian
Conditionalization and degrees of belief.
First about degrees of belief.
Seen as degrees of belief, the new
fractions calculated with Bayesian Conditionalization differ from real
probabilities based on frequencies, which is what one may have started from at
t-2: pr(F).t-2=f.
They are proportions like probabilities, and
indeed degrees of belief may be fairly called personal or subjective
probabilities. But they are not like ordinary frequency based
probabilities, because they are hypothetical: One or the other of T and
~T is false, and the fractions in the cells in the column used for it are purely
hypothetical. Indeed, at most one of T and ~T is true and so at least one of the
columns for T and ~T is purely speculative and merely corresponds to one's
degrees of belief, and not to any frequency one can establish directly.
And indeed this is unlike the case of a patient
with a disease and symptoms mentioned in section 2, for which
the four inner cells in the table in principle can all be established themselves, in that one
may be able to find people with the disease with and without the symptoms and
people without the disease with and without the symptoms, for all one has here
are the symptoms, figuratively speaking, and two confilcting hypotheses to
account for these facts.
In fact, the degrees of belief derive from the
hypothesis T that was started at t-1 to account for F (or for another fact X
that is relevant for F).
Second about Bayesian Conditionalization.
In the above last table, Bayesian
Conditionalization corresponds to the move from ps(a, T).t-1=(t)
to ps(a, T).t=(ht):(ht+f~t). It
is a recalculation of one's degree of belief in T upon finding that F.
The big question now is: Is this move
probabilistic? Well - it surely is
proportionalistic: ps(a,T).t = ps(a,T|F).t = ps(a,
F|T).t-1.ps(a,T).t-1:ps(a, F).t-1. But note
that in fact at t ps(a,T|F).t = ps(a, F|T).t*ps(a,T).t:ps(a,
F).t = ps(a, F|T).t-1.ps(a,T).t since ps(a, F).t
= 1. Therefore also, ps(a,T|F).t = ps(a,T).t - but then
that is useless.
Note also that for Bayesian Conditionalization
given that conditional degrees of belief remain the same in time we also have ps(a,
F|T).t = ps(a, F|T).t-1 = (ht):t
= h, as is correct.
So this differs from probability theory:
The Bayesian Conditionalization step corresponds to recalculating at t
using the numbers for t-1 and not for t as would happen in
ordinary proportional algebra including ordinary probability theory.
The Bayesian Conditionalization move does
follow given that conditional degrees of belief remain the same in time, and
given that degrees of belief are proportions, and indeed then the previously
useless step seems to become useful and seems to correspond to
conditionalization.
Now this move is quite plausible for degrees
of belief - for what else can one rationally use but one's last best
hypothetical estimates? - but not for probabilities as real frequencies,
since conditional frequencies may well change in time.
How plausible it is for degrees of belief can be illustrated by doing our
earlier theorem for explicit beliefs, using five assumptions and logic:
The assumptions are:
A0. ps is proportional.
A1. aB( pr(X)=z ).t
IFF ps(a,X).t = z
A2. aB( Y |- pr(X)=z ).t IFF ps(a,X|Y).t = z
A3. aB( Y |- pr(X)=z ).t-1 --> aB( Y |- p(X)=z ).t
A4. aB( X |- Y ).t --> aB( X).t |- aB( Y ).t
A5. aB( pr(X)=1 ).t IFF aB( X ).t
A0 guarantees that personal probabilities a.k.a. degrees of belief are
proportions, like ordinary probabilities.
A1 and A2 convert between beliefs in probabilities and personal
probabilities, and guarantee that one's personal probability is numerically the
same as one's belief in the probability. Note that A2 imposes a particular,
natural and simple intepretation on conditional degree of belief: it is belief
in a probability based on an assumption.
A3 insists that conditional beliefs remain constant in time once established
or assumed, e.g. on the ground that what is involved is a relation of
deducibility.
A4 says that if one believes that X implies Y at t then if one believes X at
t one believes Y at t.
A5 converts between belief that X has probability 1 and the belief that X is
true.
For beliefs in probabilities the theorem and argument now follow. The logic
assumed apart from the assumptions made is standard First-Order Predicate Logic
with identity.
T. aB( T |- pr(F)=h ).t-1 & aB( ~T
|- pr(F)=f ).t-1 & aB( pr(T)=e ).t-1 &
aB( pr(F)=1 ).t |- aB( pr(T)=(he):(he+f~e) ).t
This is the theorem to be proved, all in terms
of beliefs about probabilities, all relativized to times.
AI 1. aB( T |-
pr(F)=h ).t-1
AI 2. aB( ~T
|- pr(F)=f ).t-1
AI 3. aB(
pr(T)=e ).t-1
AI 4. aB(
pr(F)=1 ).t
The assumptions of the proof. Notice that at (4) a new fact is recorded, that
was not so at t-1.
1,A2 5. ps(a,F|T).t-1=h
2, A2 6. ps(a,F|~T).t-1=f
3, A1 7. ps(a,T).t-1=e
The inference of the personal probabilities for
a Bayesian Conditionalization.
A0 8. ps(a,F).t-1
= ps(a,F&T).t-1+ps(a,F&~T).t-1
8,A0 9.
= ps(a,F|T).t-1*ps(a,T).t-1 +
ps(a,F|~T).t-1*ps(a,~T).t-1
5,6,7,9 10.
= (he + f~e)
The calculation of ps(a,F).t-1. It
is A0 i.e. the assumption that personal probabilities are
proportions, like ordinary probabilities, that
allows these arguments, together with the assumptions of the proof at (10).
A0 11. ps(a,T|F).t-1
= ps(a,F|T).t-1 * ps(a,T).t-1 : ps(a,F).t-1
5,7,10 12.
= he : (he + f~e)
The calculation of ps(a,T|F).t-1.
4,A1 13. ps(a,F).t = 1
A0 14. ps(a,T|F).t
= ps(a,T&F).t : ps(a,F).t
14,A0 15.
= ps(a,T).t
The new information at t simplifies ps(a,T|F).t
to ps(a,T).t (a fact often slurred over in expositions about Bayesian
Conditionalization).
A3,A2 16. ps(a,T|F).t-1
= ps(a,T|F).t
15,16 17. ps(a,T).t = ps(a,T|F).t-1
Here the new personal probability at t for T
has been derived, which was already calculated at (12) from the assumptions of
the theorem. And in fact ps(a,T).t has been calculated from the
latest relevant information that a had avalable, namely at t-1 listed by the
assumptions (1)-(3).
12,A2 18. aB( F |- pr(T)=he : (he + f~e) ).t-1
18,A3 19. aB( F |- pr(T)=he : (he + f~e) ).t
Converting personal probabilities into a
person's beliefs about probabilities.
4,A5 20. aB( F ).t
Converting a belief in a probability of 1 into
belief of truth.
19,A4 21. aB( F ).t |-
aB( pr(T)=he : (he + f~e) ).t
Converting belief in a conditional into a conditional between beliefs.
20,21,MP 22. aB( pr(T)=he : (he + f~e) ).t
Deriving the conclusion of the theorem, that
was to be proved, by ordinary modus ponens: A new probability for T derived in
conformity with Bayesian reasoning. QED.
So this seems a plausible new interpretation
of and explanation for Bayesian Conditionalization: It concerns degrees of
belief, that are proportions like probabilities, with conditional
degrees of belief supposed constant in time, and corresponding to believed
conditions for probabilities. And the reason for that constancy
is that it concerns hypotheses, in which the degree of belief does not
vary directly with time but varies with evidence.
|