Persistent Wondering: probability

I have almost finished reading Elliot Sober's The Nature of Selection. It is a complex book, which inspired many marginal notes and a number of journal entries, and which I am sure I will need to come back to more than once to fully appreciate. With some temerity, perhaps, I have decided to address a couple of issues related to this book in my next couple of posts on this blog. This week, I am going to reflect on an idea about causality that Sober puts forth. Next week, hopefully, I will revisit the evolution of altruism, which I have discussed before (9/28/09).

Sober's causality claim relates specifically to what he calls "population level" causality, as distinguished from "individual level" causality. An example he uses to illustrate this difference is: suppose a golfer is trying to sink a putt, and a squirrel runs by after he hits the ball, and kicks it. Improbably, the ball deflects off of some obstruction, but sinks in the hole, anyway. From an individual level, we would wish to say that the squirrel's kick caused the ball to sink in the hole, because it started a chain of events that resulted in the ball sinking. But from a population level, we would not wish to say that "squirrel kicks sink balls," because we are convinced that, usually, this would not happen.

On the population level, Sober first points out, non-controversially, that causality is not implied by correlation. My own favorite example illustrating this truism is the theory that fire fighters cause damage at fires. It is observed that damage at fires is positively correlated with the number of fire fighters that arrive at the scene. If correlation is taken as proof of causation, the conclusion is that fire fighters cause damage. In fact, of course, the number of fire fighters and the amount of damage are correlated because they have a common background cause - the intensity of the fire.

Sober believes that by strengthening the criteria, it is possible to derive a probability-based definition of population-level causation. The rule he argues for is this: an event x is a (positive) causal factor of an event y if the probability of y given x is greater than or equal to the probability of y given (not-x) under all possible background conditions, and the inequality is a strict inequality in at least one condition. In other words (or, rather, in symbols):

x CF y <=> ∀z (P(y|x ⋅ z) ≥ P(y|!x ⋅ z)) ⋅ ∃z (P(y|x ⋅ z) > P(y|!x ⋅ z))

Here I am defining the relation "CF" given by "x is a causal factor of y". I will be using the dot operator both for the logical "and" between propositions and the probabilistic conjunction of events, and also using ! for a logical "not" and for negating an event (the event !x is the event that x does not occur). This should be contextually unambiguous. Hopefully your browser will display the symbols properly - if not, you may need to change your character set to "UTF-8". I've tested it with both Internet Explorer 7 and Firefox 3.5.6. (Firefox, annoyingly, sticks extra space above and below all equations, leading to very ugly page renderings.)

The reason for the universal modifier on the first part of the above expression is interesting. Sober argues that it is not enough for causality to increase the likelihood of an event in most circumstances, but decrease it in others (even in a minority of cases). If you allow any negative cases, he argues, the causality claim reduces to P(y|x) > P(y|!x), which simply represents correlation. So to have a definition of causality stronger than mere correlation, the first event must raise the probability of the second event, or be neutral, in all circumstances, and must positively raise it (strict inequality) in at least one.

Sober raises some issues with the above formulation regarding the causal independence of the background issues from the factor being examined. Specifically, he requires that the "background events" (z) not be "causally relevant" (either positively or negatively) to the proposed cause being investigated (x). If they are, this leads to undefined conditional probabilities of the form P(y|x ⋅ !x). Conceptually, this represents a form of double counting. Recasting this requirement in the quantificational form gives something like:

x CF y <=> ∀z (z ∈ B => P(y|x ⋅ z) ≥ P(y|!x ⋅ z)) ⋅ ∃z (z ∈ B ⋅ (P(y|x ⋅ z) > P(y|!x ⋅ z)))

B = {z: !(x CF z) ⋅ !(z CF x) ⋅ !(x CF !z) ⋅ !(z CF !x) ⋅ (z ≠ x) ⋅ (z ≠ y)}

The last two inequalities are necessary to avoid undefined conditional probabilities in P(y|!x ⋅ x), and because the strict inequality P(y|x ⋅ y) > P(y|!x ⋅ y) is always false.

The set notation above is kind of nasty, because it carries the "free variables" x and y outside of the expression. But we can eliminate this notation by expanding the independence criterion in place (although it gets a little unwieldy):

x CF y <=> ∀z ((!(x CF z) ⋅ !(z CF x) ⋅ !(x CF !z) ⋅ !(z CF !x) ⋅ (z ≠ x) ⋅ (z ≠ y)) => P(y|x ⋅ z) ≥ P(y|!x ⋅ z))
⋅ ∃z ((!(x CF z) ⋅ !(z CF x) ⋅ !(x CF !z) ⋅ !(z CF !x) ⋅ (z ≠ x) ⋅ (z ≠ y)) ⋅ (P(y|x ⋅ z) > P(y|!x ⋅ z)))

Now that certainly looks circular. I hesitate only because I'm not 100% certain that it is impossible to iteratively expand the "CF" terms, at least in a finite universe. I took a stab at it in a universe containing only 4 events {x, y, B1, B2}, but the problem quickly exceeded my limited powers of symbolic manipulation. But, anyway, I think the definition is circular.

Actually, I don't know why I even bother to fret over it, since Sober actually admits that it is a circular definition - but he argues that it is a useful definition, anyway. Hs references a 1979 Nous article by N. Cartwright, which supposedly goes into this in more depth. It would be interesting to read that, but I have no easy way of tracking it down.

I am not going to dispute that a definition can be conceptually useful, even if circular, outside certain strictly formal contexts. But I think we need to ask in each case where does the circularity come from, and why it is necessary, and/or useful. In this case, I have a suspicion that it is because we have an underlying, intuitive definition of causation that has nothing to do with the definition that is being attempted, here. This is also reflected in my sense that this idea is only useful if we "prune" it somehow, as suggested by my reference to a "finite universe" above. For another example of pruning, I think we are only interested in background conditions that have some causal effect, themselves - totally neutral conditions are not interesting. In other words:

B = {z: !(x CF z) ⋅ !(z CF x) ⋅ !(x CF !z) ⋅ !(z CF !x) ⋅ (z ≠ x) ⋅ (z ≠ y) ⋅ (z CF y)}

But how do we prune the universe of possible events, other than applying some other, a priori, theory of causation? And in that case, how is Sober's causation test any different then just using correlation as an empirical test of the a priori theory?

I'd be the first to admit that my reasoning above is a little mushy. But a specific example, I think, shows that Sober's definition doesn't quite jibe with our intuitive ideas about causation, and may not ultimately be satisfactory as a definition of population-level causation.

Imagine a rectangular pool table, with the long axis oriented north-south. From time to time, billiard balls are introduced approximately on the 1/3rd line (the imaginary line dividing the southern 1/3 of the table from the northern 2/3). Some of these balls will be struck with a cue. The horizontal angle with which the cue strikes the ball is normally distributed such that 90% of the variation is w/in ± 70 degrees of the mean, which is to the north. There is friction in the table, and spin (variation in the incident angle of the cue with respect to the radial angles from the center of the balls to the point of impact), and the impulse imparted by the cue is finite, so that a ball may strike a side wall, or other obstruction, and come to rest before hitting the north wall. As balls accumulate, they may strike, or be struck by, other balls. Impacts are approximately elastic (with frictional/damping losses). Additionally, a number of bumpers are introduced between the 1/3rd line and the north wall. The precise position of the bumpers is varied, from time to time. Usually, when a ball strikes a bumper, it will be a glancing blow, and the ball will continue in a generally northerly direction; however, occasionally the impact will be square enough that the ball will rebound to the south. This rebound may be sufficient to carry the ball south of the 1/3 line (e.g., if the obstruction is close to the line).

Finally, the entire table will be lifted and tilted, from time to time (but infrequently), either to north or to south, with the conditional probability of a southward tilt, given that a tilt occurs, equal to 50%. The tilt is of finite (temporal) duration - i.e., there is a probability greater than zero but less than 1 that a ball on the table will reach the south wall during a south tilt. Note that any ball which has ended up south of the 1/3rd line due to a rebound has a greater probability of touching the south wall during a southward tilt than it did when it was first introduced into the game.

When a ball strikes either the north or south wall, it is removed from the game. Its probability of striking the other wall, at that point, is zero.

It is hard to say that, in this game, the impact of the cue is not a "causal factor" in increasing the percentage of the ball population that touches the north wall, even though there are some members of "B" (combinations of bumper location, ball position, cue angle, other factors) under which the impact will, in fact REDUCE the probability of reaching the north wall. But by Sober's definition of causality we would, in fact, need to make that claim.

I find myself inclined to throw out Sober's definition, or rather, to view it only as a sort of statistical test of some underlying idea of causality. I'm inclined, further, to view population-level causation as just an aggregation of individual causation-events (including those that have actually occurred, plus hypotheticals). "Squirrel kicks cause balls to sink sometimes, but most of the time they don't." So squirrel kicks are not considered a "cause" of successful putts, at the population level.

I don't think this has a negative affect on Sober's substantive arguments about the nature of selection. His arguments about "units of selection", for example, depend on the distinction between "selection of" some kind of entity and "selection for" some specific quality or trait. The question is, at what level does the cause of the selection operate? One (admittedly artificial) example he uses for illustration is to postulate several groups of otherwise similar organisms which are homogenous within each group with respect to some quality - say tallness - but vary between groups. Suppose some predator differentially picks off the shorter organisms. Is this an example of group selection (the predator is selecting organisms from groups of short organisms), or individual selection (the predator selects shorter organisms)? The question cannot be resolved strictly by looking at results, because in either case there is selection of the same organisms. One needs to look at the cause - what trait is being selected for? Does the predator simply favor shorter animals? Or does it avoid tall groups of animals? A test of the causal assumptions would be to examine what would happen if a shorter organism happened to be found in a taller group. Would it be subject to the same level of predation as if it were in a group of small organisms? In that case, the individual selection model would be supported. Or would it have the same security as its taller group members? In that case, a group selection mechanism is indicated. Of course, this test might be empirically impossible, if this were a real-life example, but it illustrates the role that a concept of causality plays in determining the unit of selection. I see no way, though, in which this argument depends on Sober's specific formulation of his law for population-level causality.

P.S. I am not 100% certain how I feel about the explanatory necessity of the concept of "causation". For instance, if I say "an object subject to a given force F will experience an acceleration proportional to its mass", does it add anything useful to the explanation to say "the force causes the object to accelerate"? The idea of cause is important to Sober, and he bases a lot of his "units of selection" arguments on the concept. I have no dogmatic objection to this, but I can't help but wonder if the concept of causation isn't somehow reducible.

Reading the discussion of “Deterministic and Stochastic Processes”, in Elliott Sober’s book, The Nature of Selection, has me musing on Laplace’s demon. The brilliant 19th Century mathematician Pierre-Simon Laplace was convinced (despite his own pioneering work in probability) that the world was at root deterministic. His famous expression of this, as cited (in translation) in Sober is:

“Given for one instant an intelligence which could comprehend all the forces by which nature is animated and the respective situation of the beings who compose it – an intelligence sufficiently vast to submit these data to analysis – it would embrace in the same formula the movements of the greatest bodies of the universe and those of the lightest atom; for it, nothing would be uncertain, and the future, and the past, would be present to its eyes.”

This hypothetical vast intelligence has come to be known as “Laplace’s demon”.

Is Laplace’s demon, i.e., some being which could know everything there is to know about the universe, even theoretically possible? (In a philosophical sort of definition of “theoretically”.) To address that question, we have to address at least a little of the question “What is knowledge?” That’s rather a big question for an amateur philosopher, but I’ll see what I can do.

First of all, whatever we mean by “knowledge”, it seems clear that it is not synonymous with “representation”. If we were to imagine an infinitely “true” mirror that perfectly reflected the incident light, or a computer disk copier that made perfect copies of one disk onto another, we would not say that the mirror, or the second disk had “knowledge”. What we call “knowledge” involves, I believe, a process of abstraction and interpretation. Abstraction involves making a representation of part of the world – the part we think relevant for our analysis. Interpretation is adding meaning. (I will not, at the moment, try to give a meaning for “meaning” – it is just whatever we add to a representation in order to possess knowledge. For that matter, I won’t discuss the nature of “representation”, either.) If I know a fox is in my chicken coop, I do not know exactly how many kilograms the fox weighs, or exactly where each chicken is relative to the position of the fox, but I know if I don’t get out there, fast, I am going to lose some chickens. Knowledge therefore involves both subtraction and addition. We subtract data that we believe irrelevant to our analysis (abstraction), and we add meaning by a process of interpretation.

So what is “exact” knowledge – that is knowledge that could “comprehend all the forces by which nature is animated and the respective situation of the beings who compose it”? It would seem to require a complete representation, leaving nothing out – i.e., perfect representation plus meaning, instead of abstraction plus meaning. It is therefore a purely additive process. But, as I discussed in “Creating the world” (11/5/09), a representation must be represented somewhere. Whatever mind or computing device is doing the knowing must have at least as many data storage locations as there are data to be represented – and in fact, must have more, since exact knowledge is additive. But if exact knowledge is to leave nothing out – if it is to incorporate every possible thing that could have any influence whatsoever, on the thing known, then it seems the knower must also know itself, and this leads to an infinite regress.

Maybe we could try a different formulation of “exact” knowledge. Thinking of the way “infinity” is usually represented in mathematics, we might try to define exact knowledge by some “limit” formulation, saying that, however much data we have already represented about the thing to be known, we can always if necessary represent more, so that, without ever claiming to have represented everything, we can always represent “as much as we please”. This theory seems to require that the thing to be known is in some sense “small” with respect to the knower, but it stops short of requiring infinite regress. But is it good enough for Laplace’s demon? It seems that with this definition of “exact knowledge” all we are saying is that we can make the probability that we have missed some important piece of information arbitrarily small. There always remains some non-zero probability that some important fact we haven’t considered can come crashing in and invalidate our model. It seems that Laplace’s demon is qualitatively in the same relation to determinism vs. uncertainty as the rest of us – it just has a really big brain, so it can know a lot more.

Another approach is to assume two separate universes – this approach might appeal to someone who still clings to some form of Cartesian dualism. The knower exists in a universe that is not part of the universe in which the known resides. But in order for this to escape the problem of regress, it must be impossible for the knowing universe to affect the known – the knower must be an pure observer, only – otherwise, the knower must still know itself, in order to know all possible influences. We must also posit some form of one-way communication in which information can pass from our universe into the other without even the information-bearing entities themselves being in anyway touched or affected.

It seems that even if such a knower-in-a-separate-universe were to exist, this could be of absolutely no interest to beings in our universe. This separate universe theory is of the same sort as the theory of pure, philosophical solipsism – the theory that only I exist, and everything else that seems to exist is merely a phantasm in my own mind. Each theory is completely untestable, as is inescapable implied by its own hypotheses. Beyond stating such a theory, and noting its inherent untestability, not much of interest can be said.

The above questions seem to render the thought experiment of Laplace’s demon useless as an argument for the determinacy of the universe. Does this imply that the universe is not determinate (i.e. that it is inherently stochastic)? Or could it be that it is determinate, but that this determinacy is unknowable? Personally, I find it hard to render the concept of “unknowably determinate” coherent, but perhaps some philosopher cleverer than I can do so.

Note that I’m not even discussing the possible implications of quantum mechanics. Quantum mechanics holds, of course, that there is inherent uncertainty at the level of the most fundamental constituents of the universe. (Quantum mechanics, by the way, although cast in the most abstruse mathematics – math well beyond my feeble capabilities – is at root an empirically based theory, developed not from abstract philosophical considerations, but in an attempt to explain some otherwise extremely intractable experimental data.) Quantum mechanics is sometimes – although not necessarily – held to imply that some “built-in” level of uncertainty exists at the macroscopic level, also.

Note that the above discussion is not about human limitations, or whether “exact knowledge” is possible to a human mind. The infinite regress problem does not say “no human brain can hold all this information”, it asks “how could the universe contain complete knowledge of itself?” Similarly, the “knowledge as a limit” idea is not about capabilities of human intelligence – in fact, I would argue that no human mind could even come close to getting “as close as we please” (in this quasi-mathematical sense) to exact knowledge about any real world problem. Rather, this is an argument about “theoretical” possibility, as I say above. I admit, I’m not really sure exactly what such a “theoretical” possibility means, except that if something is not “theoretically” possible, than it darn tootin’ is not a practical possibility, either.

Postscript: After writing this essay, I happened to look up the Wikipedia article on Laplace’s demon (http://en.wikipedia.org/wiki/Laplace's_demon). Some of the objections I make in this essay were covered therein (if somewhat more tersely), and physical arguments, including quantum mechanical, were gone into more deeply. The Wikipedia article did not mention the “knowledge as a limit” idea, nor the fact that if the demon were in an alternate universe, some form of one-way (and only one-way) communication between universes would be necessary.

Persistent Wondering

Sunday, December 20, 2009

Sober's causation

Sunday, November 22, 2009

Laplace’s demon

David Reading

About Me

About This Blog

My web sites

Followers

Blog Archive

Creative Commons licenses