Saturday, December 26, 2009

Altruism and human evolution

The song "Easy to Be Hard" from the 1967 musical "Hair" posed the question: "How can people be so heartless?" Ever since Darwin, though, biologists have struggled with the opposite question: "How can people be so nice?" If traits arise through intense competition between living organisms in a war of all against all, how does a tendency to altruism arise in a population? What benefit does altruist behavior have that it contributes to the survival and reproductive success of the altruist?

How we define "altruism" is crucial to the form that the explanation will take, although the underlying facts, of course, are not changed by the definition. There is no question but that a heritable, genetic trait, in order to be selected for by natural selection, must provide some differential reproductive advantage to the gene complex coding for that trait. So if we define "altruism" as describing traits that benefit ONLY others, and confer NO adaptive advantage on either the organism bearing or the gene complex, then necessarily genes are selfish, and altruism cannot arise by natural selection. If such "altruistic" behavior in fact is not only neutral, but places the bearer at a selective disadvantage, then it cannot long survive, even if it arises by chance.

But this is a trivial result of the nature of selection, and a particular definition of "altruistic", and leaves a lot of real-life facts unexplained. Nature is full of examples of behavior that SEEMS altruistic, in the sense that it seems to be for the benefit of others, and of immediate cost to the organism exhibiting the behavior. Simply stating that there must be some hidden benefit to the organism, or at least to some of the gene complexes which code for the nature of the organism, does not explain how each particular behavior could develop.

So, rather than sacrifice a perfectly good word to preserve a trivial result (and make do, in consequence, with cumbersome phrases like "seemingly altruistic"), I prefer to define "altruistic behavior" simply as behavior that seems to benefit primarily others, at an immediate net cost to the altruistic individual. This definition of altruistic defines "fuzzier" classes of behaviors and organisms than the alternate definition, but classes that have the decided advantage that the constituent behaviors and organisms actually exist.

Elliot Sober’s book The Nature of Selection makes a strong connection between the evolution of altruistic behavior, and the process of "group selection". Group selection is a controversial concept among biologists, part of what Sober refers to as the "unit of selection" debate. The question is on what entities or "units" does natural selection work? The oldest tradition, starting with Darwin, focus on the organism (or phenotype). Individual organisms vary in traits, and have different rates of survival and reproduction deriving from those traits, causing certain types of organisms, with certain combinations of traits, to be "selected" as being better adapted to their environment, and others to die out.

Other candidates for the unit of selection exist. Many biologists have strongly argued for the individual gene as the only proper unit to be considered. Others extend this to complexes of genes. More lately, some theorists have argued for selection at the level of the group or "deme", or even at the species level. I won’t go into detail on all this. Sober argues strongly for a multi-level view; i.e., that all of these may be, from time to time, the level at which selection operates, and the particulars must be carefully examined in each case. In particular, he argues that the "genes only" school arises as a trivial result of the fact that the mathematics in each case can be reduced to calculations based on the averaged relative fitness of each gene. But to really understand the process, he believes you have to look at causality – at what level are the actual natural forces shaping evolution being applied?

The way Sober defines "group selection", there must be a trait or property definable only at the group level which confers a selective advantage on individual organisms. There is actually some flexibility in whether the "benchmark" of selection is the individual organism, a particular gene or gene complex, or the group, but focusing on the organism will do for now. The crucial feature is that the property that confers the selective advantage must be definable only at the group level. For instance, predators may avoid large individuals, but show even more aversion to groups of large animals; in that case, being a member of group of organisms with a large average size will confer a selective advantage to each organism in the group, irrespective of its own, individual size. There may be other selective effects in operation. For example, if smaller size happens to confer some selective advantage on organisms competing with other organisms within the group, then there will be two countervailing evolutionary forces at work, and the result is indeterminate (in the mathematical sense that one must look to other factors for a resolution), but this does not alter the fact that there is a group selective force at work.

Altruism seems, at least in many cases, to be a situation where group selection and individual selection act at cross purposes. Since altruists help other group members, there is a significant selective advantage to being a member of a group containing a large number of altruists. But this advantage accrues to selfish individuals, as well as to altruistic ones. Since selfish individuals gain the selective advantages of group membership without the personal costs, they may have a higher "fitness" value for intra-group competition. Which one wins out – whether selfish organisms or altruistic ones will predominate or "go to fixation" (one trait driving the other into extinction) – will depend on the relative strength of the two forces. The existence of individual (organism-level) selection as a countervailing force leads to an inherent instability in the fitness of altruism based on group selection, absent other forces.

I am mostly concerned with the evolution of altruistic behavior within primate species, and in particular within our own. I think group selection, as Sober defines it, is a large factor in this, but I think other factors can be identified, reinforcing what otherwise might seem to be an inadequately strong effect. One such factor is kin selection. Kin selection is a more powerful force than group selection, because an individual’s genes benefit more directly from the individual’s actions. This effect is most strong when I act for the benefit of my (biological) offspring. Even if I sacrifice my life for my children, this may well preserve my genotype much more effectively than had I failed to take the risk. The force of kin selection is somewhat less strong when I act for the benefit of siblings, who generally share most of my genetic material, and so on with decreasing force through first cousins, second cousins etc. I suspect that altruistic behavior may have originated, very early in the evolution of primate lineages, as a generalization of kin-selective behavior. Once it evolved, the group-selection advantages – the increase in fitness experienced by each individual member of a cooperative group – would have exercised at least a weak selective effect.

I think sex selection (selection by females of certain preferred traits in sexual partners) may also have played a part in the evolution of certain types of altruistic behavior, particularly in hominid lineages. Humans are unique in that our infants require a great deal of care – more care, and for a longer period than other primates, possibly more than any other animal. Stephen J. Gould has connected this to the neotenous nature of our species – we retain as adults, features that are infantile in other apes – for instance we continue to learn at a higher rate through much of our lives. But also we are born at a less developed stage. This may be in part because of our large brain size – a more fully developed brain and skull would be too large to pass through the birth canal.

For whatever reason, the result has been that human infants require much care, over a long period. This has led to other evolutionary changes; for example, it very likely led to the "always on" nature of human female sexuality. Other primate females, unlike human women, are interested in sex, and sexually interesting to males, only during the period in which they are fertile. By making sexuality a constant, hominid females could attract males to serve as ongoing helpmates, if not actually helping much with child rearing tasks (then or now), at least supplementing the female’s efforts in other areas, such as providing food and protection when the female was engaged with the child. This would have led to a tendency on the part of females to select for at least certain types of altruistic behavior, those that could fall under the rubric of being a "good provider". Constant sexuality, and the resulting pair-bonding, also led to another revolution – awareness of paternity, thereby extending the possible scope of kin selection.

Another factor I think played a big effect in the evolution of altruistic behavior, not only in human lineages but in other primates as well, at least those most closely related to us, such as gorillas and chimpanzees, is something we might call police action. Part of primate altruism involves cooperation and sharing – but it is not strictly necessary that these benefits be equally distributed between altruists and selfish individuals. Groups of individuals can choose NOT to share with individuals they don’t feel to be deserving. Groups can also band together to limit the power of otherwise dominant individuals who are perceived as abusing their power, and in extreme cases can even drive a selfish individual from the overall group. Evolution of policing behaviors would strongly support the group-selective benefits deriving from the evolution of altruistic forces, by lessening the selective advantage enjoyed by selfish individuals within the group, and the combination of traits would thus tend to be much more stable than altruism alone.

I’m no population geneticist, and I’m not going to be able to put together mathematical models to demonstrate all this, but it seems to me that these factors: kin selection, group selection, sex selection and evolution of cooperating behavior complexes (policing) together make up a sufficiently powerful set of forces to be adequate to ensure the evolution by natural selection of human altruism. The details I’ve laid out may not be quite right, but something like this history must be have occurred – because, indisputably, human altruism exists.

The lyric from "Hair" teaches us one important lesson: how natural all this seems to us. Because (saving a few who have imbibed too much Libertarian philosophy), we all tend to respond in the way the lyric suggests – we are surprised and startled by "heartlessness". By and large, in our day-to-day activities, we humans are more likely to go out of our way to be nice to each other than to be mean. Granted, exemplary occurrences of altruism are remarkable, and inspire awe and admiration, but it is meanness that shocks us, nags at our consciousness, and leaves us with the conviction that something must be done.

Sunday, December 20, 2009

Sober's causation

I have almost finished reading Elliot Sober's The Nature of Selection. It is a complex book, which inspired many marginal notes and a number of journal entries, and which I am sure I will need to come back to more than once to fully appreciate. With some temerity, perhaps, I have decided to address a couple of issues related to this book in my next couple of posts on this blog. This week, I am going to reflect on an idea about causality that Sober puts forth. Next week, hopefully, I will revisit the evolution of altruism, which I have discussed before (9/28/09).

Sober's causality claim relates specifically to what he calls "population level" causality, as distinguished from "individual level" causality. An example he uses to illustrate this difference is: suppose a golfer is trying to sink a putt, and a squirrel runs by after he hits the ball, and kicks it. Improbably, the ball deflects off of some obstruction, but sinks in the hole, anyway. From an individual level, we would wish to say that the squirrel's kick caused the ball to sink in the hole, because it started a chain of events that resulted in the ball sinking. But from a population level, we would not wish to say that "squirrel kicks sink balls," because we are convinced that, usually, this would not happen.

On the population level, Sober first points out, non-controversially, that causality is not implied by correlation. My own favorite example illustrating this truism is the theory that fire fighters cause damage at fires. It is observed that damage at fires is positively correlated with the number of fire fighters that arrive at the scene. If correlation is taken as proof of causation, the conclusion is that fire fighters cause damage. In fact, of course, the number of fire fighters and the amount of damage are correlated because they have a common background cause - the intensity of the fire.

Sober believes that by strengthening the criteria, it is possible to derive a probability-based definition of population-level causation. The rule he argues for is this: an event x is a (positive) causal factor of an event y if the probability of y given x is greater than or equal to the probability of y given (not-x) under all possible background conditions, and the inequality is a strict inequality in at least one condition. In other words (or, rather, in symbols):

x CF y <=> ∀z (P(y|x ⋅ z) ≥ P(y|!x ⋅ z)) ⋅ ∃z (P(y|x ⋅ z) > P(y|!x ⋅ z))

Here I am defining the relation "CF" given by "x is a causal factor of y". I will be using the dot operator both for the logical "and" between propositions and the probabilistic conjunction of events, and also using ! for a logical "not" and for negating an event (the event !x is the event that x does not occur). This should be contextually unambiguous. Hopefully your browser will display the symbols properly - if not, you may need to change your character set to "UTF-8". I've tested it with both Internet Explorer 7 and Firefox 3.5.6. (Firefox, annoyingly, sticks extra space above and below all equations, leading to very ugly page renderings.)

The reason for the universal modifier on the first part of the above expression is interesting. Sober argues that it is not enough for causality to increase the likelihood of an event in most circumstances, but decrease it in others (even in a minority of cases). If you allow any negative cases, he argues, the causality claim reduces to P(y|x) > P(y|!x), which simply represents correlation. So to have a definition of causality stronger than mere correlation, the first event must raise the probability of the second event, or be neutral, in all circumstances, and must positively raise it (strict inequality) in at least one.

Sober raises some issues with the above formulation regarding the causal independence of the background issues from the factor being examined. Specifically, he requires that the "background events" (z) not be "causally relevant" (either positively or negatively) to the proposed cause being investigated (x). If they are, this leads to undefined conditional probabilities of the form P(y|x ⋅ !x). Conceptually, this represents a form of double counting. Recasting this requirement in the quantificational form gives something like:

x CF y <=> ∀z (z ∈ B => P(y|x ⋅ z) ≥ P(y|!x ⋅ z)) ⋅ ∃z (z ∈ B ⋅ (P(y|x ⋅ z) > P(y|!x ⋅ z)))

B = {z: !(x CF z) ⋅ !(z CF x) ⋅ !(x CF !z) ⋅ !(z CF !x) ⋅ (z ≠ x) ⋅ (z ≠ y)}

The last two inequalities are necessary to avoid undefined conditional probabilities in P(y|!x ⋅ x), and because the strict inequality P(y|x ⋅ y) > P(y|!x ⋅ y) is always false.

The set notation above is kind of nasty, because it carries the "free variables" x and y outside of the expression. But we can eliminate this notation by expanding the independence criterion in place (although it gets a little unwieldy):

x CF y <=> ∀z ((!(x CF z) ⋅ !(z CF x) ⋅ !(x CF !z) ⋅ !(z CF !x) ⋅ (z ≠ x) ⋅ (z ≠ y)) => P(y|x ⋅ z) ≥ P(y|!x ⋅ z))
⋅ ∃z ((!(x CF z) ⋅ !(z CF x) ⋅ !(x CF !z) ⋅ !(z CF !x) ⋅ (z ≠ x) ⋅ (z ≠ y)) ⋅ (P(y|x ⋅ z) > P(y|!x ⋅ z)))

Now that certainly looks circular. I hesitate only because I'm not 100% certain that it is impossible to iteratively expand the "CF" terms, at least in a finite universe. I took a stab at it in a universe containing only 4 events {x, y, B1, B2}, but the problem quickly exceeded my limited powers of symbolic manipulation. But, anyway, I think the definition is circular.

Actually, I don't know why I even bother to fret over it, since Sober actually admits that it is a circular definition - but he argues that it is a useful definition, anyway. Hs references a 1979 Nous article by N. Cartwright, which supposedly goes into this in more depth. It would be interesting to read that, but I have no easy way of tracking it down.

I am not going to dispute that a definition can be conceptually useful, even if circular, outside certain strictly formal contexts. But I think we need to ask in each case where does the circularity come from, and why it is necessary, and/or useful. In this case, I have a suspicion that it is because we have an underlying, intuitive definition of causation that has nothing to do with the definition that is being attempted, here. This is also reflected in my sense that this idea is only useful if we "prune" it somehow, as suggested by my reference to a "finite universe" above. For another example of pruning, I think we are only interested in background conditions that have some causal effect, themselves - totally neutral conditions are not interesting. In other words:

B = {z: !(x CF z) ⋅ !(z CF x) ⋅ !(x CF !z) ⋅ !(z CF !x) ⋅ (z ≠ x) ⋅ (z ≠ y) ⋅ (z CF y)}

But how do we prune the universe of possible events, other than applying some other, a priori, theory of causation? And in that case, how is Sober's causation test any different then just using correlation as an empirical test of the a priori theory?

I'd be the first to admit that my reasoning above is a little mushy. But a specific example, I think, shows that Sober's definition doesn't quite jibe with our intuitive ideas about causation, and may not ultimately be satisfactory as a definition of population-level causation.

Imagine a rectangular pool table, with the long axis oriented north-south. From time to time, billiard balls are introduced approximately on the 1/3rd line (the imaginary line dividing the southern 1/3 of the table from the northern 2/3). Some of these balls will be struck with a cue. The horizontal angle with which the cue strikes the ball is normally distributed such that 90% of the variation is w/in ± 70 degrees of the mean, which is to the north. There is friction in the table, and spin (variation in the incident angle of the cue with respect to the radial angles from the center of the balls to the point of impact), and the impulse imparted by the cue is finite, so that a ball may strike a side wall, or other obstruction, and come to rest before hitting the north wall. As balls accumulate, they may strike, or be struck by, other balls. Impacts are approximately elastic (with frictional/damping losses). Additionally, a number of bumpers are introduced between the 1/3rd line and the north wall. The precise position of the bumpers is varied, from time to time. Usually, when a ball strikes a bumper, it will be a glancing blow, and the ball will continue in a generally northerly direction; however, occasionally the impact will be square enough that the ball will rebound to the south. This rebound may be sufficient to carry the ball south of the 1/3 line (e.g., if the obstruction is close to the line).

Finally, the entire table will be lifted and tilted, from time to time (but infrequently), either to north or to south, with the conditional probability of a southward tilt, given that a tilt occurs, equal to 50%. The tilt is of finite (temporal) duration - i.e., there is a probability greater than zero but less than 1 that a ball on the table will reach the south wall during a south tilt. Note that any ball which has ended up south of the 1/3rd line due to a rebound has a greater probability of touching the south wall during a southward tilt than it did when it was first introduced into the game.

When a ball strikes either the north or south wall, it is removed from the game. Its probability of striking the other wall, at that point, is zero.

It is hard to say that, in this game, the impact of the cue is not a "causal factor" in increasing the percentage of the ball population that touches the north wall, even though there are some members of "B" (combinations of bumper location, ball position, cue angle, other factors) under which the impact will, in fact REDUCE the probability of reaching the north wall. But by Sober's definition of causality we would, in fact, need to make that claim.

I find myself inclined to throw out Sober's definition, or rather, to view it only as a sort of statistical test of some underlying idea of causality. I'm inclined, further, to view population-level causation as just an aggregation of individual causation-events (including those that have actually occurred, plus hypotheticals). "Squirrel kicks cause balls to sink sometimes, but most of the time they don't." So squirrel kicks are not considered a "cause" of successful putts, at the population level.

I don't think this has a negative affect on Sober's substantive arguments about the nature of selection. His arguments about "units of selection", for example, depend on the distinction between "selection of" some kind of entity and "selection for" some specific quality or trait. The question is, at what level does the cause of the selection operate? One (admittedly artificial) example he uses for illustration is to postulate several groups of otherwise similar organisms which are homogenous within each group with respect to some quality - say tallness - but vary between groups. Suppose some predator differentially picks off the shorter organisms. Is this an example of group selection (the predator is selecting organisms from groups of short organisms), or individual selection (the predator selects shorter organisms)? The question cannot be resolved strictly by looking at results, because in either case there is selection of the same organisms. One needs to look at the cause - what trait is being selected for? Does the predator simply favor shorter animals? Or does it avoid tall groups of animals? A test of the causal assumptions would be to examine what would happen if a shorter organism happened to be found in a taller group. Would it be subject to the same level of predation as if it were in a group of small organisms? In that case, the individual selection model would be supported. Or would it have the same security as its taller group members? In that case, a group selection mechanism is indicated. Of course, this test might be empirically impossible, if this were a real-life example, but it illustrates the role that a concept of causality plays in determining the unit of selection. I see no way, though, in which this argument depends on Sober's specific formulation of his law for population-level causality.

P.S. I am not 100% certain how I feel about the explanatory necessity of the concept of "causation". For instance, if I say "an object subject to a given force F will experience an acceleration proportional to its mass", does it add anything useful to the explanation to say "the force causes the object to accelerate"? The idea of cause is important to Sober, and he bases a lot of his "units of selection" arguments on the concept. I have no dogmatic objection to this, but I can't help but wonder if the concept of causation isn't somehow reducible.

Saturday, December 5, 2009


I have two fundamental epistemological premises: that the evidence of my experience is the best available (really only available) data I have for learning about the world, and that most people who study, think, speak and write about the world are not intentionally lying. These seem to be pragmatically a minimal set. I don’t see how one can practically set forth on the project of learning without them.

Note that the use of the words “most” and “intentionally” in the second premise imply two corollaries: some people are lying, and some people may be unintentionally stating mistruths. (In fact, I might argue that we are all unintentionally stating mistruths to a greater or lesser extent, but that would be more of a theorem than a premise.) Also, saying experience is the “best available” data doesn’t imply that it yields infallible insight.

The two premises do not form a complete (i.e., sufficient) set. All they really say is that I can trust what I experience, and what people tell me about what they experienced (including second or third hand, etc. reports) – but with a grain or two of salt. They don’t say anything about how to come up with that grain of salt, or to know how many grains to apply. They don’t, in other words, tell me how to distinguish the veracity of conclusions I draw from these sources, or how to distinguish between competing theories. They don’t specify any rules of inference, at all.

I’m afraid all I can say about making distinctions is, “It’s ad hoc.” I am no Descartes, to offer a single unified answer to the question of how to distinguish true ideas from false ones. Certainly, I do not believe that because I can hold some idea “clearly and distinctly” that it must be true (although it might suggest truthfulness prima fascie). Instead, it’s more a matter of how well does an idea “fit in” with the other body of ideas I have constructed, over time, from the same evidence. “Consistency”, in a word. But how do I decide if an idea is consistent? Certainly not the idea of the excluded middle. I am quite convinced that it is possible for a thing to be both A and not A. The clearest examples come from human emotions: do I want to spend a month’s vacation in Venice this year, even though the press of work before and after will be terrible, it will cost a lot of money, my Italian is rusty, and I will have to find a house-sitter and/or worry about my pets and everything else in my house? I do, but I don’t. Fuzzy logic may offer better (if inherently less certain) models. But I am convinced that real antinomies can also be supported, as matters of fact (at least as humans perceive fact) in the real world, as well. Or at least, I’m not convinced that they can’t.

Ad hoc. I know it when I see it. Maybe. More or less. (I do, but I don’t.) Kind of like Descartes, perhaps, except I substitute “vague and fuzzy” for “clear and distinct”?

This could be depressing, if, like many philosophers past, I desired the nature of my mind (or soul) to approach some ideal of perfection – to make me like a god. But I don’t believe in gods. Rather than being depressed at failing to approach a fictional divinity, I prefer to celebrate the humanness of it all. Because this messy, ad hoc, but often very effective process of distinction is the stuff of life, after all, and quintessentially human, if only because humans, by and large, do it exceptionally well. Not that we do it infallibly – there are a lot of people in the world who are dead certain of things about which I am certain they are dead wrong. But by and large, in the billions of tiny, every day distinctions and decisions we make over the course of our lives, we do mostly pretty well.

We do this, of course, because we’ve been programmed that way by natural selection. Our brains have evolved to do a job, and they do it rather well (just as flies fly very well, and frogs do an excellent job of catching them). We have certain decision making processes built into our equipment. By studying our thinking in a natural scientific sort of way, it is possible to get clues as to what they are. Philosophers, I think, who have tried to set rules of thought start with some biological rule and then codify it – so clarity and distinctness counts, biologically, as evidence, and so, on some level, does the excluded middle. But we can’t stop there. We move on to fuzzy logic, paradigmatic categories... and who knows how far beyond?

My guess is that, as in most things, the brain works by having a bunch of rules, without any necessary regard as to whether they are consistent in any a priori theoretical sense. Different rules are stimulated by a particular experience, others suppressed, memory of past experience and feedback loops are brought into play, until the system “settles” in some state (“settles” is a relative term for a system that is constantly in motion), and we feel that this “makes sense” or doesn’t. This is what I mean by “consistent” with the rest of my body of knowledge. It is, in fact, the biological basis of, in a sense the definition of, “consistency”. The rules exist because, at some time in the past, they have been found helpful in negotiating the world – they have been empirically proven. They may be “hard coded” rules, proven in the dim historical past of our heritage, but, again like most things in the brain, the “hard coded” rules can be modified, and new rules created, by our individual experience. And such learned rules may be passed on to subsequent generations via the “Lamarckian” evolutionary process represented by our culture and its systems of education.

Thinking in this natural historical way about distinction and rules of inference, etc., may not “prove” validity, in the sense that philosophers have traditionally sought such proofs. But it may give pretty damn’ good evidence of empirical functionality. And, I would argue that this empirical, matter-of-fact kind of “proof” is most suitable to our real-life existence as human beings in a material world, even if it fails for some fictional existence as souls aspiring to a divine one. If philosophy is the pursuit of the “good” and if good must be good for something, then this is the kind of knowledge and truth that is “good for humans”.