Sunday, 14 May 2017

Reading The Biogeography Part 1: Southwest Pacific Area Cladogram

Southwest Pacific Area Cladogram
Starting with area cladistics, I have today finished reading Ung V, Michaux B, RAB Leschen, 2017. A comprehensive vicariant model for Southwest Pacific biotas. Australian Systematic Botany 29: 424-439.
I am hoping to learn more about the reasoning behind panbiogeography and area cladistics by going through the relevant papers in the recent special issue of Australian Systematic Botany.
To the best of my understanding the key steps of the study can be summarised in a very bare-bones fashion as follows. The authors...
  1. State that very little is still known about area relationships, as most research focuses on ancestral area inference for individual taxa.
  2. Summarize at length - over nearly four pages - the geological and tectonic history of the region. I cannot judge any of this at all and will consequently take it as given, although it is puzzling that no reference seems to be provided for the claim that the now largely submerged region had much more dry land when it broke away from Gondwana.
  3. Divide the study region into areas - the details don't matter for present purposes.
  4. Compile 76 phylogenies for plant and animal taxa occurring in the study region, and replace the species with the combination of areas in which they occur.
  5. Discuss the 'problems' of incongruence between the area relationships in these individual phylogenies, of terminal taxa occurring in several areas, which they call "taxonomic paralogy", and of the same area occurring in different branches of a phylogeny, which results in what they call "paralogous nodes". They decide to exclude these confounding nodes and to use only "paralogy-free subtrees" by applying a "transparent method" that I had not heard of before.
  6. Turn the trees into "three-item statements" and use those to produce a consensus area cladogram.
  7. Present the consensus area cladogram.
  8. Argue that one larger area that they had hypothesised is not a "real biogeographic entity" because it is paraphyletic on the area cladogram.
  9. Argue that New Caledonia's "highly endemic flora and fauna are ancient" because of its "basal" position on the area cladogram. I am not sure that this follows, and am a bit concerned about the potential of scala naturae thinking here, but that is not the main point here.
  10. Agree with panbiogeographer Michael Heads that any and all time-calibrated phylogenies are unreliable. Then they proceed to a lengthy attempt at time-calibrating their area cladogram based on plate tectonics.
I would like to explore in a bit more depth items #1, #5 and #10.

Does the concept of area relationships even make sense?

I cannot say that this paper has me convinced. To quote a few sentences where the authors themselves discuss problems:
In real-world situations, individual areagrams are unlikely to be congruent with each other and the problem, therefore, arises as to how best to deal with this incongruency [sic]. The main sources of incongruency [sic] are the occurrence of widespread taxa (multiple areas on a single terminal, or MASTs, for short), redundant areas (resulting in taxonomic paralogy), missing areas and inadequate methods of analysis (dos Santos 2011). Redundancy, the repeated occurrence of the same area in different branches on the areagram is nigh on universal and results in paralogous nodes. [...] [These] yield no information about area relationships and obscure the real relationships between areas.
Honestly, when I read this I am drawn to a very different conclusion than that we have to exclude all "paralogous nodes": maybe there is so much noise because stuff moves around too much. In other words, the concept of an areagram or area cladogram makes exactly as much sense as trying to force members of the same sexually reproducing animal population into a phylogenetic tree. Where there is no phylogenetic structure, phylogenetic trees are not an appropriate representation of the data.

Another issue I wonder about is the use of the term paralogy in this context. The word comes from gene evolution. Imagine a gene has duplicated in a distantly ancestral species, and subsequently both copies A and B evolved to have different functions. (This is, of course, one of the main ways in which new genes come into existence.) All descendant species inherit both genes. If we now look at a bunch of descendant species and want to figure out their relationships, we need to make sure we compare only the A copies or only the B copies. Comparing the A copy from one descendant with the B copy of the other misleads our analysis; the A and B copies are called paralogues of each other, and the A copies from different species are called orthologues of each other.

What I do not understand is how the situation in areagrams is supposed to be equivalent enough to use the same terminology. Areas are not genes that are inherited by species lineages. At best, it is the other way around: if the assumptions of area cladistics are true (which I doubt), then species lineages are comparable to genes inherited by areas. The same mistake as taking two paralogues as orthologous in genetics would then be to treat two species lineages in different areas as orthologues although they already diverged before continental breakup.

But the way the word is used here is in the former sense, when contemplating areas on a phylogeny, not when contemplating lineages in areas. This use of genetic terminology is rather confusing, I have to say.

What is the problem with time-calibrated phylogenies?

The open access de Queiroz paper in the same issue does a good job at discussing Heads' and the present authors' criticism of molecular dating, so just very quickly, there are two arguments here:

First, that
using substitution rates derived from modern taxa and then applying them over evolutionary time, often to groups only distantly related, is not justifiable
This is true as far as it goes, but the problem is that to the best of my understanding for the conclusions favoured by Heads to be realistic, substitution rates would have to be off by an utterly unrealistic factor. We are talking cases here where he sees a divergence as having happened tens of millions of years ago when the molecular data say a few million years. And why would we assume such massive shifts conveniently in just the direction needed to make vicariance a viable explanation, and in the absence of any other argument? Sorry to say, but that looks a bit like ad-hoccery to me.

I hope this is not taken to be too inflammatory, but it reminds me of those young earth creationists who are worried about the starlight problem and then argue that a few thousand years ago the speed of light must have been orders of magnitude higher. There is, indeed, a very practical parallel: just like the creationists in question do not take into account what such a change would do to other physical parameters (E=mc^2, meaning that our planet would have been incinerated), so in this case nobody seems to consider what a massively higher mutation rate would have done to the biology of the affected species.

The second argument is that
the same can be said for dating phylogenies using the age of the oldest fossil, which, despite giving only a minimum age for divergence, becomes a maximum estimate by proxy (Heads 2014b)
As has been discussed at length in rebuttals of Heads, including again in the aforementioned de Queiroz contribution, this is half nonsense and half, let me say, odd. It is nonsense in the sense that fossils are indeed used as minimum ages, not as maximum ages. I have myself recently used the R package chronos to time-calibrate trees, and you simply tell the analysis to make a divergence no younger than so and so, and that's that. Admittedly you generally also want to have some realistic maximum age for the entire tree, but that can be way higher than any minimum age you set. In fact, I wrote a blog post about this stuff not too long ago.

In Bayesian analyses, it is true, it is necessarily the case that there will be a limit to how much older than the fossil the results can realistically be because calibration is usually done with priors. The user sets a prior probability distribution where the probability of divergence, which necessarily has to add up to 100% over all possible times, will become so close to zero as to make no difference if we only go far enough back in time. It is, after all, impossible to stretch 100% out over infinity years and still have 10% per million years left.

But here is where the argument also gets distinctively odd. What Bayesian phylogeneticists do in practice is to set a relatively high probability around the time where the fossil was dated, and then have it peter off towards the past. The question is now: what else would one do? Is it not eminently reasonable to assume that the further into the past we go from the known existence of a lineage, the less likely it is that it already existed? Surely it is reasonable to assume that if the oldest known fossil of a plant genus is from 20 Mya, then it is quite likely that the genus already existed around, say, 21 Mya, a bit less likely that it existed 30 Mya, still less likely that it existed 50 Mya, and vanishingly unlikely that it existed as long as 200 Mya?

The problem with time-calibrating a tree based on plate tectonics is, in turn, that it front-loads the analysis with the assumption that there is no dispersal between areas. For the purposes of the discussion around vicariance and dispersal it is circular reasoning.

But to end on a positive note, despite approvingly citing panbiogeographers the authors of the present paper actually do not seem to argue that dispersal between areas is impossible; they merely kick out the data that I would interpret as showing such dispersal to infer the 'real area relationships'. Admittedly that could be seen as equivalent to kicking out all the genes I share with my father to claim that my genetic relationship with my mother is the 'real' one, but well, it still makes more sense to me than hostility to the mere possibility of dispersal!

No comments:

Post a Comment