Why we shouldn’t use formation counts to correct the fossil record

New paper out in Palaeontology: it’s Open Access, so read it here.

We use empirical and simulated data along with information transfer statistics to show that using formation counts as a sampling proxy to model bias in the fossil record can lead to erroneous biodiversity estimates that may be further from the truth than raw diversity extracting directly from the fossil record. If the paper is little technical for your taste, try reading the summary below:

What is a sampling proxy?

A metric that represents the collecting effort of palaeontologists that should cover some or all of the geological and human factors that can introduce error into interpretations of data from the fossil record (Benton et al., 2011).

What are formation counts?

A formation count is simply counting the number of named geological formations per time period. A count of named formations can be used as a proxy for sampling because they supposedly provide a summary of rock volume, habitat heterogeneity, geographic and temporal dispersion, and research effort (Benson & Upchurch, 2013). In our study, we quantify formation counts in 4 different ways; (1) total fossiliferous formations = the total number of named geological formations that contain fossils per time period, (2) clade-bearing formations = the total number of named geological formations that contain fossils of the clade whose diversity we are interested in, (3) wider clade-bearing formations = the total number of named geological formations that contain fossils of the wider clade of which our clade of interest belongs to, (4) potential clade-bearing formations = the total number of named geological formations that could potentially yield fossils of our clade of interest.

What is residual modelling?

Residual modelling (Smith & McGowan, 2007; Lloyd, 2012) is a method for removing sampling signal from a diversity data set – it is commonly used in palaeontology, particularly when  sample sizes are too small for subsampling methods like rarefaction or Shareholder Quorum Subsampling. The relationship between a sampling proxy and diversity is obtained via a regression model. The model represents a scenario where sampling perfectly predicts diversity. The residuals, i.e. the remainder of the diversity data once the sampling signal has been accounted for, can then be interpreted as a biological signal in the absence of any sampling error.

Why use simulated data?

By using simulated data, we have knowledge of both true and raw sampled diversity. This allows us to compare modelled, raw and true diversity through a time series. This is impossible in the real fossil record.

What did we find?

In empirical marine fossil data, there are close correlations between clade-bearing formations and diversity across all clades. However, Information Transfer analyses show that diversity predicts formation counts just as well as formation counts predicts diversity. This shows that close correlation between formation counts and diversity in the fossil record is more likely a result of information redundancy rather than strong evidence for sampling bias.

In a simulated fossil record, despite close correlation between clade-bearing formations and diversity, modelled residual diversity based on clade-bearing formations is less accurate than raw diversity. All other types of formation count yield residual diversity estimates that are a slight improvement on raw diversity.

simulated data
Simulated fossil data. True diversity (red solid), raw diversity (red dashed), modelled diversity based on potential clade-bearing formations (black solid), modelled diversity based on clade-bearing formations (black dashed).

It is evident that strong correlation between a sampling proxy and raw diversity is not necessarily strong evidence for sampling bias. However, the simulation results suggest that residual modelling can be used as a conservative method for correcting for sampling bias in the fossil record, but only if the correct sampling proxy is used. The most commonly used forms of the formation count sampling proxy, clade-bearing formations and total fossiliferous formations, produce the worst results. The best performing sampling proxy is potential clade-bearing formations but, this metric is difficult to define in the real fossil record.

It is inadvisable to use formation counts as a sampling proxy to correct raw diversity from the fossil record simply based on the spurious assumption that they are good representations of sampling regimes and because they correlate closely with raw diversity.

Read the full, open access, original article: Dunhill, A.M., Hannisdal, B., Brocklehurst, N. and Benton, M.J. On formation-based sampling proxies and why they should not be used to the correct the fossil record. Palaeontology, DOI: 10.1111/pala.12331.

Supplementary data available at: https://doi.org/10.5061/dryad.rb86d

 


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s