This Guest Blog is rather different. It’s about statistical analysis and written by a respected scientist, Professor Jeremy Greenwood. In his Guest Blog Professor Greenwood criticises the statistical basis of the UK government’s opposition to a ban on neonicotinoid pesticides during the tenure of Owen Paterson as Secretary of State for Environment, Food and Rural Affairs.
This paper is considerably more technically demanding than my usual scribblings about Hen Harriers or farmland birds and the reason it is being published here is that it touches on a subject of importance and of current debate. It’s the type of thing that it is difficult to publish in the scientific or statistical literature and even more difficult to publish quickly so that it can have an impact on the policy debate. This blog’s readership will be interested in the punchline of the paper, which is, that one of the main published scientific papers on which the UK government’s objection to a ban on neonicotinoid pesticides was based is deeply flawed and essentially worthless as a basis for policy decisions. Note that Professor Greenwood asked to see the original dataset to perform further statistical analysis but was refused access – this is contrary to scientific etiquette and procedure in my experience.
Professor Greenwood is quite blunt in his assessment of the paper by Pilling and others. He states that the analysis is ‘absolutely unnacceptable’, that ‘the experiment has told us nothing’ and that ‘It is difficult to understand how the work came to be published in a refereed journal.’. This is blunter than Godfray et al. in a very recent paper which reviews that same paper (and others) and blunter than the Parliamentary Office of Science and Technology briefing on the subject.
I’m grateful to Professor Greenwood for this paper. I am happy to consider responses to it from Syngenta or from Defra (or actually from anyone else), perhaps from the Defra Chief Scientist Ian Boyd, if they would like to comment or correct anything that they see as an error.
Professor Jeremy Greenwood is a former Director of the British Trust for Ornithology and is an Honorary Professor attached to the Centre of Ecological and Environmental Monitoring at St Andrews.
1. Use and environmental distribution
Neonicotinoids are the most widely used insecticides in the world. They are toxic to most arthropods. They are widely applied as seed dressings because they act systemically, protecting all parts of the crop.
Neonicotinoids can persist and accumulate in soils. They leach into waterways. They are found in nectar and pollen of treated crops. Concentrations in soils, waterways, field margin plants, nectar and pollen have commonly been found to be levels sufficient to control crop pests; they commonly exceed the LC50 (the concentration which kills 50% of individuals) for beneficial organisms.
2. Apparent effects on bees
There have been many studies in the lab and, increasingly, in combinations of lab and field conditions on the possible impacts of neonicotinoids on both honey bees and bumble bees. They indicate that concentrations in nectar and pollen in crops are not lethal to bees but are sometimes sufficient to reduce their ability to learn, to forage and to find their way back to the hive. There is evidence that they can reduce the survival of colonies. The general conclusion of these studies is that, together with such things as loss of flower-rich habitat and Varroa mite, neonicotinoid use may have caused losses of bee populations.
The manufacturers say that only studies carried out entirely in the field are valid. This is a curious reversal of arguments in respect of the impact of organochlorine pesticides on birds in the 1960s, when manufacturers said field studies were of no value and that all effects had to be demonstrated in the lab.
Following a scientific report by the European Food Safety Authority, the EC has imposed a
2-year moratorium on use of 3 neonics from 1 December last year. The UK government opposed this, on the grounds that there was no evidence of deleterious effects from purely field trials.
4. One study under field conditions was done in Britain, by the Food & Environment Research Agency (Fera), an Executive Agency of Defra.
Thompson H, Harrington P, Wilkins W, Pietravalle S, Sweet D, Jones A. 2013 Effects of neonicotinoid seed treatments on bumble bee colonies under field conditions. See http://www.fera.co.uk/ccss/documents/defraBumbleBeeReportPS2371V4a.pdf. (Note that this work has not appeared in the peer-reviewed literature).
To determine the effects of neonics on them, bumblebee colonies were placed adjacent to oilseed rape fields in which the seed had been treated with either of two neonics or with none. Unfortunately there was no difference in the amount of neonics picked up by bees adjacent to treated and untreated fields, probably because the bees foraged beyond the immediately adjacent fields, so those next to untreated fields picked up neonics from treated fields. Thus the experiment was incapable of showing anything useful about the effects of neonic exposure on the colonies. The work was not published in a peer-reviewed journal.
The lead scientist involved in the Fera (research (Dr Helen Thompson) moved to Syngenta (a major manufacturer of neonics) shortly after the report on it was made public. (Damian Carrington. The Guardian, Friday 26 July 2013).
4. Another study under field conditions was done in France, by Syngenta.
Pilling E, Campbell P, Coulson M, Ruddle N, Tornier I. 2013 A four-year field program investigating long-term effects of repeated exposure of honey bee colonies to flowering crops treated with thiamethoxam. PLoS ONE 8, e77193. doi:10.1371/journal.pone.0077193.
Honeybee colonies were placed beside thiamethoxam-treated or control fields of maize (three replicates) or oilseed rape (two replicates). Bees from treatment hives had higher concentrations of insecticide residues. But the authors stated that the results “show no evidence of detrimental effects on colonies that were repeatedly exposed over a four-year period to thiamethoxam residues in pollen and nectar, following seed treatment of oilseed rape and maize.” This is true but their conclusion that the measures of colony performance “were similar between treatment and control colonies” does not follow. They conducted no statistical analyses of their data nor did they present their results in a way that allowed others to make a proper assessment of their claim.
5. Hand-waving is no substitute for formal statistical analysis.
The reason that formal statistical analysis was not undertaken was said to be that, because the sample sizes were so small, “such an analysis would lack the power to detect anything other than very large treatment effects, and it is clear from a simple inspection of the results that no large treatment effects were present. Therefore a formal statistical analysis … would be potentially misleading”. This is absolutely unacceptable.
The usual way to analyse experiments such as that reported by Pilling et al. is to conduct a significance test of the means of the treated and untreated subjects. In such a test, one typically sets up a null hypothesis that there is no underlying difference between treated and untreated subjects (colonies in this case), any difference apparent in the experiment being just a result of chance. Should the analysis indicate that the difference found in the experiment is too great to be reasonably attributed to chance, the result is considered significant and one rejects the null hypothesis. If, in contrast, the difference can be attributed to chance, the result is labelled as non-significant and in practice many people then accept the null hypothesis. However, in addition to the null hypothesis being true, there are two other reasons why a test result may not be significant: that there is great individual variation between colonies subject to the same treatment or that there were too few replicate colonies. Forgetting these other two possibilities, people commonly interpret a non-significant result to mean that the null hypothesis is true. When sample sizes are as small as those used by Pilling et al. such an interpretation would certainly be misleading – but it is not the analysis but the interpretation of the result of the analysis that would be misleading. In fact, accepting the null hypothesis is an error. The correct,interpretation is that the experiment has told us nothing. This might be embarrassing to those who designed the experiment but the risk of embarrassment is no reason for not doing the right thing.
To avoid being misled by wrongly-interpreted significance tests, Pilling et al. used “expert interpretation of the data by scientists with experience in undertaking such trials”. But, however experienced are the experts, they have no more data before them than would be used in a statistical analysis. Expert interpretation of the results of a statistical analysis is important but without the formal analysis it becomes an exercise in mere hand-waving and obfuscation.
Professional statisticians nowadays tend not to carry out significance tests but prefer to estimate the magnitude of the difference between treated and untreated subjects and to place confidence limits on the estimate. The limits provide an idea of the range of values in which the underlying true difference between treated and untreated is likely to lie, so the approach is more informative than the simple significant/non-significant dichotomy of an hypothesis test. Pilling et al. should have adopted this approach. Alternatively, if they wished to use significance tests they should have actually done what the inventors of significance test, Jerzy Neyman and Egon Pearson, advised. That is to set up both a null hypothesis and an alternative hypothesis. For the bee experiment, the latter might have taken the form “There is a difference in colony performance of X% or more.” There are then four possible outcomes, three of which tell us something useful:
a. Both tests significant: “There is a difference but it is less than X%”.
b. Null significant, alternate not: “There is a difference and it could be X% or more”.
c. Alternate significant, null not: “Any difference is less than X%; it may be zero”.
d. Both non-significant: “There may be no difference or it may be X% or more”.
7. Inadequate reporting of the data.
Pilling et al. present graphs that allow one to get an idea of the magnitude of the differences between treated and control colonies and they tell us how many replicates they used. All we need to get a feel as to whether their conclusion that their results demonstrate that the measures of colony performance “were similar between treatment and control colonies” is an idea of the extent of differences between the two or three replicates. Unfortunately, their graphs do not reveal this. I have asked for a data set on which to test formal analyses but have been refused access.
With replications of only two or three, no formal statistical analysis and the results being published in a form so aggregated that it is impossible to assess the variation between replicates, the conclusion that “mortality, foraging behavior, colony strength, colony weight, brood development and food storage levels were similar between treatment and control colonies” is a clear overstatement unless one defines similar so loosely as to be scientifically meaningless. It is difficult to understand how the work came to be published in a refereed journal.