Four Dollars, Almost Five: May 2012

If I was to tell you that a medical study was done on the efficacy of prayer in which the authors presented data showing that there was no significant difference in mortality, you would probably agree that it is reasonable to conclude that, in this particular study, prayer did not save lives. That is because it is the correct conclusion as borne out by the data, and reasonable people tend to arrive at reasonable conclusions.

Unfortunatley John Fraser is not a reasonable person. He is far from it. He engages in what is known as 'massaging the data' - an intellectually dishonest approach in which data is erroneously interpreted in order to confirm a preconceived belief.

Data does not like being massaged John Fraser!

The story goes like this...

About six months ago, over on the Premier forums, John Fraser spent a lot of time and effort trying to claim that a study from 1988 by Randolph Byrd showed that prayer saves lives, despite the data from the study directly contradicting his claim. Not only that, but he also continually tried to bully his way through the debate by claiming that I didn't understand what I was talking about, and he accused me of 'bluster' and 'hand-waving' a number of times. On the contrary, I am far more qualified in the areas of both medical research and medical publishing (two areas that the debate focused on) than he could ever hope to be, having worked in both fields for a combined total of over 10 years.

Go and read the full exchange if you want, but it was quite long, and veered off onto many tangents. I cannot possibly cover all of the problems with the paper itself, or with John Fraser's arguments, but the important details are presented below:

The paper in question was entitled "Positive Therapeutic Effects of Intercessory Prayer in a Coronary Care Unit Population" (Southern Medical Journal 81, no. 7 (1988):826-29). Straight off the bat, you will notice that there is no mention in the title of prayer having any effect on mortality, but instead it simply states that it has 'positive effects'. In medical publishing, it is the norm to include the primary finding of a study in the title in order to attract readers, e.g. "Drug X associated with increased survival in disease Y". If prayer had been shown to save lives in this study, you can be sure that it would have been specifically mentioned in the title.

The study itself separated patients into two groups, one of which received prayer from 3-7 unrelated Christians, and the other group did not. This poses an immediate problem. Regardless of any biases we might have, I think we can all agree that in a study of the efficacy of prayer, it is quite important to ensure that the amount of prayer recieved by the patients in both groups needs to be known and verified. How else can an effect be determined? All we know for sure is that the 'prayed for' group definitely received prayer - we cannot say that the 'not prayed for' group did not receive prayer. Thus, this critical variable in the study was not adequately controlled for. Byrd even admits this in the paper:

Several points concerning the present study should be mentioned. First, prayer by and for the control group (by persons not in conjunction with this study) could not be accounted for. Nor was there any attempt to limit prayer among the controls. Such action would certainly be unethical and probably impossible to achieve. Therefore, “pure” groups were not attained in this study — all of one group and part of the other had access to the intervention under study. [Emphasis mine]

Any single patient in the control group or test group could have had a large family or church congregation praying for them, or perhaps not. Who knows? Although it is likely that on average the 'prayer group' did indeed receive more prayer, we have no way of verifying this - the point being that such uncertainty casts doubt on the conclusions of the study. I suggested ways in which this might have been addressed, such as the patient or their families filling out a questionnaire about how religious they are or how often they pray, etc.

Following all the praying, the investigators compared the outcome of the patients in both groups by measuring 29 different variables, one of which was mortality. This information is contained in Table 2, which I have reproduced here (note that two different versions of Table 2 can be found online, in which the p-values are different, but the overall result remains the same regardless of which is used):

In total, out of 29 different categories tested, differences were found in only 6. Note that no difference was observed in overall mortality - I've circled this in red in the above graph. NS stands for 'not significant'. This means that although the numbers between two groups might be marginally different, this difference can be attributed to chance alone. Remember that these are not my interpretations of the results. These are the results as presented by the author himself. However, this didn't stop John Fraser from re-interpreting and massaging the data as we will see below.

As a side point, 6 effects did show differences, including need for diuretics, need for antibiotics, need for respiratory intubation and/or ventilation, congestive heart failure, pneumonia, and cardiopulmonary arrest. But on further inspection it becomes apparent that there are several causal relationships between these 6 effects (e.g. patients with pneumonia will obviously also need to take antibiotics), meaning that they cannot really be taken as 6 independent results, an approach that was taken by Byrd (multivariate analysis). If the interrelationships between the 6 effects had been taken into account, perhaps the significance of the result would have come under threat. This issue was not addressed in the paper.

Furthermore, when a large amount of variables are measured in a study, the chance of detecting false positives becomes an issue. Byrd realised this and so he used a severity score system as an additional measure. Unfortunately, this system not only suffers from the same criticism of interrelated variables, but seems to be completely arbitrary as no reference is given to indicate that it is a widely used scoring system.

But back to the main topic of the debate - whether or not prayer saved lives? My contention throughout the whole exchange was that it did not, and my conclusion is clearly backed up by the data in Table 2. John Fraser, on the other hand, was adamant that the study shows that prayer did have an effect on mortality. His rationale was that since there was a significant difference in the rate of cardiac arrests, and since cardiac arrests generally lead to death, then less cardiac arrests must mean less deaths. I summed up his whole argument as follows:

[John Fraser is] saying that prayer did have an effect on overall mortality because in order to determine the real change in overall mortality, you have to compare the actual situation (13 deaths) with what deaths would have been in the prayer group had there not been a difference in cardiac arrests (21-22 deaths). You are calculating this on the basis that 70% of cardiac arrests lead to death, so if the number of cardiac arrests is increased to the same as that in the control group (from 3 to 14, a difference of 11), then 70% of 11 is 7-8 extra deaths, hence you add this to 13 to give 21-22 overall deaths. When you compare 21 deaths to the original 13 deaths, since it is an increase greater than 50%, you conclude that the difference in cardiac arrests did indeed cause a significant difference in overall mortality. Since prayer was associated with the difference in cardiac arrests, therefore, prayer was also associated with the difference in overall mortality.

John Fraser agreed that I had accurately summed up his argument. But can you spot where he is going wrong? It might not be obvious to someone unfamiliar with interpreting clinical data.

In short, he has ignored the results of the control group, and is trying to compare the 'prayer group' to an imaginary control group that he has massaged into existence based on national average rates of death from cardiac arrest (70% is the figure he used). I tried my best to explain this error to him:

Firstly, the 70% mortality rate that you quote – where did you get that from? Presumably it is an overall average for death by cardiac arrest in US hospitals, or something similar to that. The problem is that you cannot simply apply the overall average of all hospitals to one single hospital. That may sound paradoxical but let me explain. In a statistical study of data from a single population, you cannot apply the combined average of data from multiple populations; you have to apply an average that is appropriate for that population. If the study had incorporated 10 hospitals, for example, then you could apply the average rate from those 10 hospitals. If the study incorporated a large number of hospitals taken from a diverse range of geographical locations in order to give an accurate sampling of the whole country, then you could perhaps use your 70% average, although it would still be more accurate to only take the average of the participating hospitals. So the only average rate you can use to apply to a study done in a single hospital is the average rate from that single hospital. In this case, it would be the average rate of mortality from cardiac arrest in San Francisco General.

Think about it for a minute and you will realise why. Some hospitals will have better cardiac units than others, some will have better surgeons, and some will be dealing with populations that are more or less capable to survive following a cardiac arrest. A sample population in San Francisco would have much more ethnic diversity than a sample population from, say, Alabama. Since there are many, many other variables like this that must be considered, it is simply wrong to apply a standard 70% national average to a study done in a single hospital. That is precisely why controlled conditions are needed in a study – in order to control for the variables that might affect how this particular population, in this particular hospital, survive a cardiac arrest when compared to the national average. Hence the authors included a control group. That is why they compared the test group to the control group and it just so happens that they found an insignificant result for mortality difference. Instead of doing this, you are trying to ignore the control group completely, thus ignoring the very thing you are trying to determine – what the mortality rate would have been if the cardiac rate had not been decreased. That is the very reason the control group is included, to provide that information so that the test group has a comparator. Instead of doing this, you are trying to compare the test group to itself, in the case that the number of cardiac arrests matched the control group and the national average mortality rate for cardiac arrest is applied. This is fundamentally wrong because doing that completely violates the controlled conditions that are at the very heart of the study’s design.

I then tried to explain it again using the numbers from the study:

The prayer group had 3 cardiac arrests and 13 deaths, while the control group had 14 cardiac arrests and 17 deaths. You are claiming that had the prayer had no effect on cardiac arrest such that the number matched that of the control group, the total number of deaths would have been 21-22 (based on the national average). But you don’t need to use the national average as I have explained above – in fact, it is erroneous to do so because it violates the controlled conditions of this particular population in this particular hospital. You do not need to even calculate numbers for the situation in which prayer had no effect on cardiac arrests, because it is the same as the situation in which no prayer was given – this is provided for us already by the control group. It is important to note here that when I say no prayer was given, I mean no prayer from the 3-7 recruited intercessors. All patients presumably received some amount of prayer.

So if the prayer group had the same amount of cardiac arrests as the control group (14), they would have had approximately 17 deaths, as indicated by the control group, and not the 21-22 that you calculated based on the national average. Hence your comparison of 21-22 vs 13 is not a valid representation of the effect of cardiac arrests on overall mortality. The real comparison is 17 vs 13, because 17 is the number of deaths that occur in this particular population in the event that 14 cardiac arrests occur, as indicated by the control group. When 17 is compared to 13 statistically, it is not significant, as presented in Table 2. Presumably this means that the San Francisco General has a better than average cardiac unit, or that the population itself has an inherently lower rate of death following cardiac arrest, but that is just speculating on an unknown, and we do not need to worry about that because this aspect of the experiment was controlled for. Please note that this does not mean that other aspects of the experiment were properly controlled for, such as the actual amount of prayer received.

I appreciate that the above analysis might not be completely digestible for everyone, so here is an analogy:

Imagine that the average national rate of death for men with certain cardiac problems is 20%. A study is set up to test if a new drug will help reduce this death rate. Two groups of men with these cardiac problems are established, one group of which is given the drug (test) and the other a placebo (control). A year later, 7% of the test group have died, while 9% of the control group have died (not a significant difference). Therefore, in the specific conditions used in this study, only 9% of control patients died, which is much lower than expected given that the national average is 20%. If you compare the 7% death rate in the test group with the national average of 20%, then you will conclude that the drug saved lives. But you would be wrong to do so. The only number you can compare the 7% to is the rate in the control group, i.e. 9%, and so you are forced to conclude that the drug did not save lives (in this particular study). There might be unusual circumstances that resulted in such a low death rate in the control group, which are also causing the low death rate in the test group, meaning that the effect might not be attributable to the drug at all.

This is the mistake that John Fraser is making. He is comparing the amount of deaths in the test group to the amount of deaths there should have been in the absence of prayer on the basis of national average death rates from cardiac arrests. But this data is already provided by the control group, and the difference between the two groups is not significant, as shown in Table 2.

Needless to say, John Fraser did not accept this explanation and continued to attack strawmen. For example, he couldn't understand that my position was actually NOT based on national average mortality rates at all, it was simply based on a comparison of the test group to the control group, as it should be. In the end, I had to bring the exchange to an end, so since he clearly could not understand where he was going wrong, I tried to convince him in a completely different way:

As I said I'm happy to leave it here. On the basis of your summation, you actually don't understand my argument fully. I don't see any point in me trying to go over it again as if you don't have it by now, you will never have it. Whether you purposefully mean to misrepresent it or not I can't say, but I doubt it.
Since I can't convince you that the data in this study shows that prayer has no effect on overall mortality, I suggest you give a copy of the paper ... to your doctor or any healthcare professional and ask what their conclusion would be.

That was six months ago, and to date John Fraser has not presented any evidence that a healthcare professional agrees with his conclusions (despite claiming that he would do further research into this). I'm not surprised - even the authors of the study didn't agree with his conclusions. The irony of this whole exchange was that prior to my involvement, John Fraser was claiming that skeptics are unwilling to admit defeat, even in the face of overwhelming evidence against their position. I would encourage everyone to take another look at the data for 'mortality' in Table 2 above to see just whose position was supported by the evidence in this case.

In John Fraser's parting comment, he seemed to have somewhat changed his tune:

...I already knew (even before we talked) that there was no statistically significant difference in overall mortality. What you and perhaps some of these other sources ignore is that there WAS a statistically significant difference in cardiac arrests, and that due to the seriously detrimental consequences of cardiac arrest, this would have to have an affect on the mortality rate even if for some reason it didn't affect the overall mortality rate enough to show up as statistically significant.

Wow! It only took about 5 pages of comments for him to actually admit this. Even so, he is still convinced that prayer saved lives in this study, a conclusion which is not presented in the data, is not shared by the authors, and according to John Fraser is 'ignored' in a series of follow-up reviews on the efficacy of prayer (strange that so many medical professionals agree with my interpretation of the data, whereas John Fraser is the only person that can see the truth). In fact, the word 'mortality' is not found anywhere in the discussion of the article, which seems strange if prayer did actually have any effect on it.

From his final comment, it is clear that John Fraser is fixated on the fact that less cardiac arrests must mean less deaths. Even if we grant him that point, looking at the study as a whole it becomes clear that although prayer may have saved some people from dying from cardiac arrest, this apparent positive effect was counterbalanced by increased deaths from other causes, since there was no significant difference in overall mortality. So claiming that prayer saved lives on the basis of this study is akin to saying that food laced with cyanide can save lives, because feeding hungry people with such food will prevent them from dying from starvation - never mind the annoying fact that they die anyway from cyanide poisoning. Byrd doesn't mention this anomaly at all in his results or discussion, almost as though it is being ignored. But when you take this into account, it seems that prayer simply alters the way you will die, rather than saves you from death.

Hardly a medical breakthrough.