For Mathew Crawford
In an exchange in the comments of Mr. Crawford’s article found here (https://roundingtheearth.substack.com/p/estimating-vaccine-induced-mortality), I have called into question a graph Mr. Crawford posts here (https://roundingtheearth.substack.com/p/systemic-covid-19-vaccine-efficacy-fe3). Using the data that one can acquire here (https://github.com/owid/covid-19-data/blob/master/public/data/README.md), I tried to repeat Mr. Crawford’s analysis. In the exchange in the comments of Mr. Crawford’s article, Mr. Crawford has called for me to publish my results. These are my results and the R code I used. I have begun work in attempting to answer his questions to the best the data set would allow. This article will be updated as those results are made available.
# libraries loaded — — — — — — — — library(tidyverse)
library(lubridate)
library(brms)
library(modelsummary)#
# loading data — — — — — — — — — — — — — url <- “https://covid.ourworldindata.org/data/owid-covid-data.csv"
covid_data <- read.csv(url)covid_data <- covid_data %>%
mutate(date = as.Date(date))
The first task performed was to plot correlations over time of a few variables. The following R code performs the trend of correlations of case and vaccines:
covid_data %>%
group_by(date) %>%
summarize(New_cases_vax = cor(new_cases, new_vaccinations, use = "p"),
new_cases_vaxper100 = cor(new_cases, people_vaccinated_per_hundred, use = "p"),
new_cases_fullvaxper100 = cor(new_cases, people_fully_vaccinated_per_hundred, use = "p"),
new_casesperMil_vaxper100 = cor(new_cases_per_million, people_vaccinated_per_hundred, use = "p"),
new_casesperMil_fullvaxper100 = cor(new_cases_per_million, people_fully_vaccinated_per_hundred, use = "p")) %>%
pivot_longer(!date,
names_to = "corr_type",
values_to = "correlation") %>%
ggplot(aes(x = date, y = correlation, colour = as.factor(corr_type))) +
geom_point() +
geom_smooth() +
geom_hline(yintercept = 0) +
scale_x_date(date_labels = "%Y %m %d",
limit=c(as.Date("2021-01-01"),as.Date("2021-11-23")))
The graph below is the result. I have bolded the 0 correlation line to make things easier to see. We see that the correlation trend closest to what Mr. Crawford has is that between the cases per capita vs people (fully or not) vaccinated per capita. However, our data is systematically lower than what Mr. Crawford has identified. Finally, the correlation of new cases and people vaccinated per capita is mostly negative. It is only in the last couple of months, as the vaccination drive in the developed world has stalled, that the correlation has gone positive. We also see that absolute new cases vs absolute new vaccinations has a correlation near 1.

So, what is the appropriate correlation to use to assess the question Mr. Crawford means to answer: does the vaccine cause cases/deaths? Let’s hold off on this for a moment. Let’s look at the mortality data.
covid_data %>%
group_by(date) %>%
summarize(new_deaths_vax = cor(new_deaths, new_vaccinations, use = "p"),
new_deaths_vaxper100 = cor(new_deaths, people_vaccinated_per_hundred, use = "p"),
new_deaths_fullvaxper100 = cor(new_deaths, people_fully_vaccinated_per_hundred, use = "p"),
new_deathsperMil_vaxper100 = cor(new_deaths_per_million, people_vaccinated_per_hundred, use = "p"),
new_deathsperMil_fullvaxper100 = cor(new_deaths_per_million, people_fully_vaccinated_per_hundred, use = "p")) %>%
pivot_longer(!date,
names_to = "corr_type",
values_to = "correlation") %>%
ggplot(aes(x = date, y = correlation, colour = as.factor(corr_type))) +
geom_point() +
geom_smooth() +
geom_hline(yintercept = 0) +
geom_hline(yintercept = 0.05) +
geom_hline(yintercept = 0.15) +
geom_hline(yintercept = -0.05) +
scale_x_date(date_labels = "%Y %m %d",
limit=c(as.Date("2021-01-01"),as.Date("2021-11-23"))) +
theme(legend.position = "bottom")
The above produces the correlation trends of mortality and vaccines. In this plot I have added black lines for a correlation of 0, 0.05, -0.05, and 0.15. In Mr. Crawford’s graph, he identifies 5 months in which the correlation is above 0.15. I find a few such points for each correlation, but they tend to be slightly removed from the loess trend. By and large, the correlations that I have calculated are significantly different from those that Mr. Crawford has calculated. My code is above. Mr. Crawford should release how he has done the same.

Now, the important question, do any of these correlations bear on the question of if the vaccine causes cases or deaths. The answer is no. Mr. Crawford, in the second article of his cited, only presents his versions of the above correlation trends and seems to consider the positive correlations (in his graph, not mine) are sufficient evidence that the vaccine causes cases and deaths. This is guilty of that old bugbear of stats 101 students: X and Y are correlated therefore X causes Y. This is not a valid inference. Correlation does not entail a specific causation. Correlation suggests there is a causal relationship somewhere, but it never entails or offers evidence that X causes Y specifically.
The issue is there could be confounders mediating the relationship between X and Y, between vaccination and cases/death, there could be any number of upstream variables causing both. In Mr. Crawford’s work, it is assumed that there are none. In our exchanges, he has laid it on me to show that there are confounders as a precondition of taking the possibility seriously that there are. In other words, Mr. Crawford has put the burden of proof on me. I have to show that there is a confounder for him to accept that his analysis is inappropriate. This is bad statistics. This is lying with statistics. If you make an assumption in your analysis, it should be an obviously innocuous one, a standard one (like linear effects), or you should present evidence to support your assumption. The assumption that there are no confounders is not ever a standard assumption in a statistical analysis of causal relationships nor an innocuous one. Thus, gold standard applied statistics would have Mr. Crawford give us evidence that his assumption is plausible, at the very least.
How would we tease apart the causal effect of vaccination on cases given the data set? We should actually model the raw data, not look at the correlation trends across time. We should take care of any immediately plausible confounders, like age and affluence. My first attempt uses only the people vaccinated per hundred. I use a Bayesian estimation technique via the brms package in R with default priors. The R code is as follows with the coefficient estimates in the table:
The data available is longitudinal in countries. Thus, a mixed effect model will likely be advantageous, in particular looking at how the vaccine’s effects is different by country. So, I will end up testing 4 models: 1) a simple mixed effect model with just the people vaccinated per 100 with both the intercept and slope having mixed effects; 2) the previous model with an AR(1) specification; 3) several controls will be added; 4) all the controls that can be added will be, with exceptions taking into account multicollinearity concerns. The R code for the four models is as follows.
# simple model
fit_simple_cases <- brm(new_cases_per_million ~
people_vaccinated_per_hundred +
(1 + people_vaccinated_per_hundred | location),
data = covid_data,
warmup = 1000,
iter = 2000,
chains = 4,
inits = "random",
cores = 4,
control = list(max_treedepth = 20))# ar(1) model
fit_ar1 <- brm(new_cases_per_million ~
ar(time = date,
gr = location,
p=1) +
people_vaccinated_per_hundred +
(1 + people_vaccinated_per_hundred | location),
data = covid_data,
warmup = 1000,
iter = 2000,
chains = 4,
inits = "random",
cores = 4)# with controls
fit_control_cases <- brm(new_cases_per_million ~
ar(time = date,
gr = location,
p=1) +
people_vaccinated_per_hundred +
median_age +
gdp_per_capita +
life_expectancy +
human_development_index +
extreme_poverty +
(1 + people_vaccinated_per_hundred | location),
data = covid_data,
warmup = 1000,
iter = 2000,
chains = 4,
inits = "random",
cores = 4)# with more controls
fit_control_cases2 <- brm(new_cases_per_million ~
ar(time = date,
gr = location,
p=1) +
people_vaccinated_per_hundred +
median_age +
gdp_per_capita +
life_expectancy +
human_development_index +
extreme_poverty +
population_density +
stringency_index +
cardiovasc_death_rate +
diabetes_prevalence +
female_smokers +
male_smokers +
handwashing_facilities +
hospital_beds_per_thousand +
(1 + people_vaccinated_per_hundred | location),
data = covid_data,
warmup = 1000,
iter = 2000,
chains = 4,
inits = "random",
cores = 4)modelsummary(models = list(Simple = fit_simple_cases,
AR1 = fit_ar1,
Control1 = fit_control_cases,
Control2 = fit_control_cases2),
coef_map = c("(Intercept)", "people_vaccinated_per_hundred", "sd__(Intercept)", "sd__people_vaccinated_per_hundred", "cor__(Intercept).people_vaccinated_per_hundred"),
gof_omit = "algorithm|pss",
statistic = "conf.int")

The table only displays the variables common to all models. The most important thing to note in this table is the point estimate and confidence interval for the effect of vaccine proportion and the “sd_” estimates that follow. The “sd_” estimates tell us about the standard deviation of the random effect coefficients, the country specific effects. What we see is that vaccine effects are very variable across countries; sd__people_vaccinated_per_hundred is close to 3 or greater in all models while the effect of people_vaccinated_per_hundred is near 0.5 or less. Thus, we can expect to find a country in our data set that has an effect of vaccine near 5 and another with an effect near -3.5, given our simple model. In addition, the average effect of people_vaccinated_per_hundred is not identified as different from 0; the confidence intervals all span 0. Thus, given our data, these models cannot claim that the effect of vaccine proportion is different from 0. This is starkly inconsistent with Mr. Crawford’s claim that the vaccine is causing cases. What about for death?
Do these models answer Mr. Crawford’s questions: does the vaccine cause cases or deaths?