Monday, April 20, 2020

Problems with the New Antibody Study from Santa Clara

I've been watching for results from various SARS-CoV-2 antibody testing projects with great interest, so I was interested to know that a Santa Clara County antibody survey just published interim results, here: https://www.medrxiv.org/…/10…/2020.04.14.20062463v1.full.pdf
I was extremely annoyed, however, on reading the results to find out that they recruited for their study using a Facebook ad! This makes their study population essentially self-selected and hence, in my view, practically worthless. I have no idea, either, how you would go about trying to correct for this self-selection bias.
The paper they cited as justification for this practice largely touted Facebook as a "cost effective" way of getting a mostly representative study population--which might be true if you're researching something for which the specific population whose size you are trying to gauge didn't have a vested interest in participating in your study in order to get a very hard-to-obtain and very sought after test. So yeah . . . "cost effective" except that they just wasted over 3000 perfectly good antibody tests on a study with a massive bias problem which we can't realistically quantify or correct for.

More Detailed Criticism

Imagine you are someone who had flu like symptoms a month or a few weeks ago. Now, like everyone in the world, you're wondering, "gee, was that really Covid-19? I bet I had Covid-19 and didn't even know it!" So if you see an ad on Facebook for a "hey, participate in this antibody testing study!", you are highly motivated to say "me! me! me! yes, test me!". On the other hand, if you have not had any flu like symptoms in the past two months, you are only motivated to take the study (which involves getting in your car and driving somewhere to get your blood drawn) if you understand the public health importance of figuring out how many asymptomatic cases there are. Which some people do, but a lot of people don't.

So by the design of the study, the population they are actually studying is probably much more representative of people in Santa Clara County who have had flu or cold symptoms recently than it is of people in Santa Clara County in general. And then of course you are going to way over-sample people who actually did have Covid-19 compared to the rest of the population. The claimed "50 to 85 times as many people" number is meaningless because of this oversampling.

Now, I admit, with something as high profile as Covid-19 antibody testing, it'll be hard to completely eliminate the self-selection bias--but it's not impossible. First of all, I would not use a Facebook ad campaign to recruit volunteers. I would use start with a home survey sent to a randomized set of addresses (as Dr. Streeck did in his antibody testing in the Gengelt). And I would not say in the survey that it was specifically for antibody testing, just that it was a study on Covid-19. 

Then I would follow up with an explanation: "OK, so now we would like to do antibody testing on you", and I would explain the importance to public health of getting as high of a participation rate as possible--really sell it hard and try to get everyone you selected to participate. This takes advantage of the existing attention capture, because it's a lot easier to get people to go along with something they've already partially bought into. And I really don't think it would be hard to get very high participation rates from a truly random sample if you approached it this way.

Can Self-Selection Bias be Corrected For?

If you start a study with a survey of a pre-selected set of randomized addresses, then you get to report on what percentage of people didn't participate in your survey. Which means you get to quantify the potential self-selection bias: something like, "since 30% of people chose not to take our test, we might have such-and-such percent selection bias."

The only way to do this with a Facebook ad campaign is to report on the total number of people who saw your ad but didn't click on it . . . which is kind of a dubious number given it's really hard to tell with internet ads how many people actually look at them or not. But the study didn't even report how many views the ad campaign got, or even how many people clicked on the ad, just the total number of people who filled out the initial online survey. So they're not even trying to quantify the massive self-selection bias. This is super shoddy, in my opinion.

How Bad Could the Effect of the Bias Be?

I did some simple calculations in a Google spreadsheet to try to quantify how bad a self-selection bias could be: https://docs.google.com/spreadsheets/d/1JfYxfak6uY4Bd1vBA-HGOYn_OECfGvrIp-H2ygPqn5M/edit?usp=sharing
The goal was to figure out how many different infection rates could actually match the results the Santa Clara study obtained. My question was this:

This study was reporting an interim result of an infection fatality rate for Santa Clara county of 0.12-0.2%. How big an effect would self-selection have to be in order for the true IFR to be actually 1%?

For this, you first have to decide on a percentage of people in Santa Clara county who might think, "hey, I might have had Covid-19 in the past two months". I first set this percentage at 20%, which I think is very generous considering the estimated total percentage of the population who gets the flu for the whole flu season is only 10%--I'm sort of adding in some people who had cold symptoms as well. It's a guestimate, let's go with it.

When you have this percentage, then you can start playing with a multiplier that represent how much more likely it is that people in that specific group of the population (people who have reason to suspect they might have had Covid) would respond to the Facebook ad campaign compared with people who have no reason to think they might have had Covid. The spreadsheet will then tell you how many people you would expect would test positive for Covid from the study under those assumptions. You then need to adjust your numbers till it matches the number of people from the study that actually tested positive for Covid (50), and that will tell you what the self-selecting bias needs to have been.

In order to achieve a target IFR of 1% (around 10 times the study number), and assuming that a full 20% of all people in Santa Clara had reason to believe they had Covid for some reason (again, I feel that very generous), then I need these people to be 4.6 times more likely to respond to the Facebook ad than people who have no reason to suspect they had Covid. I think it's entirely reasonable to think that people who think they might have had Covid would be up to 5 times more likely to respond to such a survey.

If I decrease my 20% estimate and say instead that just 5% of all people in Santa Clara had some reason to think they had Covid earlier, then I only need these people to be 2.9 times more likely to respond to the Facebook ad in order to get the 50 positive tests the study obtained.

Interestingly, after going through this exercise, I think this does open up a way that the self-selection bias could be a least partially detected. The study took a survey of the respondents, and I assume they asked some basic questions including whether they had any cold or flu symptoms in the past few months--although the preliminary report does not say that they took this sort of survey information, so maybe I'm assuming too much. But assuming they did, they could compare the percentage of respondents reporting previous symptoms with the percentage of the general population who actually had undiagnosed flu-like illnesses. This should track with a self-selection bias, I think.

Postscript

After I went through my own analysis, I discovered that a peer reviewer of this study has come to some similar conclusions: https://medium.com/@balajis/peer-review-of-covid-19-antibody-seroprevalence-in-santa-clara-county-california-1f6382258c25.  He doesn't do a sensitivity analysis of the same sort that I do, but he does also have a different concern based off of the false positive percentage of the test used as well.  The review is worth reading.

No comments:

Post a Comment