For the first method of data analysis, I note that the official Covid death tally is surmised to be composed of two series of numbers: the people each day who die of some random cause but only happen to be infected with Covid, and the people each day who actually die of Covid. And *both* of these series of numbers will be related to another series of numbers: the number of people each day who are diagnosed with Covid. However, the two types of people who die each day will each have a *different* relationship to this number.
For the people who die of some other completely unrelated cause, the number of those people--who just happen to also have Covid--will be directly related to how many people currently have Covid in the population. If a lot of people happen to have Covid at some time, a lot of people who die *at that time* will also happen to have Covid by coincidence. If few people happen to have Covid at that time, few people will die coincidentally also having Covid. So if you plotted the number of people who have Covid at any particular time on the same graph as the number of people who die "with" Covid at any particular time, the second graph will be a mirror of the first graph (but smaller).
The same thing is true of people who die "from" Covid--*except* for the important fact that this graph would be not only mirrored, but also time shifted. It takes some time after you are diagnosed with Covid to actually die of Covid. So if a lot of people at a particular time are diagnosed with Covid, then *later on* a lot of people will die from Covid--but not right away.
This time dependency represents a difference between the two types of people that we are surmising compose the total official death tally of Covid. We should then be able to separate out roughly how many people fall into each category by doing a time-dependent analysis.
My Analysis
Here was my approach, using publicly available datasets and a custom Python program:
I assumed that the number of "deaths with" (the coincidental deaths) included in the official death tally was some fairly constant percentage of the total deaths (seeing as I couldn't think of any good reason for this to change over time). I also assumed that the number of these deaths over time would be directly proportional to the number of Covid cases at the time. I could therefore generate a time series that represented those deaths by taking the time series number of confirmed cases per day and scaling it down until the number of deaths it represented equaled a given percentage of the total official death tally.
I made this target percentage (the percentage of deaths in the official tally which are "spurious") a variable so that I could generate multiple time series of spurious (or coincidental) deaths per day corresponding to any target magnitude of this effect I wanted.
For each iteration of my run, I would generate the "spurious" deaths that would correspond to a given magnitude. I then subtracted these deaths from the official tally. The hypothesis of this particular run would be that the remaining deaths were the deaths caused "by" Covid, and should therefore match the Covid infection curve, but with a time delay. I then scaled these deaths up to match the infection curve and found the best time delay which caused the death and infection curves to match.
By doing this for a target "spurious" death percentage of 0%, 10%, 25% and 50%, I figured I could see which rate of "deaths with" resulted in the best final match between time-shifted deaths and the original infections. That is, the closer my arbitrary percent of "deaths with" ended up being to reality, the better the remaining deaths would correspond to the infections that actually caused them.
The result was as following (orange is scaled up deaths, blue is infections):
Periods of Rapid Infection Growth
Another Important Factor: Amount of Time Shift
For the hypothesis that 0% of the total death tally is spurious, I had to shift the deaths back 20 days to get them to match up with the infections properly. I had to increase this a few days for each subsequent graph, all the way up to 30 days of time shift or the graph where I assume 50% of the total death tally is spurious.
Here it is important to note that the average time-to-death from infection has been established independently based on case studies, and it's normally given at something in the range of 18 days. This also argues against positing that the total percentage of spurious deaths goes very far above 0%--it's another way that the hypothesis results in unrealistic data the larger this percentage gets.
Some Closing Comments on this Analysis
2. One objection might be made, suppose there were other causes of overreporting aside from purely coincidental deaths? This analysis doesn't rule those out per se, however given how well the time-shifted deaths matches the infections (when scaled), those causes of overreporting would have to be somehow time-matched to actual Covid deaths. That is, the overreporting would get worse when *actual deaths from Covid* go up (not just Covid infections) and get better when these deaths go down. I have not yet been able to think of a cause of overreporting that would be proportional to correct reporting in such a way.
This could happen on a regular basis a certain percentage of time and it would not show up as an anomaly on this kind of a comparison graph, since the deaths are just scaled up to match with the infections anyway. That would be one time-matched factor causing deaths to be *underreported*, and others could also be easily thought of.
This means that this particular analysis does not offer any sort of cap on how much the official death tally might be under-representing the actual death toll of Covid. More on this point in Part 3.
No comments:
Post a Comment