Monday, March 30, 2020

The Imperial College Study: Part 1

[Links to the full series]

Part 1
Part 2
Part 3A
Part 3B

------------

A bit ago I posted a link to this study by the Imperial College of London, which is a projection of how the U.S. and Great Britain can expect the Covid-19 disease to progress in the next two years under various scenarios of different levels of social isolation: https://www.imperial.ac.uk/…/Imperial-College-COVID19-NPI-m…
I want now to do a series of posts explaining this study and the current research behind it, because I think this might be the most influential study currently informing government decisions. I encourage you to read the study yourself, but I won't assume this.

1. What is the nature of this study?

This study is the results of a series of computer simulations run by epidemiologists. Specifically, it is an application of pre-existing "microsimulation" (https://en.wikipedia.org/wiki/Microsimulation) software with parameters set to simulate Covid-19 on a virtual population. A "microsimulation" is a computer model in which the behavior of a population is modeled by creating millions of individual virtual people and having them move around in a virtual world according to a set of behavioral rules. If you've ever played any of the Sim City games or any of the Tycoon games, you've played with a simple, small-scale microsimulation. A lot of people at this point have seen an *extremely* crude microsimulation recently published in an article by the Washington Post: https://www.washingtonpost.com/…/20…/world/corona-simulator/.

In epidemiology, these things are incredibly useful because you can simulate the spread and effects of a disease much more accurately than by just talking in broad percentages. You setup your population according to the actual characteristics of the real world: children will congregate in schools during school days but stay home on the weekends, old people will live in nursing home clusters, working-age people will congregate in business and will travel a range of distances to work using public transportation according to actual percentages from real-life that you enter, and so forth. Your virtual population will be created with a range of ages and existing health conditions that you also enter, based on real-world numbers that are correct for the population and time period you are modeling.

Then you characterize your disease: how long is the incubation period? What is the range of severity of symptoms and does this vary on an age basis or by existence of co-morbidities? What is the range of infectiousness of the disease, and how does it alter over the timeline of the progression of symptoms? How close does one person need to be to another person to spread the disease? How much does having the disease itself limit the motion of an individual and therefore the likelihood that one person will continue contacting other people?

Then you introduce the disease into the virtual population and play the simulation forward a certain number of times with different random seeding, and it gives you a composite average result of what happens.

2. What are some advantages to this approach?


There are a number of great advantages to this approach in epidemiology. A lot of statements about how diseases spread are actually just abstractions of just this sort of real-life behavior. We say that diseases spread exponentially in the early phases of an epidemic, but this is just a rough mathematical approximation of the behavior of a self-replicating virus as it spreads through networks of people. A good microsimulation will take more exact accounting of real-world social distances and movement. There will be a real-world mix of dense urban centers and more spread-out suburban neighborhoods, there will be an age and health spread in the population you are simulating that matches the real world, sick people will slow down their movements, etc.

There have been some people who have doubted that Covid-19 will behave as badly in the U.S. as it has in China and in Italy. They've raised potential differences between us and them that could make a difference: different baseline health of the population, different population density, different levels of cultural contact, different standards of hygiene, different access to health care. If the microsimulation is a good match to a given population, none of these problems would apply to its projections.

Furthermore, the type of human and virus behavior that needs to be modeled in order to get a realistic simulation is not actually that complex. Basically, the only human behavior that needs to be about right is movement and proximity. Viruses also are pretty simple organisms and it doesn't take too much data to be able to get a pretty accurate knowledge on the average behavior of an infection in humans on aggregate. So I think the results of these simulations tend to be fairly robust.

Another advantage is that you can use the same model to test multiple scenarios. Not sure the exact R(0) of your virus? You can run your simulation with a range and see how it affects the outcome.

3. What are some disadvantages to this approach?


Not a lot, but I can think of a few. First, as with any computer simulation, the results are only as good as the underlying assumptions and the underlying programming. This can cause it to miss some things. For example, we are now in an absolutely unprecedented level of public awareness and discussion about the Covid-19 pandemic. This itself is likely to change societal behavior--has the model taken this sort of public awareness and consequent fear of spreading infection into account? I don't know. Also, if the underlying characteristics of the virus are not correctly understood, then you will have an example of "garbage in, garbage out". Yes, you can run the simulation on a range of inputs, but if you understanding of the parameters is very out of line from reality, the results won't be very helpful.

Second, the results these types of simulation provide are very specific and realistic. But "realistic" is not the same thing as "real" and I think these simulations tend to produce a bit of over-confidence in their results because it looks like you're observing reality when you are just roughly simulating it.

4. How good a fit to reality is the model that was used in this study?


That, I am not sure of. However, I think there is good reason to trust it's a fairly good fit. With epidemiology, we have an opportunity to run such models and test their output every year because we use them things to predict the movements of seasonal flu. We've used them to model known outbreaks from the past to see if the results they produce match historical known results.

I'll stop my first post here. In a subsequent post, I'll get into the assumptions which this particular study used: how it characterized the virus and how it justified those characterizations.

No comments:

Post a Comment