Why COVID-19 death predictions will always be wrong


Following is a transcript of the video.

Narrator: At the end of March, the White House announced that it was predicting somewhere between 100,000 and 240,000 US deaths from COVID-19, a huge drop from a couple of weeks earlier, when the predictions were more like a couple million deaths. This led to a lot of confusion about where the White House's numbers came from and why the predictions shifted.

So let's break down where those numbers are coming from. To make any prediction about how many people will die from a virus, you have to first know how many people will get it, and that's why this number keeps coming up.

Emily Ricotta: The basic reproductive number, or R0, is, in a totally susceptible population, how many people would one person go on to infect.

Narrator: This is Emily Ricotta, a research fellow with the US National Institutes of Health.


Ricotta: So, if you drop one infected person into a totally susceptible population, how many more cases are you gonna see?

Narrator: As you can probably imagine, R0 is pretty important for predicting how bad an outbreak will be. If R0 is less than one, over time you'll end up with fewer and fewer new cases and the disease will die out. If R0 is more than one, that's when you can start to run into some problems, depending on how severe the illness is. Even an R0 of 2 gets more than 1,000 people infected only nine links down the chain.

Ricotta: So, the common cold. If it has an R0 of 2, how does that change policy? It doesn't, right? But if I have a pathogen that has an R0 of 2 and it's killing 1%, 10%, 50% of the people it infects, I'm going to respond much differently.

Narrator: R0 should be pretty simple to calculate. It's based on three main things: transmissibility, or how likely it is that you'll be infected through contact with someone who has the disease; average rate of contact, or how many people the average infected person will come in contact with over time; and finally duration of infectiousness, which is just how long the person spreading the disease is contagious for. Getting those factors involves a whole bunch of calculus that takes into account things like how many people at any given time are susceptible to infection and how many are actually infected. This is what some of the simplest equations look like. Of course, just having the equations isn't enough.

Ricotta: The thing that I want to emphasize about R0 is that it is very specific to the time of the outbreak, the place, the population. So there's never really just one R0 for a pathogen. Narrator: And those three factors? You just don't know that information with a new outbreak. Ricotta: The earlier that you are building these models in an outbreak, the harder it is, because you have more educated guesses and data that's not as specific to the outbreak as you'd want.


Narrator: Normally, scientists can estimate things like transmissibility based on data from previous outbreaks, but for the early predictions of the spread of COVID-19, all scientists had were the numbers from Wuhan, China, along with the data we've collected about other types of coronaviruses that infect humans.

Ricotta: What we do with it, especially at the early stages of an outbreak, is that we take data from the endemic coronaviruses, and we take data from what we saw in SARS, we take data from what we say in MERS, and we say, OK, let's see what happens if we make COVID, you know, spread the same as an endemic coronavirus. How many people does that infect? And then as we progress and we get more modern data, we start feeding that into the model and updating it as we go.

Narrator: The early estimates for R0, based on the initial outbreak in Wuhan, were 2.2 to 2.7, so more than the flu. Which brings us to the more detailed models for COVID-19 that we were talking about earlier.

Ricotta: R0 is gonna be different depending on where you are. So if I drop an infected person into the middle of New York City and I drop an infected person into the middle of rural America, R0s are gonna be two very different things because the number of people that are gonna come in contact with each other are very different.

Narrator: The report that predicted millions of deaths in the US alone, which was put together by researchers at the Imperial College London, used 2.4 as the average R0 for the coronavirus. That was based on transmission rates reported early on in China. From other data, they estimated things like the percentage of cases where the patient needed to be hospitalized. They predicted that 30% of those hospitalized would need critical care, like a ventilator, based on the rates among early cases.


Other factors needed more guesswork. For example, that half of the people who needed critical care would die, a number they landed on based on input from clinical experts. When they used those numbers to model the epidemic in the US and Great Britain, they predicted that about 2.2 million people in the US and 510,000 people in Great Britain would die without policies to slow the spread. Those numbers aren't as relevant anymore, though, since most countries did adopt physical-distancing measures. One major problem with all this is that testing data might not reflect how many people actually have COVID-19. Tests haven't been widely available in many places, including the US, so plenty of people without symptoms or with mild symptoms aren't being tested, and that totally throws off the number for R0. If you don't know how many people have been infected, you can't really calculate transmissibility.

That's why at least one major group, the Institute for Health Metrics and Evaluation, or IHME, makes its predictions based on reported deaths instead of R0. The group is based out of Washington, and we know that the US government at least is somewhat referencing its model, which might explain why the White House predictions were so different. The IHME team figured that death rates, while not 100% perfect, would still be a more accurate statistic than the number of people who have COVID-19. People with the most severe cases usually have been getting tested, which means we have at least a semi-accurate idea of how many people are dying specifically from the disease. By analyzing the pattern of death rates in Wuhan, they were able to come up with a mathematical formula for how different physical-distancing measures, like closing schools, affected the number of deaths. Then they applied that model to hot spots in the US, as well as the country as a whole, taking into account average death rates for different age groups since populations can differ in that aspect. Basing their model on death rates was also a useful way to predict the number of people who would need to be hospitalized. If you're predicting 100 deaths and previous data is saying 10% of those hospitalized die, you can work backwards and guess that roughly 1,000 people were probably hospitalized. When the White House released its 100,000-to-240,000 range for the number of deaths in the US, it cited the IHME model as a main source it was looking at. Like with the models based on R0, the IHME predictions continue to change as the pandemic continues. If anything is clear from all of these different models, it's that none of them can be perfectly accurate.

Ricotta: These numbers aren't static. They're constantly evolving, especially as we get new data, especially as we are put into new situations.

Narrator: Researchers don't really expect to get all of the hard and fast numbers that would allow them to calculate an exact R0, but for our models to be as accurate as possible, we need more reliable data, and the best way to get that data is by testing more people so we have better info on how many people catch the virus and when. That way, with some clever math, we can get the basic information we need to see what works for slowing COVID-19's spread, and then we can use those strategies to help keep people safe.

Read the original article on Business Insider