So many sources have shown total deaths in the USA to have already surpassed 50k. Now, even those sources are undercounts, we are sure. (NYC has experienced 4000 deaths more than would be expected for the period of the pandemic even after subtracting deaths confirmed to be a result of COVID-19.) Nevertheless, data collated by the European Center for Disease Control & Prevention are considered just as “official” and yet aren’t the same as those presented by the USA CDC. Since I started off with the ECDC data, I have to continue to use it for consistency’s sake, but it is interesting to note that it appears to be somewhat behind.
For instance, today it is reporting an official death toll just a few dozen below 50,000:
However data on the USA from a Johns Hopkins University website is being widely used by US media, and it shows more than 50,000 deaths as of this morning, according to CBS News. It’s not entirely clear why data from 6 hours later retrieved through the ECDC is showing fewer deaths than reported this morning by Johns Hopkins, but the discrepancies are interesting. In particular it’s interesting that 2 days ago I said it would take 3 days to reach the 50k threshold, when according to the ECDC data we’re only 37 deaths away and Johns Hopkins puts us over that number already. Why is that, exactly? Well, we don’t know, but you can look at the numbers of deaths reported each day on the right side of that table above and see that the deaths ECDC recorded on April 24 (3,179) far exceed the number recorded and reported yesterday (which, IIRC, was 1,700-something).
While I haven’t yet found the reason why reported deaths surged, it’s possible that this might be purely bureaucratic (e.g. some deaths that happened earlier were simply not communicated promptly). France had a misleading jump in deaths when it corrected its official tally to include confirmed deaths that occurred in assisted living facilities. (Prior to that it only official reported deaths in hospitals, even when deaths outside of hospitals had the testing done and deaths were confirmed to be COVID-19 related.) If this is not a one-time bump, it suggests that about 3 weeks ago there was an uptick in transmission. But that was April 3rd, when red states were just beginning to enact shelter-in-place regimes and before the recent protests and resistance to sheltering-in-place. It seems unlikely that April 3rd would be a time of increasing transmission, so ultimately I think this will be explained as of random variability and/or a correction as a result of current reporting of deaths happening more than a day in the past.
If it isn’t, of course, that’s bad. But for now I think that the surge past 50,000 that is being widely reported doesn’t yet indicate a resurgence of transmission and illness.
P.S. The Johns Hopkins data is difficult to use, which is another reason to stick with the ECDC data. It’s presented as deaths in different jurisdictions, which is fine, but it lumps countries other than the US & Canada to come up with whole-nation numbers for places like Italy, while in the US & Canada it reports by city or county. This means that while the Johns Hopkins data might be just as accurate, if not more so, than the ECDC data, it makes it much more difficult for me to track deaths in the USA as a whole, which was my original aim. JHU obviously put a lot of work into their website, but the level of extra work to add a sum for Canada (and possibly each province) & a sum for the USA (and possibly each state) is trivial compared to all the work necessary for their end users to sum the various jurisdictions by hand. So, y’know, fuck that. I won’t be using them.
So let this serve as a reminder to everyone: if you’re creating data visualization for the general public, don’t forget to include tools to make it easy to aggregate related data.
jrkrideau says
I sometimes think that the creators of data pages go out of their way to annoy us. Perhaps there is a yearly award?