Frequently Asked Questions
Where did you source your data from?
Most U.S. state-level data comes from the COVID Tracking Project by The Atlantic. Country level data comes from Our World in Data (OWID). You can learn more about where the data is sourced from the data section of the website.
How can I trust your data?
You don't have to! All of the code for this website, including data sourcing and custom calculations, is open source and auditble by anyone on GitHub here. If you're comfortable running a couple commands in the terminal, you can even run the site yourself with your own data.
Why are only certain regions shown?
This was primarily a matter of convenience. There are websites that can give you COVID tracking data for nearly every country in the world and even regional-level data for many of those (such as states and counties in the US). Since the point of this project is to show how narratives can be shaped using data, most of the comparisons were chosen because of how often they are discussed together (e.g. NY compared to Florida) or because they offer more similar situations (strain of the virus, demographics, geography, etc.).
In particular, some countries don't make as much sense to compare together for data purposes. Different areas of the world got hit with different strains of the virus for example, some (notably the European strain which is what primarily affected the U.S.) being more deadly than others.
New Zealand is often given as an example of a country that is widely considered to have handled the pandemic well. However, if we consider their situation, being incredibly remote, an island that is easier to block travel to and from, and being much more sparsely populated than other parts of the world, it doesn't give as useful of a comparison in terms of situations that countries in Europe and North America faced.
What if I think another region, comparison, or narrative should be shown?
What is "smoothed" data?
For "smoothed" data, I took the rolling 7-day average for any given date. Some data sources provide this out of the box and for others it needed to be calculated manually. The reason to use this is that the data reporting is not always consistent over time. For example, some data points get underreported on weekends. Getting the average over the previous 7 days still gives us an accurate picture (possibly more accurate) of the outcomes in a way that is easier and more accurate to illustrate in the graph when smoothed.
What are some other narratives you plan on adding?
My next priorities are the narratives around lockdowns, mobility, and possibly mask usage as these seem to be some of the most contentious, political, and, as a result, likely manipulated narratives.
I would also like look more into the economic ramifications of policy decisions particularly when compared to health outcomes and how they correlate.
Data isn't my specialty though and these are trickier things to graph especially since the data is not as precise and the impacts are harder to display graphically. If this is something you're interested in and have experience with, feel free to reach out!
Why are some of the colors so terrible?
The colors are generated using a random color generator seeded by an item from the data being displayed, making the color deterministic and consistent. The generator is supposed to pick from an aesthetically pleasing palette but it's not perfect. Picking the colors progamattically rather than manually assigning colors to each chart, line, and bar means I can easily remove and add new data sets with minimal effort, but it also means that sometimes there are some jarring color choices.
How often is the data updated?
There's currently no set schedule, but every chart should make it clear the time frames that are being used. If it's missing somewhere, please let me know on Twitter or GitHub, or you can submit a pull request with a fix. For any raw data that is kept in JSON form on GitHub, you can also see the last date a file was updated here or here. Sometimes though shorter selective timeframes are deliberate as this can be a common way to shape a narrative with data.
What's the tech stack you use?
Everything is written in Typescript for no other reason than that (JS) is the language I'm most comfortable with, performance wasn't a huge issue, and I built this as a side project without too much time to work with more data friendly languages.
The site is built and deployed as a serverless app using Gatsby. Data pre-processing is done in Nodejs and it's hosted/deployed automatically using Vercel. Visualizations are rendered using a wrapper library around D3 called recharts.