COVID-19 Visualization
This is a special blog post but after having a major decision about how to compute bounds yesterday for the ConstraintSolver I promise the next post will be
one about the ConstraintSolver again.
Based on the recent events I wanted to create a visualization about the COVID-19 virus to play around with the publicly available data a little bit.
I’m a huge fan of data visualization especially on maps.
First of all we need to get the data and then I want to visualize the number of total cases over time. It can be easily modified to only show active cases or what I’m also interested in is the number of cases per 100,000 people. I’m planning on updating the visualization at the end of the post regularly when new data arrives.
Getting the data
Most people probably look at this map to check the current status but I’m not a huge fan of the circles and wanted to create an actual overlay over the country as it’s quite easy to get the shapefiles of the countries but the ones of the provinces will take more time/effort π
They use the data from this repo which I also use.
The first step after downloading the data:
download("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv", "covid.csv")
is to combine all cases from Mainland China
into one row and the same for all other countries of course.
The first step is to remove some columns we don’t need and a simple renaming of the Country/Region
column.
using DataFrames, CSV
function summarize_data()
df = CSV.read("covid.csv"; copycols=true)
select!(df, Not(Symbol("Province/State")))
select!(df, Not(:Lat))
select!(df, Not(:Long))
rename!(df, Symbol("Country/Region") => :Country)
end
Later I also don’t have a shapefile for some countries one of them is Hong Kong which I count as China here. Hopefully that’s okay for this visualization…
for row in eachrow(df)
if row[:Country] == "Hong Kong"
row[:Country] = "Mainland China"
end
end
Next step is to sum up the cases grouped by :Country
which can be done using:
adf = aggregate(df, :Country, sum)
I also renamed some country names to work with the shapefiles later and renamed the dates to my preferred format π
dates = names(adf)[2:end]
for date in dates
col_name = string(date)[1:end-4]
parts = split(col_name, "/")
col_name = "$(parts[2]).$(parts[1]).$(parts[3])"
rename!(adf, date => Symbol(col_name))
end
for row in eachrow(adf)
if row[:Country] == "Mainland China"
row[:Country] = "China"
elseif row[:Country] == "US"
row[:Country] = "United States"
elseif row[:Country] == "UK"
row[:Country] = "United Kingdom"
end
end
CSV.write("summarized.csv", adf)
Now we have a csv file with a column for each date and the number of cases for that date per country.
Visualization
I’m using another Julia plotting library after using Plots.jl and tried out Makie.jl for others. This time I use Luxor.jl as they have an example of a world map π
Let’s include all libraries we need first
using Shapefile, Luxor
using DataFrames, CSV
using ColorSchemes
using Dates
include(joinpath(dirname(pathof(Luxor)), "readshapefiles.jl"))
I’m not a huge fan of the include()
but that’s what was used in the documentation of Luxor so maybe…