One thing I've wanted to visualize from the
hbg-crime.org dataset is what times of day
have the most and least crime, in which parts of the city. Using the
Gadfly plotting package with
Julia makes that easy.
First, pull down the current dataset:
$ wget http://hbg-crime.org/reports.csv
Then launch Julia, and import all the libraries we'll be using.
using DataFrames
using Datetime
using Gadfly
We'll read the reports into a DataFrame:
data = readtable("reports.csv")
Then we need to convert the time of the report into an hour of the
day, from 0 (midnight to 1:00 am) to 23 (11:00 pm to midnight):
formatter = "yyyy-MM-ddTHH:mm:ss"
function hourofday(d::String)
Datetime.hour(Datetime.datetime(formatter, d))
end
@vectorize_1arg String hourofday
@transform(data, Hour => hourofday(End))
We're just creating a quick function that takes a String
timestamp,
converts it to a DateTime
, then extracts the hour; after that, we
just vectorize that function and apply it to the "End" column from the
data.
The final data we need is just to group those results by Neighborhood
and Hour:
results = by(data, ["Neighborhood", "Hour"], nrow)
complete_cases!(results)
The complete_cases!
function just strips all of the non-classified
data out, as it tends to give Gadfly some problems. Speaking of which,
all that's left is to create the plot and draw it to an SVG file:
p = plot(results, y="x1", x="Hour", color="Neighborhood", Guide.XLabel("Hour of Day"), Guide.YLabel("Number of Reports"), Geom.bar(position=:dodge))
draw(SVG("results.svg", 6inch, 6inch), p)
The color=
attribute tells Gadfly to use the "Neighborhood" column
to group different columns.
Crime spikes everywhere after dark and decreases during the day, but
unsurprisingly Downtown sees a disproportionate spike around 1:00-2:00
am when the bars let out.
Full source is available on Github.