Recently I've been digging into Julia for
technical and statistical computing. It's still in its early stages
and things break frequently, but I find it a lot more amenable to my
style of thinking than the more established technical languages like
Matlab/Octave or R.
I wanted to take the full hbg-crime.org
dataset and see which neighborhoods experienced crimes at which hours.
The full CSV set has timestamps
and classification by neighborhood already, so we just need to convert
the timestamp into an hour of the day (0 through 23) and then group
them by neighborhood.
To read and manipulate the data, we're going to use
DataFrames — it gives
us the NA
type for missing values (not uncommon in the dataset while
I'm still tuning the geolocation) and nice functions for dealing with
columns directly. For the timestamps, we're going to use
Datetime.
using DataFrames
using Datetime
Let's read the data in:
julia> data = readtable("reports.csv")
DataFrame with 868 rows, 7 columns
Columns:
Start 306 non-null values
End 868 non-null values
Description 868 non-null values
Address 868 non-null values
Lat 841 non-null values
Lon 841 non-null values
Neighborhood 802 non-null values
Then let's add a function that will map a column of timestamps into a
column of integer hours:
formatter = "yyyy-MM-ddTHH:mm:ss"
function eachHour(m)
map(h -> Datetime.hour(Datetime.datetime(formatter, h)), m)
end
To create a new table with a column called "Hour" with the results of
that function in it, we're going to use the @transform
macro:
withHours = @transform(data, Hour => eachHour(End))
Now to group the results down we're going to use the by
function
from DataFrames'
Split-Apply-Combine:
results = by(withHours, ["Neighborhood", "Hour"], nrow)
We're just passing by
a source, a list of columns we want to split
on, and a function to apply to their combination (nrow
, which just
counts them). And the results are just what we wanted:
julia> results
119x3 DataFrame:
Neighborhood hr x1
[1,] NA 0 4
[2,] "allison-hill" 0 18
[3,] "downtown" 0 5
[4,] "midtown" 0 10
[5,] "uptown" 0 13
[6,] NA 1 3
[7,] "allison-hill" 1 20
[8,] "downtown" 1 9
[9,] "midtown" 1 1
[10,] "uptown" 1 5
[11,] NA 2 4
[12,] "allison-hill" 2 24
[13,] "downtown" 2 7
[14,] "midtown" 2 6
[15,] "uptown" 2 4
[16,] NA 3 2
[17,] "allison-hill" 3 13
[18,] "downtown" 3 1
[19,] "midtown" 3 6
[20,] "uptown" 3 5
:
[100,] "uptown" 19 7
[101,] "allison-hill" 20 22
[102,] "downtown" 20 1
[103,] "midtown" 20 1
[104,] "uptown" 20 7
[105,] NA 21 3
[106,] "allison-hill" 21 20
[107,] "downtown" 21 2
[108,] "midtown" 21 1
[109,] "uptown" 21 6
[110,] NA 22 5
[111,] "allison-hill" 22 12
[112,] "downtown" 22 4
[113,] "midtown" 22 6
[114,] "uptown" 22 7
[115,] NA 23 1
[116,] "allison-hill" 23 17
[117,] "downtown" 23 2
[118,] "midtown" 23 5
[119,] "uptown" 23 8