Yesterday we looked at Julia’s support for tabular data, which can be represented by a DataFrame
. The TimeSeries
package implements another common data type: time series. We’ll start by loading the TimeSeries
package, but we’ll also add the Quandl
package, which provides an interface to a rich source of time series data from Quandl.
julia> using TimeSeries julia> using Quandl
We’ll start by getting our hands on some data from Yahoo Finance. By default these data will be of type TimeArray
, although it is possible to explicitly request a DataFrame
instead,
julia> google = quandl("YAHOO/GOOGL"); # GOOGL at (default) daily intervals julia> typeof(google) TimeArray{Float64,2,DataType} (constructor with 1 method) julia> apple = quandl("YAHOO/AAPL", frequency = :weekly); # AAPL at weekly intervals julia> mmm = quandl("YAHOO/MMM", from = "2015-07-01"); # MMM starting at 2015-07-01 julia> rht = quandl("YAHOO/RHT", format = "DataFrame"); # As a DataFramejulia > typeof(rht) DataFrame (constructor with 11 methods)
Having a closer look at one of the TimeSeries
objects we find that it actually consists of multiple data series, each represented by a separate column. The colnames
attribute gives names for each of the component series, while the timestamp
and values
attributes provide access to the data themselves. We’ll see more convenient means for accessing those data in a moment.
julia> google 100x6 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-09-15 Open High Low Close Volume Adjusted Close 2015-04-24 | 580.05 584.7 568.35 573.66 4608400 573.66 2015-04-27 | 572.77 575.52 562.3 566.12 2403100 566.12 2015-04-28 | 564.32 567.83 560.96 564.37 1858900 564.37 2015-04-29 | 560.51 565.84 559.0 561.39 1681100 561.39 ⋮ 2015-09-10 | 643.9 654.9 641.7 651.08 1384600 651.08 2015-09-11 | 650.21 655.31 647.41 655.3 1736100 655.3 2015-09-14 | 655.63 655.92 649.5 652.47 1497100 652.47 2015-09-15 | 656.71 668.85 653.34 665.07 1761800 665.07 julia> names(google) 4-element Array{Symbol,1}: :timestamp :values :colnames :meta julia> google.colnames 6-element Array{UTF8String,1}: "Open" "High" "Low" "Close" "Volume" "Adjusted Close" julia> google.timestamp[1:5] 5-element Array{Date,1}: 2015-04-24 2015-04-27 2015-04-28 2015-04-29 2015-04-30 julia> google.values[1:5,:] 5x6 Array{Float64,2}: 580.05 584.7 568.35 573.66 4.6084e6 573.66 572.77 575.52 562.3 566.12 2.4031e6 566.12 564.32 567.83 560.96 564.37 1.8589e6 564.37 560.51 565.84 559.0 561.39 1.6811e6 561.39 558.56 561.11 546.72 548.77 2.362e6 548.77
The TimeArray type caters for a full range of indexing operations which allow you to slice and dice those data to your exacting requirements. to()
and from()
extract subsets of the data before or after a specified instant.
julia> google[1:5] 5x6 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-04-30 Open High Low Close Volume Adjusted Close 2015-04-24 | 580.05 584.7 568.35 573.66 4608400 573.66 2015-04-27 | 572.77 575.52 562.3 566.12 2403100 566.12 2015-04-28 | 564.32 567.83 560.96 564.37 1858900 564.37 2015-04-29 | 560.51 565.84 559.0 561.39 1681100 561.39 2015-04-30 | 558.56 561.11 546.72 548.77 2362000 548.77 julia> google[[Date(2015,8,7):Date(2015,8,12)]] 4x6 TimeArray{Float64,2,DataType} 2015-08-07 to 2015-08-12 Open High Low Close Volume Adjusted Close 2015-08-07 | 667.78 668.8 658.87 664.39 1374100 664.39 2015-08-10 | 667.09 671.62 660.23 663.14 1403900 663.14 2015-08-11 | 699.58 704.0 684.32 690.3 5264100 690.3 2015-08-12 | 694.49 696.0 680.51 691.47 2924900 691.47 julia> google["High","Low"] 100x2 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-09-15 High Low 2015-04-24 | 584.7 568.35 2015-04-27 | 575.52 562.3 2015-04-28 | 567.83 560.96 2015-04-29 | 565.84 559.0 ⋮ 2015-09-10 | 654.9 641.7 2015-09-11 | 655.31 647.41 2015-09-14 | 655.92 649.5 2015-09-15 | 668.85 653.34 julia> google["Close"][3:5] 3x1 TimeArray{Float64,1,DataType} 2015-04-28 to 2015-04-30 Close 2015-04-28 | 564.37 2015-04-29 | 561.39 2015-04-30 | 548.77
We can shift observations forward or backward in time using lag()
or lead()
.
julia> lag(google[1:5]) 4x6 TimeArray{Float64,2,DataType} 2015-04-27 to 2015-04-30 Open High Low Close Volume Adjusted Close 2015-04-27 | 580.05 584.7 568.35 573.66 4608400 573.66 2015-04-28 | 572.77 575.52 562.3 566.12 2403100 566.12 2015-04-29 | 564.32 567.83 560.96 564.37 1858900 564.37 2015-04-30 | 560.51 565.84 559.0 561.39 1681100 561.39 julia> lead(google[1:5], 3) 2x6 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-04-27 Open High Low Close Volume Adjusted Close 2015-04-24 | 560.51 565.84 559.0 561.39 1681100 561.39 2015-04-27 | 558.56 561.11 546.72 548.77 2362000 548.77
We can also calculate the percentage change between observations.
julia> percentchange(google["Close"], method = "log") 99x1 TimeArray{Float64,1,DataType} 2015-04-27 to 2015-09-15 Close 2015-04-27 | -0.0132 2015-04-28 | -0.0031 2015-04-29 | -0.0053 2015-04-30 | -0.0227 ⋮ 2015-09-10 | 0.0119 2015-09-11 | 0.0065 2015-09-14 | -0.0043 2015-09-15 | 0.0191
Well, that’s the core functionality in TimeSeries
. There are also methods for aggregation and moving window operations, as well as time series merging. You can check out some examples in the documentation as well as on github. Finally, watch the video below from JuliaCon 2014.
The post #MonthOfJulia Day 15: Time Series appeared first on Exegetic Analytics.