The Numerical Algorithms Group (NAG) and the NAG Library Introduction
The Numerical Algorithms Group (NAG) provides expertise in numerical engineering, by delivering high-quality computational software,
consulting services and high performance computing services. For over four decades NAG have collaborated with world-leading researchers
in academia and industry to create powerful, reliable and flexible software which is relied on by tens of thousands of individual users,
as well as numerous independent software vendors. As a not-for-profit company,
NAG reinvests surpluses into the research and development of its products, services, staff and its collaborations.
The NAG Library is the world’s largest collection of robust, documented, tested and maintained numerical and statistical algorithms.
It is callable from many different programming languages and environments such as .NET, C++, Java®, MATLAB®, Python® and Visual Basic®.
Some users of the Julia language already call the NAG Library into their applications.
This blog demonstrates that the NAG Library is also callable directly from Julia, using one of the routines from the NAG Library as an example.
This blog post is also available on the NAG website.
Introduction to Correlation Matrices
Correlations provide helpful information when working with possibly mutually related data sources, such as the value of stocks, options, precious metals or others. In the case of three or more data sources this produces a matrix known as a “correlation matrix” which has a special structure, shown in Julia code below:
Correlation (or, to give its full name, “Pearson’s product-moment correlation coefficient”),
is a measure of the linear relationship between two random variables. One of the features
that makes it particularly useful is that it is invariant to location-scale
transformations of the variables, giving values on the interval [-1,1]: 1 being perfect
positive linear relationship, 0 being no linear relationship (though, it should be
emphasised, not necessarily independent), and -1 being a perfect negative relationship.
A correlation matrix is then the set of pairwise correlations between multiple random
variables, where the (i,j)th element is the correlation between the ith and the
jth variable. Mathematically, a correlation matrix must satsify the following
properties:
-
it must be symmetric,
-
the diagonal elements must be 1, and
-
it must be positive semidefinite, which is equivalent to all eigenvalues being non-negative.
Estimating Correlation Matrices
Of course, we never get to observe the correlation matrix directly, instead it must be
estimated from data. Given a matrix Y
, where each row is a joint observation of the
variables, the empirical correlation matrix can be computed using the
cor
function in Julia.
The resulting matrix should satisfy the three properties above, though some small
deviations may arise due to numerical error in extreme cases:
The structure of correlations produces a matrix that should be symmetric, have unit diagonal and be
positive semidefinite. In
applications however it can happen that approximate correlations lead
to accordingly approximate correlation matrices which are symmetric and
have unit diagonal, but also have nontrivial negative eigenvalues, meaning
it is not positive semidefinite.
Larger deviations from positive definiteness may arise when the correlations are computed
from incomplete data: for example, two stock symbols may not have price data on the exact
same days.
The available approaches to handling the situation of missing data can be broken down into two broad categories;
model the missing values and use that model to generate a full set of data,
hence guaranteeing a semi-definitive correlation matrix; or calculate an approximate correlation matrix from the available data and adjust it to ensure that it is definite.
The first of these approaches is beyond the scope of this article and we only deal with the second:
calculating an approximate correlation matrix from existing data, adjusting if required.
This approach has the advantage of being applicable when an approximate correlation matrix is indefinite for a reason other than missing data,
for example, due to rounding during its calculation.
In the example given below, based on real financial data, it was not possible to calculate a full correlation matrix due to missing data.
Pairwise correlations between the stocks were calculated using as much data as possible, which yielded a matrix with a negative eigenvalue.
It is useful therefore to correlate incomplete data and then clean the resulting
correlation matrix by finding the “nearest” correlation matrix, and then use the nearest correlation
matrix as a proxy for the original correlation matrix in workflows which require those properties.
Nearest Correlation Matrix
To fix the problem of indefiniteness in an approximation correlation matrix, one
can define an optimization problem which fixes it appropriately without reducing
the original value of the true correlations. One could for example simply find the closest matrix in the Frobenius norm to an input
matrix, but subject to the constraints that the output is a correlation matrix.
In practice this may not be an acceptable solution, so there are alternative
formulations which enforce important properties
of the approximate correlation matrix input. NAG has implemented many of these different
approaches and through a partnership with Julia Computing a Julia
interface has been developed to access these nearest correlation routines in the NAG Library. The next
section will first detail the routines and following this there is
an explanation of how they can be accessed through Julia.
NAG Routines
The first and simplest routine in the NAG Library for nearest correlation matrices
solves an optimization problem and accepts an arbitrary matrix as input, this is
g02aa
, which solves the following optimization problem
where X
is a valid correlation matrix. An algorithm devised in 1 uses alternating projection to solve this optimization problem,
which has the virtue of being simple to implement, but tends to converge slowly. Newton’s algorithm
can have rapid convergence and was investigated in 2 along with suitable preconditioners
for the linear systems it yields. This algorithm has been implemented by NAG
Some common situations may make this simple approach less
appealing however. For example suppose only a few data pairs have problematic missing data,
then only a small sub-block of the approximate correlation matrix needs to be fixed and
a likely-small change added to the rest of the matrix to preserve the properties it had
to begin with. Thus targeting of a sub-block can be formulated as follows:
this was solved by Higham et al. in 3 and is implemented by NAG as routine g02an. A similar routine “g02ap” which will be released with the NAG Library Mark 26 can keep arbitrary entries of a matrix fixed, rather than a full sub-block, which affords the user more flexibility.
If fixing a sub-block or fixed entries of the input matrix does not model well the additional information
a user may have, then one can also weight fixed rows and columns in the optimization, formulated as follows:
where W
is an input diagonal matrix of weights.
The NAG routine g02ab
incorporates this weighted formulation.
Alternatively if weighting whole rows and columns is not adequate one can optionally weight
only specified entries with the routine g02aj,
(note this is a very expensive formulation).
Finally one may know that the output should have a k-factor structure, which means
that the off-diagonal entries of the output can be low-rank approximated.
The NAG Library also incorporates this additional information through the
reformulation
This optimization problem is solved in the NAG routine g02ae and produces a nearest correlation matrix which respects k-factor structure specified by one.
In summary, the NAG Library contains the following nearest correlation routines:
g02aa
Basic problem using Newton’s method.g02ab
Incorporates row/column weights.g02aj
Incorporates entrywise weights.g02an
Incorporates fixed submatrix.g02ap
Incorporates fixed entries.g02ae
Incorporates k-factor structure specified by user.
As a final remark it is worth noting that g02ab,
g02aj,
and g02ap
can also optionally include a bound on the lowest eigenvalue. This might be desirable for example
to clamp the smallest eigenvalue to zero rather than allowing a possible roundoff to be negative. Bounding the smallest eigenvalue can also help in enforcing a positive definite
output, rather than only positive semidefinite.
What follows will detail a NAG partnership with Julia Computing which makes these routines much easier to utilize from a Julia environment.
Julia Interface
Julia’s ccall
function provides an easy to use, “no boilerplate,” interface for calling out to shared C and Fortran libraries, such as the NAG C Library and NAG Fortran Library. The ccall
function allows for calling into compiled shared libraries directly without the need to write any “glue” code, perform low-level code generation to create function interfaces or even recompile a binary. Full documentation on the use of ccall
can be found in the “Calling C and Fortran Code” section of the Julia Language documentation.
For this project, we developed a set of Julia wrappers to the above set of nearest correlation matrix routines. Development of these wrappers consisted of the following components:
- a Julia
NAGdemo
module containing all of the Julia code for the interface. The module includes all Juliatype
definitions,const
values,function
definitions andexport
definitions for the Julia/NAG interface. - a constant string (
LIBNAGC_NAG
) pointing to the location of the NAG shared library distributed with a NAG C Library installation. - a function
nag_licence_query
/nag_license_query
that tests execution of the simple NAG routinea00acc
as a method of performing proper license checkout for the NAG Library. - translation of entries in the nag_types.h header file included with the NAG C Library into corresponding Julia objects.
- definition of proper error handling routines that can capture, store and print any errors in Julia that occur within the NAG C Library routines.
- definition of Julia wrapper functions to NAG’s nearest correlation matrix routines.
Below we will discuss each of these items in turn.
Defining a NAGdemo
module
In this post we define a simple NAGdemo module, included in full at the bottom of the page. NAGdemo is a Julia module much like any other Julia module, providing a standard mechanism for the organization of Julia code, as well as namespace resolution for all functions defined as part of the module when called from external Julia code. The list of exported functions in the module enumerate those functions that do not need to be called with full namespace resolution when the NAGdemo
module is loaded into a Julia session via a using NAGdemo
statement.
Defining a constant for the location of the NAG shared library
The constant string LIBNAGC_NAG
contains the location of the NAG shared library which is a required argument for all uses of ccall
that will call into the NAG C Library from Julia.
Testing access to a NAG licence
The nag_licence_query
\ nag_license_query
function provides our first, and simplest, example of calling into the NAG C Library from Julia using ccall
.
This particular use of ccall
takes 3 arguments:
- a tuple
(:a00acc, LIBNAGC_NAG)
containing a symbol for the name of the exported function to be accessed,:a00acc
, and theconst
string,LIBNAGC_NAG
, pointing to the location of the NAG shared library. The NAG routine:a00acc
outputs information about the version of the NAG C Library in use. - the output type,
Cint
, of the NAG C routine being called. In the case of:a00acc
, the output type is a C integer. The routine will return0
if a licence is acquired and the function executes successfully, or1
if the routine was unsuccessful. - a tuple of the types for all input arguments, In this particular case,
:a00acc
does not take any input arguments, so this tuple argument is empty.
Translating the NAG header file
The header file nag_types.h is a large file containing many C enums, typedefs and pre-processor macros. As the contents of this header file are widely used throughout the NAG C Library when referring to the input and output types of most functions, the definitions included in this header, or at least the atomic types and values that are represented by these definitions, need to be present in Julia when wrapping routines via ccall
.
While there are packages within the Julia ecosystem, such as Clang.jl, that can assist in automation of wrapping C libraries in certain situations, a manual translation of the necessary portions of nag_types.h file into corresponding Julia code is not difficult, and is sufficient for this exercise. Each of the enum entries within nag_types.h are fundamentally integer constant values and this Nearest Correlation Matrix demonstration only requires one of those enum values:
NagInt
is an alias for an appropriately sized integer value on Windows or Unix platforms, as needed for use of the NAG C Library in each operating system environment for appropriate 32-bit or 64-bit flavours as necessary.
Handling errors from NAG routines
The following block of Julia code defines a NagError
type that inherits from Julia’s abstract Exception
type, as well as various functions for handling and storing any errors returned by any NAG routine as Julia exceptions:
The call to reset_nag_error()
on load of the NAGdemo
module sets our custom error_handler
routine to collect and handle any errors thrown by NAG routines, as well as throw those errors as Julia exceptions. The last_nag_error()
function resets the stored list of NagError
values.
Creating Julia wrappers
For each of the NAG routines for which a Julia function API is to be created we start with inspection of the appropriate NAG function signature to determine the datatypes of all arguments.
With a ccall
invocation defined we can subsequently make a determination of which arguments to a NAG routine can be considered as purely input arguments, which arguments could be considered as purely outputs and which arguments could be considered both input and output arguments. This classification can help in the design of a user-facing Julia API.
Pure input arguments are those arguments which are passed to the NAG routine but are not modified in any way. Pure output arguments are those arguments for which memory is allocated prior to calling the NAG routine, but for which the initial values in that memory location have no importance the memory is only needed for returning an output value. Arguments that are both inputs and outputs are those arguments for which the initial value is both used by the NAG routine and modified before being returned.
Understanding the context of each argument in the NAG C API also helps to determine which of the C arguments might be able to have default values assigned via Julia keyword arguments, which of the C arguments might be able to infer their values from other input arguments at the Julia level and which of the C arguments might not need to be present at all within a user-facing Julia API.
For each of the nearest correlation routines considered in this exercise, the routine returns an output of type Void
.
For the g02aac
routine, its C API is the following:
Constructing a ccall
command for this API signature requires mapping the type of each input argument from C type to the corresponding Julia data type. The following table provides that mapping:
argument | C data type | Julia data type |
---|---|---|
order | Nag_OrderType | NagInt |
g | double[] | Ptr{Float64} |
pdg | Integer | NagInt |
n | Integer | NagInt |
errtol | double | Float64 |
maxits | Integer | NagInt |
maxit | Integer | NagInt |
x | double[] | Ptr{Float64} |
pdx | Integer | NagInt |
iter | Integer* | Ptr{NagInt} |
feval | Integer* | Ptr{NagInt} |
nrmgrd | double* | Ptr{Float64} |
fail | NagError* | Ptr{Void} |
With each of these variables defined in the current Julia scope a pure ccall
into NAG’s g02aac
routine would look like the following block of code:
As stated previously, for the design of our initial Julia API, we need to make a few decisions about which of the arguments to ccall
should be Julia inputs and which values should be returned as outputs.
In this proof of concept project, we made the following decisions for a first level Julia wrapper to g02aac
named nag_nearest_correlation!
:
- the scalar arguments being passed to
ccall
by value (e.g.order
,pdg
,n
,errtol
,maxits
,maxit
,pdx
) will be arguments tonag_nearest_correlation!
- all array arguments (e.g.
g
andx
) will be arguments tonag_nearest_correlation!
. Argumentg
is actually a pure input argument, as its value is not modified, while argumentx
could be considered a pure output argument as its memory is only used to store the output correlation matrix. We will return to this point below. - the scalar arguments being passed by reference into the
ccall
(e.g.iter
,feval
,nrmgrd
) are being used to track and report internal behaviour of the NAG routine. These pointers to scalar values will be created within the Julia routine and their values returned as Julia integer or floating point values as needed.
The initial implementation of the nag_nearest_correlation!
Julia function subsequently took the following form:
Using the same data defined above, we can call this routine as follows:
Now if we decide that we would like to provide an interface that does not require one to manage the memory allocation for x
(and also only provides the nearest correlation matrix x
as output), we can define the following method nag_nearest_correlation
that allocates the memory for x
internally and then calls nag_nearest_correlation!
.
But the values we have assigned for errtol
, maxits
and maxit
are essentially default values for these arguments. Consequently, we can assign these values in keyword arguments in our nag_nearest_correlation
function signature:
Furthermore, the values for arguments pdg
, n
and pdx
can be deduced from the size of the input array g
and can also be removed from the function signature for nag_nearest_correlation
as follows:
Calling this method now involves only the following code:
Having seen the entire process for construction of a user-friendly wrapper for the nag_nearest_correlation
function below are the full contents of our NAGdemo
module:
Summary
By using Julia’s features for calling external libraries we have provided easy interfaces into the NAG nearest correlation matrix routines. This example shows that the numerical computing capabilities of Julia can be easily extended to incorporate pre-existing industry grade libraries. For users of Julia, NAG integration allows the possibility of including widely utilized, specialized libraries developed outside of the Julia ecosystem, and for users of the NAG Library it means their existing workflows depending on NAG need not be disrupted by a switch to Julia.
In summary: the NAG Library, which is the worlds’ largest collection of robust and commercially maintained numerical and statistical routines and Julia, which is
a high-productivity, high-performance programming language is a winning combination.
If you are interested in a more extensive integration of Julia and the NAG Library then please contact Julia Computing at info@juliacomputing.com and NAG at infodesk@nag.com.
Trademark Usage
Java® is a registered trademark of Oracle Corporation
MATLAB® is a registered trademark of The MathWorks, Inc.
Python® is a registered trademark of the Python Software Foundation
Visual Basic® is a registered trademark of Microsoft Corporation
References
-
N. J. Higham. “Computing the nearest correlation matrix – A problem from finance”. IMA J. Numer. Anal., 22(3):329–343, 2002. doi: 10.1093/imanum/22.3.329. ↩
-
R. Borsdorf and N. J. Higham. “A preconditioned (Newton) algorithm for the nearest correlation matrix”. IMA J. of Numer. Anal., 30(1):94–107, 2010. doi: 10.1093/imanum/drn085. ↩
-
N. J. Higham, N. Strabić, and V. Šego, “Restoring definiteness via shrinking, with an application to correlation matrices with a fixed block”. SIAM Rev., 58(2), 245–263, 2016. doi:10.1137/140996112. ↩