Author Archives: Bradley Setzler

UPDATE: A Comparison of Programming Languages in Economics

By: Bradley Setzler

Re-posted from: https://juliaeconomics.com/2014/06/16/update-a-comparison-of-programming-languages-in-economics/

Here, you can find the latest version of the paper and all of the codes used in A Comparison of Programming Languages in Economics, by S. Boragan Aruoba and Jesús Fernández-Villaverde.

This is the paper that convinced me to give Julia a try.


Bradley J. Setzler

Bootstrapping and Non-parametric p-values in Julia

By: Bradley Setzler

Re-posted from: http://juliaeconomics.com/2014/06/16/bootstrapping-and-hypothesis-tests-in-julia/

* The script to reproduce the results of this tutorial in Julia is located here.

Suppose that we wish to test to see if the the parameter estimates of \beta are statistically different from zero and if the estimate of \sigma^2 is different from one for the OLS parameters defined in a previous post. Suppose further that we do not know how to compute analytically the standard errors of the MLE parameter estimates; the MLE estimates were presented in the most recent post.

We decide to bootstrap by resampling cases in order to estimate the standard errors. This means that we treat the sample of N individuals as if it were a population from which we randomly draw B subsamples, each of size M<N. This produces a sample of MLEs of size B, that is, it provides an empirical approximation to the distribution of the MLEs. From the empirical approximation, we can compare the full-sample point MLE to the MLE distribution under the null hypothesis.

Unfortunately, Julia does not have built-in bootstrapping yet, but I found a way to do it in one line with the randperm function.


Wrapper Functions and the Likelihood of Interest

Now that we have a bootstrap index, we define the log-likelihood as a function of x and y, which are any subsets of X and Y, respectively.

function loglike(rho,y,x)
   beta = rho[1:4]
   sigma2 = exp(rho[5])
   residual = y-*(x,beta)
   dist = Normal(0, sigma2)
   contributions = logpdf(dist,residual)
   loglikelihood = sum(contributions)
   return -loglikelihood
end

Then, if we wish to evaluate loglike across various subsets of x and y, we use what is called a wrapper, which simply creates a copy of loglike that has already set the values of x and y. For example, the following function will evaluate loglike when x=X and y=Y:

function wrapLoglike(rho)
   return loglike(rho,Y,X)
end

We do this because we want the optimizer to find the optimal \rho, holding x and y fixed, but we also want to be able to adjust x and y to suit our purposes. The wrapper function allows the user to modify x and y, but tells the optimizer not to mess with them.

Tip: Use wrapper functions to manage arguments of your objective function that are not supposed to be accessed by the optimizer. Give the optimizer functions with only one argument — the parameters over which it is supposed to optimize your objective function.


Bootstrapping the OLS MLE

Now, we will use a random index, which I created from the randperm function, to take a random subset of individuals from the data, feed them into the function using a wrapper, then have the optimizer maximize the wrapper across the parameters. We repeat this process in a loop, so that we obtain the MLE for each subset. The following loop stores the MLE in each row of the matrix samples using 1,000 bootstrap samples of size one-half (M) of the available sample:

M=convert(Int,N/2)
B=1000
samples = zeros(B,5)
for b=1:B
   theIndex = randperm(N)[1:M]
   x = X[theIndex,:]
   y = Y[theIndex,:]
   function wrapLoglike(rho)
      return loglike(rho,y,x)
   end
   samples[b,:] = optimize(wrapLoglike,params0,method=:cg).minimum
end
samples[:,5] = exp(samples[:,5])

The resulting matrix contains 1,000 samples of the MLE. As always, we must remember to exponentiate the variance estimates, because they were stored in log-units.


Bootstrapping for Non-parametric p-values

Estimates of the standard errors of the MLE estimates can be obtained by computing the standard deviation of each column,

bootstrapSE = std(samples,1)

where the number 1 indicates that the standard deviation is taken over columns (instead of rows).

Standard errors like these can be used directly for hypothesis testing under parametric assumptions. For example, if we assume an MLE is normally distributed, then we reject the null hypothesis that the parameter is equal to some point if the parameter estimate differs from the point by at least 1.96 standard errors (using that the sample is large). However, we can make fewer assumptions using non-parametric p-values. The following code creates the distribution implied by the null hypothesis that \beta_0 =0, \beta_1=0, \beta_2=0, \beta_3=0, \sigma^2=1 by subtracting the mean from each distribution (thus imposing a zero mean) and then adding 1 to the distribution of \sigma^2 (thus imposing a mean of one); this is called nullDistribution.

nullDistribution = samples
pvalues = ones(5)
for i=1:5
   nullDistribution[:,i] = nullDistribution[:,i]-mean(nullDistribution[:,i])
end
nullDistribution[:,5] = 1 + nullDistribution[:,5]

The non-parametric p-value (for two-sided hypothesis testing) is the fraction of times that the absolute value of the MLE is greater than the absolute value of the null distribution.

pvalues = [mean(abs(MLE[i]).<abs(nullDistribution[:,i])) for i=1:5]

If we are interested in one-sided hypothesis testing, the following code would test the null hypothesis \beta_0 =0 against the alternative that \beta_0>0:

pvalues = [mean(MLE[i].<nullDistribution[:,i]) for i=1:5]

Conversely, the following code would test the null hypothesis \beta_0 =0 against the alternative that \beta_0<0:

pvalues = [mean(MLE[i].>nullDistribution[:,i]) for i=1:5]

Thus, two-sided testing uses the absolute value (abs), and one-sided testing only requires that we choose the right comparison operator (.> or .<).


Results

The resulting bootstrap standard errors are,

julia> bootstrapSE
1x5 Array{Float64,2}:
0.0316747 0.0312018 0.0327237 0.0306751 0.0208552

and the non-parametric two-sided p-value estimates are,

julia> pvalues
5-element Array{Any,1}:
0.001
0.0
0.0
0.727
0.305

Thus, we reject the null hypotheses for the first three parameters and conclude that \beta_0 \neq 0, \beta_1 \neq 0, \beta_2 \neq 0, but find insufficient evidence to reject the null hypotheses that \beta_3 =0 and \sigma^2=1. Comparing these conclusions to trueParams, we see that the non-parametric p-values performed well.


Bradley J. Setzler

 

Maximum Likelihood Estimation (MLE) in Julia: The OLS Example

By: Bradley Setzler

Re-posted from: http://juliaeconomics.com/2014/06/16/numerical-maximum-likelihood-the-ols-example/

* The script to reproduce the results of this tutorial in Julia is located here.

We continue working with OLS, using the model and data generating process presented in the previous post. Recall that,

\epsilon|X \sim \mathcal{N}\left(0,\sigma^2\right) \implies Y|X \sim \mathcal{N}\left( X\beta, \sigma^2 \right),

or, equivalently,

\left(Y-X\beta\right)|X \sim \mathcal{N}\left( 0, \sigma^2 \right),

which is a more convenient expression because the distribution does not depend on X, conditional on X. Denote the parameter vector by \rho \equiv [\beta, \sigma^2]. We will now see how to obtain the MLE estimate \hat\rho of \rho. By Bayes’ Rule and independence across individuals (i), the likelihood of \rho satisfies,

\mathcal{L}\left(\rho|Y,X\right) \propto \prod_{i=1}^N \phi\left( Y_i - X_i\beta, \sigma^2 \right|\rho);

where \phi is the normal probability distribution function (PDF). \hat\rho is the \arg\max of this expression, and we will show how to find it using a numerical search algorithm in Julia.


Computing the Log-Likelihood of OLS

First, we define the log-likelihood in Julia as follows (we are using the data X and Y generated in the previous post):

using Distributions
function loglike(rho)
   beta = rho[1:4]
   sigma2 = exp(rho[5])
   residual = Y-X*beta
   dist = Normal(0, sigma2)
   contributions = logpdf(dist,residual)
   loglikelihood = sum(contributions)
   return -loglikelihood
end

This code first collects beta (beta) and \sigma^2 (sigma2) from \rho (rho), uses \sigma^2 to initialize the appropriate normal distribution (dist), then evaluates the normal distribution at each of the residuals, Y_i-X_i\beta (residuals), returning the negative of the sum of the individual contributions to the log-likelihood (contributions).

Tip: Always remember to use the negative of the likelihood (or log-likelihood) that you wish to maximize, because the optimize command is a minimizer by default. Since the \arg\max is the same when maximizing some function and minimizing the negative of the function, we need the negative sign in order to maximize the likelihood with a minimizer.

The only confusing part of this function is that sigma2 is read from rho as an exponential. This is strictly a unit conversion — it means that \sigma^2 was stored in \rho in log-units, so we must exponentiate to return it to levels, that is, \rho is defined as \rho \equiv [\beta, \log\left(\sigma^2\right)]. This is a common approach in numerical optimization to solve a practical problem: the numerical optimizer tries out many possible values for \rho in its effort to search for the MLE, and it may naively try out a negative value of \sigma^2, which would be a logical contradiction that would crash the code. By storing \sigma^2 in \log units, we make it perfectly acceptable for the optimizer to try out a negative value of the parameter, because a parameter that is negative in log-units is non-negative in level-units.

Tip: Always restrict variances so that the optimizer cannot try negative values. We have used log-units to achieve this goal.


Maximizing the Likelihood of OLS

Now, we are ready to find the MLE, \hat\rho. To do this, we will use the Optim package, which I previously showed how to install.

using Optim
params0 = [.1,.2,.3,.4,.5]
optimum = optimize(loglike,params0,method=:cg)
MLE = optimum.minimum
MLE[5] = exp(MLE[5])
println(MLE)

This says to optimize the function loglike, starting from the point params0, which is chosen somewhat arbitrarily. Numerical search algorithms have to start somewhere, and params0 serves as an initial guess of the optimum. Of course, the best possible guess is trueParams, because the optimizer would have to do much less work, but in practice, we do not know the true parameters so the optimizer will have to do the work. Notice that, at the end, we have to exponentiate the sigma2 parameter because the optimizer will return it in log-units due to our exponentiation above.


Results

Using the same random seed as before, the algorithm returns the estimates,

julia> MLE
5-element Array{Float64,1}:
0.112163
0.476432
-0.290571
0.010831
1.01085

which are very close to trueParams and the true variance of \epsilon, 1. The Optim package has various optimizers from which to choose; we were using the Newton Conjugate-Gradient (cg) algorithm above. If we replace this with the Nelder-Mead algorithm (nelder_mead), we obtain the almost-identical estimates,

julia> MLE
5-element Array{Float64,1}:
0.112161
0.476435
-0.290572
0.0108343
1.01085

In future posts, we will see cases where the choice of optimizer makes a major difference in the quality of the numerical MLE as well as the speed of computation, but for this simple example, the choice of optimizer does not much matter.


Bradley J. Setzler