L1-Regularization

By: Sören Dobberschütz

Re-posted from: https://tensorflowjulia.blogspot.com/2018/08/l1-regularization.html

The next programming exercise in the machine learning crash course is about L1-regularization and sparsity. In principle, one can add a regularization term to the train_linear_classifier_model-function from the previous file:
    y=feature_columns*m + b
    loss = -reduce_mean(log(y+ϵ).*target_columns + log(1-y+ϵ).*(1-target_columns))  
    regularization = regularization_strength*reduce_sum(abs(m)) 

    optimizer_function=loss+regularization

Unfortunately, with this setup, all optimizers that are implemented in Tensorflow.jl still produce a non-sparse model. This is due to the fact that gradient descent algorithms basically never produce weights that are exactly zero. 

To obtain a sparse set of weights, special classes of optimizers need to be used. In the original exercise, the FTRL Optimizer (“Follow the Regularized Leader”) is used. This optimizer has been suggested in this paper and shows good results for driving weights to zero with a good model accuracy.

I am not that familiar with implementing optimizers myself in TensorFlow.jl. If you have any suggestions on how to do that, I would be very interested – just leave a comment!