Note: This post discusses cutting edge cryptographic techniques. It is intended to give a view into research at Julia Computing. Do not use any examples in this blog post for production applications. Always consult a professional cryptographer before using cryptography.
TL;DR: click here to go directly to the package that implements the magic and here for the code that we’ll be talking about in this blog post.
Introduction
Suppose you have just developed a spiffy new machine learning model (using Flux.jl of course) and now want to start deploying
it for your users. How do you go about doing that? Probably the simplest thing would
be to just ship your model to your users and let them run it locally on their data.
However there are a number of problems with this approach:
- ML models are large and the user’s device may not have enough storage or computation to actually run the model.
- ML models are often updated frequently and you may not want to send the large model across
the network that often. - Developing ML models takes a lot of time and computational resources, which you may want to recover by charging your users for making use of your model.
The solution that usually comes next is expose the model as an API on the cloud. These machine learning-as-a-service offerings have sprung up in mass over the past few years, with every major cloud platform offering such services to the enterprising developer. The dilemma for potential users of such products is obvious: User data is now processed on some remote server that may not necessarily be trustworthy. This has clear ethical and legal ramifications that limit the areas where such solutions can be effective. In regulated industries, such as medicine or finance in particular,
sending patient or financial data to third parties for processing is often a no-go. Can we do better?
As it turns out we can! Recent breakthroughs in cryptography have made it practical to perform computation on data without ever decrypting it. In our example, the user would send encrypted data (e.g. images) to the cloud API, which would run the machine learning model and then return
the encrypted answer. Nowhere was the user data decrypted and in particular the cloud provider does not have access to either the orignal image nor is it able to decrypt the prediction it computed. How is this possible? Let’s find out by building a machine learning service for handwriting recognition of encrypted images (from the MNIST dataset).
HE generally
The ability to compute on encrypted data is generally referred to as “secure computation” and is a fairly large area of research, with many different cryptographic approaches and techniques for a plethora of different application scenarios. For our example, we will be focusing on a technique known as “homomorphic encryption”. In a homomorphic encryption system, we generally have the following operations available:
pub_key, eval_key, priv_key = keygen()
encrypted = encrypt(pub_key, plaintext)
decrypted = decrypt(priv_key, encrypted)
encrypted′ = eval(eval_key, f, encrypted)
the first three are fairly straightforward and should be familiar to anyone who has used any sort of asymmetric cryptography before (as you did when you connected to this blog post via TLS). The last operation is where the magic is. It evaluates some function f
on the encryption and returns another encrypted value corresponding to the result of evaluting f
on the encrypted value. It is this property that gives homomorphic computation its name. Evaluation commutes with the encryption operation:
f(decrypt(priv_key, encrypted)) == decrypt(priv_key, eval(eval_key, f, encrypted))
(Equivalently it is possible to evaluate arbitrary homomorphisms f
on the encrypted value).
Which functions f
are supported depends on the cryptographic schemes and depending on the supported operations. If only one f
is supported (e.g. f = +
), we call an encryption scheme “partially homomorphic”. If f
can be any complete set of gates out of which we can build arbitrary circuits, we call the computation “somewhat homomorphic” if the size of the circuit is limited or “fully homomorphic” if the size of the circuit is unlimited. It is often possible to turn “somehwhat” into “fully” homomorphic encryption through a technique known as bootstrapping though that is beyond the scope of the current blog post. Fully homomorphic encryption is a fairly recent discovery, with the first viable (though not practical) scheme published by Craig Gentry in 2009. There are several more recent (and practical) FHE schemes. More importantly, there are software packages that implement them efficiently. The two most commonly used ones are probably Microsoft SEAL and PALISADE. In addition, I recently open sourced a pure julia implementation of these algorithms. For our purposes we will be using the CKKS encryption as implemented in the latter.
CKKS High Level
CKKS (named after Cheon-Kim-Kim-Song, the authors of the 2016 paper that proposed it) is a homomorphic encryption scheme that allows homomorphic evaluation of the following primitive operations:
- Elementwise addition of length
n
vectors of complex numbers - Elementwise multiplication of length
n
complex vectors - Rotation (in the
circshift
sense) of elements in the vector - Complex conjugation of vector elements
The parameter n
here depends on the desired security and precision and is generally relatively high. For our example it will be 4096 (higher numbers are more secure, but also more expensive, scaling as roughly n log n
).
Additionally, computations using CKKS are noisy. As a result, computational results are
only approximate and care must be taken to ensure that results are evaluated with sufficient precision to not affect the correctness of a result.
That said, these restrictions are not all that unusual to developers of machine learning packages. Special purpose accelerators like GPUs also generally operate on vectors of numbers. Likewise, for many developers floating point numbers can sometimes feel noisy due to effects of algorithms selection, multithreading etc. (I want to emphasize that there is a crucial difference here in that floating point arithmetic is inherently deterministic, even if it sometimes doesn’t appear that way due to complexity of the implementation, while the CKKS primitives really are noisy, but perhaps this allows users to appreciate that noisyness is not as scary as it might at first appear).
With that in mind, let’s see how we can perform these operations in Julia (note: these are highly insecure parameter choices, the purpose of these operations is to illustrate
usage of the library at the REPL)
julia> using ToyFHE
# Let's play with 8 element vectors
julia> N = 8;
# Choose some parameters - we'll talk about it later
julia> ℛ = NegacyclicRing(2N, (40, 40, 40))
ℤ₁₃₂₉₂₂₇₉₉₇₅₆₈₀₈₁₄₅₇₄₀₂₇₀₁₂₀₇₁₀₄₂₄₈₂₅₇/(x¹⁶ + 1)
# We'll use CKKS
julia> params = CKKSParams(ℛ)
CKKS parameters
# We need to pick a scaling factor for a numbers - again we'll talk about that later
julia> Tscale = FixedRational{2^40}
FixedRational{1099511627776,T} where T
# Let's start with a plain Vector of zeros
julia> plain = CKKSEncoding{Tscale}(zero(ℛ))
8-element CKKSEncoding{FixedRational{1099511627776,T} where T} with indices 0:7:
0.0 + 0.0im
0.0 + 0.0im
0.0 + 0.0im
0.0 + 0.0im
0.0 + 0.0im
0.0 + 0.0im
0.0 + 0.0im
0.0 + 0.0im
# Ok, we're ready to get started, but first we'll need some keys
julia> kp = keygen(params)
CKKS key pair
julia> kp.priv
CKKS private key
julia> kp.pub
CKKS public key
# Alright, let's encrypt some things:
julia> foreach(i->plain[i] = i+1, 0:7); plain
8-element CKKSEncoding{FixedRational{1099511627776,T} where T} with indices 0:7:
1.0 + 0.0im
2.0 + 0.0im
3.0 + 0.0im
4.0 + 0.0im
5.0 + 0.0im
6.0 + 0.0im
7.0 + 0.0im
8.0 + 0.0im
julia> c = encrypt(kp.pub, plain)
CKKS ciphertext (length 2, encoding CKKSEncoding{FixedRational{1099511627776,T} where T})
# And decrypt it again
julia> decrypt(kp.priv, c)
8-element CKKSEncoding{FixedRational{1099511627776,T} where T} with indices 0:7:
0.9999999999995506 - 2.7335193113350057e-16im
1.9999999999989408 - 3.885780586188048e-16im
3.000000000000205 + 1.6772825551165524e-16im
4.000000000000538 - 3.885780586188048e-16im
4.999999999998865 + 8.382500573679615e-17im
6.000000000000185 + 4.996003610813204e-16im
7.000000000001043 - 2.0024593503998215e-16im
8.000000000000673 + 4.996003610813204e-16im
# Note that we had some noise. Let's go through all the primitive operations we'll need:
julia> decrypt(kp.priv, c+c)
8-element CKKSEncoding{FixedRational{1099511627776,T} where T} with indices 0:7:
1.9999999999991012 - 5.467038622670011e-16im
3.9999999999978817 - 7.771561172376096e-16im
6.00000000000041 + 3.354565110233105e-16im
8.000000000001076 - 7.771561172376096e-16im
9.99999999999773 + 1.676500114735923e-16im
12.00000000000037 + 9.992007221626409e-16im
14.000000000002085 - 4.004918700799643e-16im
16.000000000001346 + 9.992007221626409e-16im
julia> csq = c*c
CKKS ciphertext (length 3, encoding CKKSEncoding{FixedRational{1208925819614629174706176,T} where T})
julia> decrypt(kp.priv, csq)
8-element CKKSEncoding{FixedRational{1208925819614629174706176,T} where T} with indices 0:7:
0.9999999999991012 - 2.350516767363621e-15im
3.9999999999957616 - 5.773159728050814e-15im
9.000000000001226 - 2.534464540987068e-15im
16.000000000004306 - 2.220446049250313e-15im
24.99999999998865 + 2.0903753311370056e-15im
36.00000000000222 + 4.884981308350689e-15im
49.000000000014595 + 1.0182491378134327e-15im
64.00000000001077 + 4.884981308350689e-15im
That was easy! The eagle eyed reader may have noticed that csq
looks a bit different from the previous ciphertext. In particular,
it is a “length 3” ciphertext and the scale is much larger. What these
are and what they do is a bit too complicated for this point in the blog post,
but suffice it to say, we want to get these back down before we do further
computation, or we’ll run out of “space” in the ciphertext. Luckily, there
is a way to do these for each of the two aspects that grew:
# To get back down to length 2, we need to `keyswitch` (aka
# relinerarize), which requires an evaluation key. Generating
# this requires the private key. In a real application we would
# have generated this up front and sent it along with the encrypted
# data, but since we have the private key, we can just do it now.
julia> ek = keygen(EvalMultKey, kp.priv)
CKKS multiplication key
julia> csq_length2 = keyswitch(ek, csq)
CKKS ciphertext (length 2, encoding CKKSEncoding{FixedRational{1208925819614629174706176,T} where T})
# Getting the scale back down is done using modswitching.
julia> csq_smaller = modswitch(csq_length2)
CKKS ciphertext (length 2, encoding CKKSEncoding{FixedRational{1.099511626783e12,T} where T})
# And it still decrypts correctly (though note we've lost some precision)
julia> decrypt(kp.priv, csq_smaller)
8-element CKKSEncoding{FixedRational{1.099511626783e12,T} where T} with indices 0:7:
0.9999999999802469 - 5.005163520332181e-11im
3.9999999999957723 - 1.0468514951188039e-11im
8.999999999998249 - 4.7588542623100616e-12im
16.000000000023014 - 1.0413447889166631e-11im
24.999999999955193 - 6.187833723406491e-12im
36.000000000002345 + 1.860733715346631e-13im
49.00000000001647 - 1.442396043149794e-12im
63.999999999988695 - 1.0722489563648028e-10im
Additionally, modswitching (short for modulus switching) reduces the size
of the ciphertext modulus, so we can’t just keep doing this indefinitely.
(In the terminology from above, we’re using a SHE scheme):
julia> ℛ # Remember the ring we initially created
ℤ₁₃₂₉₂₂₇₉₉₇₅₆₈₀₈₁₄₅₇₄₀₂₇₀₁₂₀₇₁₀₄₂₄₈₂₅₇/(x¹⁶ + 1)
julia> ToyFHE.ring(csq_smaller) # It shrunk!
ℤ₁₂₀₈₉₂₅₈₂₀₁₄₄₅₉₃₇₇₉₃₃₁₅₅₃/(x¹⁶ + 1)
There’s one last operation we’ll need: rotations. Like keyswitching
above, this requires an evaluation key (also called a galois key):
julia> gk = keygen(GaloisKey, kp.priv; steps=2)
CKKS galois key (element 25)
julia> decrypt(circshift(c, gk))
decrypt(kp, circshift(c, gk))
8-element CKKSEncoding{FixedRational{1099511627776,T} where T} with indices 0:7:
7.000000000001042 + 5.68459112632516e-16im
8.000000000000673 + 5.551115123125783e-17im
0.999999999999551 - 2.308655353580721e-16im
1.9999999999989408 + 2.7755575615628914e-16im
3.000000000000205 - 6.009767921608429e-16im
4.000000000000538 + 5.551115123125783e-17im
4.999999999998865 + 4.133860996136768e-17im
6.000000000000185 - 1.6653345369377348e-16im
# And let's compare to doing the same on the plaintext
julia> circshift(plain, 2)
8-element OffsetArray(::Array{Complex{Float64},1}, 0:7) with eltype Complex{Float64} with indices 0:7:
7.0 + 0.0im
8.0 + 0.0im
1.0 + 0.0im
2.0 + 0.0im
3.0 + 0.0im
4.0 + 0.0im
5.0 + 0.0im
6.0 + 0.0im
Alright, we’ve covered the basic usage of the HE library.
Before we get started thinking about how to perform neural
network inference using these primitives, let’s look at and
train the neural network we’ll be using.
The machine learning model
If you’re not familiar with machine learning, or the Flux.jl
machine learning library, I’d recommend a quick detour to
the Flux.jl documentation or our free Introduction to Machine Learning course on JuliaAcademy, since we’ll only be discussing the changes for running the model on encrypted data.
Our starting point is the convolutional neural network example in the Flux model zoo. We’ll keep the training loop, data preparation, etc. the same and just tweak the model slightly. The model we’ll use is:
function reshape_and_vcat(x)
let y=reshape(x, 64, 4, size(x, 4))
vcat((y[:,i,:] for i=axes(y,2))...)
end
end
model = Chain(
# First convolution, operating upon a 28x28 image
Conv((7, 7), 1=>4, stride=(3,3), x->x.^2),
reshape_and_vcat,
Dense(256, 64, x->x.^2),
Dense(64, 10),
)
This is essentially the same model as the one used in the paper “Secure Outsourced Matrix Computation and Application to Neural Networks”, which uses the same cryptographic scheme for the same demo, with two differences: 1) They also encrypt the model, which we neglect here for simplicity and 2) We have bias vectors after every layer (which is what Flux will do by default), which I’m not sure was the case for the model evaluated in the paper. Perhaps because of 2), the test set accuracy of our model is slightly higher (98.6% vs 98.1%), but this may of course also just come down to hyperparameter differences.
An unusual feature (for those coming from a machine learning background) are the x.^2
activation functions. More common choices here would be something like tanh
or relu
or something fancier than that. However, while those functions (relu
in particular)
are cheap to evaluate on plaintext values, they would be quite expensive to evaluated
encryptedly (we’d basically evaluate a polynomial approximation). Luckily x.^2
works
fine our our purposes.
The rest of the training loop is basically the same. The softmax
was removed from the model
in favor of a logitcrossentropy
loss function (though of course we could have kept it and
just evaluated the softmax after decryption on the client). The full code to train
this model is on GitHub and completes in a few minutes on any recent GPU.
Performing the operations efficiently
Alright, now that we know what we need to do, let’s take stock of what operations we need to be able to do:
- Convolutions
- Elementwise Squaring
- Matrix Multiply
Squaring is trivial, we already saw that above, so let’s tackle the other two in order.
Throughout we’ll be assuming that we’re working with a batch size of 64 (you may note
that the model parameters and batch size were strategically chosen to take good advantage
of a 4096 element vector which is what we get from realistic parameter choices).
Convolution
Let us recall how convolution works. We take some window (in our case 7×7) of the original input array and for each element in the window multiply by an element of the convolution mask.
Then we move the window over some (in our case, the stride is 3, so we move over by 3 elements)
and repeat the process (with the same convolution mask). This process is illustrated in the following animation (source) for a 3×3 convolution with stride (2, 2)
(the blue array is the input, the green array the output):
Additionally, we have convolutions into 4 different “channels” (all this means is that we repeat the convolution 3 more times with different convolution masks).
Alright, so now that we know what we’re doing let’s figure out how to do it. We’re in luck in that the convolution is the first thing in our model. As a result, we can do some preprocessing on the client before encrypting the data (without needing the model weights) to save us some work. In particular, we’ll do the following:
- Precompute each convolution window (i.e. 7×7 extraction from the original images), giving us 64 7×7 matrices per input image (note for 7×7 windows with stride 2 there are 8×8 convolution windows to evaluate per 28×28 input image)
- Collect the same position in each window into one vector, i.e. we’ll have a 64-element vector for each image or a 64×64 element vector for a batch of 64 (i.e. a total of 49 64×64 matrices)
- Encrypt that
The convolution then simply becomes scalar multiplication of the whole matrix with the appropriate mask element, and by summing all 49 elements later, we the result of the convolution. An implementation of this strategy (on the plaintext) may look like:
function public_preprocess(batch)
ka = OffsetArray(0:7, 0:7)
# Create feature extracted matrix
I = [[batch[i′*3 .+ (1:7), j′*3 .+ (1:7), 1, k] for i′=ka, j′=ka] for k = 1:64]
# Reshape into the ciphertext
Iᵢⱼ = [[I[k][l...][i,j] for k=1:64, l=product(ka, ka)] for i=1:7, j=1:7]
end
Iᵢⱼ = public_preprocess(batch)
# Evaluate the convolution
weights = model.layers[1].weight
conv_weights = reverse(reverse(weights, dims=1), dims=2)
conved = [sum(Iᵢⱼ[i,j]*conv_weights[i,j,1,channel] for i=1:7, j=1:7) for channel = 1:4]
conved = map(((x,b),)->x .+ b, zip(conved, model.layers[1].bias))
which (modulo a reordering of the dimension) gives the same answer as, but using
operations
model.layers[1](batch)
Adding the encryption operations, we have:
Iᵢⱼ = public_preprocess(batch)
C_Iᵢⱼ = map(Iᵢⱼ) do Iij
plain = CKKSEncoding{Tscale}(zero(plaintext_space(ckks_params)))
plain .= OffsetArray(vec(Iij), 0:(N÷2-1))
encrypt(kp, plain)
end
weights = model.layers[1].weight
conv_weights = reverse(reverse(weights, dims=1), dims=2)
conved3 = [sum(C_Iᵢⱼ[i,j]*conv_weights[i,j,1,channel] for i=1:7, j=1:7) for channel = 1:4]
conved2 = map(((x,b),)->x .+ b, zip(conved3, model.layers[1].bias))
conved1 = map(ToyFHE.modswitch, conved2)
Note that a keyswitch isn’t required because the weights are public, so we didn’t expand the length of the ciphertext.
Matrix multiply
Moving on to matrix multiply, we take advantage of the fact that we can rotate elements in the vector to effect a re-ordering of the multiplication indices. In particular, consider a row-major ordering of matrix elements in the vector. Then, if we shift the vector by a multiple of the row-size, we get the effect of rotating the columns, which is a sufficient primitive for implementing matrix multiply (of square matrices at least). Let’s try it:
function matmul_square_reordered(weights, x)
sum(1:size(weights, 1)) do k
# We rotate the columns of the LHS and take the diagonal
weight_diag = diag(circshift(weights, (0,(k-1))))
# We rotate the rows of the RHS
x_rotated = circshift(x, (k-1,0))
# We do an elementwise, broadcast multiply
weight_diag .* x_rotated
end
end
function matmul_reorderd(weights, x)
sum(partition(1:256, 64)) do range
matmul_square_reordered(weights[:, range], x[range, :])
end
end
fc1_weights = model.layers[3].W
x = rand(Float64, 256, 64)
@assert (fc1_weights*x) ≈ matmul_reorderd(fc1_weights, x)
Of course for general matrix multiply, we may want something fancier, but
it’ll do for now.
Making it nicer
At this point, we’ve managed to get everything together and indeed it
works. For reference, here it is in all its glory (omitting setup for parameter
selection and the like):
ek = keygen(EvalMultKey, kp.priv)
gk = keygen(GaloisKey, kp.priv; steps=64)
Iᵢⱼ = public_preprocess(batch)
C_Iᵢⱼ = map(Iᵢⱼ) do Iij
plain = CKKSEncoding{Tscale}(zero(plaintext_space(ckks_params)))
plain .= OffsetArray(vec(Iij), 0:(N÷2-1))
encrypt(kp, plain)
end
weights = model.layers[1].weight
conv_weights = reverse(reverse(weights, dims=1), dims=2)
conved3 = [sum(C_Iᵢⱼ[i,j]*conv_weights[i,j,1,channel] for i=1:7, j=1:7) for channel = 1:4]
conved2 = map(((x,b),)->x .+ b, zip(conved3, model.layers[1].bias))
conved1 = map(ToyFHE.modswitch, conved2)
Csqed1 = map(x->x*x, conved1)
Csqed1 = map(x->keyswitch(ek, x), Csqed1)
Csqed1 = map(ToyFHE.modswitch, Csqed1)
function encrypted_matmul(gk, weights, x::ToyFHE.CipherText)
result = repeat(diag(weights), inner=64).*x
rotated = x
for k = 2:64
rotated = ToyFHE.rotate(gk, rotated)
result += repeat(diag(circshift(weights, (0,(k-1)))), inner=64) .* rotated
end
result
end
fq1_weights = model.layers[3].W
Cfq1 = sum(enumerate(partition(1:256, 64))) do (i,range)
encrypted_matmul(gk, fq1_weights[:, range], Csqed1[i])
end
Cfq1 = Cfq1 .+ OffsetArray(repeat(model.layers[3].b, inner=64), 0:4095)
Cfq1 = modswitch(Cfq1)
Csqed2 = Cfq1*Cfq1
Csqed2 = keyswitch(ek, Csqed2)
Csqed2 = modswitch(Csqed2)
function naive_rectangular_matmul(gk, weights, x)
@assert size(weights, 1) < size(weights, 2)
weights = vcat(weights, zeros(eltype(weights), size(weights, 2)-size(weights, 1), size(weights, 2)))
encrypted_matmul(gk, weights, x)
end
fq2_weights = model.layers[4].W
Cresult = naive_rectangular_matmul(gk, fq2_weights, Csqed2)
Cresult = Cresult .+ OffsetArray(repeat(vcat(model.layers[4].b, zeros(54)), inner=64), 0:4095)
Not very pretty to look at, but hopefully if you have made it this
far in the blog post, you should be able to understand each step in the sequence.
Now, let’s turn our attention to thinking about some abstractions that
would make all this easier. We’re now leaving the realm of cryptography
and machine learning and arriving at programming language design, so let’s
take advantage of fact that Julia allows powerful abstractions and go through
the exercise of building some. For example, we could encapsulate the whole
convolution extraction process as a custom array type:
using BlockArrays
"""
ExplodedConvArray{T, Dims, Storage} <: AbstractArray{T, 4}
Represents a an `nxmx1xb` array of images, but rearranged into a
series of convolution windows. Evaluating a convolution compatible
with `Dims` on this array is achievable through a sequence of
scalar multiplications and sums on the underling storage.
"""
struct ExplodedConvArray{T, Dims, Storage} <: AbstractArray{T, 4}
# sx*sy matrix of b*(dx*dy) matrices of extracted elements
# where (sx, sy) = kernel_size(Dims)
# (dx, dy) = output_size(DenseConvDims(...))
cdims::Dims
x::Matrix{Storage}
function ExplodedConvArray{T, Dims, Storage}(cdims::Dims, storage::Matrix{Storage}) where {T, Dims, Storage}
@assert all(==(size(storage[1])), size.(storage))
new{T, Dims, Storage}(cdims, storage)
end
end
Base.size(ex::ExplodedConvArray) = (NNlib.input_size(ex.cdims)..., 1, size(ex.x[1], 1))
function ExplodedConvArray{T}(cdims, batch::AbstractArray{T, 4}) where {T}
x, y = NNlib.output_size(cdims)
kx, ky = NNlib.kernel_size(cdims)
stridex, stridey = NNlib.stride(cdims)
kax = OffsetArray(0:x-1, 0:x-1)
kay = OffsetArray(0:x-1, 0:x-1)
I = [[batch[i′*stridex .+ (1:kx), j′*stridey .+ (1:ky), 1, k] for i′=kax, j′=kay] for k = 1:size(batch, 4)]
Iᵢⱼ = [[I[k][l...][i,j] for k=1:size(batch, 4), l=product(kax, kay)] for (i,j) in product(1:kx, 1:ky)]
ExplodedConvArray{T, typeof(cdims), eltype(Iᵢⱼ)}(cdims, Iᵢⱼ)
end
function NNlib.conv(x::ExplodedConvArray{<:Any, Dims}, weights::AbstractArray{<:Any, 4}, cdims::Dims) where {Dims<:ConvDims}
blocks = reshape([ Base.ReshapedArray(sum(x.x[i,j]*weights[i,j,1,channel] for i=1:7, j=1:7), (NNlib.output_size(cdims)...,1,size(x, 4)), ()) for channel = 1:4 ],(1,1,4,1))
BlockArrays._BlockArray(blocks, BlockArrays.BlockSizes([8], [8], [1,1,1,1], [64]))
end
Note that here we made use BlockArrays
back to represent a 8x8x4x64
array
as 4 8x8x1x64
arrays as in the original code. Ok, so now we already have a much
nicer representation of the first step, at least on unencrypted arrays:
julia> cdims = DenseConvDims(batch, model.layers[1].weight; stride=(3,3), padding=(0,0,0,0), dilation=(1,1))
DenseConvDims: (28, 28, 1) * (7, 7) -> (8, 8, 4), stride: (3, 3) pad: (0, 0, 0, 0), dil: (1, 1), flip: false
julia> a = ExplodedConvArray{eltype(batch)}(cdims, batch);
julia> model(a)
10×64 Array{Float32,2}:
[snip]
How do we bring this into the encrypted world? Well, we need to do two things:
- We want to encrypt a struct (
ExplodedConvArray
) in such a way that each that we get a ciphertext for each field. Then, operations on this encrypted struct work by looking up what the function would have done on the original struct and simply doing the same homomorphically. - We want to intercept certain operations to be done differently in the encrypted context.
Luckily, Julia, provides an abstraction that lets us a do both: A compiler plugin-in using the Cassette.jl mechanism. How this works and how to use it is a bit of a complicated story, so I will omit it from this blog, post, but briefly,
you can define a Context (say Encrypted
and then define rules for how operations under this context work). For example, for the second requirement might be written, as:
# Define Matrix multiplication between an array and an encrypted block array
function (*::Encrypted{typeof(*)})(a::Array{T, 2}, b::Encrypted{<:BlockArray{T, 2}}) where {T}
sum(a*b for (i,range) in enumerate(partition(1:size(a, 2), size(b.blocks[1], 1))))
end
# Define Matrix multiplication between an array and an encrypted array
function (*::Encrypted{typeof(*)})(a::Array{T, 2}, b::Encrypted{Array{T, 2}}) where {T}
result = repeat(diag(a), inner=size(a, 1)).*x
rotated = b
for k = 2:size(a, 2)
rotated = ToyFHE.rotate(GaloisKey(*), rotated)
result += repeat(diag(circshift(a, (0,(k-1)))), inner=size(a, 1)) .* rotated
end
result
end
The end result of all of this that the user should be able to write the whole thing
above with minimal manual work:
kp = keygen(ckks_params)
ek = keygen(EvalMultKey, kp.priv)
gk = keygen(GaloisKey, kp.priv; steps=64)
# Create evaluation context
ctx = Encrypted(ek, gk)
# Do public preprocessing
batch = ExplodedConvArray{eltype(batch)}(cdims, batch);
# Run on encrypted data under the encryption context
Cresult = ctx(model)(encrypt(kp.pub, batch))
# Decrypt the answer
decrypt(kp, Cresult)
Of course, even that may not be optimal. The parameters of the cryptosystem
(e.g. the ring ℛ
, when to modswitch, keyswitch, etc) represent a tradeoff
between precision of the answer, security and performance and depend strongly
on the code being run. In general, one would want the compiler to analyze the code
it’s about to run encrypted, suggest parameters for a given security level and
desired precision and then generate the code with minimal manual work by the user.
Conclusion
Achieving the dream of automatically executing arbitrary computations securely is a tall order for any system, but Julia’s metaprogramming capabilities and friendly syntax make it
well suited as a development platform. Some attempts at this have already been made
by the RAMPARTS collaboration (paper, JuliaCon talk), which compiles simple Julia code
to the PALISADE FHE library. Julia Computing is collaborating with the experts behind RAMPARTS on Verona, the recently announced next generation version of that system. Only in the past year or so has the performance of homomorphic encryption systems
reached the point where it is possible to actually evaluate interesting computations at speed approaching practical usability. The floodgates are open. With new advances in algorithms, software and hardware, homomorphic encryption is sure to become a mainstream technology to
protect the privacy of millions of users.
If you would like to understand more deeply how everything works, I have tried to make sure
that the ToyFHE repository is readable. There
is also some documentation that I’m hoping gives a somewhat approachable introduction to the cryptography involved. Of course much work remains to be done. If you are interested in this kind of work or have interesting applications, do not hesitate to get in touch.