By: Picaud Vincent
Re-posted from: https://pixorblog.wordpress.com/2016/07/17/direct-convolution/
For small kernels, direct convolution beats FFT based one. I present here a basic implementation. This implementation allows to compute
From time to time we will use the notation .
An arbitrary stride has been introduced to define:
- convolution
- cross-correlation
- the stationary wavelet transform (the so called “à trous” algorithm)
Also note that with proper boundary extension (periodic and zero padding essentially), changing the sign of gives the adjoint operator:
Disclaimer
Maybe the following is overwhelmingly detailed for a simple task like Eq. (1), but I have found some interests in writing this once for all. Maybe it can be useful for someone else.
Some notations
We note our vector domain (or support), for instance
means that is defined for
To get interval lower/upper bounds we use the notation
We denote by the scaled domain defined by:
where and
Finally we use the relative complement of with respect to the set defined by
This set is not necessary connex, however like we are working in , it is sufficient to introduce the left and right parts (that can be empty)
Goal
Given two vectors , defined on , we want to define and implement an algorithm that computes for .
First step, no boundary extension
We need to define the the domain that does not violate domain of definition. This can be expressed as
Let’s write the details,
hence we have
Thus the computation of (Eq. 1) is splitted into two parts:
- one part free of boundary effect,
- one part that requires boundary extension
The algorithm takes the following form:
Second step, boundary extensions
Usually we define some classical boundary extensions. These extensions are computed from and are sometimes entailed by a validity condition. For a better clarity I give explicit lower/upper bounds:
Left boundary | validity condition | |
---|---|---|
Mirror | ||
Periodic (or cyclic) | ||
Constant | none | |
Zero padding | none |
Right boundary | validity condition | |
---|---|---|
Mirror | ||
Periodic (or cyclic) | ||
Constant | none | |
Zero padding | none |
As we want something general we want to get rid of these validity conditions.
Periodic case
Starting from a vector defined on we want to define a periodic function of period . This function must fulfills the relation.
We can do that by considering where
and is the modulus function associated to a floored division.
For a vector defined on an arbitrary domain , we first translate the indices
and then translate them back using
Putting all together, we build a periodized vector
where
Mirror Symmetry case
Starting from a vector defined on we can extend it by mirror symmetry on using with
The resulting vector fulfills the relation for .
To get a “global” definition we then periodize it on using (attention and not , otherwise the component is duplicated!).
For an arbitrary domain we use index translation as for the periodic case. Putting everything together we get:
where
Boundary extensions
To use the algorithm with boundary extensions, you only have to define:
where is the boundary extension you have chosen (periodic, constant). You do not have to take care of any validity condition, these formula are general.
Implementation
This is a straightforward implementation following as close as possible the presented formula. We did not try to optimize it, this would have obscured the presentation. Some ideas: reverse for (access memory in the right order), use simd, or C++ meta-programming with loop unrolling for fixed size, specialize regarding to Vector/StridedVector or
Preamble
Index translation / domain definition
There is however one last thing we have to explain. In languages like Julia, C we are manipulating arrays having a common starting index: in Julia, Fortran or in C, C++
For this reason we do not manipulate on but an another translated array defined on (Julia) or (C++).
To cover all cases, I assume that the starting index is denoted by .
The array is defined by:
Hence we must modify the initiale Eq. (1) to use instead of
With we have
and
Thus, Eq (1) becomes:
The other arrays are less problematic:
- For array, which is our input array, we implicitly use . This does not reduce the generality of the subroutine.
- For which is the output array, as for we assume it is defined on , but we provide to define the components we want to compute. The other components, , will remain unmodified by the subroutine.
Definition of
As we have seen before, the convolution subroutine will have as argument, but we also need . For the driver subroutine we do not directly provide this interval because its length is redundant with length. Instead we provide an offset. is deduced from:
Note: this definition does not depend on .
With you are in the “usual situation”. If you have a window size of , taking returns the middle of the window. Here, in the Fig. below, the graphical representation of an arbitrary case: a filter if size , with and .
Julia
Auxiliary subroutines
We start by defining the basic operations on sets:
function scale(λ::Int64,Ω::UnitRange) ifelse(λ>0, UnitRange(λ*start(Ω),λ*last(Ω)), UnitRange(λ*last(Ω),λ*start(Ω))) end function compute_Ωγ1(Ωα::UnitRange, λ::Int64, Ωβ::UnitRange) λΩα = scale(λ,Ωα) UnitRange(start(Ωβ)-start(λΩα), last(Ωβ)-last(λΩα)) end # Left & Right relative complements A\B # function relelativeComplement_left(A::UnitRange, B::UnitRange) UnitRange(start(A), min(last(A),start(B)-1)) end function relelativeComplement_right(A::UnitRange, B::UnitRange) UnitRange(max(start(A),last(B)+1), last(A)) end
Boundary extensions
We then define the boundary extensions. Nothing special there, we only had to check that the Julia mod(x,y) function is the floored division version (by opposition to the rem(x,y) function which is the rounded toward zero division version).
const tilde_i0 = Int64(1) function boundaryExtension_zeroPadding{T}(β::StridedVector{T}, k::Int64) kmin = tilde_i0 kmax = length(β) + kmin - 1 if (k>=kmin)&&(k<=kmax) β[k] else T(0) end end function boundaryExtension_constant{T}(β::StridedVector{T}, k::Int64) kmin = tilde_i0 kmax = length(β) + kmin - 1 if k<kmin β[kmin] elseif k<=kmax β[k] else β[kmax] end end function boundaryExtension_periodic{T}(β::StridedVector{T}, k::Int64) kmin = tilde_i0 kmax = length(β) + kmin - 1 β[kmin+mod(k-kmin,1+kmax-kmin)] end function boundaryExtension_mirror{T}(β::StridedVector{T}, k::Int64) kmin = tilde_i0 kmax = length(β) + kmin - 1 β[kmax-abs(kmax-kmin-mod(k-kmin,2*(kmax-kmin)))] end # For the user interface # boundaryExtension = Dict(:ZeroPadding=>boundaryExtension_zeroPadding, :Constant=>boundaryExtension_constant, :Periodic=>boundaryExtension_periodic, :Mirror=>boundaryExtension_mirror)
Main subroutine
Finally we define the main subroutine. Its arguments have been defined in the preamble part. I just added one @simd & @inbounds because this has a significant impact concerning perfomance (see end of this post).
function direct_conv!{T}(tilde_α::StridedVector{T}, Ωα::UnitRange, λ::Int64, β::StridedVector{T}, γ::StridedVector{T}, Ωγ::UnitRange, LeftBoundary::Symbol, RightBoundary::Symbol) # Sanity check @assert λ!=0 @assert length(tilde_α)==length(Ωα) @assert (start(Ωγ)>=1)&&(last(Ωγ)<=length(γ)) # Initialization Ωβ = UnitRange(1,length(β)) tilde_Ωα = 1:length(Ωα) for k in Ωγ γ[k]=0 end rΩγ1=intersect(Ωγ,compute_Ωγ1(Ωα,λ,Ωβ)) # rΩγ1 part: no boundary effect # β_offset = λ*(start(Ωα)-tilde_i0) @simd for k in rΩγ1 for i in tilde_Ωα @inbounds γ[k]+=tilde_α[i]*β[k+λ*i+β_offset] end end # Left part # rΩγ1_left = relelativeComplement_left(Ωγ,rΩγ1) Φ_left = boundaryExtension[LeftBoundary] for k in rΩγ1_left for i in tilde_Ωα γ[k]+=tilde_α[i]*Φ_left(β,k+λ*i+β_offset) end end # Right part # rΩγ1_right = relelativeComplement_right(Ωγ,rΩγ1) Φ_right = boundaryExtension[RightBoundary] for k in rΩγ1_right for i in tilde_Ωα γ[k]+=tilde_α[i]*Φ_right(β,k+λ*i+β_offset) end end end # Some UI functions, γ inplace modification # function direct_conv!{T}(tilde_α::StridedVector{T}, α_offset::Int64, λ::Int64, β::StridedVector{T}, γ::StridedVector{T}, Ωγ::UnitRange, LeftBoundary::Symbol, RightBoundary::Symbol) Ωα = UnitRange(-α_offset, length(tilde_α)-α_offset-1) direct_conv!(tilde_α, Ωα, λ, β, γ, Ωγ, LeftBoundary, RightBoundary) end # Some UI functions, allocates γ # function direct_conv{T}(tilde_α::StridedVector{T}, α_offset::Int64, λ::Int64, β::StridedVector{T}, LeftBoundary::Symbol, RightBoundary::Symbol) γ = Array{T,1}(length(β)) direct_conv!(tilde_α, α_offset, λ, β, γ, UnitRange(1,length(γ)), LeftBoundary, RightBoundary) γ end
In C/C++
As this post is already long I will not provide a complete code here. The only trap is to use the right mod function.
C/C++ modulus operator % is not standardized. Only the D%d=D-d*(D/d) relation is invariant allowing to define the Euclidean division. On the other side a lot of CPU x86 idiv, truncate toward zero, as a consequence C/C++ generally uses this direction.
To be sure, we have to explicitly use our F-mod function:
// Floored mod int modF(int D, int d) { int r = std::fmod(D,d); if((r > 0 && d < 0) || (r < 0 && d > 0)) r = r + d; return r; }
You can read:
Usages examples
Basic usages
Beware that due to the asymmetric role of and the proposed approach does preserve all the mathematical properties of the operator.
- Commutativity:
only for ZeroPadding
- Adjoint operator:
only for ZeroPadding and Periodic
- I have assumed arrays (not ones): some conjugation are missing
- Not considered here, but extension to n-dimensional & separable filters is immediate
push!(LOAD_PATH,"./") using DirectConv α=rand(4); β=rand(10); # Check adjoint operator # -> restricted to ZeroPadding & Periodic # (asymmetric role of α and β) # vβ=rand(length(β)) d1=dot(direct_conv(α,2,-3,vβ,:ZeroPadding,:ZeroPadding),β) d2=dot(direct_conv(α,2,+3,β,:ZeroPadding,:ZeroPadding),vβ) @assert abs(d1-d2)<sqrt(eps()) d1=dot(direct_conv(α,-1,-3,vβ,:Periodic,:Periodic),β) d2=dot(direct_conv(α,-1,+3,β,:Periodic,:Periodic),vβ) @assert abs(d1-d2)<sqrt(eps()) # Check commutativity # -> λ = -1 (convolution) and # restricted to ZeroPadding # (asymmetric role of α and β) v1=zeros(20) v2=zeros(20) direct_conv!(α,0,-1, β,v1,UnitRange(1,20),:ZeroPadding,:ZeroPadding) direct_conv!(β,0,-1, α,v2,UnitRange(1,20),:ZeroPadding,:ZeroPadding) @assert (norm(v1-v2)<sqrt(eps())) # Check Interval splitting # (should work for any boundary extension type) # γ=direct_conv(α,3,2,β,:Mirror,:Periodic) # global computation Γ=zeros(length(γ)) Ω1=UnitRange(1:3) Ω2=UnitRange(4:length(γ)) direct_conv!(α,3,2,β,Γ,Ω1,:Mirror,:Periodic) # compute on Ω1 direct_conv!(α,3,2,β,Γ,Ω2,:Mirror,:Periodic) # compute on Ω2 @assert (norm(γ-Γ)<sqrt(eps()))
Performance?
In a previous post I gave a short derivation of the Savitzky-Golay filters. I used a FFT based convolution to apply the filters. It is interesting to compare the performance of the presented direct approach vs the FFT one.
push!(LOAD_PATH,"./") using DirectConv function apply_filter{T}(filter::StridedVector{T},signal::StridedVector{T}) @assert isodd(length(filter)) halfWindow = round(Int,(length(filter)-1)/2) padded_signal = [signal[1]*ones(halfWindow); signal; signal[end]*ones(halfWindow)] filter_cross_signal = conv(filter[end:-1:1], padded_signal) filter_cross_signal[2*halfWindow+1:end-2*halfWindow] end # Now we can create a (very) rough benchmark M=Array(Float64,0,3) β=rand(1000000); for halfWidth in 1:2:40 α=rand(2*halfWidth+1); fft_t0 = time() fft_v = apply_filter(α,β) fft_t1 = time() direct_t0 = time() direct_v = direct_conv(α,halfWidth,1,β, :Constant,:Constant) direct_t1 = time() @assert (norm(fft_v -direct_v)<sqrt(eps())) M=vcat(M, Float64[length(α) (fft_t1-fft_t0)*1e3 (direct_t1-direct_t0)*1e3]) end M
We see that for small filters direct method can easily be 10 time faster than the FFT approach!
Conclusion: for small filters, use a direct approach!
Discussion
Optimization/performance
If I have time I will try to benchmark two basic implementations, a Julia one vs a C/C++ one. I’m a beginner in Julia language, with C++, I’m more at home.
I would be curious to see the difference between a basic implementation and an optimized one in Julia. Just to see how optimization can obfuscate (or not) the initial code and the performance gain. In C++ you generally have a lot of boiler-plate code (meta-programming).
Applications
The basic Eq. (1) is common tool that can be used for:
- deconvolution procedures,
- decimated and undecimated wavelet transforms,
For wavelet transform especially the undecimated one, AFAIK Eq. (1) is really the good choice. I will certainly write some posts on these stuff.
Some extra reading:
- The FFT way: Algorithms for Efficient Computation of Convolution, K. Pavel
- The Winograd’s minimal filtering algorithms way: Fast Algorithms for Convolutional Neural Networks, A. Lavin, S. Gray
- The OpenCL/GUPU way: Case study: High performance convolution using OpenCL __local memory
Code
The code is on github.
Complement: more domains
The domain
We have introduced the domain that does not violate domain of definition (given and ).
To be exhaustive we can introduce the domain that use at least one .
This domain is:
following arguments similar to those used for we get:
The domain
We can also ask for the “dual” question: given and what is the domain of , , involved in the computation of
By definition, this domain must fulfill the following relation:
hence, using the previous result
which gives: