Júlio Hoffimann | juliabloggers.com

Re-posted from: https://juliohm.github.io/science/fft-trick/

In today’s post, I would like to document an old trick in image processing sometimes referred to as “The FFT trick” for exhaustive pattern search.

Although I would like to give credit to the person who introduced the trick in the literature, I don’t have this information with me. Please leave a comment if you have references.

Learning with a sliding window

Filtering an image with a sliding window is a routine operation in various disciplines like image processing, computer vision, geostatistics, and deep learning (e.g. convolutional neural networks), to name a few.

Depending on the application, this operation serves different purposes. It is used everywhere as illustrated in the table:

Field of study	Typical window	Purposes
Image processing	Sobel kernel: $$\small\begin{bmatrix}1 & 0 & -1 \\ 2 & 0 & -2\\ 1 & 0 & -1\end{bmatrix}$$ Gaussian blur: $$\small\frac{1}{16}\begin{bmatrix}1 & 2 & 1 \\ 2 & 4 & 2\\ 1 & 2 & 1\end{bmatrix}$$	Edge detection Blurring Denoising
Computer vision	Same as in Image processing	Image descriptors (e.g. GIST, HoG) Invariance and alignment
Geostatistics	A spatial pattern: $\small\begin{bmatrix}0 & 0 & 1 \\ 1 & 1 & 1 \\ 1 & 0 & 0\end{bmatrix} %]]>$	Pattern search
Deep learning	Random weights: $\small\begin{bmatrix}0.1 & 0.4 & 3.7 \\ 2.4 & 5.9 & 8.2 \\ 0.6 & 2.0 & 6.5\end{bmatrix} %]]>$	Convolutional neural networks

BRAINSTORM: Why sliding windows? Couldn’t we learn more by using a different technique? What does a sliding window misses in terms of learning? :thinking:

Given its importance, a great deal of time was devoted by the scientific community to optimizing software and hardware for filtering images (or arrays) with sliding windows.

This post is about a clever trick that exploits legacy high-performance filtering algorithms to speed up pattern search on very large images.

FFT filtering

Consider a first naive implementation of the filtering operation in Julia that loops over the input image, extracts a subimage, multiplies it with the window entrywise, and adds the results:

function naivefilter(image::AbstractArray, window::AbstractArray)
  nx, ny = size(image)
  wx, wy = size(window)

  [sum(window .* image[i:i+wx-1,j:j+wy-1]) for i=1:nx-wx+1, j=1:ny-wy+1]
end

It works well, but is relatively slow for industry standards. Compare it to the efficient implementation available in Images.jl:

using Images
using BenchmarkTools

image = zeros(100,100)
window = zeros(3,3)

@benchmark naivefilter(image, window)
@benchmark imfilter(image, window)

# BenchmarkTools.Trial: (naivefilter)
#   memory estimate:  7.99 MiB
#   allocs estimate:  192093
#   --------------
#   minimum time:     6.512 ms (0.00% GC)
#   median time:      6.638 ms (0.00% GC)
#   mean time:        7.209 ms (7.67% GC)
#   maximum time:     10.569 ms (15.77% GC)
#   --------------
#   samples:          693
#   evals/sample:     1
#   time tolerance:   5.00%
#   memory tolerance: 1.00%
#
# BenchmarkTools.Trial: (imfilter)
#   memory estimate:  204.03 KiB
#   allocs estimate:  268
#   --------------
#   minimum time:     378.637 μs (0.00% GC)
#   median time:      388.839 μs (0.00% GC)
#   mean time:        406.210 μs (2.51% GC)
#   maximum time:     2.035 ms (78.63% GC)
#   --------------
#   samples:          10000
#   evals/sample:     1
#   time tolerance:   5.00%
#   memory tolerance: 1.00%

From the output, we notice $ms$ turning into $\mu$ s and MiB turning into KiB. Besides the cache-friendly code that Tim Holy and others have written, I would like to comment on another optimization that Images.jl does that is not available in other image processing packages and programming languages.

Image filtering can alternatively be implemented with Fast Fourier Transforms (FFTs). The details of the algorithm and the theory behind it are beyond the scope of this post. The message that I want to convey is that depending on the window size, particularly for large windows, the FFT algorithm is much faster.

In Images.jl you can call the FFT algorithm explicitly by using:

imfilter(image, window, Algorithm.FFT())

but you don’t need to, Images.jl will call the faster algorithm depending on the window size you pass to it! :raised_hands:

The switch point between algorithms was set by means of benchmarks on a few modern processors. Feel free to submit a pull request if you feel that the switch point isn’t quite right for your hardware.

Exploiting FFT filtering for pattern search

Now that we’ve seen that image filtering is a super efficient operation (specially in Julia), let’s exploit it to find patterns in images.

Given an image $A$ and a spatial pattern $B$ , an exhaustive pattern search consists of computing a distance (or dissimilarity) between $B$ and all subimages of $A$ . The subimages for which the distance is smallest are called matches. If the computed distance is zero, the match is called a perfect match.

Now is the trick. Consider the Euclidean distance and relax the notation $D = \left(A - B\right)^2$ to mean

Loop over all subimages $A_{sub}$ of $A$ and compute $D_{sub} = \left(A_{sub} - B\right)^2$

where $A_{sub}$ and $B$ have the same size. It turns out that we can implement this operation by rewriting

$D = \left(A - B\right)^2 = A^2 - 2AB + B^2$

and replacing the original inefficient loop with two FFT filtering + one reduction:

A² = imfilter(A.^2, ones(B))
AB = imfilter(A, B)
B² = sumabs2(B)

D = abs(A² - 2AB + B²) # here "abs" takes care of floating point errors

Isn’t that awesome? This trick was crucial in my ImageQuilting.jl package, it allowed me to process large 3D images in minutes instead of hours.

Finally, I would like to add that high-performance GPU implementations of FFT filtering are available in Julia. You can set an option in your package to use CLFFT.jl and have the fastest software in the market. :blush:

By: Júlio Hoffimann

Re-posted from: https://juliohm.github.io/science/coin-flipping/

In this post, I would like to contrast two ways of thinking about a coin flip process—one of which is very intuitive and one of which can be very useful for certain applications. This insight is inspired on a stochastic modeling class by Ramesh Johari.

Discrete-time coin flip process

Suppose that I ask you to simulate a sequence of i.i.d. (independent and identically distributed) coin flips in a computer where the probability of flipping a coin and observing a head () is $p$ and the probability of flipping a coin and observing a tail () is $1-p$ :

, , , , , , $\cdots$ ,

There are at least two ways of doing this:

Solution A — popular solution

The most popular solution by far is to go to your favorite programming language (e.g. Julia :blush:) and sample a sequence of $Bernoulli(p)$ :

using Distributions

p = .5  # probability of head
N = 100 # number of coin flips

coin_flips = rand(Bernoulli(p), N)
# 1, 0, 0, 0, 1, 1, ...

Solution B — not so popular solution

Think of a head (or $1$ ) as a successful coin flip. For example, you earn $1 whenever the head comes up and nothing otherwise. Keep track of the times when you had a successful outcome $T_1,\; T_1+T_2,\; T_1+T_2+T_3,\; \ldots$

Now is the trick. In a $Bernoulli(p)$ coin flip process, the interarrival times $T_n$ between successes are i.i.d. $Geometric(p)$ . Instead of simulating the coin flips directly, simulate the times at which a head will occur:

using Distributions

p = .5 # probability of head
N = 100 # number of coin flips

T = rand(Geometric(p), N)
head_times = cumsum(T+1)

# construct process in a single shot
coin_flips = zeros(N)
head_times = head_times[head_times ≤ N]
coin_flips[head_times] = 1
# 1, 0, 1, 1, 0, 0, ...

Why is solution B relevant?

In other words, why would you care about a solution that is more verbose and somewhat less intuitive?

Imagine that the probability of success $p$ is extremely low. Would you like your computer spending hours generating tails (or $0$ ) before it can generate a head? For some applications, this is a critical question.

More than that, sometimes Solution A is not available…

Continuous-time coin flip process

The Poisson process is the equivalent of the coin flip process in continuous-time. Imagine that you divide real time $T$ (a real number) into small intervals of equal lenght $\delta$ :

If a coin is to be flipped at each of these marks in the limit when $\delta \to 0$ , then Solution A is not available! There are uncountably many coin flips in the interval $[0,T)$ .

However, simulating this process in a computer is possible with Solution B. For a Poisson process with success rate $\lambda$ , the interarrival times are i.i.d. $Exponential(\lambda)$ .

juliabloggers.com

A Julia Language Blog Aggregator

Author Archives: Júlio Hoffimann

Super fast pattern search: The FFT trick