The cord tying problem

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/08/03/laces.html

Introduction

During the JuliaCon 2023 conference I got several
suggestions for writing some more puzzle-solving posts.
Therefore this week I want to present a problem that I have recently
learned from Paweł Prałat:

Assume that you have an even number n of cords of equal length
lying in a bunch. They are arranged in such a way that they are
approximately straight so that you see one end of each cord in one
region (call it left) and the other end in another region
(call it right). Imagine, for example, that the cords were wrapped
around in their middles. However, this unfortunately
means, you cannot distinguish which end belongs to which cord.
Your task is to tie the ends of the cords in such a way that after
removing the middle wrapping they form a single big loop. Assuming that
you tie the cords randomly (left and right ends separately)
compute the probability that you are going to succeed.

The initial setup of the cords (before we start tying them) is
shown on a figure below (I took the image from this source):

Cords

As usual, we are going to solve it analytically and computationally.

The post was written using Julia 1.9.2 and DataFrames.jl 1.6.1.

Analytical solution

Denote by p(n) the probability that we succeed with n cords.

Notice that we can assume that we can first tie n right
ends of the cords in any order.

Now let us analyze tying the left ends of the cords.
We assume that they are tied randomly.
We start tying the left ends by randomly picking cord
a and tying it to a random cord b.
Let us ask when such a tie is a success and when it is a failure.

If n = 2 we know we succeeded. We have just created a loop
using two cords. Thus p(2) = 1.

If n > 2 what we want to avoid is a situation when the cords
a and b are already tied on the right side. Why?
Because then we would create a loop
that would be smaller than the required loop including all cords.
The probability that we pick a wrong end of a cord is 1/(n-1)
as we have n-1 ends to choose from and one of them is bad.
Thus we succeed with probability (n-2)/(n-1).

Now observe that, assuming we succeeded, we have just tied three original
cords into a one longer cord. Thus we are left with a situation
when we have n-2 cords and the problem has the same structure to
what we started with.
So we get that for n > 2 we can computep(n) = p(n-2)*(n-2)/(n-1).

This means that we can write (using Julia code notation):

p(n) = prod((i-1)/i for i in n-1:-2:3)

Note that this function works correctly only for even n
that is at least 4. We will make its more careful implementation
in the computational solution section below.

However, first, let us ask ourselves if the formula for this function
can be simplified. Indeed we see that it is equivalent to:

prod(i-1 for i in n-1:-2:3) / prod(i for i in n-1:-2:3)

which in turn can be rewritten as:

prod(i-1 for i in n-1:-2:3)^2 / prod(i for i in n-1:-1:2)

Now observe that the numerator is just 2^(n-2)*factorial(n÷2-1)^2
(remember that n is even)
and the denominator is factorial(n-1). Thus the formula further
simplifies to:

2^(n-2)*factorial(n÷2-1)^2 / factorial(n-1)

And finally:

2^(n-2) / ((n-1)*binomial(n-2, n÷2-1))

Now from Stirling’s approximation we know that:

binomial(n-2, n÷2-1) ~ 2^(2n-4) / sqrt((n-2)*pi)

so for sufficiently large n:

p(n) ~ sqrt((n/2-1)*pi) / (n-1)

Thus we learn that the probability of getting a full circle
is of order O(1/sqrt(n)) for large n.

Let us now check these results computationally.

Computational solution

First start with a more careful implementation of the functions
computing p(n):

function connect_exact(n::Integer)
    @assert n > 3 && iseven(n)
    return prod((i-1)/i for i in n-1:-2:3)
end

function connect_approx(n::Integer)
    @assert n > 3 && iseven(n)
    return sqrt(pi * (n / 2 - 1)) / (n - 1)
end

Let us check how close the exact and approximate formulas are.
Let us compute percentage deviation of the approximation
from the exact result:

julia> [1 - connect_approx(n) / connect_exact(n) for n in 4:2:28]
13-element Vector{Float64}:
 0.11377307454724206
 0.060014397013374854
 0.040631211300167
 0.030689300286045995
 0.024649922854770412
 0.02059439568578203
 0.017683822837349372
 0.015493594528168564
 0.013785863139806342
 0.012417071173843053
 0.011295454766000912
 0.01035962441429672
 0.00956696079055186

We see that approximation is slightly below the exact number and that
the percentage deviation decreases as n goes up. With n=28 we are
below 1% error.

Let us check some larger value of n:

julia> 1 - connect_approx(10000) / connect_exact(10000)
2.5004688326446534e-5

We see that the values are now really close.
If you were afraid that we might be hitting numeric computation
issues with connect_approx since we are multiplying a lot of
values, we can easily switch to a more precise computation with Julia:

julia> 1 - connect_approx(big(10000)) / connect_exact(big(10000))
2.500468833607760982625749174941669517305288399515417883351990349709823151295288e-05

We see that using normal Float64 was enough for this range of values
of n to get enough accuracy.

But what if we were not sure if our derivation of the formula for p(n)
was correct? We can use simulation to check it.

Here is the implementation of a simulator:

function connect_sim(n::Integer)
    @assert iseven(n) && n > 3
    left = randperm(n)
    neis2 = zeros(Int, n)
    for i in 1:2:n
        neis2[left[i]] = left[i+1]
        neis2[left[i+1]] = left[i]
    end
    prev = 1
    loc = 2
    visited = 2
    while true
        nei1 = isodd(loc) ? loc+1 : loc-1
        nei2 = neis2[loc]
        loc, prev = (prev == nei1 ? nei2 : nei1), loc
        loc == 1 && return visited == n
        visited += 1
    end
end

The the code we assume that we numbered the cords from 1 to n
and that in the right part they are connected 1-2, 3-4, …
(note that we can always re-number them to get this).

The neis2 keeps the information about connections on left.
To get a random connection pattern we first draw a random n-element
permutation and store it in the left variable. Then we assume that
the connections are formed by cords left[1]-left[2], left[3]-left[4], …
and store these connections in the neis2 vector.

Now we are ready to check if this connection pattern is good, that
is, it creates one big loop. To do this we start from cord 1
and assume that we first moved to cord 2. The current location of
our travel is kept in variable loc. Then from each cord we move
either on right or on left to the next cord. The nei1 variable keeps
cords neighbor on right and nei2 on left. We keep track in the prev
variable which cord we have visited last. Using this information
we know which move we should make next. Notice that since we started from
1 we eventually have to reach it. The number of steps taken to reach
1 is tracked by the visited variable. If when loc == 1 we have
that visited == n this means that we have formed a big cycle and
we return true. Otherwise we return false.

Let us check if our simulation indeed returns values close to theoretical
ones. For this we will record the mean of 100,000 runs of our simulation
(and here the power of Julia shines – it is not a problem to run that many
samples). We check the results for the values of n we investigated above:

using DataFrames
using Random
using Statistics
connect_sim_mean(n) =
    mean(connect_sim(n) for _ in 1:100_000)
Random.seed!(1234)
df = DataFrame(n=[4:2:28; 10_000])
transform(df, :n .=> ByRow.([connect_exact,
                             connect_approx,
                             connect_sim_mean]))

The results of running this code are given below:

14×4 DataFrame
 Row │ n      n_connect_exact  n_connect_approx  n_connect_sim_mean
     │ Int64  Float64          Float64           Float64
─────┼──────────────────────────────────────────────────────────────
   1 │     4        0.666667          0.590818              0.66662
   2 │     6        0.533333          0.501326              0.53622
   3 │     8        0.457143          0.438569              0.45843
   4 │    10        0.406349          0.393879              0.40671
   5 │    12        0.369408          0.360302              0.36978
   6 │    14        0.340992          0.33397               0.33996
   7 │    16        0.31826           0.312631              0.31743
   8 │    18        0.299538          0.294897              0.29848
   9 │    20        0.283773          0.279861              0.28399
  10 │    22        0.27026           0.266904              0.26879
  11 │    24        0.25851           0.25559               0.25528
  12 │    26        0.248169          0.245598              0.24755
  13 │    28        0.238978          0.236692              0.24052
  14 │ 10000        0.0125335         0.0125331             0.01266

We can see that simulation results match the exact calculations well.

Conclusions

I hope you liked the puzzle and the solution. Next week I plan to
present the results of some experiments involving machine learning
models in Julia.