By: Fabian Becker
Re-posted from: https://geekmonkey.org/lwm-julia-bitwise-operators/
In part 3, we learned about structs and binary I/O in Julia. We looked into how the PNG format stores metadata and how it is represented using chunks. Each chunk consisted of four components: length, type, data, and a four-byte CRC.
CRC stands for "Cyclic Redundancy Check," and these checks are commonly used in digital networks and storage devices to detect transmission or read errors in data. Sometimes one or more bits can get flipped during data transmission, and CRCs are one of the tools employed to detect this.
CRC is a so-called error detection algorithm. The simplest algorithm in this class is called the parity bit. The idea with the parity bit is that you count all the 1s in a binary string, and if the number of 1s is even, the parity bit is also even. With this technique, you can detect any odd-number bit flips but are basically out of luck when an even number of bits are flipped.
In this article, we'll look at implementing CRC32 using Julia bitwise operators. The Wikipedia page is a great starting point if you want to learn a bit more before jumping into the code.
Bitwise Operators in Julia
Like all other modern programming languages, Julia comes with bitwise operators. They apply to all primitive integer types. I don't think it's necessary for me to cover all of them here, so I'll focus on the ones we need for this post:
AND denoted in Julia by an ampersand &
is true only if both operands are true.
a b | a & b
0 0 | 0
0 1 | 0
1 0 | 0
1 1 | 1
Bitshift is an operation that moves a sequence of bits either left or right. In Julia, the operator for this is >>
or <<
depending on whether you want to shift the bits right or left. Shifting left by n bits has the effect of multiplying a binary number by \(2^n\). Likewise shifting right by n bits will do the inverse and divide by \(2^n\). When used, it's important to know that the first operand is always the bits operated on, and the second denotes the number of bits to shift. 2 << 5
means "shift the binary representation of 2 (10) left by 5 bits (100000).
XOR, also called exclusive or is an operator that's true if and only if the operands are different. Most languages use the ^
(caret) to symbolise XOR, Julia uses ⊻
(which can be entered using LaTeX notation \xor
) but also offers the xor
function.
a b | a ⊻ b
-----------------
0 0 | 0
0 1 | 1
1 0 | 1
1 1 | 1
With the operators covered, let's now looks at implementing the CRC verification algorithm for our PNG chunks.
Bitwise CRC32
The simplest implementation of the CRC algorithm is operating bitwise. CRC is essentially an implementation of polynomial division where our data is the dividend, and the result of the division is the CRC for our data. Don't worry, we don't have to get super mathematical here – in binary polynomial division can easily be done using XOR.
For CRC32 the generator polynomial, or divisor, is $$x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1$$ which can also be written in hex as 0x04C11DB7. In practice this is often reversed since it makes the algorithm easier to implement in software. The resulting constant is 0xEDB88320.
The above implementation appears in one of my favorite algorithm books written by Henry S. Warren Jr., called Hacker's Delight. It features many other algorithms and techniques, and its most recent edition has a whole section on CRC32.
We can start by converting the C implementation from Hacker's Delight to Julia. Julia strings contain variable-length characters, so it is necessary to ensure they are converted to UInt8
first, since the algorithm can only operate on individual bytes.
function crc32(message::Vector{UInt8})
crc = 0xFFFFFFFF
for byte in message
crc = crc ⊻ byte
for _ in 1:8
mask = -UInt32(crc & 1)
crc = (crc >> 1) ⊻ (0xEDB88320 & mask)
end
end
~crc
end
We can trivially test this by picking any chunk from a PNG file and passing the type and data fields in as the message. The easiest one I found was the IEND chunk since it doesn't even have any data. This will always have the same CRC of 0xAE426082. Running the following snippet, we can see our code working:
message = "IEND"
crc = crc32(Vector{UInt8}(message)) # Returns a UInt32
print(string(crc, base=16)) # Prints: ae426082
Done! Or are we…?
Bytewise CRC32
For PNG, there's a reference implementation available in C that operates byte-wise instead of bitwise, allowing for a drastic speedup in CRC computation. We can adapt it to work in Julia as well.
The reference implementation uses a pre-built table to speed up performance. This table stores all possible combinations for a single byte resulting in a table of 256 elements. There are other implementations out there that extend this to 2 bytes or 65,536 elements.
function maketable()
crc_table = Vector{UInt32}(undef, 256)
for n in 0:255
c = convert(UInt32, n)
for _ in 1:8
if c & 1 == 1
c = 0xEDB88320 ⊻ (c >> 1)
else
c = c >> 1
end
end
crc_table[n+1] = c
end
crc_table
end
Note that since Julia arrays are 1-indexed, we have to adapt the algorithm to take care of this.
Since the table is reusable (as long as we use the same generator polynomial), it's advisable to cache it somehow. One way of doing this is using a const variable in the global scope:
const crc_table = maketable()
The main CRC32 algorithm can now use this table to conveniently iterate over the input data byte-by-byte and save many compute cycles. Note that we had to + 1
the lookup here (as compared with the C version) to accommodate Julia's 1-indexed arrays.
function crc32(data::Vector{UInt8}, crc::UInt32)
c = crc
for byte in data
c = crc_table[((c ⊻ UInt8(byte)) & 0xff) + 1] ⊻ (c >> 8)
end
return c
end
With some simple multi-dispatch, we can even add a version that takes Julia strings:
crc32(data::AbstractString, crc::UInt32) = crc32(Vector{UInt8}(data), crc)
Finally, we can now apply this to our PNG chunk. The PNG spec mentions that the CRC is computed over the bytes from the type and data field and not the length field. Since we're still dealing with Vector{UInt8}
we can use vcat
to concatenate those two fields.
Additionally, the PNG spec demands that we initialize the CRC with 0xFFFFFFFF and XOR the result again afterward. We can implement a crc32
function for our PNGChunk
type as follows:
crc32(c::PNGChunk) = crc32(vcat(c.type, c.data), 0xFFFFFFFF) ⊻ 0xFFFFFFFF
We can now validate the crc32 for each chunk against the CRC we read from the file. First, we need to fix the byte order and ensure we're comparing values of the same type. The CRC in our PNGChunk is represented as a Vector{UInt8}
with the least significant byte coming first. We can reverse the order using reverse
and turn the vector into a single UInt32
by using reinterpret
again.
isvalid(c::PNGChunk) = crc32(c) == reinterpret(UInt32, reverse(c.crc))[1]
I've modified our Base.show
function from last time to output the data a bit more clearly, and it now looks like this:
function Base.show(io::IO, c::PNGChunk)
println(io, "Length: ", length(c))
println(io, "Type: ", type(c))
println(io, "Data: ", datastr(c))
println(io, "CRC: ", crc32(c), "\t", isvalid(c) ? "OK" : "INVALID")
println(io, "-----")
end
That's it – we have successfully implemented a basic CRC32 algorithm in Julia.
Julia comes with a core package called CRC32c using the same algorithm but uses a different generator polynomial, so we, unfortunately, can't use it. There is, however an excellent third-party library called CRC.jl that you might want to check out.
If you like articles like this one, please consider subscribing to my free newsletter, where at least once a week, I send out my latest work covering Julia, Python, Machine Learning, and other tech.
You can also follow me on Twitter.