Skip to content

Commit 4df2986

Browse files
authored
Merge pull request #10 from pedropark99/fix/figure-base64
Fixes over Figure 3.1 about the base64 encoder
2 parents 23736d3 + 046ff0a commit 4df2986

27 files changed

+366
-113
lines changed

Chapters/01-base64.qmd

+31-39
Original file line numberDiff line numberDiff line change
@@ -73,14 +73,6 @@ The bulletpoints below summarises the base64 scale:
7373

7474

7575

76-
Everytime that the base64 algorithm needs to fill some gap (which always occur at the end of
77-
the input string) with a group of 6 bits filled with only zeros (`000000`), this group is automatically
78-
mapped to the character `=`. Because this group of 6 bits is meaningless, they represent nothing,
79-
they are just filling the gap. As a result, the base64 algorithm maps this meaningless group
80-
to the character `=`, which represents the end of meaningful characters in the sequence.
81-
This characteristic is explained in more details at @sec-base64-encoder-algo.
82-
83-
8476

8577
### Creating the scale as a lookup table {#sec-base64-table}
8678

@@ -96,8 +88,9 @@ from the array where you stored all the possible characters in the base64 scale.
9688
directly from memory.
9789

9890
We can start building a Zig struct to store our base64 decoder/encoder logic.
99-
We start with the `Base64` struct below. You can see that, for now, we only have an `init()` function,
100-
to create a new instance of a `Base64` object, and, a `_char_at()` function, which is a
91+
We start with the `Base64` struct below. You can see that, for now, we only have one single data member in this
92+
struct, i.e. the member `_table`, which represents our lookup table. We also have an `init()` method,
93+
to create a new instance of a `Base64` object, and, a `_char_at()` method, which is a
10194
"get chat at index ..." type of function.
10295

10396

@@ -123,7 +116,7 @@ const Base64 = struct {
123116
```
124117

125118

126-
In other words, the `_char_at()` function is responsible for getting the character in the lookup table (i.e. the `_table` variable) that
119+
In other words, the `_char_at()` method is responsible for getting the character in the lookup table (i.e. the `_table` struct data member) that
127120
corresponds to a particular index in the "base64 scale". So, in the example below, we know that
128121
the character that corresponds to the index 28 in the "base64 scale" is the character "c".
129122

@@ -149,7 +142,7 @@ The algorithm behind a base64 encoder usually works on a window of 3 bytes. Beca
149142
8 bits, so, 3 bytes forms a set of $8 \times 3 = 24$ bits. This is desirable for the base64 algorithm, because
150143
24 bits is divisble by 6, which form a set of 4 groups of 6 bits each.
151144

152-
So the base64 algorithm work by converting 3 bytes at a time
145+
So the base64 algorithm works by converting 3 bytes at a time
153146
into 4 characters in the base64 scale. It keeps iterating through the input string,
154147
3 bytes at a time, and converting them into the base64 scale, producing 4 characters
155148
per iteration. It keeps iterating, and producing these "new characters"
@@ -158,42 +151,41 @@ until it hits the end of the input string.
158151
Now you may think, what if you have a particular string that have a number of bytes
159152
that is not divisible by 3? What happens? For example, if you have a string
160153
that contains only two characters/bytes, such as "Hi". How the
161-
algorithm behaves in such situation? You find the answer at @fig-base64-algo1.
154+
algorithm would behave in such situation? You find the answer at @fig-base64-algo1.
162155
You can see at @fig-base64-algo1 that the string "Hi", when converted to base64,
163156
becomes the string "SGk=":
164157

165158
![The logic behind a base64 encoder](./../Figures/base64-encoder-flow.png){#fig-base64-algo1}
166159

167-
In the example of the string "Hi" we have 2 bytes, or, 16 bits in total. So, we lack a full byte (8 bits)
168-
to complete the window of 24 bits that the base64 algorithm likes to work on. In essence,
169-
everytime that the algorithm does not meet this requirement, it simply add extra zeros
170-
until it fills the space that it needs.
160+
Taking the string "Hi" as an example, we have 2 bytes, or, 16 bits in total. So, we lack a full byte (8 bits)
161+
to complete the window of 24 bits that the base64 algorithm likes to work on. The first thing that
162+
the algorithm does, is to check how to divide the input bytes into groups of 6 bits.
171163

164+
If the algorithm notice that there is a group of 6 bits that, have some bits in it, but, at the same time, it is not full
165+
(in other words, $0 < nbits < 6$, being $nbits$ the number of bits), meaning that, it lacks
166+
some bits to fill the 6-bits requirement, the algorithm simply add extra zeros in this group
167+
to fill the space that it needs.
172168
That is why at @fig-base64-algo1, on the third group after the 6-bit transformation,
173-
2 extra zeros were added to fill the gap in this group, and also, the fourth group (which is the last 6-bit group)
174-
is entirely made by zeros that were added by the algorithm.
175-
176-
So every time that the base64 algorithm can't produce a full group of 6 bits, it
177-
simply fills the gap in this group with zeros, until it get's the 6 bits that it needs.
178-
179-
Is worth mentioning that, everytime that the algorithm produces a group of 6 bits that
180-
is entirely composed by these extra zeros added by the algorithm, then, this group of 6 bits is automatically mapped to
181-
the character `=` (equal sign). However, notice that a group of 6-bit entirely made by **extra zeros**,
182-
is different than a group of 6-bit entirely made by **zeros**.
183-
184-
In other words, if the algorithm produces a 6-bit group made by zeros, without
185-
needing to include extra-zeros to fill any gap, then, this "group of zeros" is interpreted as is. In binary,
186-
the 6-bit group `000000` simply means zero. So, if we give the index zero to the function `_char_at()`,
187-
this zero index is mapped to the first character in the base64 scale, which is "A".
188-
189-
So be aware of this important distinction. A group of "extra-zeros" that are "filling the gap"
190-
is different than a group of actual zeros that were calculated by the 6-bit transformation.
191-
As an example, if you give the string "0" as input to a base64 encoder, this string is
169+
2 extra zeros were added to fill the gap in this group.
170+
171+
So, when we have a 6-bit group that is not completely full, like the third group, extra zeros
172+
are added to fill the gap. But what about when an entire 6-bit group is empty, or, it
173+
simply doesn't exist? This is the case of the fourth 6-bit group exposed at
174+
@fig-base64-algo1.
175+
176+
This fourth group is necessary, because the algorithm works on 4 groups of 6 bits.
177+
But the input string does not have enough bytes to create a fourth 6-bit group.
178+
Every time that this happens, where a entire group of 6 bits is empty,
179+
this group becomes a "padding group". Every "padding group" is mapped to
180+
the character `=` (equal sign), which represents "null", or, the end
181+
of meaninful characters in the sequence.
182+
Hence, everytime that the algorithm produces a "padding group", this group is mapped to `=`.
183+
184+
As another example, if you give the string "0" as input to a base64 encoder, this string is
192185
translated into the base64 sequence "MA==".
193-
194186
The character "0" is, in binary, the sequence `00110000`[^zero-note]. So, with the 6-bit transformation
195187
exposed at @fig-base64-algo1, this single character would produce these two 6-bit groups: `001100`, `000000`.
196-
The other two 6-bit groups are entirely made by extra-zeros, and that is why the last
188+
The remaining two 6-bit groups become "padding groups". That is why the last
197189
two characters in the output sequence (MA==) are `==`.
198190

199191

@@ -779,7 +771,7 @@ So, the steps to produce the 3 bytes in the output are:
779771

780772

781773
Before we continue, let's try to visualize how these transformations make the original bytes that we had
782-
before the encoding process. First, think back at the 6-bit transformation performed by the encoder exposed at #sec-encoder-logic.
774+
before the encoding process. First, think back at the 6-bit transformation performed by the encoder exposed at @sec-encoder-logic.
783775
The first byte in the output of the encoder is produced by moving the bits in the first byte of the input two positions to the right.
784776

785777
So, if for example the first byte in the input of the encoder was the sequence `ABCDEFGH`, then, the first byte in the output of the encoder would be
-1.86 KB
Binary file not shown.
-1.4 KB
Binary file not shown.

Figures/base64-decoder-bit-shift.png

-28.5 KB
Loading

Figures/base64-decoder-flow.png

-114 KB
Loading

Figures/base64-encoder-bit-shift.png

-30.4 KB
Loading

Figures/base64-encoder-flow.png

-121 KB
Loading

_freeze/Chapters/01-base64/execute-results/html.json

+2-2
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)