Skip to content

Commit 4ba5361

Browse files
Add 'Serialize and Deserialize Binary Tree'
1 parent 30726b9 commit 4ba5361

File tree

4 files changed

+214
-0
lines changed

4 files changed

+214
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@ A collection of LeetCode solutions
6464

6565
[Same Tree](./src/same_tree.py)
6666

67+
[Serialize and Deserialize Binary Tree](./src/serialize_and_deserialize_binary_tree.py)
68+
6769
[String Compression](./src/string_compression.py)
6870

6971
[Subtree of Another Tree](./src/two_sum.py)

TODO.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,5 @@
1414
- [ ] Add self-balancing BST approach to 'Find Median from Data Stream'
1515
- [ ] Add iterative without parent pointers approach to 'Lowest Common Ancestor of a Binary Tree'
1616
- [ ] Add recursive approach to 'Binary Tree Right Side View'
17+
- [ ] Move `utils.py` to `src/`
18+
- [ ] Use `serialize_binary_tree()` and `deserialize_binary_tree()`
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
"""
2+
297. Serialize and Deserialize Binary Tree
3+
4+
https://leetcode.com/problems/serialize-and-deserialize-binary-tree
5+
6+
NOTES
7+
* Serialization and deserialization has applicability in real world software
8+
engineering, making this a great problem!
9+
10+
/!\ NOTE /!\
11+
This solution description comprises my original thought process, backtracking
12+
when a solution was nonviable, and a final correct solution. I've retroactively
13+
provided additional context in order call attention to flaws in my original
14+
approach.
15+
16+
---
17+
18+
Serialization is the process of converting information (e.g., a data structure
19+
or object) into a sequence of bits, so that it can be stored on disk or in
20+
memory, or transmitted over a network. Deserialization is the process of
21+
constructing the serialized information.
22+
23+
Serialization and deserialization is facilitated by a codec (short for
24+
coder/decoder), which enables both data compression and data conversion.
25+
26+
Let's recall from 'Construct Binary Tree from Preorder and Inorder Traversal'
27+
that:
28+
29+
>No single traversal order (pre-order, post-order, or in-order) uniquely
30+
identifies the structure of a tree...
31+
32+
/!\ NOTE /!\
33+
This statement is only true when we do not account for gaps in the tree. Adding
34+
gaps as well as node values gives us enough information to uniquely determine
35+
the structure of the tree.
36+
37+
Therefore, the serialized tree will need to store both the pre-order and
38+
in-order traversal in order to uniquely reconstruct the binary tree.
39+
40+
Each node has a value between -1000 and 1000, which means we will need to
41+
represent 2001 numbers. To find the minimum number of bits required to
42+
represent all possible node values in the tree, we need to find n where 2^n ≥
43+
2001, since each bit pattern must uniquely identify a number. Taking the log
44+
(base-2) of both sides, results in the following:
45+
46+
n ≥ log(2001) ≈ 10.97
47+
48+
Therefore, rounding up, we need 11 bits, 2^11 = 2048, to represent 2001 values.
49+
Further leveraging the fact that node values are constrained to -1000 and 1000,
50+
we can create a simple codec by concatenating both the pre-order and in-order
51+
traversals. A special sequence, 01111111111 (1023 in base-10), is used to
52+
denote the termination of the pre-order sequence and start of the in-order
53+
sequence. NOTE: 01111111111 is 1023 in two's complement.
54+
55+
/!\ NOTE /!\
56+
This approach only works for trees with node values that are unique. Looking
57+
back at 'Construct Binary Tree from Preorder and Inorder Traversal', this was
58+
one of the problem constraints:
59+
60+
>preorder and inorder consist of *unique* values.
61+
62+
So, the correct solution involves using either a depth-first or breadth-first
63+
traversal, while accounting for gaps. Null nodes are denoted by a null marker.
64+
Here, we can reuse the special sequence above to denote gaps in the tree. An
65+
added element of complexity to this approach is the deserialization logic must
66+
account for the fact that the serialization does not include all gaps.
67+
68+
/!\ NOTE /!\
69+
Creating a serialization that accounts for all gaps in the tree, essentially
70+
representing a complete binary tree, exceeds the time limit.
71+
72+
In the end, I learned a new algorithm for serializing and deserializing binary
73+
trees. This is the same algorithm used by LeetCode.
74+
75+
Example:
76+
77+
[1, 2, 3, None, None, 4, 5, 6, 7]
78+
79+
1
80+
81+
/ \
82+
2 3
83+
● ●
84+
/ \
85+
4 5
86+
● ●
87+
/ \
88+
6 7
89+
● ●
90+
"""
91+
92+
from collections import deque
93+
94+
from src.classes import TreeNode
95+
96+
97+
class Codec:
98+
NULL_MARKER = 1023 # 01111111111 in two's complement
99+
100+
def serialize(self, root: TreeNode | None) -> str:
101+
"""
102+
Encodes a tree into a string.
103+
"""
104+
s = ""
105+
106+
if not root:
107+
return s
108+
109+
q: deque[TreeNode | None] = deque([root])
110+
111+
while q:
112+
curr: TreeNode | None = q.popleft()
113+
if curr:
114+
s += format(curr.val, "011b")
115+
else:
116+
s += format(self.NULL_MARKER, "011b")
117+
if curr:
118+
q.append(curr.left)
119+
q.append(curr.right)
120+
121+
return s
122+
123+
def deserialize(self, data: str) -> TreeNode | None:
124+
"""
125+
Decodes a string into a tree.
126+
"""
127+
l: list[int | None] = []
128+
129+
# Iterate over the data in 11 bit chunks.
130+
for i in range(0, len(data), 11):
131+
bits = data[i : i + 11]
132+
# Convert the chunk to an integer using two's complement.
133+
value = int(bits, 2)
134+
# If the leftmost bit is 1, the value was negative, so we have to
135+
# convert from unsigned to two's complement by subtracting 2^11
136+
# (2048).
137+
if bits[0] == "1":
138+
value -= 1 << 11
139+
if value == self.NULL_MARKER:
140+
l.append(None)
141+
else:
142+
l.append(value)
143+
144+
if not l:
145+
return None
146+
147+
root = TreeNode(val=l[0])
148+
q: deque[TreeNode] = deque([root])
149+
i = 1
150+
151+
# The crucial property of this algorithm is that i increments by 2
152+
# every iteration, while nodes are only enqueued if l[i] is not None.
153+
# This ensures our index into l is always aligned with the possible
154+
# left and right child nodes of the current node under consideration.
155+
# This allows the tree structure to be determined without storing
156+
# additional null nodes.
157+
while q and i < len(l):
158+
curr: TreeNode = q.popleft()
159+
if l[i] is not None:
160+
left = TreeNode(val=l[i])
161+
curr.left = left
162+
q.append(left)
163+
i += 1
164+
if l[i] is not None:
165+
right = TreeNode(val=l[i])
166+
curr.right = right
167+
q.append(right)
168+
i += 1
169+
170+
return root
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
"""
2+
297. Serialize and Deserialize Binary Tree
3+
4+
https://leetcode.com/problems/serialize-and-deserialize-binary-tree
5+
"""
6+
7+
from unittest import TestCase
8+
9+
from src.serialize_and_deserialize_binary_tree import Codec
10+
from tests.utils import deserialize_binary_tree, serialize_binary_tree
11+
12+
13+
class TestSolution(TestCase):
14+
def test_1(self):
15+
exp = [1, 2, 3, None, None, 4, 5]
16+
root = deserialize_binary_tree([1, 2, 3, None, None, 4, 5])
17+
s = Codec()
18+
d = Codec()
19+
assert serialize_binary_tree(root=d.deserialize(s.serialize(root))) == exp
20+
21+
def test_2(self):
22+
exp = []
23+
root = deserialize_binary_tree([])
24+
s = Codec()
25+
d = Codec()
26+
assert serialize_binary_tree(root=d.deserialize(s.serialize(root))) == exp
27+
28+
def test_3(self):
29+
exp = [1, 2, 2]
30+
root = deserialize_binary_tree([1, 2, 2])
31+
s = Codec()
32+
d = Codec()
33+
assert serialize_binary_tree(root=d.deserialize(s.serialize(root))) == exp
34+
35+
def test_4(self):
36+
exp = [1, 2, 3, None, None, 4, 5, 6, 7]
37+
root = deserialize_binary_tree([1, 2, 3, None, None, 4, 5, 6, 7])
38+
s = Codec()
39+
d = Codec()
40+
assert serialize_binary_tree(root=d.deserialize(s.serialize(root))) == exp

0 commit comments

Comments
 (0)