Skip to content

Commit a9b2d4d

Browse files
Add 'Longest Common Subsequence'
1 parent f105ca4 commit a9b2d4d

File tree

3 files changed

+159
-81
lines changed

3 files changed

+159
-81
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ A collection of LeetCode solutions
2222

2323
[Linked List Cycle](./src/linked_list_cycle.py)
2424

25+
[Longest Common Subsequence](./src/longest_common_subsequence.py)
26+
2527
[Maximum Depth of Binary Tree](./src/maximum_depth_of_binary_tree.py)
2628

2729
[Maximum Subarray](./src/maximum_subarray.py)

src/longest_common_subsequence.py

Lines changed: 138 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -6,43 +6,42 @@
66
NOTES
77
* Use dynamic programming (2D) or recursion.
88
9-
* A common subsequence is a sequence of letters that appears in both strings.
10-
Not every letter in the strings has to be used, but letters cannot be
11-
rearranged. In essence, a subsequence of a string 's' is a string we get by
12-
deleting some letters in 's'.
13-
14-
* The most obvious approach would be to iterate through each subsequence of
15-
the first string and check whether or not it is also a subsequence of the
16-
second string. This, however, will require exponential time to run. The
17-
number of subsequences in a string is up to 2^L, where L is the length of
18-
the string.
19-
20-
* There are a couple of strategies we use to design a tractable
21-
(non-exponential) algorithm for an optimization problem:
22-
23-
1. Identifying a greedy algorithm
24-
2. Dynamic programming
25-
26-
* There is no guarantee that either is possible. Additionally, greedy
27-
algorithms are strictly less common than dynamic programming algorithms and
28-
are often more difficult to identify. However, if a greedy algorithm
29-
exists, then it will almost always be better than a dynamic programming
30-
one. You should, therefore, at least give some thought to the potential
31-
existence of a greedy algorithm before jumping straight into dynamic
32-
programming.
33-
34-
* Recall that there are two different techniques we can use to implement a
35-
dynamic programming solution; memoization and tabulation.
36-
37-
* Memoization is where we add caching to a function (that has no side
38-
effects). In dynamic programming, it is typically used on recursive
39-
functions for a top-down solution that starts with the initial problem
40-
and then recursively calls itself to solve smaller problems.
41-
42-
* Tabulation uses a table to keep track of subproblem results and works
43-
in a bottom-up manner: solving the smallest subproblems before the
44-
large ones, in an iterative manner. Often, people use the words
45-
"tabulation" and "dynamic programming" interchangeably.
9+
A common subsequence is a sequence of letters that appears in both strings. Not
10+
every letter in the string has to be used, but letters cannot be rearranged. In
11+
essence, a subsequence of a string 's' is a string we get by deleting some
12+
letters in 's'.
13+
14+
The most obvious approach would be to iterate through each subsequence of the
15+
first string and check whether or not it is also a subsequence of the second
16+
string. This, however, will require exponential time to run. The number of
17+
subsequences in a string is up to 2^L, where L is the length of the string.
18+
19+
There are a couple of strategies we can use to design a tractable
20+
(non-exponential) algorithm for an optimization problem:
21+
22+
1. Identifying a greedy algorithm
23+
2. Dynamic programming
24+
25+
There is no guarantee that either is possible. Additionally, greedy algorithms
26+
are strictly less common than dynamic programming algorithms and are often more
27+
difficult to identify. However, if a greedy algorithm exists, then it will
28+
almost always be better than a dynamic programming one. You should, therefore,
29+
at least give some thought to the potential existence of a greedy algorithm
30+
before jumping straight into dynamic programming.
31+
32+
Recall that there are two different techniques we can use to implement a
33+
dynamic programming solution: tabulation and memoization.
34+
35+
* Tabulation uses a table to keep track of subproblem results and works in a
36+
bottom-up manner: solving the smallest subproblems before the large ones,
37+
in an iterative manner. Often, people use the words "tabulation" and
38+
"dynamic programming" interchangeably.
39+
40+
* Memoization is where we add caching to a function (that has no side
41+
effects). In dynamic programming, it is typically used on recursive
42+
functions for a top-down solution that starts with the initial problem and
43+
then recursively calls itself to solve smaller problems. Memoization is
44+
useful when a problem has overlapping subproblems.
4645
"""
4746

4847

@@ -60,44 +59,98 @@ class Solution:
6059
subproblems, the smaller ones that they depend on will already have been
6160
solved. The best way to do this is to use a 2D array.
6261
63-
Remembering back to the memoization solution, there were two cases.
62+
There are two cases when considering the optimal solution of the
63+
subproblem:
6464
6565
1. The first letter of both strings are the same.
66-
2. The first letter of both strings are *not* the same.
66+
2. The first letter of both strings are not the same.
6767
"""
6868

6969
def longestCommonSubsequence(self, text1: str, text2: str) -> int:
70+
"""
71+
Given two strings, `text1` and `text2`, compute the length of their
72+
Longest Common Subsequence (LCS).
73+
"""
74+
# Create an m×n matrix initialized to 0s, where m is the number of rows
75+
# (|text1| + 1) and n is the number of columns (|text2| + 1).
76+
#
77+
# dp[0...m, 0] and dp[0, 0...n] are set to 0. This represents our base
78+
# case:
79+
#
80+
# If either sequence is empty, the LCS length is 0.
81+
m, n = len(text1) + 1, len(text2) + 1
82+
dp: list[list[int]] = [[0 for j in range(n)] for i in range(m)]
83+
84+
# Fill the remaining matrix for all remaining prefixes. For each
85+
# position dp[i][j], we calculate:
86+
#
87+
# If text1[i-1] == text2[j-1], dp[i][j] = 1 + dp[i-1][j-1]
88+
#
89+
# This means we include the current matching character and add 1 to the
90+
# previous LCS length.
91+
#
92+
# Else, dp[i][j] = max(dp[i-1][j], dp[i][j-1])
93+
#
94+
# This means we take the maximum LCS length when excluding either the
95+
# current character from sequence `text1` or sequence `text2`.
96+
for i in range(1, m):
97+
for j in range(1, n):
98+
if text1[i - 1] == text2[j - 1]:
99+
dp[i][j] = 1 + dp[i - 1][j - 1]
100+
else:
101+
dp[i][j] = max(
102+
dp[i - 1][j], # Exclude the character at position i in `text1`
103+
dp[i][j - 1], # Exclude the character at position j in `text2`
104+
)
105+
# NOTE: dp[m - 1][n - 1] is equivalent to dp[i][j].
106+
return dp[m - 1][n - 1]
107+
108+
109+
class AlternativeSolution:
110+
"""
111+
Typically, the length of the Longest Common Subsequence (LCS) is given by
112+
the value of dp[i][j], however, we can also solve the problem in reverse.
113+
This results in the solution being located at dp[0][0]. Though slightly
114+
less intuitive, this allows us to use the same indices for the string and
115+
matrix.
116+
"""
117+
118+
def longestCommonSubsequence(self, text1: str, text2: str) -> int:
119+
"""
120+
Given two strings, `text1` and `text2`, compute the length of their
121+
Longest Common Subsequence (LCS) using a reverse iteration approach.
122+
"""
70123
# Initializing the table to 0 allows us to calculate the current
71124
# subproblem from previous subproblems.
72125
#
73-
# a b c d e - i →
74-
# a 0 0 0 0 0 0 j
75-
# c 0 0 0 0 0 0 ↓
76-
# e 0 0 0 0 0 0
77-
# - 0 0 0 0 0 0
78-
#
79-
# a b c d e - i →
80-
# a 3 2 2 1 1 0 j
81-
# c 2 2 2 1 1 0 ↓
82-
# e 1 1 1 1 1 0
83-
# - 0 0 0 0 0 0
126+
# a c e - j →
127+
# a 0 0 0 0 i
128+
# b 0 0 0 0 ↓
129+
# c 0 0 0 0
130+
# d 0 0 0 0
131+
# e 0 0 0 0
132+
# - 0 0 0 0
84133
#
85-
# where a,a is (0,0) and e,e is (5,3) (for i,j).
86-
col, row = len(text1) + 1, len(text2) + 1
87-
dp: list[list[int]] = [[0 for _ in range(col)] for _ in range(row)]
134+
# a c e - j →
135+
# a 3 2 1 0 i
136+
# b 2 2 1 0 ↓
137+
# c 2 2 1 0
138+
# d 1 1 1 0
139+
# e 1 1 1 0
140+
# - 0 0 0 0
141+
m, n = len(text1) + 1, len(text2) + 1
142+
dp: list[list[int]] = [[0 for j in range(n)] for i in range(m)]
88143

89144
# Iterate over the table in reverse (first by column, then by row).
90-
for i in reversed(range(len(text2))):
91-
for j in reversed(range(len(text1))):
145+
for i in reversed(range(len(text1))):
146+
for j in reversed(range(len(text2))):
92147
# 1. The first letter of both strings are the same.
93-
if text1[j] == text2[i]:
148+
if text1[i] == text2[j]:
94149
dp[i][j] = 1 + dp[i + 1][j + 1]
95150
# 2. The first letter of both strings are *not* the same.
96151
else:
97152
dp[i][j] = max(dp[i][j + 1], dp[i + 1][j])
98-
# NOTE: Uncomment to print the result of the table.
99-
# for r in dp:
100-
# print(r)
153+
101154
return dp[0][0]
102155

103156

@@ -110,30 +163,34 @@ class MemoizationSolution:
110163
"""
111164

112165
def longestCommonSubsequence(self, text1: str, text2: str) -> int:
166+
"""
167+
Given two strings, `text1` and `text2`, compute the length of their
168+
Longest Common Subsequence (LCS) using memoization.
169+
"""
113170
# Initializing the memoization table to -1 allows us to determine
114171
# whether or not the value has been calculated.
115172
#
116-
# a b c d e i
117-
# a . . . . . j
118-
# c . . . . . ↓
119-
# e . . . . .
120-
#
121-
# where a,a is (0,0) and e,e is (5,3) (for i,j).
122-
col, row = len(text1), len(text2)
123-
memo: list[list[int]] = [[-1 for _ in range(col)] for _ in range(row)]
173+
# a c e j
174+
# a . . . i
175+
# b . . . ↓
176+
# c . . .
177+
# d . . .
178+
# e . . .
179+
m, n = len(text1), len(text2)
180+
memo: list[list[int]] = [[-1 for j in range(n)] for i in range(m)]
124181

125182
def lcs(s1: str, s2: str, memo: list[list[int]]) -> int:
126-
col, row = len(memo[0]), len(memo)
127-
if s1 == "" or s2 == "":
183+
if not s1 or not s2:
128184
return 0
185+
# Calculate current position in memo table
186+
i, j = len(memo) - len(s1), len(memo[0]) - len(s2)
129187
# Check whether we've already solved the given subproblem.
130-
i, j = row - len(s2), col - len(s1)
131188
if memo[i][j] != -1:
132189
return memo[i][j]
133190
if s1[0] == s2[0]:
134191
memo[i][j] = 1 + lcs(s1[1:], s2[1:], memo)
135192
else:
136-
memo[i][j] = max(lcs(s1[0:], s2[1:], memo), lcs(s1[1:], s2[0:], memo))
193+
memo[i][j] = max(lcs(s1[1:], s2, memo), lcs(s1, s2[1:], memo))
137194
return memo[i][j]
138195

139196
return lcs(text1, text2, memo)
@@ -167,18 +224,23 @@ class RecursiveSolution:
167224
168225
Finally, we formalize the above cases in code.
169226
170-
This solution is O(M x N), where where `M` is the length of the first
171-
string and `N` is the length of the second string.
227+
This solution is O(2^(M + N)), where `M` is the length of the first string
228+
and `N` is the length of the second string.
172229
173-
NOTE: This solution exceeds the time limit.
230+
NOTE: Though *technically* correct, this solution exceeds the time limit,
231+
since it does not account for overlapping subproblems.
174232
"""
175233

176234
def longestCommonSubsequence(self, text1: str, text2: str) -> int:
235+
"""
236+
Given two strings, `text1` and `text2`, compute the length of their
237+
Longest Common Subsequence (LCS) using pure recursion.
238+
"""
177239
def lcs(s1: str, s2: str) -> int:
178-
if s1 == "" or s2 == "":
240+
if not s1 or not s2:
179241
return 0
180242
if s1[0] == s2[0]:
181243
return 1 + lcs(s1[1:], s2[1:])
182-
return max(lcs(s1[0:], s2[1:]), lcs(s1[1:], s2[0:]))
244+
return max(lcs(s1[1:], s2), lcs(s1, s2[1:]))
183245

184246
return lcs(text1, text2)

tests/test_longest_common_subsequence.py

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,7 @@
66

77
from unittest import TestCase
88

9-
from src.longest_common_subsequence import (
10-
MemoizationSolution,
11-
RecursiveSolution,
12-
Solution,
13-
)
9+
from src.longest_common_subsequence import AlternativeSolution, MemoizationSolution, RecursiveSolution, Solution
1410

1511

1612
class TestSolution(TestCase):
@@ -31,6 +27,24 @@ def test_4(self):
3127
assert Solution().longestCommonSubsequence("pmjghexybyrgzczy", "hafcdqbgncrcbihkd") == exp
3228

3329

30+
class TestAlternativeSolution(TestCase):
31+
def test_1(self):
32+
exp = 3
33+
assert AlternativeSolution().longestCommonSubsequence("abcde", "ace") == exp
34+
35+
def test_2(self):
36+
exp = 3
37+
assert AlternativeSolution().longestCommonSubsequence("abc", "abc") == exp
38+
39+
def test_3(self):
40+
exp = 0
41+
assert AlternativeSolution().longestCommonSubsequence("abc", "def") == exp
42+
43+
def test_4(self):
44+
exp = 4
45+
assert AlternativeSolution().longestCommonSubsequence("pmjghexybyrgzczy", "hafcdqbgncrcbihkd") == exp
46+
47+
3448
class TestMemoizationSolution(TestCase):
3549
def test_1(self):
3650
exp = 3

0 commit comments

Comments
 (0)