6
6
NOTES
7
7
* Use dynamic programming (2D) or recursion.
8
8
9
- * A common subsequence is a sequence of letters that appears in both strings.
10
- Not every letter in the strings has to be used, but letters cannot be
11
- rearranged. In essence, a subsequence of a string 's' is a string we get by
12
- deleting some letters in 's'.
13
-
14
- * The most obvious approach would be to iterate through each subsequence of
15
- the first string and check whether or not it is also a subsequence of the
16
- second string. This, however, will require exponential time to run. The
17
- number of subsequences in a string is up to 2^L, where L is the length of
18
- the string.
19
-
20
- * There are a couple of strategies we use to design a tractable
21
- (non-exponential) algorithm for an optimization problem:
22
-
23
- 1. Identifying a greedy algorithm
24
- 2. Dynamic programming
25
-
26
- * There is no guarantee that either is possible. Additionally, greedy
27
- algorithms are strictly less common than dynamic programming algorithms and
28
- are often more difficult to identify. However, if a greedy algorithm
29
- exists, then it will almost always be better than a dynamic programming
30
- one. You should, therefore, at least give some thought to the potential
31
- existence of a greedy algorithm before jumping straight into dynamic
32
- programming.
33
-
34
- * Recall that there are two different techniques we can use to implement a
35
- dynamic programming solution; memoization and tabulation.
36
-
37
- * Memoization is where we add caching to a function (that has no side
38
- effects). In dynamic programming, it is typically used on recursive
39
- functions for a top-down solution that starts with the initial problem
40
- and then recursively calls itself to solve smaller problems.
41
-
42
- * Tabulation uses a table to keep track of subproblem results and works
43
- in a bottom-up manner: solving the smallest subproblems before the
44
- large ones, in an iterative manner. Often, people use the words
45
- "tabulation" and "dynamic programming" interchangeably.
9
+ A common subsequence is a sequence of letters that appears in both strings. Not
10
+ every letter in the string has to be used, but letters cannot be rearranged. In
11
+ essence, a subsequence of a string 's' is a string we get by deleting some
12
+ letters in 's'.
13
+
14
+ The most obvious approach would be to iterate through each subsequence of the
15
+ first string and check whether or not it is also a subsequence of the second
16
+ string. This, however, will require exponential time to run. The number of
17
+ subsequences in a string is up to 2^L, where L is the length of the string.
18
+
19
+ There are a couple of strategies we can use to design a tractable
20
+ (non-exponential) algorithm for an optimization problem:
21
+
22
+ 1. Identifying a greedy algorithm
23
+ 2. Dynamic programming
24
+
25
+ There is no guarantee that either is possible. Additionally, greedy algorithms
26
+ are strictly less common than dynamic programming algorithms and are often more
27
+ difficult to identify. However, if a greedy algorithm exists, then it will
28
+ almost always be better than a dynamic programming one. You should, therefore,
29
+ at least give some thought to the potential existence of a greedy algorithm
30
+ before jumping straight into dynamic programming.
31
+
32
+ Recall that there are two different techniques we can use to implement a
33
+ dynamic programming solution: tabulation and memoization.
34
+
35
+ * Tabulation uses a table to keep track of subproblem results and works in a
36
+ bottom-up manner: solving the smallest subproblems before the large ones,
37
+ in an iterative manner. Often, people use the words "tabulation" and
38
+ "dynamic programming" interchangeably.
39
+
40
+ * Memoization is where we add caching to a function (that has no side
41
+ effects). In dynamic programming, it is typically used on recursive
42
+ functions for a top-down solution that starts with the initial problem and
43
+ then recursively calls itself to solve smaller problems. Memoization is
44
+ useful when a problem has overlapping subproblems.
46
45
"""
47
46
48
47
@@ -60,44 +59,98 @@ class Solution:
60
59
subproblems, the smaller ones that they depend on will already have been
61
60
solved. The best way to do this is to use a 2D array.
62
61
63
- Remembering back to the memoization solution, there were two cases.
62
+ There are two cases when considering the optimal solution of the
63
+ subproblem:
64
64
65
65
1. The first letter of both strings are the same.
66
- 2. The first letter of both strings are * not* the same.
66
+ 2. The first letter of both strings are not the same.
67
67
"""
68
68
69
69
def longestCommonSubsequence (self , text1 : str , text2 : str ) -> int :
70
+ """
71
+ Given two strings, `text1` and `text2`, compute the length of their
72
+ Longest Common Subsequence (LCS).
73
+ """
74
+ # Create an m×n matrix initialized to 0s, where m is the number of rows
75
+ # (|text1| + 1) and n is the number of columns (|text2| + 1).
76
+ #
77
+ # dp[0...m, 0] and dp[0, 0...n] are set to 0. This represents our base
78
+ # case:
79
+ #
80
+ # If either sequence is empty, the LCS length is 0.
81
+ m , n = len (text1 ) + 1 , len (text2 ) + 1
82
+ dp : list [list [int ]] = [[0 for j in range (n )] for i in range (m )]
83
+
84
+ # Fill the remaining matrix for all remaining prefixes. For each
85
+ # position dp[i][j], we calculate:
86
+ #
87
+ # If text1[i-1] == text2[j-1], dp[i][j] = 1 + dp[i-1][j-1]
88
+ #
89
+ # This means we include the current matching character and add 1 to the
90
+ # previous LCS length.
91
+ #
92
+ # Else, dp[i][j] = max(dp[i-1][j], dp[i][j-1])
93
+ #
94
+ # This means we take the maximum LCS length when excluding either the
95
+ # current character from sequence `text1` or sequence `text2`.
96
+ for i in range (1 , m ):
97
+ for j in range (1 , n ):
98
+ if text1 [i - 1 ] == text2 [j - 1 ]:
99
+ dp [i ][j ] = 1 + dp [i - 1 ][j - 1 ]
100
+ else :
101
+ dp [i ][j ] = max (
102
+ dp [i - 1 ][j ], # Exclude the character at position i in `text1`
103
+ dp [i ][j - 1 ], # Exclude the character at position j in `text2`
104
+ )
105
+ # NOTE: dp[m - 1][n - 1] is equivalent to dp[i][j].
106
+ return dp [m - 1 ][n - 1 ]
107
+
108
+
109
+ class AlternativeSolution :
110
+ """
111
+ Typically, the length of the Longest Common Subsequence (LCS) is given by
112
+ the value of dp[i][j], however, we can also solve the problem in reverse.
113
+ This results in the solution being located at dp[0][0]. Though slightly
114
+ less intuitive, this allows us to use the same indices for the string and
115
+ matrix.
116
+ """
117
+
118
+ def longestCommonSubsequence (self , text1 : str , text2 : str ) -> int :
119
+ """
120
+ Given two strings, `text1` and `text2`, compute the length of their
121
+ Longest Common Subsequence (LCS) using a reverse iteration approach.
122
+ """
70
123
# Initializing the table to 0 allows us to calculate the current
71
124
# subproblem from previous subproblems.
72
125
#
73
- # a b c d e - i →
74
- # a 0 0 0 0 0 0 j
75
- # c 0 0 0 0 0 0 ↓
76
- # e 0 0 0 0 0 0
77
- # - 0 0 0 0 0 0
78
- #
79
- # a b c d e - i →
80
- # a 3 2 2 1 1 0 j
81
- # c 2 2 2 1 1 0 ↓
82
- # e 1 1 1 1 1 0
83
- # - 0 0 0 0 0 0
126
+ # a c e - j →
127
+ # a 0 0 0 0 i
128
+ # b 0 0 0 0 ↓
129
+ # c 0 0 0 0
130
+ # d 0 0 0 0
131
+ # e 0 0 0 0
132
+ # - 0 0 0 0
84
133
#
85
- # where a,a is (0,0) and e,e is (5,3) (for i,j).
86
- col , row = len (text1 ) + 1 , len (text2 ) + 1
87
- dp : list [list [int ]] = [[0 for _ in range (col )] for _ in range (row )]
134
+ # a c e - j →
135
+ # a 3 2 1 0 i
136
+ # b 2 2 1 0 ↓
137
+ # c 2 2 1 0
138
+ # d 1 1 1 0
139
+ # e 1 1 1 0
140
+ # - 0 0 0 0
141
+ m , n = len (text1 ) + 1 , len (text2 ) + 1
142
+ dp : list [list [int ]] = [[0 for j in range (n )] for i in range (m )]
88
143
89
144
# Iterate over the table in reverse (first by column, then by row).
90
- for i in reversed (range (len (text2 ))):
91
- for j in reversed (range (len (text1 ))):
145
+ for i in reversed (range (len (text1 ))):
146
+ for j in reversed (range (len (text2 ))):
92
147
# 1. The first letter of both strings are the same.
93
- if text1 [j ] == text2 [i ]:
148
+ if text1 [i ] == text2 [j ]:
94
149
dp [i ][j ] = 1 + dp [i + 1 ][j + 1 ]
95
150
# 2. The first letter of both strings are *not* the same.
96
151
else :
97
152
dp [i ][j ] = max (dp [i ][j + 1 ], dp [i + 1 ][j ])
98
- # NOTE: Uncomment to print the result of the table.
99
- # for r in dp:
100
- # print(r)
153
+
101
154
return dp [0 ][0 ]
102
155
103
156
@@ -110,30 +163,34 @@ class MemoizationSolution:
110
163
"""
111
164
112
165
def longestCommonSubsequence (self , text1 : str , text2 : str ) -> int :
166
+ """
167
+ Given two strings, `text1` and `text2`, compute the length of their
168
+ Longest Common Subsequence (LCS) using memoization.
169
+ """
113
170
# Initializing the memoization table to -1 allows us to determine
114
171
# whether or not the value has been calculated.
115
172
#
116
- # a b c d e i →
117
- # a . . . . . j
118
- # c . . . . . ↓
119
- # e . . . . .
120
- #
121
- # where a,a is (0,0) and e,e is (5,3) (for i,j) .
122
- col , row = len (text1 ), len (text2 )
123
- memo : list [list [int ]] = [[- 1 for _ in range (col )] for _ in range (row )]
173
+ # a c e j →
174
+ # a . . . i
175
+ # b . . . ↓
176
+ # c . . .
177
+ # d . . .
178
+ # e . . .
179
+ m , n = len (text1 ), len (text2 )
180
+ memo : list [list [int ]] = [[- 1 for j in range (n )] for i in range (m )]
124
181
125
182
def lcs (s1 : str , s2 : str , memo : list [list [int ]]) -> int :
126
- col , row = len (memo [0 ]), len (memo )
127
- if s1 == "" or s2 == "" :
183
+ if not s1 or not s2 :
128
184
return 0
185
+ # Calculate current position in memo table
186
+ i , j = len (memo ) - len (s1 ), len (memo [0 ]) - len (s2 )
129
187
# Check whether we've already solved the given subproblem.
130
- i , j = row - len (s2 ), col - len (s1 )
131
188
if memo [i ][j ] != - 1 :
132
189
return memo [i ][j ]
133
190
if s1 [0 ] == s2 [0 ]:
134
191
memo [i ][j ] = 1 + lcs (s1 [1 :], s2 [1 :], memo )
135
192
else :
136
- memo [i ][j ] = max (lcs (s1 [0 :], s2 [ 1 :] , memo ), lcs (s1 [ 1 :] , s2 [0 :], memo ))
193
+ memo [i ][j ] = max (lcs (s1 [1 :], s2 , memo ), lcs (s1 , s2 [1 :], memo ))
137
194
return memo [i ][j ]
138
195
139
196
return lcs (text1 , text2 , memo )
@@ -167,18 +224,23 @@ class RecursiveSolution:
167
224
168
225
Finally, we formalize the above cases in code.
169
226
170
- This solution is O(M x N), where where `M` is the length of the first
171
- string and `N` is the length of the second string.
227
+ This solution is O(2^(M + N)) , where `M` is the length of the first string
228
+ and `N` is the length of the second string.
172
229
173
- NOTE: This solution exceeds the time limit.
230
+ NOTE: Though *technically* correct, this solution exceeds the time limit,
231
+ since it does not account for overlapping subproblems.
174
232
"""
175
233
176
234
def longestCommonSubsequence (self , text1 : str , text2 : str ) -> int :
235
+ """
236
+ Given two strings, `text1` and `text2`, compute the length of their
237
+ Longest Common Subsequence (LCS) using pure recursion.
238
+ """
177
239
def lcs (s1 : str , s2 : str ) -> int :
178
- if s1 == "" or s2 == "" :
240
+ if not s1 or not s2 :
179
241
return 0
180
242
if s1 [0 ] == s2 [0 ]:
181
243
return 1 + lcs (s1 [1 :], s2 [1 :])
182
- return max (lcs (s1 [0 :], s2 [ 1 :] ), lcs (s1 [ 1 :] , s2 [0 :]))
244
+ return max (lcs (s1 [1 :], s2 ), lcs (s1 , s2 [1 :]))
183
245
184
246
return lcs (text1 , text2 )
0 commit comments