Skip to content

Commit

Permalink
[feature] add md versions of others notebooks, update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
gitgik committed Feb 25, 2021
1 parent b834a32 commit 7a4cffb
Show file tree
Hide file tree
Showing 8 changed files with 662 additions and 7 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,10 +161,10 @@ A catalogue of data structures implementation + algorithms and coding problems a

## Others

- [Hit Counter](others/hit_counter.ipynb)
- [Job Scheduler](others/job_scheduler.ipynb)
- [Kaprekar's constant](others/kaprekar's_constant.ipynb)
- [Regular expression matching](others/regex.ipynb)
- [Stable marriage problem](others/stable_marriage_problem.ipynb)
- [Url on the browser](others/url_browser_explanation.ipynb)
- [Word sense disambiguation](others/word_sense_disambiguation.ipynb)
- [Hit Counter](others/hit_counter.md)
- [Job Scheduler](others/job_scheduler.md)
- [Kaprekar's constant](others/kaprekar's_constant.md)
- [Regular expression matching](others/regex.md)
- [Stable marriage problem](others/stable_marriage_problem.md)
- [Url on the browser](others/url_browser_explanation.md)
- [Word sense disambiguation](others/word_sense_disambiguation.md)
105 changes: 105 additions & 0 deletions others/hit_counter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
## Hit Counter

Design and implement a HitCounter class that keeps track of requests (or hits). It should support the following operations:

- record(timestamp): records a hit that happened at timestamp
- total(): returns the total number of hits recorded
- range(lower, upper): returns the number of hits that occurred between timestamps lower and upper (inclusive)

What if our system has limited memory?


```python
class HitCounter:
def __init__(self):
self.hits = []

def record(timestamp):
self.hits.append(timestamp)

def total():
return len(self.hits)

def range(lower, upper):
count = 0
for hit in self.hits:
if lower <= hit <= upper:
count += 1
return count
```

Here record and count are constant time operations. Range takes O(N) time.

One tradeoff we can make is to use a sorted list or BST to keep track of the hits. This allows range operation to take O(log N) time. We can use Python's [bisect](https://docs.python.org/3/library/bisect.html) to handle sortedness.



```python
import bisect


class HitCounter:
def __init__(self):
self.hits = []

def record(timestamp):
bisect.insort_left(self.hits, timestamp)

def total():
return len(self.hits)

def range(lower, upper):
low = bisect.bisect_left(self.hits, lower)
high = bisect.bisect_right(self.hits, upper)
return high - low
```

While this is time efficient, it'll still take a lot of space because we are still saving each timestamp into the list.

We can sacrifice accuracy for memory by grouping timestamps into minutes or hours. We'll lose accuracy around the boarders but use upto a constant factor less space.

For our solution, we'll keep track of each group in a tuple, where the first item is a timestamp (in minutes) and the second item is the number of hits occuring within that minute. We'll sort the tuple by minute to allow record to run in O(log N) time.

```
tuple = (minute, hits_within_this_minute)
```


```python
import bisect
from math import floor

class HitCounter:
def __init__(self):
self.hits = []
self.counter = 0

def record(timestamp):
self.counter += 1

minute = floor(timestamp / 60)

idx = bisect.bisect_left([hit[0] for hit in self.hits], minute)

if idx < len(hits) and self.hits[idx][0] == minute:
self.hits[idx] = (minute, self.hits[idx][1] + 1)
else:
self.hits.insert(idx, (minute, 1))

def total():
return self.counter

def range(lower, higher):
lo = floor(lower / 60)
hi = floor(higher / 60)
lo_idx = bisect.bisect_left([hit[0] for hit in self.hits], lo)
hi_idx = bisect.bisect_right([hit[0] for hit in self.hits], hi)

# sum the counts of each tuple within the range(lo, hi)
return sum(self.hits[i][0] for i in range(lo_idx, hi_idx))
```


```python

```
70 changes: 70 additions & 0 deletions others/job_scheduler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
## Job Scheduler

Implement a job scheduler that takes in a function f and an integer N, and calls the function after N milliseconds.


### 1st Approach
There are many ways to do this. A more straightforward solution is to spin off a new thread on each function we want to delay, sleep for N milliseconds, then run the function.



```python
import threading
from time import sleep

class Scheduler:
def __init__(self):
pass

def delay(self, func, n):
def sleep_then_call(n):
sleep(n / 1000)
func()

t = threading.Thread(target=sleep_then_call)
t.start()
```

### 2nd Approach
While this works, there's a huge problem with our logic: we spin off a new thread each time we call delay! The number of threads will easily grow as we have more functions to schedule.

We can solve this by having one dedicated thread to call functions, and storing functions we need to call in some data structure, say a list.

Then do polling to check when to run a function. We can store each function along with a unix epoch timestamp that tells when it should run.

After checking the list for any jobs that are due to run, we run them and remove them from the list.


```python
import threading
from time import sleep, time

class Scheduler:
def __init__(self):
self.functions = [] # saves tuple of (function, time-to-run-it)
t = threading.Thread(target=self.poll)
t.start()

def poll(self):
while True:
now = time() * 1000. # change from sec to ms
for function, due in self.functions:
if now > due:
function()
self.functions = [(function, due) for (function, due) in self.functions if due > now]
sleep(0.01)

def delay(self, function, n):
self.functions.append((function, time() * 1000 + n))
```

You can go further by doing:
- Extend the scheduler to allow functions with variables
- Use a heap instead of a list to keep track of the next job to run more efficiently
- Come up with a way to get a due function, say a condition variable instead of polling
- Use a threadpool to run more than one thread without the chance of starvation (when one thread is not able to run because of another running thread)


```python

```
81 changes: 81 additions & 0 deletions others/kaprekar's_constant.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
## Kaprekar's constant

The number 6174 is known as Kaprekar's contant, after the mathematician who discovered an associated property: for all four-digit numbers with at least two distinct digits, repeatedly applying a simple procedure eventually results in this value.

The procedure is as follows:

- For a given input x, create two new numbers that consist of the digits in x in ascending and descending order.
- Subtract the smaller number from the larger number.

For example, this algorithm terminates in three steps when starting from 1234:

```js
4321 - 1234 = 3087
8730 - 0378 = 8352
8532 - 2358 = 6174
```
Write a function that returns how many steps this will take for a given input N.

## Solution
To solve this imperatively, we can implement a while loop that continually runs the procedure described above until obtaining the number 6174.

For each iteration of the loop we will increment a counter for the number of steps, and return this value at the end.

We also use a helper function that prepends zeros if necessary so that the number always remains four digits long, before creating the ascending and descending integers.


```python
def get_digits(n):
digits = str(n)
if len(digits) == 4:
return digits
else:
return '0' * (4 - len(digits)) + digits

def count_steps(n):
count = 0
while n != 6174:
n = int(''.join(sorted(get_digits(n), reverse=True))) - int(''.join(sorted(get_digits(n))))
count += 1
return count
```


```python
count_steps(12)
```




3




```python
### Recursive solution
def count_steps(n, steps=0):
if n == 6174:
return steps
num = int(''.join(sorted(get_digits(n), reverse=True))) - int(''.join(sorted(get_digits(n))))

return count_steps(num, steps + 1)
```


```python
count_steps(1234)
```




3




```python

```
74 changes: 74 additions & 0 deletions others/regex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
## Implement Regular Expression Matching

Implement regular expression matching with the following special characters:

. (period) which matches any single character
* (asterisk) which matches zero or more of the preceding element
That is, implement a function that takes in a string and a valid regular expression and returns whether or not the string matches the regular expression.

For example, given the regular expression "ra." and the string "ray", your function should return true. The same regular expression on the string "raymond" should return false.

Given the regular expression ".*at" and the string "chat", your function should return true. The same regular expression on the string "chats" should return false.


```python
### Approach

# helper function that check first matching character

# base case: if r == '', return s == '' // s = "123" .. recursive(s, r)
# Otherwise if the first thing in r is not an asterisk(*), then match the first character of both r and s. If they match, return match(r[1:], s[1:]). If they don't return false.
# If the first things in r is an asterisk, then

def matches_first_char(s, r):
return s[0] == r[0] or (r[0] == "." and len(s) > 0)

def matches(s, r):
# base case
if r == "":
return s == ""

# The first char in the regex r is not proceeded by a *
if len(r) == 1 or r[0] != "*":
if matches_first_char(s, r):
return matches(s[1:], r[1:])
else:
return False

# The first char in r is proceeded by *
if matches(s, r[2:]):
# Try zero length
return True

# If it doesn't match staight away, try globbing until
# the first character of the string doesn't match anymore.
i = 0
while matches_first_char(s[i:], r):
if matches(s[i+1:], r[2:]):
return True
i += 1
return False
```


```python
r = "tx."
s = "txt"
matches(s, r)
```




True



This takes **O(len(s) * len(r))** time and space, since we potentially need to iterate over each suffix substring again for each character.

Fun fact: Stephen Kleene introduced the * operator in regular expressions and as such, it is sometimes referred to as the Kleene star.


```python

```
Loading

0 comments on commit 7a4cffb

Please sign in to comment.