-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement eager, streaming punct fixer #21
Conversation
As of this commit, users can import the PunctFixStreamer which allows for inputting unfinished segments and getting partial results which can be trusted as corresponding to a subset of the final result
a4ee05f
to
18f01ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTB!
Very few comments, I have not suggested changes, as I didn't want to block a merge.
The code is very well documented, which helped a lot on the streaming part.
I see all tests are cleared, so I see no issues 👍
For future: Upgrade Python and change to pytest
and the partial, finalized text if there has been updates to it. | ||
""" | ||
self.buffer.extend( | ||
self.punct_fixer.init_word_prediction_list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large can this buffer get? - just memory wise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as one entire text, (+ some storage for labels) so it would not use any more memory than normal punctfixer, it just keeps the memory for longer
""" | ||
Reset internal state. | ||
""" | ||
self.buffer = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this clear all memory from the buffer? just curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -206,22 +207,22 @@ def test_do_normalize(self): | |||
for model_input in ("hejsa, mand", " hejsa mand", "hejsa mand", | |||
"Hejsa mand", "hejsa mand", " hejsa mand", " hejsa, Mand", | |||
"hejsa % mand ! % "): | |||
actual_output = self.model._split_input_text(model_input) | |||
actual_output = self.model.split_input_text(model_input) | |||
self.assertEqual(actual_output, expected_output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch to pytest instead of unittest? 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree - I have made an issue for that #23
self.assertEqual(actual_output, expected_output) | ||
|
||
def test_sample02(self): | ||
model_inputs = "en dag bliver vi sku glade", "for", "at vi nu kan", "sætte punktummer ",\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgu 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammatik Babba 🙌😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost find it cute that we still have @Rasmusafj old, funny texts here :P
As of this commit, users can import the PunctFixStreamer which allows for inputting unfinished segments and getting partial results which can be trusted as corresponding to a subset of the final result