Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement eager, streaming punct fixer #21

Merged
merged 1 commit into from
Dec 8, 2023
Merged

Conversation

sorenmulli
Copy link
Member

As of this commit, users can import the PunctFixStreamer which allows for inputting unfinished segments and getting partial results which can be trusted as corresponding to a subset of the final result

As of this commit, users can import the PunctFixStreamer which allows
for inputting unfinished segments and getting partial results which can
be trusted as corresponding to a subset of the final result
@sorenmulli sorenmulli marked this pull request as ready for review December 6, 2023 14:10
@sorenmulli sorenmulli requested a review from kshll6 December 6, 2023 14:10
Copy link
Contributor

@kshll6 kshll6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTB!
Very few comments, I have not suggested changes, as I didn't want to block a merge.

The code is very well documented, which helped a lot on the streaming part.

I see all tests are cleared, so I see no issues 👍

For future: Upgrade Python and change to pytest

and the partial, finalized text if there has been updates to it.
"""
self.buffer.extend(
self.punct_fixer.init_word_prediction_list(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large can this buffer get? - just memory wise

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as one entire text, (+ some storage for labels) so it would not use any more memory than normal punctfixer, it just keeps the memory for longer

"""
Reset internal state.
"""
self.buffer = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this clear all memory from the buffer? just curious

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Trust the collector!
billede

@@ -206,22 +207,22 @@ def test_do_normalize(self):
for model_input in ("hejsa, mand", " hejsa mand", "hejsa mand",
"Hejsa mand", "hejsa mand", " hejsa mand", " hejsa, Mand",
"hejsa % mand ! % "):
actual_output = self.model._split_input_text(model_input)
actual_output = self.model.split_input_text(model_input)
self.assertEqual(actual_output, expected_output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to pytest instead of unittest? 😊

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree - I have made an issue for that #23

self.assertEqual(actual_output, expected_output)

def test_sample02(self):
model_inputs = "en dag bliver vi sku glade", "for", "at vi nu kan", "sætte punktummer ",\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgu 🙃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammatik Babba 🙌😀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost find it cute that we still have @Rasmusafj old, funny texts here :P

@sorenmulli
Copy link
Member Author

Thanks! I have made issues for new Python version and for switching to Pytest! :) #23 #22

@sorenmulli sorenmulli merged commit 00546c1 into main Dec 8, 2023
2 checks passed
@sorenmulli sorenmulli deleted the feature/streaming branch December 8, 2023 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants