Implement eager, streaming punct fixer #21

sorenmulli · 2023-12-06T14:07:08Z

As of this commit, users can import the PunctFixStreamer which allows for inputting unfinished segments and getting partial results which can be trusted as corresponding to a subset of the final result

kshll6

LGTB!
Very few comments, I have not suggested changes, as I didn't want to block a merge.

The code is very well documented, which helped a lot on the streaming part.

I see all tests are cleared, so I see no issues 👍

For future: Upgrade Python and change to pytest

kshll6 · 2023-12-06T16:43:23Z

punctfix/streaming.py

+        and the partial, finalized text if there has been updates to it.
+        """
+        self.buffer.extend(
+            self.punct_fixer.init_word_prediction_list(


How large can this buffer get? - just memory wise

As long as one entire text, (+ some storage for labels) so it would not use any more memory than normal punctfixer, it just keeps the memory for longer

kshll6 · 2023-12-06T16:48:20Z

punctfix/streaming.py

+        """
+        Reset internal state.
+        """
+        self.buffer = []


Does this clear all memory from the buffer? just curious

Yep! Trust the collector!

kshll6 · 2023-12-06T16:50:57Z

tests/test_punctuation.py

@@ -206,22 +207,22 @@ def test_do_normalize(self):
        for model_input in ("hejsa, mand", " hejsa mand", "hejsa mand",
                "Hejsa mand", "hejsa  mand", "  hejsa mand", "  hejsa, Mand",
                "hejsa % mand ! % "):
-            actual_output = self.model._split_input_text(model_input)
+            actual_output = self.model.split_input_text(model_input)
            self.assertEqual(actual_output, expected_output)


Switch to pytest instead of unittest? 😊

Yes, I agree - I have made an issue for that #23

kshll6 · 2023-12-06T16:57:28Z

tests/test_punctuation.py

+        self.assertEqual(actual_output, expected_output)
+
+    def test_sample02(self):
+        model_inputs = "en dag bliver vi sku glade", "for", "at vi nu kan", "sætte punktummer ",\


Grammatik Babba 🙌😀

I almost find it cute that we still have @Rasmusafj old, funny texts here :P

sorenmulli · 2023-12-08T08:12:46Z

Thanks! I have made issues for new Python version and for switching to Pytest! :) #23 #22

Implement eager, streaming punct fixer

18f01ac

As of this commit, users can import the PunctFixStreamer which allows for inputting unfinished segments and getting partial results which can be trusted as corresponding to a subset of the final result

sorenmulli force-pushed the feature/streaming branch from a4ee05f to 18f01ac Compare December 6, 2023 14:09

sorenmulli marked this pull request as ready for review December 6, 2023 14:10

sorenmulli requested a review from kshll6 December 6, 2023 14:10

kshll6 reviewed Dec 6, 2023

View reviewed changes

sorenmulli merged commit 00546c1 into main Dec 8, 2023
2 checks passed

sorenmulli deleted the feature/streaming branch December 8, 2023 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement eager, streaming punct fixer #21

Implement eager, streaming punct fixer #21

sorenmulli commented Dec 6, 2023

kshll6 left a comment •

edited

Loading

kshll6 Dec 6, 2023

sorenmulli Dec 8, 2023

kshll6 Dec 6, 2023

sorenmulli Dec 8, 2023

kshll6 Dec 6, 2023

sorenmulli Dec 8, 2023

kshll6 Dec 6, 2023

Rasmusafj Dec 6, 2023

sorenmulli Dec 8, 2023

sorenmulli commented Dec 8, 2023

Implement eager, streaming punct fixer #21

Implement eager, streaming punct fixer #21

Conversation

sorenmulli commented Dec 6, 2023

kshll6 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorenmulli commented Dec 8, 2023

kshll6 left a comment •

edited

Loading