You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I occasionally see SRTs in which 1 or 2 captions begin with the Line Separator character, u2028. Those captions get incorrectly parsed as blank.
I believe the character originates in Word, and is carried over when transcript is copy-pasted to YouTube to use YouTube's transcript auto-timing function.
This character seems to act as a normal line break when in the middle or end of a caption; the issue only arises when it is the first character of the caption.
I think the parser to ignore this character.
VLC, for the record, ignores it and displays the caption normally.
Gotchas:
It may make sense to pre-process the file, replacing u2028 with a more compatible line break like \n. We should be careful, though, not to inadvertently trigger the blank line state outlined in Issue 71 by having a caption start with \n.
Example SRT that exhibits this problem:
1
00:00:08,330 --> 00:00:13,653
This caption starts with the character
u2028, which causes PySRT to see it as blank.
2
00:00:13,653 --> 00:00:18,305
This caption has a u2028 here: which does not cause issues.
3
00:00:18,305 --> 00:00:22,906
This caption starts with a normal line break; VLC
and PySRT show it as blank as per Issue 71.
Output:
Caption 1: VLC displays the caption, PySRT parses it as blank
Caption 2: VLC and PySRT display the caption
Caption 3: VLC and PySRT show the caption as blank
The text was updated successfully, but these errors were encountered:
I occasionally see SRTs in which 1 or 2 captions begin with the Line Separator character, u2028. Those captions get incorrectly parsed as blank.
I believe the character originates in Word, and is carried over when transcript is copy-pasted to YouTube to use YouTube's transcript auto-timing function.
This character seems to act as a normal line break when in the middle or end of a caption; the issue only arises when it is the first character of the caption.
I think the parser to ignore this character.
VLC, for the record, ignores it and displays the caption normally.
Gotchas:
It may make sense to pre-process the file, replacing u2028 with a more compatible line break like
\n
. We should be careful, though, not to inadvertently trigger the blank line state outlined in Issue 71 by having a caption start with\n
.Example SRT that exhibits this problem:
Output:
The text was updated successfully, but these errors were encountered: