Skip to content

Inconsistent results when changing the -B parameter #261

Open
@abolibibelot1980

Description

@abolibibelot1980

I found out about xdelta recently and made some tests with it, with video files in particular, in order to keep a reference to the original file when said file has been remuxed — either by adding subtitles, or an audio track, or simply remuxing into another container. For files which have a straightforward structure, with a (single) header, followed by video / audio streams, it works very well, the size of the generated “diff” file is as small as can be with the default setting. But I'm having trouble with TS files (Transport Stream), which have a specific structure, with (as I understand it) multiple chunks of video / audio data, each having its own header (which is meant to prevent playback issues when parts of the streams aren't transmitted properly). When remuxing such a file to MP4 or MKV, the video / audio streams are extracted from those small chunks and placed contiguously into the new container ; the remuxed file is therefore significantly smaller because of the reduced “overhead”, and the playback is generally smoother (no lag when randomly accessing any spot / timecode). I have many TS files from television recordings which I converted or want to convert to MP4 or MKV, but I'd like to keep a reference to the original TS file before deleting it, as I know from experience that there can be unexpected issues later on (for instance a video editor could refuse the remuxed file, or there could be a glitch which caused a video / audio desynchronization in the remuxed file and which could not be fixed without going back to the original TS file). Here comes xdelta (which was suggested here for such purposes).

I first used an old version, 3.0t, which I happened to find with a “patch” for an animation movie in MKV. I will note below the beginning of the file name (which is the date of broadcast), and the size of the “diff” file obtained with xdelta. Using the basic command xdelta3 -e -s "remuxed file" "original file" "diff file", I got these results :

201410150005 ... .mp4 [remuxed file] 370,479,951 bytes
201410150005 ... .ts [original file] 393,272,500 bytes
201410150005 ... .ts.diff 22,100,120 bytes

201410142230 ... .mp4 [remuxed file] 492,882,417 bytes
201410142230 ... .ts [original file] 523,175,988 bytes
201410142230 ... .ts.diff 29,532,754 bytes

201410122320 ... .mp4 [remuxed file] 649,361,203 bytes
201410122320 ... .ts [original file] 689,051,020 bytes
201410122320 ... .ts.diff 162,539,264 bytes

For the first two, the size of the “diff” file seemed acceptable, but for the third one it was obviously too big ; so, based on the advice given in this discussion, I used the -B option, setting it to the size of the source file. With -B 649361203 I got :

201410122320 ... .ts.diff [-B 649361203] 38,242,658 bytes

Much better indeed. Then I processed the first two files likewise, setting the -B parameter to the size of the corresponding TS file, figuring that it would shrink the size of the resulting “diff” file some more.

201410150005 ... .ts.diff [-B 393272500] 34,096,675 bytes

201410142230 ... .ts.diff [-B 523175988] 48,801,030 bytes

But the opposite happened : the “diff” files obtained from those two files were significantly bigger !
Then I tested one of those files with decreasing values of the -B parameter (decreasing powers of 2) :

201410142230 ... .ts.diff [xdelta 3.0t -B 536870912] 48,801,030 bytes => same as with the file's size (makes sense as that value is bigger)
201410142230 ... .ts.diff [xdelta 3.0t -B 268435456] 29,687,747 bytes
201410142230 ... .ts.diff [xdelta 3.0t -B 134217728] 29,786,624 bytes
201410142230 ... .ts.diff [xdelta 3.0t -B 67108864] 29,532,754 bytes => default value of “-B”
201410142230 ... .ts.diff [xdelta 3.0t -B 33554432] 257,993,708 bytes

It turns out that for this file, the “sweet spot” is around the default value. Using a very large value (i.e. equal to the source file's size) yields a bigger “diff” file, and using a smaller value yields a much bigger “diff” file.

I ran those tests again with xdelta 3.0.11, which seems to be the most recent stable release. The global performance was much improved (the sizes of “diff” files are consistently smaller), yet I experienced the same pattern with regards to the relative sizes of “diff” files obtained with various values of the -B parameter.

201410150005 ... .ts.diff [xdelta 3.0.11 default] 7,533,148 bytes
201410150005 ... .ts.diff [xdelta 3.0.11 -B 393272500] 11,925,966 bytes
201410142230 ... .ts.diff [xdelta 3.0.11 default] 10,116,897 bytes
201410142230 ... .ts.diff [xdelta 3.0.11 -B 523175988] 15,497,312 bytes

Bigger “diff” file with -B set to the size of the source file for those two.

201410122320 ... .ts.diff [xdelta 3.0.11 default] 156,164,290 bytes
201410122320 ... .ts.diff [xdelta 3.0.11 -B 649361203] 30,213,812 bytes

Much smaller “diff” file with -B set to the size of the source file for that one.

201410142230 ... .ts.diff [xdelta 3.0.11 -B 536870912] 15,497,312 bytes => same as with the source file's size
201410142230 ... .ts.diff [xdelta 3.0.11 -B 268435456] 10,306,754 bytes
201410142230 ... .ts.diff [xdelta 3.0.11 -B 134217728] 10,375,873 bytes
201410142230 ... .ts.diff [xdelta 3.0.11 -B 67108864] 10,116,897 bytes => default value of “-B”
201410142230 ... .ts.diff [xdelta 3.0.11 -B 33554432] 250,575,115 bytes

Smaller sizes than with the older version, but same pattern : the smaller size is obtained with the default value, setting the -B parameter to the size of the source file yields a bigger “diff” file, setting it to a lower value yields a much bigger “diff” file.

How could those results be explained ?
And how could I batch process a whole directory with hundreds of TS files, if there's seemingly no way of finding an optimal setting for all of them ?

I could upload the video files used for those tests, if necessary, but they're quite big and I have a rather slow uploading speed so I would prefer to get some feedback first.

Thanks.

Gabriel, France

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions