Selective node splitting

So @gloryknight came up with a very interesting and simple heuristic that apparently works really well in many JSON files: whenever a `,` is encountered (can be set to another character, like a space or newline), LZ is forced to stop the current substring, and start anew (with a leading comma), _except_ when the current substring starts with a comma. See these two pull-requests:

https://github.com/JobLeonard/lz-string/pull/3

https://github.com/JobLeonard/lz-string/pull/4

The reason for this being effective is a bit subtle: imagine that we have a string we are scanning through, and the next set of characters will be `abcdefg`. Furthermore, our dictionary already has the substrings `abc`, `abcd` and `defg` (plus the necessary substrings to get to this point), but _not_ `efg`. Obviously, the ideal combination of tokens would be `abc` + `defg`. Instead we'll get `abcd` + `e` + `f` + `g`. This can happen quite often in LZ. So how to avoid this? Well, I guess gloryknight's insight was that not all characters are created equal here; they can have special functions. One of those is as *separator characters*. Think of natural language: or words are separated by spaces, so if we split on the space character (and similar separator like newlines, dots, commas) we would converge on identical substrings much quicker.

Since LZString is most commonly used whn compressing JSON, which strips out all unnecessary whitespace, the `,` is the option that seems to improve compression performance (although maybe `{` and `:` also make sense, or maybe all three?). In his tests it gave significant compression benefits at a small perf cost.

The best bit? This is perfectly backwards compatible with previous codes: the output can be decompressed by the same function as before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Selective node splitting #120

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Selective node splitting #120

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions