Skip to content

Printing containers (lex, list) is now overly verbose #2894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JonnyOThan opened this issue Mar 10, 2021 · 7 comments · May be fixed by #2909
Open

Printing containers (lex, list) is now overly verbose #2894

JonnyOThan opened this issue Mar 10, 2021 · 7 comments · May be fixed by #2909

Comments

@JonnyOThan
Copy link
Contributor

As a side effect of the work to make single primitives serializable, printing a list or a lexicon on the terminal now takes twice as much vertical screen space as it used to. This makes it far less likely to see the thing you're looking for and much less usable than before. For example, I would commonly print a module to see all of its fields and events, but now it takes way too much space and can't fit on a screen.

@Dunbaratu
Copy link
Member

Interesting datapoint:
print ship:parts. shows lists the old way - one per line.
Not saying this isn't wrong. I'm just pointing out that when I look at this later, knowing that it is different may help identify what changed, by looking at how that one differs from the other LIST containers.

@Dunbaratu
Copy link
Member

Dunbaratu commented Mar 27, 2021

Right now the ToString() for any EnumerableValue type makes use of SafeSerializationMgr.Serialize() to do the dirty work of remembering the indent levels recursively so things nested in things get printed right. The problem is that this algorithm makes use of an object's Dump if it's available instead of that object's ToString(). So anything that is Serializable gets its ToString() bypassed, instead being printed as if it was getting serialized, one field per line, regardless of what its ToString() says.

For example:

SET seeThroughBlue to RGBA(0,0,1,0.5).
set myList to LIST(seeThroughBlue).
print myList. // prints the zeroth item as:  ["R"] = 0\n    ["G"] = 0\n    ["B"} = 1\n    ["A"] = 0
print myList[0].  // prints "RGBA(0, 0, 1, 0.05)", using RgbaColor.ToString().

Even though the zeroth item in myList should look the same whether printed inside myList's print or directly, it doesn't.

Looking at the code it seems the logic that was used was to piggyback the ToString() for containers on top of the serialization system, but just provide a second formatter for it - one that dumps lines without the JSON syntax, called TerminalFormatter.

This is buried under so many layers of muck that it's a pain to fix. As long as the container ToString() is built on top of the Dump system, that causes it to bypass ToString() and instead favor a field-by-field dumping, which is appropriate for something like JSON but wrong for something like PRINT FOO.

But there is one reason that this was done - and that's that C#'s provided ToString() system never implemented a good way to indent things nested in things. ToString() takes zero args so you can't recurse through a collection of things telling all its items' ToString()s that they need to indent themselves. This is needed if their ToString() is a multiline string. Because let's say you are four spaces indented in, then your ToString() that would normally be

"this\nis my\mstring"

should instead be

"    this\n    is my\n    string"

So to handle that indenting, the system got piggybacked on top of the Dump that was designed for JSON.

So to fix this properly, I think I need to do this, as it will greatly simplify things:

  1. Just implement a new public string Structure.ToStringIndented(int level). Its default behavior will be to just call the object's vanilla ToString() and parse through its output, finding the line breaks so it can indent spaces at the start of each line of the string. This will handle any "leaf node" Structure type which isn't a container of other Structure items but could hypothetically print a formatted multi-line output of itself.

  2. For those Structure types which are containers of other Structure items, override this public string Structure.ToStringIndented(int level) so when it wants to recursively call the inner item's ToStrings it really calls its ToStringIndented(level + 1)'s instead.

  3. Since doing ToString() of a container is really the only place that TerminalFormatter ever got used, just remove it entirely after steps 1 and 2 above are working.

@thexa4
Copy link
Contributor

thexa4 commented Mar 27, 2021

One advantage of using the dumper is that it allows tracking cycles (#2882). If we go back to using ToString the PR will not be able to prevent crashes when (accidentally) printing recursive stuctures.

@Dunbaratu
Copy link
Member

That's a valid point, but the Dumper is so awful looking that I'd rather implement a separate check for printing than for serializing than continue using the Dumper for terminal output. The real problem I found with the Dumper is that the heuristic rule "should I use ToString() or not here?" isn't as simple as "yes if container, no if not".

For example, technically a "range" is an enumerated type (just one where when you iterate over it it's manufacturing one value at a time on the fly rather than actually having all of them stored, but it iterates like it was a collection of numbers). If I have the rule "use this one rule for printing all enumerable things in a nice way" and it hits a Range(), then it's going to go, "okay, I'll print all the values then..." unless I make an exception for that one type. It's the need to make exceptions for how different types print that makes it better to have them inherit much like ToString does, so they can each define how they print. (For example, a queue shouldn't really number the items in the list since you can't get at the values out of order. It should really name the zeroth item "front" and the N-1th item "back", and not print indexes for them like it does now. (i.e. you can't really access element 4 of a queue by doing myqueue[4] like the dump format we have now implies.)

But you do bring up a very good point, that along with this fix should come a stock detection for max recurse depth so it doesn't break KSP.

@Dunbaratu
Copy link
Member

@thexa4 The system I'm implementing for this contains a spot where all the printing funnels through one base Structure.ToStringIndent(int level) method eventually. Do you think it will be sufficient to try to prevent infinite recursion by merely putting a max limit on how big that int level is allowed to get before it throws an exception? It wouldn't be as clever as your PR is about detecting loops by marking visited nodes as you go, but that's one of the areas where I think a print foo should be very different from a writejson(foo, filename) - when just printing the value out it would seem to be correct to print multiple copies of an object if its referenced multiple times in a list or lex, and incorrect to try dumping something indented like 30 times deep so it slams the terminal with a giant buffer of scrollback.

@thexa4
Copy link
Contributor

thexa4 commented Mar 27, 2021

I agree, the only downside with printing a bounded depth of multiple copies is if you have large repeated objects.

While looking at the dumper code I think it might be salvageable though. If you want I can submit a PR using the dumper method?

@Dunbaratu
Copy link
Member

@thexa4 said:

While looking at the dumper code I think it might be salvageable though. If you want I can submit a PR using the dumper method?

If you think you can, sure. To make it clear, I wasn't trying to get rid of dumper entirely, just stop trying to make it both for serializing and for ToString. The Dumper logic would still be used when writing JSON out.

I am about ready to make a PR for what I did with my way to fix it. I was going to wait until I'd tested it further before I post the PR, but I'll post what I have now so you can look at it and compare results to what you're going to try, then we can accept whichever one of the two we like more.

Dunbaratu added a commit to Dunbaratu/KOS-1 that referenced this issue Mar 27, 2021
Fixes KSP-KOS#2894

There are going to be two proposals for how to fix KSP-KOS#2894. (Read the
comments for the issue).

This is one of them.

In this proposal, We stop borrowing the Dump infrastructure for
creating ToString()s.  The Dump infrastructure was designed for
when you want to serialize, meaing every single field must be
visited and written with nothing 'missing', so values can be
saved and restored.  I found that layering on top of that
system while still preserving the notion of ToString() being meant
for human consumption was kind of hard to wrap my head around
so I went with this idea.

This PR is being made before this is fully tested, so it can be
shown to other people who are proposing alternate solutions.  I
also want to show what this output ends up looking like when you
try it.  Note the code still has some "eraseme" things in it
at this point, which is how I remind myself that an idea wasn't
done yet.  (I have a check I do before a final PR, usually, where
I 'grep' all my code for the string "eraseme" to highlight places
I stuck something in temporarily during testing.  Here I'm using
that to remind myself of stuff that isn't done yet in this PR.

While I think this produces good output, I am willing to be convinced
other approaches might work too, so I'm open to trying to preserve
using Dumper for indented ToString()s if it can be done.
@Dunbaratu Dunbaratu linked a pull request Mar 27, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants