Skip to content

Conversation

@andychu
Copy link
Contributor

@andychu andychu commented Aug 27, 2025

No description provided.

@andychu andychu requested a review from akinomyoga August 27, 2025 02:18
@andychu
Copy link
Contributor Author

andychu commented Aug 27, 2025

#2353

Sorry for the delay. This PR has been based on #2349, but after #2349 is merged, you modified the code of #2349 in commit 0f31a51. As far as I remember, there was a reason for how I implemented #2349, and I think the simplification applied in 0f31a51 should have broken it, although it was not covered by the spec test cases. Then, I didn't have time to re-examine #2349 before working on this PR. I'm now busy and also am going to be busy for the next few months, so I don't have time to do the investigation again.

Regarding this comment, I think the original code in your PR was equivalent to this one line replace()

And it doesn't break the tests

So I wonder if you think this change should be merged, or if GlobUnescapeBackslash() is not accounting for something, and we are missing test cases?


I was thinking we should add either

  • static types
  • strict runtime assertions about what set of characters is allowed in the various escaping / unescaping modes

but I didn't get to look into that yet

@andychu
Copy link
Contributor Author

andychu commented Aug 27, 2025

another POSIX shell quirk to write about?

In test/spec.sh glob, I noticed that mksh/ksh/yash disagree with the rest of the shells on 2 test cases you added.

(note: ksh/yash are NOT active by default, I added them locally)

 32     298     pass    pass    BUG     pass    BUG     BUG     pass    \ in unquoted substitutions does not match a backslash
 33     333     pass    pass    pass    pass    pass    pass    pass    \ in unquoted substitutions is preserved
 34     344     pass    pass    pass    pass    pass    pass    pass    \ in unquoted substitutions is preserved with set -o noglob
 35     354     pass    pass    pass    pass    pass    pass    pass    \ in unquoted substitutions is preserved without glob matching
 36     367     pass    pass    BUG     pass    BUG     BUG     pass    \ in unquoted substitutions escapes globchars

I am satisfied with what we have, since it matches the common shells

But which behavior is POSIX? (this is not even "bashix", it's POSIX - https://github.com/bashix-spec/bashix?tab=readme-ov-file)


My ambition for OSH was to be an "executable spec", in a high level language like Python

Basically the idea would be that if anyone else writes a shell in the future, they should base it on OSH, because we figured out all these corner cases and exact behavior, which POSIX doesn't

I am not sure if we lived up to that ambition, since we still have some questions, and the code is more complex than I'd like. (i.e. I am still not sure about the double-escape-double-unescape algorithm, maybe it was the wrong thing, but it does work now)

So questions like this could be something we write about

@akinomyoga
Copy link
Collaborator

No, this change is the complete opposite of what it should be.

First of all, the renaming of GlobUnescapeUnquotedSubstitution => GlobUnescapeBackslash performed in 0f31a51 is wrong. The purpose of the function is not just to reverse \@. The function sees both \@ and other backslash escape sequences, and it needs to properly handle all of them.

@akinomyoga
Copy link
Collaborator

akinomyoga commented Aug 27, 2025

If you want to apply any fixes soon, please just revert 0f31a51 and 9041a6f. They are wrong.

@andychu
Copy link
Contributor Author

andychu commented Aug 27, 2025

OK but what was confusing is that in the original code there was:

        if mylib.ByteEquals(c, '\\') and i != n - 1:
            # Suppressed this to fix bug #698, #628 is still there.

            assert i != n - 1, 'Trailing backslash: %r' % s

            i += 1
            c2 = mylib.ByteAt(s, i)

            if mylib.ByteInSet(c2, GLOB_META_CHARS):
                unescaped.append(c_backslash)
                unescaped.append(c2)

But that is just doing the identity transformation

  • it sees \, and then it sees say * which is part of GLOB_META_CHARS
  • and then it translates that into c_backslash and * again - the exact same thing

That is why I changed it

And then I realized it can be changed to replace() without breaking tests


I thought that you copied the form of GlobEscape() in GlobEscapeUnquotedSubstitution(), but then didn't simplify it

But I guess the problem is that we should have a test case to show if this is wrong

Right now it passes all tests


I agree it is a bit "suspicious", but I don't know exactly why

So that is why I sent this, to ask a question -- not saying how the code should be

@andychu
Copy link
Contributor Author

andychu commented Aug 27, 2025

I don't mind reverting those changes, but it would help if we have a test case to show why they are wrong


I also think the code before those changes can be simplified ? But we could do that after reverting it

@andychu
Copy link
Contributor Author

andychu commented Aug 27, 2025

Also in

0f31a51#diff-3c590faf773bb408e6224c81321d122f1c1c96ece22f49025969535932c853ec

I added unit tests for that specific function (not spec tests)

I think this is a good way to have more fine-grained assertions and agreement on what the function is supposed to do

i.e. is the input allowed to contain \@ \* \\, and is the output allowed to contain \@ \* \\ or a single trailing backslash \ ? I think the unit tests could help us be very precise


And likewise for the other 3 functions -- we have

  • GlobEscape() / GlobUnescape()
  • GlobEscapeX() / GlobUnescapeX() - we can rename these back if you think it is wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants