Skip to content

Conversation

@khwilliamson
Copy link
Contributor

@khwilliamson khwilliamson commented Oct 30, 2025

This p.r. adds the capability to undefine macros that are visible to XS code but shouldn't be. This can be used to stop macro namespace pollution by perl.

It works by changing embed.h to have two modes, controlled by a #ifdef that is set by perl.h. perl.h now #includes embed.h twice. The first time works as it always has. The second sets the #ifdef, and causes embed.h to #undef the macros that shouldn't be visible. This call is just before perl.h returns to its includer, so that these macros have come and gone before the file that #included perl.h is affected by them. It comes after the inline headers get included, so they have access to all the symbols that are defined.

The list of macros is determined by the visibility given by the apidoc lines documenting them, plus several exception lists that allow a symbol to be visible even though it is not documented as such.

In this p.r., the main exception list contains everything that is currently visible outside the Perl core, so this should not break any code. But it means that visibility control is established for future changes to our code base. New macros will not be visible except when documented as needing to be such. We can no longer inadvertently add new names to pollute the user's.

I expect that over time, the exception list will become smaller, as we go through it and remove the items that really shouldn't be visible. We can then see via smoking if someone is actually using them, and either decide that these should be visible, or work with the module author for another way to accomplish their needs. (I would hope this would lead to proper documentation of the ones that need to be visible.)

There are currently four lists of symbols.

One list is for symbols that are used by libc functions, and that Perl may redefine (usually so that code doesn't have to know if it is running on a platform that is lacking the given feature.) The algorithm added here catches most of these and keeps them visible, but there are a few items that currently must be manually listed.

A second list is of symbols that the re extension to Perl requires, but no one else needs to. This list is currently empty, as everything initially is in the main exception list.

A third list is for items that other Perl extensions require, but no one else needs to. This list is currently empty, as everything initially is in the main exception list.

The final list is for items that currently are visible to the whole world. It contains thousands of items. This list should be examined for:

  1. Names that shouldn't be so visible; and
  2. Names that need to remain visible but should be changed so they are less likely to clash with anything the user might come up with.

I have wanted this ability to happen for a long time; and now things have come together to enable it.

This allows us to have a clear-cut boundary with CPAN.

It means you can add macros that have internal-only use without having to worry about making them likely not to clash with user names.

It shows precisely in one place what our names are that are visible to CPAN.

  • This set of changes requires a perldelta entry, and I need help writing it.

@jkeenan
Copy link
Contributor

jkeenan commented Oct 30, 2025

This p.r. undefines all macros that are visible to XS code but shouldn't be. This stops macro namespace pollution by perl.

Can you cite some examples of where "we" (Perl programmers in general, or, at a minimum, CPAN authors) have gotten into trouble with this?

@khwilliamson khwilliamson force-pushed the regain_control branch 9 times, most recently from 1f76cdb to 63bdd5f Compare October 31, 2025 00:07
@jkeenan
Copy link
Contributor

jkeenan commented Oct 31, 2025

CI tests on Windows are failing. One aspect of these failures is that where we are expecting an escaped backslash (\\) in a statement's output, we're not getting even an unescaped slash.

@khwilliamson
Copy link
Contributor Author

Getting control of our boundaries is a big deal. That's why we have EXPORT in perl code.

In the past, we've run into trouble with people using interfaces that weren't supposed to be public, and then complaining if we changed them. The first big step to controlling that that I can think of is @xenu adding the ability to restrict visibility of non-public functions. I fairly recently extended that to functions that are visible to perl extensions, though I think there are bypasses to that if someone really wants to. This step extends this to macros. Again, a determined programmer can defeat these. But it's harder to do.

But we have willy-nilly created symbols that infringe on the XS writer's name space. Writers have had to work around this at times. There are modules that people can load that try to undo this. This p.r. has us undoing much of that intrusion.
See https://stackoverflow.com/questions/22903542/what-is-namespace-pollution.

A recent example of a minor issue is #23614 (comment)
This p.r. causes that to be a non-issue, for the moment anyway.

We as a project need to have a discussion about what sorts of symbols can intrude. I have always operated under the assumption that we could reserve symbols ending in an underscore for our use. I don't know where I got that idea; probably because others who preceded me had made that assumption. But this needs to be clarified.,

This p.r. brings out into the open for examination just what macros are visible to XS writers.

@tonycoz
Copy link
Contributor

tonycoz commented Nov 3, 2025

A recent example of a minor issue is #23614 (comment)
This p.r. causes that to be a non-issue, for the moment anyway.

It doesn't.

Setting PERL_DO_UNDEFS makes the perl API essentially unusable.

As a quick example it undefs SvSetSV_and(), which is used by the API SvSetSV() and SvSetMagicSV() macros.

It also undefs many flag check macros and accessors that are API (even if not in embed.fnc) and are accessible as functions, eg CVf_METHOD, most CV flag checks, GvIOn(),

My complaint in the above comment was that the o1_ and type1_ macros are too short and too global, which could be fixed by making OpTYPE_set() an inline function as @Leont suggested, or as I described in that comment. This change nukes from orbit.

@khwilliamson
Copy link
Contributor Author

This p.r. causes that to be a non-issue, for the moment anyway.

It doesn't.

What I meant is that this p.r. undefines that symbol, so it is not visible to the outside world. And if someone tried to use the macro that needs this symbol, it wouldn't compile. This would alert us to the problem. Whether or not this is a good enough symbol name is outside the scope of this p.r., so the discussion about that should be elsewhere, and yes it does need to be discussed and resolved.

And if this p.r. were to be merged as-is, the API would be unusable. I'm sorry if it came across that that is what I expected to happen. The essence of this p.r., without the individual elements, is to gain control of what we expose to the XS writer's name space. I strongly believe that this is something we should do, and I have wanted to have it done for a long time. Maybe I should have labelled this as a POC.

The code added here looks for symbols that are exposed to the XS namespace that we don't document as existing, and symbols whose names are definitely reserved for perl's use. It undefs the rest unless they are on one of the exception lists. The exception lists could be expanded to include everything it finds, causing nothing to be undefined., Then nothing would break, but new phantom symbols that get created would be caught and have to be resolved.

This is how we've done similar things in the past, diag.t is to keep people from adding undocumented messages to new code, but has an exception list so that it could go into blead without breaking things. That list has been whittled down over time.

And we can look through that list all in one place and see if there are symbols that need to be exposed as a consequence of some other macro using them, but whose names are such that they should be changed to avoid potential conflicts.

And it means that I don't have to be so careful in creating symbols inside headers that are not going to be used outside; as they will get automatically undefined.

@tonycoz
Copy link
Contributor

tonycoz commented Nov 5, 2025

OpTYPE_set() was moved to op.h and isn't guarded by PERL_CORE. I know @leonerd uses it, (example) but he also defines it if not present. So I think the intent was to make it public, which would require the o1_ and type1_ macros to be public.

There are many other names that seem to me at first glance shouldn't be undefined by default:

  • AMGf_* - used by the amagic_call(), try_amagic_bin(), try_amagic_un(), tryAMAGICun_MG() or tryAMAGICbin_MG() API.
  • [ap]THX[aox]_? - are for backward compatibility for old XS
  • the FEATURE related macros are only defined if you #include feature.h, which isn't done by default.
  • the case_*_SBOX32 names should probably by undefined in sbox32_hash.h
  • cBINOP*, cCOP, cGVOP*, cLISTOP, cLOOP, cLOGOP, cMETHOD* and many other c[A-Z]* are API I think (usable by custom ops)
  • Cv* are all API I think
  • CX_* are API, see perl5240delta

(up to "C")

@khwilliamson
Copy link
Contributor Author

I have revised this p.r. to not undefine anything that wasn't already visible to the outside world, and changed the description. It by itself shouldn't break anything, but set us up for gaining control of the namespace.

Prior to this commit, a commented-out prototype was created for all
macros that we have argument and return type information for.  This
might be useful information for a reader of proto.h.

This commits stops doing this for private macros.  It makes a future
commit slightly easier, and I'm unsure of the usefulness of this anyway.
But I could be persuaded otherwise.
Where 'weird' is defined as meaning something where the normal rules
don't apply, so something we generate is unlikely to be correct.

This only affects one element, and it uses aTHX in a way that is
incompatible with it being automated.
These keywords all need another word to indicate the parameter type.
Previously only 'struct' was considered to have.

This changed showed an error in one entry embed.fnc, which is also
corrected in this commit.
These macros are not for external use, so don't need a Perl_ prefix
This is required for the next few commits that start automatically
creating long Perl_name functions for the elements in embed.fnc that are
macros and don't already have them in the source.

Only macros can take a parameter that has to be a literal string, so
don't fit with the next few commits.  This is the only case in embed.fnc
like that, so I'm deferring dealing with it for now.
embed.h Outdated
#if defined(PERL_DO_UNDEFS)
# if !defined(PERL_CORE)
# undef do_aexec
# undef free_c_backtrace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing defines free_c_backtrace, only Perl_free_c_backtrace is defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug in embed.fnc. I'll add a commit to fix it

regen/embed.pl Outdated
# For all modules that aren't deliberating using particular names, all the
# other symbols on it are namespace pollutants.

my @unexpectedly_used_outside_core = qw(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@unexpectedly_used_outside_core seems like a bad name, many symbols are API and expected to be used outside of core (like AMGf_assign) and many don't seem to be used outside of core at all.

Copy link
Contributor Author

@khwilliamson khwilliamson Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to convey that being on this list is not good. Being here is a sign that something is defective elsewhere. For example, if AMGf_assign were properly documented, it wouldn't need to be here, because the regen software would pick it up. (It turns out that in this case, it is documented, just not marked as such to autodoc. So it's an easy fix to mark it; I'll issue a separate p.r. for that.)

I'm changing the name to overrides_to_be_visible_outside_core but am certainly open to other suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some names listed aren't visible outside of code, like ref (I suspect it was last visible over 20 years ago). I haven't looked for others.

Many of them shouldn't be visible outside of core (like the /^case_\d+_SBOX32$/ names).

And there's a bunch which should be visible outside of core.

Maybe @unresolved_visibility_overrides, since ideally the list will become empty as:

  • stuff is documented
  • we undef stuff like the case_128_SBOX32 names
  • the header scanning code catches up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to your proposed name

I am working on catching the header scanning code up. So far it removes about 1500 symbols from the list that aren't actually visible outside core. It doesn't yet catch ref

A function can't return something of that type, but this has always been
a macro, so this hasn't been caught.
This commit adds the capability to undefine macros that are visible to
XS code but shouldn't be.  This can be used to stop macro namespace
pollution by perl.

It works by changing embed.h to have two modes, controlled by a #ifdef
that is set by perl.h.  perl.h now #includes embed.h twice.  The first
time works as it always has.  The second sets the #ifdef, and causes
embed.h to #undef the macros that shouldn't be visible.  This call is
just before perl.h returns to its includer, so that these macros have
come and gone before the file that #included perl.h is affected by them.
It comes after the inline headers get included, so they have access to
all the symbols that are defined.

The list of macros is determined by the visibility given by the apidoc
lines documenting them, plus several exception lists that allow a symbol
to be visible even though it is not documented as such.

In this commit, the main exception list contains everything that is
currently visible outside the Perl core, so this should not break any
code.  But it means that the visibility control is established for
future changes to our code base.  New macros will not be visible except
when documented as needing to be such.  We can no longer inadvertently
add new names to pollute the user's.

I expect that over time, the exception list will become smaller, as we
go through it and remove the items that really shouldn't be visible.  We
can then see via smoking if someone is actually using them, and either
decide that these should be visible, or work with the module author for
another way to accomplish their needs.  (I would hope this would lead to
proper documentation of the ones that need to be visible.)

There are currently four lists of symbols.

One list is for symbols that are used by libc functions, and that Perl
may redefine (usually so that code doesn't have to know if it is running
on a platform that is lacking the given feature.)  The algorithm added
here catches most of these and keeps them visible, but there are a few
items that currently must be manually listed.

A second list is of symbols that the re extension to Perl requires, but
no one else needs to.  This list is currently empty, as everything
initially is in the main exception list.

A third list is for items that other Perl extensions require, but no one
else needs to.  This list is currently empty, as everything initially is
in the main exception list.

The final list is for items that currently are visible to the whole
world.  It contains thousands of items.  This list should be examined
for:

    1) Names that shouldn't be so visible; and
    2) Names that need to remain visible but should be changed so they
       are less likely to clash with anything the user might come up
       with.

I have wanted this ability to happen for a long time; and now things
have come together to enable it.

This allows us to have a clear-cut boundary with CPAN.

It means you can add macros that have internal-only use without having
to worry about making them likely not to clash with user names.

It shows precisely in one place what our names are that are visible to
CPAN.
@tonycoz
Copy link
Contributor

tonycoz commented Nov 26, 2025

I no longer see a block of undefs conditional on PERL_DO_UNDEFS in embed.h nor any code to generate them in regen/embed.pl. (checked in 7b66af1)

@khwilliamson
Copy link
Contributor Author

I wanted to make sure this p.r. didn't actually change anything so it can't possibly break anything. But the code is there. To check, I just now removed ref from the override list, and regenerated. I got the following diff in embed.h

 26c26,30
 < #if !defined(PERL_DO_UNDEFS)
 ---
 > #if defined(PERL_DO_UNDEFS)
 > # if !defined(PERL_CORE)
 > #   undef ref
 > # endif
 > #else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants