Skip to content

Conversation

lb90
Copy link

@lb90 lb90 commented Oct 14, 2025

_Locinfo changes the locale temporarily and then reverts to the previous locale on destruction. The sequence of setlocale calls are as follows:

  1. oldlocname = setlocale(LC_ALL, nullptr) to query the locale string
  2. setlocale(LC_ALL, newlocname) to set the temporary locale
  3. setlocale(LC_ALL, oldlocname) to restore the previous locale

However there's a catch here: the fully-qualified locale names returned by setlocale are not always ASCII strings (more on that below). This creates challenges because the oldlocname is encoded depending on the "outer" locale, while the setlocale call at point 3) expects an encoding which follows the "inner" locale, and the two may not match.

To solve this issue, use the wide variant of setlocale: _wsetlocale. This way all strings are UTF-16 and there's no issue with varying narrow string encodings.

Addendum:

Actually, the C RunTime library does its best to use ASCII strings! It queries the english name of the locale using GetLocaleInfoEx. MSDN says that the returned string is always ASCII [1], but that's not always the case [2].

Fixes #5780

References:

  1. https://learn.microsoft.com/en-us/windows/win32/intl/locale-senglish-constants
  2. https://developercommunity.visualstudio.com/t/GetLocaleInfoEx-w-LOCALE_SENGLISHLANGUA/10981789

I made a similar change in libc++: llvm/llvm-project#160479

_LocInfo changes the locale temporarily and then reverts to the
previous locale on destruction. The sequence of setlocale calls
look as follows:

1. oldlocname = setlocale(LC_ALL, nullptr) to query the locale string
2. setlocale(LC_ALL, newlocname) to set the temporary locale
3. setlocale(LC_ALL, oldlocname) to restore the previous locale

However there's a catch here: the fully-qualified locale names
returned by setlocale are not always ASCII strings (more on that
below). This creates challenges because the oldlocname is encoded
depending on the "outer" locale, while the setlocale call at point
3) expects an encoding which depend on the "inner" locale, and the
two may not match.

To solve this issue, use the wide variant of setlocale: _wsetlocale.
This way all strings are UTF-16 and there's no issue with varying
narrow string encodings.

Addendum:

Actually, the C RunTime library does its best to use ASCII strings!
It queries the english name of the locale using GetLocaleInfoEx.
MSDN says that the returned string is always ASCII [1], but that's
not always the case [2].

Fixes microsoft#5780

References:
 1. https://learn.microsoft.com/en-us/windows/win32/intl/locale-senglish-constants
 2. https://developercommunity.visualstudio.com/t/GetLocaleInfoEx-w-LOCALE_SENGLISHLANGUA/10981789
@lb90 lb90 requested a review from a team as a code owner October 14, 2025 12:17
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Oct 14, 2025
stl/inc/xlocinfo Outdated
_Yarn<wchar_t> _W_Days; // wide weekday names
_Yarn<wchar_t> _W_Months; // wide month names
_Yarn<char> _Oldlocname; // old locale name to revert to on destruction
_Yarn<wchar_t> _Oldlocname; // old locale name to revert to on destruction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this breaks ABI: If the linker ends up choosing old _Locinfo::_Locinfo_ctor and new _Locinfo::_Locinfo_dtor or vice versa, one piece of code assumes that _Oldlocname holds a char* and the other assumes it holds a wchar_t*.

The changed class must receive a new name.

But we can't just outright remove the old class and change function signatures to use the new name, because it seems the old name appears in some exported symbols.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this is an ABI break. The _CRTIMP2_PURE_IMPORT in extern "C++" class _CRTIMP2_PURE_IMPORT _Locinfo indicates that the entire class is separately compiled, and its member functions live in locale.cpp and locale0.cpp which are being modified here. (This is an old, bad practice that we've strongly moved away from. When separately compiled code is necessary, we prefer to use "flat C exports" that accept plain C arguments. This gives us the most flexibility in the headers.)

This case is very interesting because aside from the definition in <xlocinfo>, _Oldlocname is never mentioned by the headers. It's only manipulated by locale.cpp and locale0.cpp. There's an important fact: There Is Only One STL (dll or static lib), and we never have to worry about mismatch between the STL's separately compiled components, only between outdated user code and newer STL dll/libs.

So while we can't mess with _Locinfo's name (at most, we could add a new name, but then we'd need to maintain a separate implementation, and I'm not sure if it could be header-only, or if it could be injected into the import lib without causing headaches especially for /clr and /clr:pure), and we can't change _Oldlocname's type without also changing _Locinfo's name, I think we could change _Oldlocname's content. Specifically, if _Locinfo::_Locinfo_ctor() stored UTF-8, then _Locinfo::_Locinfo_dtor() could decode and use it. Old user binaries running against the new STL DLL (or old user objects/libs linked against the new STL static LIB) would get matching new _Locinfo::_Locinfo_ctor() / _Locinfo::_Locinfo_dtor() talking UTF-8 to each other, but at no point would the user code actually care about the content of _Oldlocname, only its type of _Yarn<char>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lb90 I opened lb90#1 for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thanks a lot!

stl/inc/xlocinfo Outdated
_Yarn<wchar_t> _W_Days; // wide weekday names
_Yarn<wchar_t> _W_Months; // wide month names
_Yarn<char> _Oldlocname; // old locale name to revert to on destruction
_Yarn<wchar_t> _Oldlocname; // old locale name to revert to on destruction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this is an ABI break. The _CRTIMP2_PURE_IMPORT in extern "C++" class _CRTIMP2_PURE_IMPORT _Locinfo indicates that the entire class is separately compiled, and its member functions live in locale.cpp and locale0.cpp which are being modified here. (This is an old, bad practice that we've strongly moved away from. When separately compiled code is necessary, we prefer to use "flat C exports" that accept plain C arguments. This gives us the most flexibility in the headers.)

This case is very interesting because aside from the definition in <xlocinfo>, _Oldlocname is never mentioned by the headers. It's only manipulated by locale.cpp and locale0.cpp. There's an important fact: There Is Only One STL (dll or static lib), and we never have to worry about mismatch between the STL's separately compiled components, only between outdated user code and newer STL dll/libs.

So while we can't mess with _Locinfo's name (at most, we could add a new name, but then we'd need to maintain a separate implementation, and I'm not sure if it could be header-only, or if it could be injected into the import lib without causing headaches especially for /clr and /clr:pure), and we can't change _Oldlocname's type without also changing _Locinfo's name, I think we could change _Oldlocname's content. Specifically, if _Locinfo::_Locinfo_ctor() stored UTF-8, then _Locinfo::_Locinfo_dtor() could decode and use it. Old user binaries running against the new STL DLL (or old user objects/libs linked against the new STL static LIB) would get matching new _Locinfo::_Locinfo_ctor() / _Locinfo::_Locinfo_dtor() talking UTF-8 to each other, but at no point would the user code actually care about the content of _Oldlocname, only its type of _Yarn<char>.

@github-project-automation github-project-automation bot moved this from Initial Review to Work In Progress in STL Code Reviews Oct 14, 2025
@StephanTLavavej
Copy link
Member

Also, what about _Yarn<char> _Newlocname? It appears to get its content from setlocale. That one is trickier because it is exposed to the headers via _Getname().

@StephanTLavavej StephanTLavavej added the bug Something isn't working label Oct 14, 2025
@muellerj2
Copy link
Contributor

Also, what about _Yarn _Newlocname? It appears to get its content from setlocale. That one is trickier because it is exposed to the headers via _Getname().

Its value is effectively returned by locale:name(), so there might be some users out there relying on its encoding.

@frederick-vs-ja
Copy link
Contributor

Is it possible to add test cases?

@lb90
Copy link
Author

lb90 commented Oct 16, 2025

Is it possible to add test cases?

Sure :)

P.S: will get back to this within 1 or 2 days...

@StephanTLavavej StephanTLavavej changed the title _LocInfo: Use _wsetlocale to query and restore locales _Locinfo: Use _wsetlocale to query and restore locales Oct 16, 2025
@lb90 lb90 force-pushed the fix-issue-5780 branch 2 times, most recently from 12632f1 to 8f157f4 Compare October 17, 2025 17:10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update /tests/std/test.lst to list the new test (in sorted order) to make sure this test is picked by internal (non-Github) harness.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed! Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: Work In Progress

Development

Successfully merging this pull request may close these issues.

<iostream>: Printing with std::cout changes the global locale permanently

5 participants