-
Notifications
You must be signed in to change notification settings - Fork 1.6k
_Locinfo
: Use _wsetlocale
to query and restore locales
#5781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
_LocInfo changes the locale temporarily and then reverts to the previous locale on destruction. The sequence of setlocale calls look as follows: 1. oldlocname = setlocale(LC_ALL, nullptr) to query the locale string 2. setlocale(LC_ALL, newlocname) to set the temporary locale 3. setlocale(LC_ALL, oldlocname) to restore the previous locale However there's a catch here: the fully-qualified locale names returned by setlocale are not always ASCII strings (more on that below). This creates challenges because the oldlocname is encoded depending on the "outer" locale, while the setlocale call at point 3) expects an encoding which depend on the "inner" locale, and the two may not match. To solve this issue, use the wide variant of setlocale: _wsetlocale. This way all strings are UTF-16 and there's no issue with varying narrow string encodings. Addendum: Actually, the C RunTime library does its best to use ASCII strings! It queries the english name of the locale using GetLocaleInfoEx. MSDN says that the returned string is always ASCII [1], but that's not always the case [2]. Fixes microsoft#5780 References: 1. https://learn.microsoft.com/en-us/windows/win32/intl/locale-senglish-constants 2. https://developercommunity.visualstudio.com/t/GetLocaleInfoEx-w-LOCALE_SENGLISHLANGUA/10981789
stl/inc/xlocinfo
Outdated
_Yarn<wchar_t> _W_Days; // wide weekday names | ||
_Yarn<wchar_t> _W_Months; // wide month names | ||
_Yarn<char> _Oldlocname; // old locale name to revert to on destruction | ||
_Yarn<wchar_t> _Oldlocname; // old locale name to revert to on destruction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this breaks ABI: If the linker ends up choosing old _Locinfo::_Locinfo_ctor
and new _Locinfo::_Locinfo_dtor
or vice versa, one piece of code assumes that _Oldlocname
holds a char*
and the other assumes it holds a wchar_t*
.
The changed class must receive a new name.
But we can't just outright remove the old class and change function signatures to use the new name, because it seems the old name appears in some exported symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is an ABI break. The _CRTIMP2_PURE_IMPORT
in extern "C++" class _CRTIMP2_PURE_IMPORT _Locinfo
indicates that the entire class is separately compiled, and its member functions live in locale.cpp
and locale0.cpp
which are being modified here. (This is an old, bad practice that we've strongly moved away from. When separately compiled code is necessary, we prefer to use "flat C exports" that accept plain C arguments. This gives us the most flexibility in the headers.)
This case is very interesting because aside from the definition in <xlocinfo>
, _Oldlocname
is never mentioned by the headers. It's only manipulated by locale.cpp
and locale0.cpp
. There's an important fact: There Is Only One STL (dll or static lib), and we never have to worry about mismatch between the STL's separately compiled components, only between outdated user code and newer STL dll/libs.
So while we can't mess with _Locinfo
's name (at most, we could add a new name, but then we'd need to maintain a separate implementation, and I'm not sure if it could be header-only, or if it could be injected into the import lib without causing headaches especially for /clr
and /clr:pure
), and we can't change _Oldlocname
's type without also changing _Locinfo
's name, I think we could change _Oldlocname
's content. Specifically, if _Locinfo::_Locinfo_ctor()
stored UTF-8, then _Locinfo::_Locinfo_dtor()
could decode and use it. Old user binaries running against the new STL DLL (or old user objects/libs linked against the new STL static LIB) would get matching new _Locinfo::_Locinfo_ctor()
/ _Locinfo::_Locinfo_dtor()
talking UTF-8 to each other, but at no point would the user code actually care about the content of _Oldlocname
, only its type of _Yarn<char>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic, thanks a lot!
stl/inc/xlocinfo
Outdated
_Yarn<wchar_t> _W_Days; // wide weekday names | ||
_Yarn<wchar_t> _W_Months; // wide month names | ||
_Yarn<char> _Oldlocname; // old locale name to revert to on destruction | ||
_Yarn<wchar_t> _Oldlocname; // old locale name to revert to on destruction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is an ABI break. The _CRTIMP2_PURE_IMPORT
in extern "C++" class _CRTIMP2_PURE_IMPORT _Locinfo
indicates that the entire class is separately compiled, and its member functions live in locale.cpp
and locale0.cpp
which are being modified here. (This is an old, bad practice that we've strongly moved away from. When separately compiled code is necessary, we prefer to use "flat C exports" that accept plain C arguments. This gives us the most flexibility in the headers.)
This case is very interesting because aside from the definition in <xlocinfo>
, _Oldlocname
is never mentioned by the headers. It's only manipulated by locale.cpp
and locale0.cpp
. There's an important fact: There Is Only One STL (dll or static lib), and we never have to worry about mismatch between the STL's separately compiled components, only between outdated user code and newer STL dll/libs.
So while we can't mess with _Locinfo
's name (at most, we could add a new name, but then we'd need to maintain a separate implementation, and I'm not sure if it could be header-only, or if it could be injected into the import lib without causing headaches especially for /clr
and /clr:pure
), and we can't change _Oldlocname
's type without also changing _Locinfo
's name, I think we could change _Oldlocname
's content. Specifically, if _Locinfo::_Locinfo_ctor()
stored UTF-8, then _Locinfo::_Locinfo_dtor()
could decode and use it. Old user binaries running against the new STL DLL (or old user objects/libs linked against the new STL static LIB) would get matching new _Locinfo::_Locinfo_ctor()
/ _Locinfo::_Locinfo_dtor()
talking UTF-8 to each other, but at no point would the user code actually care about the content of _Oldlocname
, only its type of _Yarn<char>
.
Also, what about |
Its value is effectively returned by |
Is it possible to add test cases? |
Keep ABI for `_Locinfo`
Sure :) P.S: will get back to this within 1 or 2 days... |
_Locinfo
: Use _wsetlocale
to query and restore locales
12632f1
to
8f157f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update /tests/std/test.lst
to list the new test (in sorted order) to make sure this test is picked by internal (non-Github) harness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, indeed! Fixed
_Locinfo
changes the locale temporarily and then reverts to the previous locale on destruction. The sequence ofsetlocale
calls are as follows:oldlocname = setlocale(LC_ALL, nullptr)
to query the locale stringsetlocale(LC_ALL, newlocname)
to set the temporary localesetlocale(LC_ALL, oldlocname)
to restore the previous localeHowever there's a catch here: the fully-qualified locale names returned by setlocale are not always ASCII strings (more on that below). This creates challenges because the oldlocname is encoded depending on the "outer" locale, while the
setlocale
call at point 3) expects an encoding which follows the "inner" locale, and the two may not match.To solve this issue, use the wide variant of setlocale: _wsetlocale. This way all strings are UTF-16 and there's no issue with varying narrow string encodings.
Addendum:
Actually, the C RunTime library does its best to use ASCII strings! It queries the english name of the locale using
GetLocaleInfoEx
. MSDN says that the returned string is always ASCII [1], but that's not always the case [2].Fixes #5780
References:
I made a similar change in
libc++
: llvm/llvm-project#160479