-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[C23] Fix typeof handling in enum declarations #146394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
We have a parsing helper function which parses either a parenthesized expression or a parenthesized type name. This is used when parsing a unary operator such as sizeof, for example. The problem this solves is when that construct is ambiguous. Consider: enum E : typeof(int) { F }; After we've parsed the 'typeof', what ParseParenExpression() is responsible for is '(int) { F }' which looks like a compound literal expression when it's actually the parens and operand for 'typeof' followed by the enumerator list for the enumeration declaration. Then consider: sizeof (int){ 0 }; After we've parsed 'sizeof', ParseParenExpression is responsible for parsing something grammatically similar to the problematic case. The solution is to recognize that the expression form of 'typeof' is required to have parentheses. So we know the open and close parens that ParseParenExpression handles must be part of the grammar production for the operator, not part of the operand expression itself. Fixes llvm#146351
@llvm/pr-subscribers-clang Author: Aaron Ballman (AaronBallman) ChangesWe have a parsing helper function which parses either a parenthesized expression or a parenthesized type name. This is used when parsing a unary operator such as sizeof, for example. The problem this solves is when that construct is ambiguous. Consider:
After we've parsed the 'typeof', what ParseParenExpression() is responsible for is '(int) { F }' which looks like a compound literal expression when it's actually the parens and operand for 'typeof' followed by the enumerator list for the enumeration declaration. Then consider:
After we've parsed 'sizeof', ParseParenExpression is responsible for parsing something grammatically similar to the problematic case. The solution is to recognize that the expression form of 'typeof' is required to have parentheses. So we know the open and close parens that ParseParenExpression handles must be part of the grammar production for the operator, not part of the operand expression itself. Fixes #146351 Full diff: https://github.com/llvm/llvm-project/pull/146394.diff 5 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 45b5f77545361..1b12061dad85b 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -304,6 +304,8 @@ C23 Feature Support
which clarified how Clang is handling underspecified object declarations.
- Clang now accepts single variadic parameter in type-name. It's a part of
`WG14 N2975 <https://open-std.org/JTC1/SC22/WG14/www/docs/n2975.pdf>`_
+- Fixed a bug with handling the type operand form of ``typeof`` when it is used
+ to specify a fixed underlying type for an enumeration. #GH146351
C11 Feature Support
^^^^^^^^^^^^^^^^^^^
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index a47e23ffbd357..ced9613b9ab9b 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -4184,7 +4184,12 @@ class Parser : public CodeCompletionHandler {
/// ParseParenExpression - This parses the unit that starts with a '(' token,
/// based on what is allowed by ExprType. The actual thing parsed is returned
/// in ExprType. If stopIfCastExpr is true, it will only return the parsed
- /// type, not the parsed cast-expression.
+ /// type, not the parsed cast-expression. If ParenKnownToBeNonCast is true,
+ /// the initial open paren and its matching close paren are known to be part
+ /// of another grammar production and not part of the operand. e.g., the
+ /// typeof and typeof_unqual operators in C. If it is false, then the function
+ /// has to parse the parens to determine whether they're part of a cast or
+ /// compound literal expression rather than a parenthesized type.
///
/// \verbatim
/// primary-expression: [C99 6.5.1]
@@ -4210,6 +4215,7 @@ class Parser : public CodeCompletionHandler {
/// \endverbatim
ExprResult ParseParenExpression(ParenParseOption &ExprType,
bool stopIfCastExpr, bool isTypeCast,
+ bool ParenKnownToBeNonCast,
ParsedType &CastTy,
SourceLocation &RParenLoc);
diff --git a/clang/lib/Parse/ParseExpr.cpp b/clang/lib/Parse/ParseExpr.cpp
index 3cf3d4ea7d705..c6731fca7bfa7 100644
--- a/clang/lib/Parse/ParseExpr.cpp
+++ b/clang/lib/Parse/ParseExpr.cpp
@@ -758,7 +758,8 @@ ExprResult Parser::ParseCastExpression(CastParseKind ParseKind,
ParsedType CastTy;
SourceLocation RParenLoc;
Res = ParseParenExpression(ParenExprType, false /*stopIfCastExr*/,
- isTypeCast == TypeCastState::IsTypeCast, CastTy,
+ isTypeCast == TypeCastState::IsTypeCast,
+ false /*ParenKnownToBeNonCast*/, CastTy,
RParenLoc);
// FIXME: What should we do if a vector literal is followed by a
@@ -2110,12 +2111,24 @@ Parser::ParseExprAfterUnaryExprOrTypeTrait(const Token &OpTok,
// If it starts with a '(', we know that it is either a parenthesized
// type-name, or it is a unary-expression that starts with a compound
// literal, or starts with a primary-expression that is a parenthesized
- // expression.
+ // expression. Most unary operators have an expression form without parens
+ // as part of the grammar for the operator, and a type form with the parens
+ // as part of the grammar for the operator. However, typeof and
+ // typeof_unqual require parens for both forms. This means that we *know*
+ // that the open and close parens cannot be part of a cast expression,
+ // which means we definitely are not parsing a compound literal expression.
+ // This disambiguates a case like enum E : typeof(int) { }; where we've
+ // parsed typeof and need to handle the (int){} tokens properly despite
+ // them looking like a compound literal, as in sizeof (int){}; where the
+ // parens could be part of a parenthesized type name or for a cast
+ // expression of some kind.
+ bool ParenKnownToBeNonCast =
+ OpTok.isOneOf(tok::kw_typeof, tok::kw_typeof_unqual);
ParenParseOption ExprType = ParenParseOption::CastExpr;
SourceLocation LParenLoc = Tok.getLocation(), RParenLoc;
- Operand = ParseParenExpression(ExprType, true/*stopIfCastExpr*/,
- false, CastTy, RParenLoc);
+ Operand = ParseParenExpression(ExprType, true /*stopIfCastExpr*/, false,
+ ParenKnownToBeNonCast, CastTy, RParenLoc);
CastRange = SourceRange(LParenLoc, RParenLoc);
// If ParseParenExpression parsed a '(typename)' sequence only, then this is
@@ -2589,10 +2602,11 @@ bool Parser::tryParseOpenMPArrayShapingCastPart() {
return !ErrorFound;
}
-ExprResult
-Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr,
- bool isTypeCast, ParsedType &CastTy,
- SourceLocation &RParenLoc) {
+ExprResult Parser::ParseParenExpression(ParenParseOption &ExprType,
+ bool stopIfCastExpr, bool isTypeCast,
+ bool ParenKnownToBeNonCast,
+ ParsedType &CastTy,
+ SourceLocation &RParenLoc) {
assert(Tok.is(tok::l_paren) && "Not a paren expr!");
ColonProtectionRAIIObject ColonProtection(*this, false);
BalancedDelimiterTracker T(*this, tok::l_paren);
@@ -2747,7 +2761,7 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr,
T.consumeClose();
ColonProtection.restore();
RParenLoc = T.getCloseLocation();
- if (Tok.is(tok::l_brace)) {
+ if (!ParenKnownToBeNonCast && Tok.is(tok::l_brace)) {
ExprType = ParenParseOption::CompoundLiteral;
TypeResult Ty;
{
@@ -2757,7 +2771,7 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr,
return ParseCompoundLiteralExpression(Ty.get(), OpenLoc, RParenLoc);
}
- if (Tok.is(tok::l_paren)) {
+ if (!ParenKnownToBeNonCast && Tok.is(tok::l_paren)) {
// This could be OpenCL vector Literals
if (getLangOpts().OpenCL)
{
diff --git a/clang/lib/Parse/Parser.cpp b/clang/lib/Parse/Parser.cpp
index 18f399aca59e8..2f201cc49cf0b 100644
--- a/clang/lib/Parse/Parser.cpp
+++ b/clang/lib/Parse/Parser.cpp
@@ -1607,7 +1607,8 @@ ExprResult Parser::ParseAsmStringLiteral(bool ForAsmLabel) {
EnterExpressionEvaluationContext ConstantEvaluated(
Actions, Sema::ExpressionEvaluationContext::ConstantEvaluated);
AsmString = ParseParenExpression(ExprType, true /*stopIfCastExpr*/, false,
- CastTy, RParenLoc);
+ false /*ParenKnownToBeNonCast*/, CastTy,
+ RParenLoc);
if (!AsmString.isInvalid())
AsmString = Actions.ActOnConstantExpression(AsmString);
diff --git a/clang/test/Parser/c2x-typeof.c b/clang/test/Parser/c2x-typeof.c
index 9c836dfa6d823..245c127d71e03 100644
--- a/clang/test/Parser/c2x-typeof.c
+++ b/clang/test/Parser/c2x-typeof.c
@@ -42,3 +42,13 @@ _Static_assert(__builtin_offsetof(typeof(s), i) == 0);
_Static_assert(__builtin_offsetof(typeof_unqual(struct S), i) == 0);
_Static_assert(__builtin_offsetof(typeof_unqual(s), i) == 0);
+// Show that typeof and typeof_unqual can be used in the underlying type of an
+// enumeration even when given the type form. Note, this can look like a
+// compound literal expression, which caused GH146351.
+enum E3 : typeof(int) { ThirdZero }; // (int) {}; is not a compound literal!
+enum E4 : typeof_unqual(int) { FourthZero }; // Same here
+
+// Ensure that this invalid construct is diagnosed instead of being treated
+// as typeof((int){ 0 }).
+typeof(int) { 0 } x; // expected-error {{a type specifier is required for all declarations}} \
+ expected-error {{expected identifier or '('}}
|
clang/include/clang/Parse/Parser.h
Outdated
@@ -4210,6 +4215,7 @@ class Parser : public CodeCompletionHandler { | |||
/// \endverbatim | |||
ExprResult ParseParenExpression(ParenParseOption &ExprType, | |||
bool stopIfCastExpr, bool isTypeCast, | |||
bool ParenKnownToBeNonCast, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it might be semantically simpler to invert the value of this and call it just MightBeCast
? Reasoning about the fact that passing false
here means that ‘it’s not known to be a non-cast’ is a bit headache-inducing at least for me...
Actually on that note, TypeCastState
is a thing. Can we just merge this parameter with the isTypeCast
one and make it a TypeCastState
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about introducing a new enumeration here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on that note, TypeCastState is a thing. Can we just merge this parameter with the isTypeCast one and make it a TypeCastState?
TypeCastState
is cursed in that it doesn't mean what you might think it means: 77e21fc
So it doesn't mean "this is, is not, or maybe a cast expression", it means "if there's a typo, should we allow type names or not?" I verified this is not vestigial code after delayed typo correction was removed; it's still used for typo correction today. We probably should rename some stuff to make this far more clear though, because your suggestion is what I initially tried to implement and broke approximately every test in the suite.
How about introducing a new enumeration here?
Given that there's three bools in a row, yeah, that's not a bad idea.
How about this for a solution:
- Rename
TypeCastState
toTypoCorrectionTypeBehavior
- Rename
NotTypeCast
,IsTypeCast
, andMaybeTypeCast
toAllowNonTypes,
AllowTypes, and
AllowBoth` - Add a new enumeration
TypeCastState
with membersIsTypeCast
,IsNotTypeCast
, andUnknown
- Change
ParseParenExpression
to take both aTypoCorrectionTypeBehavior
and aTypeCastState
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it doesn't mean "this is, is not, or maybe a cast expression", it means "if there's a typo, should we allow type names or not?"
Ok, I like to say I’m bad at naming, but that is actually awful. ;Þ
How about this for a solution:
- Rename
TypeCastState
toTypoCorrectionTypeBehavior
- Rename
NotTypeCast
,IsTypeCast
, andMaybeTypeCast
toAllowNonTypes,
AllowTypes, and
AllowBoth`- Add a new enumeration
TypeCastState
with membersIsTypeCast
,IsNotTypeCast
, andUnknown
- Change
ParseParenExpression
to take both aTypoCorrectionTypeBehavior
and aTypeCastState
.
I was about to suggest this so sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up picking different names for TypeCastState
because that was still confusing.
@@ -2850,7 +2870,8 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr, | |||
isFoldOperator(NextToken().getKind())) { | |||
ExprType = ParenParseOption::FoldExpr; | |||
return ParseFoldExpression(ExprResult(), T); | |||
} else if (isTypeCast) { | |||
} else if (CorrectionBehavior == TypoCorrectionTypeBehavior::AllowTypes) { | |||
// FIXME: This should not be predicated on typo correction behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worth noting!
This was an unfortunate surprise. However, it's a preexisting defect that I don't have the bandwidth to correct right now.
The old TypeCastState
enumeration was introduced to help with typo correction (77e21fc) and we accidentally ended up using it for parsing decisions:
llvm-project/clang/lib/Parse/ParseExpr.cpp
Line 761 in a63f572
isTypeCast == TypeCastState::IsTypeCast, CastTy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not surprised that that happened given the name of the enum...
You can test this locally with the following command:git-clang-format --diff HEAD~1 HEAD --extensions h,cpp,c -- clang/include/clang/Parse/Parser.h clang/lib/Parse/ParseExpr.cpp clang/lib/Parse/ParseExprCXX.cpp clang/lib/Parse/ParseOpenMP.cpp clang/lib/Parse/ParseTemplate.cpp clang/lib/Parse/Parser.cpp clang/test/Parser/c2x-typeof.c View the diff from clang-format here.diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index 647e65aa7..683934321 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -127,7 +127,7 @@ enum class ParenParseOption {
/// expression, etc.
enum class ParenExprKind {
PartOfOperator, // typeof(int)
- Unknown, // sizeof(int) or sizeof (int)1.0f, or compound literal, etc
+ Unknown, // sizeof(int) or sizeof (int)1.0f, or compound literal, etc
};
/// Describes the behavior that should be taken for an __if_exists
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing I thought of: What about C++ edge cases, e.g. both Clang and GCC accept this... which doesn’t quite seem right to me (https://godbolt.org/z/K58eEvG9P):
typeof(int){} x; // Probably parsed as typeof(int{})
/*AllowTypes=*/CorrectionBehavior != | ||
TypoCorrectionTypeBehavior::AllowNonTypes, | ||
/*AllowNonTypes=*/CorrectionBehavior != | ||
TypoCorrectionTypeBehavior::AllowTypes); | ||
Validator.IsAddressOfOperand = isAddressOfOperand; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like at this point the CastExpressionIdValidator
should just store the TypoCorrectionTypeBehavior
instead of splitting it into two flags—or am I missing something here?
@@ -2850,7 +2870,8 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr, | |||
isFoldOperator(NextToken().getKind())) { | |||
ExprType = ParenParseOption::FoldExpr; | |||
return ParseFoldExpression(ExprResult(), T); | |||
} else if (isTypeCast) { | |||
} else if (CorrectionBehavior == TypoCorrectionTypeBehavior::AllowTypes) { | |||
// FIXME: This should not be predicated on typo correction behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not surprised that that happened given the name of the enum...
Actually, I’m not sure what we think this is supposed to be; I haven’t checked. |
Actually, that’s just a compound literal; I forgot we supported those in C++. |
We have a parsing helper function which parses either a parenthesized expression or a parenthesized type name. This is used when parsing a unary operator such as sizeof, for example.
The problem this solves is when that construct is ambiguous. Consider:
After we've parsed the 'typeof', what ParseParenExpression() is responsible for is '(int) { F }' which looks like a compound literal expression when it's actually the parens and operand for 'typeof' followed by the enumerator list for the enumeration declaration. Then consider:
After we've parsed 'sizeof', ParseParenExpression is responsible for parsing something grammatically similar to the problematic case.
The solution is to recognize that the expression form of 'typeof' is required to have parentheses. So we know the open and close parens that ParseParenExpression handles must be part of the grammar production for the operator, not part of the operand expression itself.
Fixes #146351