-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Decompose bool
to Literal[True, False]
in unions and intersections
#15738
base: main
Are you sure you want to change the base?
Conversation
a347b0d
to
ad54eba
Compare
8b6a5fd
to
b32181d
Compare
This comment was marked as resolved.
This comment was marked as resolved.
b32181d
to
0ff14bf
Compare
reveal_type(x) # revealed: bool | float | slice | ||
reveal_type(x) # revealed: bool | slice | float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order here changes because I'm using swap_remove
in Type::display
to ensure that Literal[True, False]
is displayed as bool
to users. But maybe it's okay to use remove()
in Type::display()
rather than swap_remove
? remove()
is O(n)
rather than O(1)
, but displaying a type shouldn't really be performance-sensitive: it's usually only done as part of printing a diagnostic to the terminal
class P: | ||
def __lt__(self, other: "P") -> bool: | ||
return True | ||
|
||
def __le__(self, other: "P") -> bool: | ||
return True | ||
|
||
def __gt__(self, other: "P") -> bool: | ||
return True | ||
|
||
def __ge__(self, other: "P") -> bool: | ||
return True | ||
|
||
class Q(P): ... | ||
|
||
def _(x: P, y: Q): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this because it doesn't look like we've implemented comparisons between unions inside tuples yet, and bool
is now internally represented as a union, which meant that many reveal_type
assertions in this test became TODOs rather than bool
s. Testing whether we knew how to compare a bool
with a bool
didn't seem to me to be the point of the test.
KnownClass::Bool.to_instance(db).is_subtype_of(db, target) | ||
KnownClass::Int.to_instance(db).is_subtype_of(db, target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a stack overflow for a looong time here before I realised I had to change this... was tearing my hair out 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more why this is necessary? And maybe add a comment (unless I'm missing something obvious)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first step in Type::is_subtype_of()
is now (with this PR) to normalize bool
to Literal[True] | Literal[False]
for both self
and target
. So if self
is bool
, self.is_subtype_of(target)
delegates to whether Literal[True] | Literal[False]
is a subtype of target
. That takes us to the Type::Union
branch: a union type is a subtype of target
iff all its members are a subtype of target
. So then we try the first member of the union Literal[True] | Literal[False]
, and ask the question: "Is Literal[True]
a subtype of target
? If target
is not equivalent to Literal[True]
, that takes us to this branch, and this branch says "Literal[True]
is a subtype of target
iff bool
is a subtype of target
". So then we ask the question "Is bool
a subtype of target
?", which is the very first question we started off with. And bool
is normalized to Literal[True] | Literal[False]
... the cycle repeats indefinitely, resulting in infinite recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I still don't understand why it's correct though? There are types which are supertypes of bool
, but not of int
. bool | None
or ~Literal[2]
, for example. So why is it correct to replace the bool <: target
check with a int <: target
check? Because all special cases have been handled above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why it's correct though?
It might not be -- I have been known to make mistakes 😜
There are types which are supertypes of
bool
, but not ofint
.bool | None
or~Literal[2]
, for example. So why is it correct to replace thebool <: target
check with aint <: target
check? Because all special cases have been handled above?
Yeah, I think our set-theoretic types such as unions and intersections should be handled in the branches above? I just added more test cases demonstrating this.
(I will add a comment to the code reflecting this conversation, but first I'm experimenting with alternative ways of structuring the code that might make this clearer in the first place!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(#15773 is the "alternative way of structuring the code", for anybody else reading this thread)
0ff14bf
to
6991676
Compare
One solution to this might be to add a #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct NormalizedType<'db>(Type<'db>);
impl<'db> NormalizedType<'db> {
/// Return true if this type is a [subtype of] type `target`.
pub(crate) fn is_subtype_of(self, db: &'db dyn Db, target: Self) -> bool {}
/// Return true if this type is [assignable to] type `target`.
pub(crate) fn is_assignable_to(self, db: &'db dyn Db, target: Self) -> bool {}
/// Return true if this type is [equivalent to] type `other`.
pub(crate) fn is_equivalent_to(self, db: &'db dyn Db, other: Self) -> bool {}
/// Returns true if this type and `other` are gradual equivalent.
pub(crate) fn is_gradual_equivalent_to(self, db: &'db dyn Db, other: Self) -> bool {}
/// Return true if this type and `other` have no common elements.
pub(crate) fn is_disjoint_from(self, db: &'db dyn Db, other: Self) -> bool {}
}
impl<'db> Deref for NormalizedType<'db> {
type Target = Type<'db>;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl<'db> From<NormalizedType<'db>> for Type<'db> {
fn from(value: NormalizedType<'db>) -> Self {
value.0
}
} This is quite attractive in some ways. It would be a very big change to make, though, and it would make our API more awkward to use in several places. I'm not sure it would be worth it. |
126aaaf
to
7318ff9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I suggested normalizing bool
to Literal[True, False]
, and I appreciate you exploring what that would look like! But having seen what it would look like, it doesn't feel to me like the right direction.
The cases where this simplification applies are very narrow: unions of bool
(or an enum, in future -- I'm going to call these sealed types) with a larger type that only partially overlaps with bool
(or the enum). Since bool
and enums are both very narrow types, with a small/fixed set of singleton inhabitants, there are very few larger types that will partially overlap. Some types (int
or object
) will fully overlap (they are super-types), but I think in fact AlwaysTruthy
and AlwaysFalsy
are the only partially overlapping types?
So while decomposing is attractive in its conceptual simplicity (and in its code simplicity!), it means lots of operations on bools in unions become more expensive, because we have to check against two types instead of one (and this will be even worse with enums, possibly making the approach not even workable for them at all.) And then in the effort to limit some of that inefficiency (by decomposing only in unions/intersections), we lose a fair bit of the conceptual and code simplicity. So we are kind of letting the tail wag the dog, by paying extra perf cost for all uses of bool
intersections, in order to support the less-common case where Always{Truthy,Falsy}
is also in that union.
I think we do want to normalize these cases (by which I mean, where a sealed type partially overlaps with another type in a union) such that we decompose the sealed type and eliminate the overlap (that is, AlwaysFalse | Literal[True]
instead of AlwaysFalse | bool
). This seems better than trying to encode the equivalence in is_equivalent_to
.
But given that the cases where this applies are so narrow, and we already compare every new union element to every existing union element, I think we can do this as (yet another) special case in UnionBuilder
, and that will perform better than always decomposing.
The rule would be that whenever we see bool
in a union with a type that it is not disjoint from but also not a subtype of (I phrase it in this general way so that we also handle intersections containing Always{Truthy,False}
being in a union with bool
), we replace bool
in that union with its literal inhabitant that is disjoint from the other type.
WDYT?
I think this could work! I tried something like what you outline above initially, and struggled to get it to work, but I think at that stage I still didn't understand the full extent of the problem here. Revisiting it might yield different results now that I am Older and Wiser.
I agree that we need some kind of normalization here rather than attempting some ugly (and expensive) special casing in |
I think there might be a more general problem here where we should understand the equivalence I understand that there is a desire to always find a canonical representation, but I'm afraid this might be too much to ask for in a set-theoretic type system? Enforcing non-overlapping union elements can be achieved, but there are many ways to do that — the two variants It is obviously desirable to have a "simple" and unique representation when we present types to users, at least in common cases. But as a general property, I'm not sure if that is achievable. |
We've had some PRs that move us in this direction, but I think this is the key question we need to answer — do we want
...because if so, you would typically avoid that combinatorial explosion by ensuring that the primitives in the normal form are all disjoint. In this case, it's |
I was mostly thinking about non-primitives, such as unrelated classes class C1: ...
class C2: ...
...
class Cn: ... How would you bring the union |
Yes, (one) problem with trying to avoid all overlaps is that a typical nominal instance type (perhaps the most common type) is not disjoint from another unrelated one, thanks to subclassing and multiple inheritance. So avoiding all overlaps in unions seems impractical. I agree that I'm not sure it's feasible in general to have a single normalized representation of every type. (It is important that we maintain DNF in the sense that we always normalize to a union of intersections, never a more complex tree of unions and intersections. But that doesn't necessarily imply a fully normalized form in terms of the possible combinations of elements.) My suggestion above was much more narrow and targeted: that we'd avoid overlaps with sealed types and prefer decomposing the sealed type instead. I think this is more tractable. But Alex makes a good point that this might be the wrong direction; perhaps we should prefer the overlap and use the full sealed type whenever it is correct to do so.
To me this looks like a case where we'd ideally prefer to achieve this understanding via an eager simplification of |
Summary
Our union builder currently has a problem due to a bad interaction of several features we've implemented:
bool ≡ Literal[True, False]
(and therefore that for anyT
,bool | T ≡ Literal[True, False] | T
; therefore, whenever it seesLiteral[True, False]
in a union, it eagerly simplifies this union to the simpler typebool
.U
to the unionS | T
is a no-op ifU
is a subtype of eitherS
orT
Literal[True]
is a subtype ofAlwaysTruthy
and~AlwaysFalsy
Literal[Falsy]
is a subtype ofAlwaysFalsy
and~AlwaysTruthy
bool
is neither a subtype ofAlwaysTruthy
nor a subtype ofAlwaysFalsy
Putting all these features together means that the union builder can produce different unions depending on the order in which elements are inserted into the union. Consider a union consisting of
Literal[True]
,Literal[False]
andAlwaysTruthy
. If they're inserted in that order, then the unions is built like this:But if they're inserted in a slightly different order, a different union is constructed entirely, which red-knot is not capable of understanding to be equivalent to the one above:
There are the following set of equivalent type pairs that can currently be constructed under our current model but are not understood to be equivalent by red-knot:
AlwaysTruthy | bool ≡ AlwaysTruthy | Literal[False]
AlwaysFalsy | bool ≡ AlwaysFalsy | Literal[True]
~AlwaysTruthy | bool ≡ ~AlwaysTruthy | Literal[True]
~AlwaysFalsy | bool ≡ ~AlwaysFalsy | Literal[False]
This PR refactors the union builder to ensure that unions are never constructed that contain
bool
as an element; instead,bool
is always decomposed toLiteral[True, False]
. The same is done for our intersection builder. Doing this makes it much easier to ensure that unions such as the ones above always have the same set of elements no matter which order they are inserted into the union; a lot of complexity is removed frombuilder.rs
in this PR. However, it has the (significant) drawback that in various type-relational methods such asType::is_equivalent_to()
,Type::is_subtype_of()
andType::is_assignable_to()
, we have to remember to normalizebool
intoLiteral[True, False]
before comparing the type with any other type, since the other type might be a union, and we must ensure thatLiteral[True, False]
is understood to be equivalent tobool
.Since it would be confusing for users if
bool
was displayed asLiteral[True, False]
, some logic is added toType::display
so thatLiteral[True, False]
is replaced withbool
in unions before unions are printed to the terminal in diagnostics.This PR fixes #15513, and allows us to promote two flaky property tests to stable. However, it currently shows up as having a 1% performance regression on both red-knot benchmarks on codspeed: https://codspeed.io/astral-sh/ruff/branches/alex%2Ftruthy-unions-5.
What about literal strings?
As well as the above invariants for unions containing
AlwaysTruthy/AlwaysFalsy
andbool
, there are also equivalences forLiteralString
that we do not yet understand. The fact that we do not understand these is at least partly responsible for the flakiness of several other property tests that are not marked as stable:AlwaysTruthy | LiteralString ≡ AlwaysTruthy | Literal[""]
AlwaysFalsy | LiteralString ≡ AlwaysFalsy | LiteralString & ~Literal[""]
~AlwaysTruthy | LiteralString ≡ ~AlwaysTruthy | Literal[""]
~AlwaysFalsy | LiteralString ≡ ~AlwaysFalsy | LiteralString & ~Literal[""]
I'm not sure if this problem is solvable in the same way without adding a new
Type::TruthyLiteralString
variant. Anyway, this PR does not attempt to solve this problem. Instead, some failing tests are added with a TODO.Test Plan
main
QUICKCHECK_TESTS=1000000 cargo test --release -p red_knot_python_semantic -- --ignored types::property_tests::stable
QUICKCHECK_TESTS=5000000 cargo test --release -p red_knot_python_semantic -- --ignored types::property_tests::stable::union_equivalence_not_order_dependent
QUICKCHECK_TESTS=3000000 cargo test --release -p red_knot_python_semantic -- --ignored types::property_tests::stable::intersection_equivalence_not_order_dependent