Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load accepts Sequence rather than Iterable (rejects generators) #2795

Merged
merged 6 commits into from
Jan 31, 2025

Conversation

sloria
Copy link
Member

@sloria sloria commented Jan 19, 2025

addresses #1898

from marshmallow import Schema, fields, validates_schema


class MySchema(Schema):
    fld = fields.String()

    @validates_schema
    def _validate(self, data, **kwargs):
        print("schema validator called")


MySchema().load(({"fld": "abc"}, {"fld": "def"}), many=True)
# schema validator called
# schema validator called


def gen():
    yield from ({"fld": "abc"}, {"fld": "def"})


MySchema().load(gen(), many=True)
# Before this change, silently fails to call schema validator
# After this change, an error is raised:
# marshmallow.exceptions.ValidationError: {"_schema": ["Invalid input type."]}

Caveat: for some reason, mypy doesn't error when passing a generator. it seems that generators are covariant with Sequence and are therefore equivalent for type-checking purposes. not sure if there is a better way to handle this.

def gen():
    yield from ({"fld": "abc"}, {"fld": "def"})


assert isinstance(gen(), Sequence) is False
MySchema().load(gen(), many=True)  # mypy accepts this

@sloria
Copy link
Member Author

sloria commented Jan 19, 2025

ah, looks like this breaks handling of sets. will investigate later

update: fixed

@sloria
Copy link
Member Author

sloria commented Jan 20, 2025

this is ready for review. i thought this was a breaking change since it changes what we're accepting as valid input. but one could argue this is bugfix, since non-sequence inputs are silently failing to call schema validators. should this be backported to v3?

@sloria sloria requested a review from lafrech January 20, 2025 00:55
@lafrech
Copy link
Member

lafrech commented Jan 26, 2025

i thought this was a breaking change since it changes what we're accepting as valid input. but one could argue this is bugfix, since non-sequence inputs are silently failing to call schema validators. should this be backported to v3?

Arguable. I don't really mind either way, but considering the move to v4 should be easy, I'm tempted to keep things as is in v3 and expect users to move fast to v4.

Did we consider actually supporting generators? Is the rationale that this is costly to do in marshmallow so we prefer users unpack their generators themselves?

@sloria
Copy link
Member Author

sloria commented Jan 27, 2025

Did we consider actually supporting generators? Is the rationale that this is costly to do in marshmallow so we prefer users unpack their generators themselves?

we've never explicitly supported generators. the immediate rationale for this is that handling generators is currently broken but it fails silently (schema validators aren't called)

@lafrech
Copy link
Member

lafrech commented Jan 27, 2025

Yes, I got that.

Just wondering if we could solve this the other way by actually supporting the use case, and at what cost.

This was investigated by @sirosen in #1898 (comment) and I don't think the downside he points to (post_load receives a list) is really an issue.

@sloria
Copy link
Member Author

sloria commented Jan 27, 2025

gotcha. i'd rather avoid adding another runtime type-check, for the reason sirosen pointed out (pre_load wouldn't actually receive the "raw" input, which is pretty unintuitive) as well as the performance hit.

@sirosen
Copy link
Contributor

sirosen commented Jan 27, 2025

My opinion circling back on this is that Sequence is what's really wanted and to agree with myself from 2022.

If you want to pass in a generator, you can make the decision between these two basic approaches:

  • type-ignore and accept that you've violated the marshmallow contract -- it might work but you won't get support for it if it breaks1
  • you can list() or otherwise convert to a sequence type yourself (as the caller)

I would say no (real) support is being lost, but that it's not important enough to be worth any extra work backporting to v3. That's just based on it being in this state for a few years and that issue getting relatively low traffic.

Footnotes

  1. EDIT: I just saw that the current PR would catch this too. So really we're pushing people to convert any iterator to a sequence, but I think that's good because they won't be surprised N years later when their validators don't work.

@sloria sloria merged commit 9cddff6 into dev Jan 31, 2025
9 checks passed
@sloria sloria deleted the load-typing branch January 31, 2025 03:05
@sloria
Copy link
Member Author

sloria commented Feb 4, 2025

Having different load/validate signatures for v3 and v4 does add a bit of pain for libs that support both. see the workaround needed here: marshmallow-code/marshmallow-sqlalchemy#659

maybe this is reason enough to backport this to v3? wdyt @lafrech @sirosen

@lafrech
Copy link
Member

lafrech commented Feb 7, 2025

I'm tied.

There may be users in the wild passing generators who never got any issue because they don't use validates_schema. Their code will be broken. The case was never meant to be supported, but not explicitly forbidden AFAIK so we may argue it is not technically a breaking change but it may come as a surprise and make a few unhappy users. Hopefully the fix is simple (cast to list). It might be harder to implement if the call happens in a 3rd-party lib, though, but I have no example in mind of a lib doing this in such a way that the user can't cast to list in their code.

Since the move to v4 is easy, I figured I'd drop v3 in the same go in flask-smorest (unless it is zero effort to support both, which may be the case since there are no type hints in flask-smorest). I agree the workaround you had to come up with is a bit painful. Do you think many libraries are impacted? Looks like you didn't face this issue when updating webargs. EOLing MA3 soon after MA4 ships would help ensuring libraries move to v4 and drop v3 quickly.

@sirosen
Copy link
Contributor

sirosen commented Feb 7, 2025

I haven't given it really deep thought, but I think I'm fine following either path.

One thing about backporting the typing fix to v3 is that users have two viable workarounds until they're ready to jump to v4:

  • # type: ignore
  • pin to an older marshmallow v3 version

I think that's totally acceptable for a typing-only change.

I can't imagine that we're going to want to make many more changes on v3 at all1, so I'm mostly looking at any changes to v3 as "things we should do to aid transition". I'm not sure this one is that significant? But I'm not against it by any means.

Footnotes

  1. I haven't had any time for marshmallow dev for a few months! So maybe I shouldn't say "we". 😉

@lafrech
Copy link
Member

lafrech commented Feb 10, 2025

IIUC, this is not a typing-only change. This actually prevents users from passing generators anymore. It works without this change, except for validates_schema, although it was never explicitly supported. So # type: ignore doesn't make it work.

# After this change, an error is raised:
# marshmallow.exceptions.ValidationError: {"_schema": ["Invalid input type."]}

@sloria
Copy link
Member Author

sloria commented Feb 11, 2025

@lafrech is correct; it's a bit more than a typing-only change.

i'll hold off on backporting this for now. the workaround i had to do in ma-sqlalchemy isn't too bad and will only be needed in libraries that have a Schema class that overrides load/validate.

@sloria
Copy link
Member Author

sloria commented Feb 11, 2025

so I'm mostly looking at any changes to v3 as "things we should do to aid transition".

👍 yep, i'm thinking the same

EOLing MA3 soon after MA4 ships would help ensuring libraries move to v4 and drop v3 quickly.

i think supporting v3 for 3-6 months (bug fixes only) after the v4 release would be a reasonable timeline. most/all of the marshmallow-code libs are already supporting v4 now, so the maintenance burden for us should be relatively low. i want to give other popular ones like marshmallow-dataclass and airflow enough time to migrate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants