Skip to content

Conversation

@gaogaotiantian
Copy link
Contributor

@gaogaotiantian gaogaotiantian commented Nov 27, 2025

What changes were proposed in this pull request?

Enabled flake8 F811 check on our repo and fixed reported issues.

Why are the changes needed?

I know upgrading lint system is a pain, but we should not just put it aside forever. Our pinned flake8 version is not even usable on Python3.12+.

During this "lint fix", I actually discovered a few real bugs - most of them are silently disabled unittests because there is a test method that has the same name (probably due to copy/paste). I think this result supported the idea that we should take lint more seriously.

About functions.log, we got it wrong. It's not because @overload does not work properly - it's because we have two log function in that gigantic file. The former one is dead. I just removed that one.

Again, I really think we should upgrade our lint system. I'm trying to do it slowly - piece by piece, so that people's daily workflow is not impacted too much.

I hope we can eventually move to a place where all linters are updated and people can be more confident about their changes.

Does this PR introduce any user-facing change?

No

How was this patch tested?

flake8 test on major directories. CI should give more a comprehensive result.

Was this patch authored or co-authored using generative AI tooling?

No

:class:`~pyspark.sql.Column`
natural logarithm of the given value.

Examples
Copy link
Contributor

@zhengruifeng zhengruifeng Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move the examples to the remaining log, to keep the doctest coverage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining log has its examples and I think the coverage is similar (if not more):

    Examples
    --------
    Example 1: Specify both base number and the input value

    >>> from pyspark.sql import functions as sf
    >>> df = spark.sql("SELECT * FROM VALUES (1), (2), (4) AS t(value)")
    >>> df.select("*", sf.log(2.0, df.value)).show()
    +-----+---------------+
    |value|LOG(2.0, value)|
    +-----+---------------+
    |    1|            0.0|
    |    2|            1.0|
    |    4|            2.0|
    +-----+---------------+

    Example 2: Return NULL for invalid input values

    >>> from pyspark.sql import functions as sf
    >>> df = spark.sql("SELECT * FROM VALUES (1), (2), (0), (-1), (NULL) AS t(value)")
    >>> df.select("*", sf.log(3.0, df.value)).show()
    +-----+------------------+
    |value|   LOG(3.0, value)|
    +-----+------------------+
    |    1|               0.0|
    |    2|0.6309297535714...|
    |    0|              NULL|
    |   -1|              NULL|
    | NULL|              NULL|
    +-----+------------------+

    Example 3: Specify only the input value (Natural logarithm)

    >>> from pyspark.sql import functions as sf
    >>> df = spark.sql("SELECT * FROM VALUES (1), (2), (4) AS t(value)")
    >>> df.select("*", sf.log(df.value)).show()
    +-----+------------------+
    |value|         ln(value)|
    +-----+------------------+
    |    1|               0.0|
    |    2|0.6931471805599...|
    |    4|1.3862943611198...|
    +-----+------------------+

messageParameters={"error_msg": error_msg},
)

def test_list_row_unequal_schema(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removing this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a bit special. There's another test with the same name at line 1593. I believe they are doing very similar things. However, this test is failing (it never shows up because it was shielded by the other one). I would guess the other one is the new one that is supposed to replace this old one.

expected[r][e] == result[r][e], f"{expected[r][e]} == {result[r][e]}"
)

def test_createDataFrame_pandas_with_struct_type(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another one at line 986 that is exactly the same.

@HyukjinKwon
Copy link
Member

Merged to master.

@gaogaotiantian gaogaotiantian deleted the flake8-f811 branch December 2, 2025 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants