[SPARK-54544][PYTHON] Enable flake8 F811 check #53253

gaogaotiantian · 2025-11-27T22:14:57Z

What changes were proposed in this pull request?

Enabled flake8 F811 check on our repo and fixed reported issues.

Why are the changes needed?

I know upgrading lint system is a pain, but we should not just put it aside forever. Our pinned flake8 version is not even usable on Python3.12+.

During this "lint fix", I actually discovered a few real bugs - most of them are silently disabled unittests because there is a test method that has the same name (probably due to copy/paste). I think this result supported the idea that we should take lint more seriously.

About functions.log, we got it wrong. It's not because @overload does not work properly - it's because we have two log function in that gigantic file. The former one is dead. I just removed that one.

Again, I really think we should upgrade our lint system. I'm trying to do it slowly - piece by piece, so that people's daily workflow is not impacted too much.

I hope we can eventually move to a place where all linters are updated and people can be more confident about their changes.

Does this PR introduce any user-facing change?

No

How was this patch tested?

flake8 test on major directories. CI should give more a comprehensive result.

Was this patch authored or co-authored using generative AI tooling?

No

zhengruifeng · 2025-11-28T02:40:16Z

python/pyspark/sql/functions/builtin.py

-    :class:`~pyspark.sql.Column`
-        natural logarithm of the given value.
-
-    Examples


let's move the examples to the remaining log, to keep the doctest coverage

The remaining log has its examples and I think the coverage is similar (if not more):

Examples -------- Example 1: Specify both base number and the input value >>> from pyspark.sql import functions as sf >>> df = spark.sql("SELECT * FROM VALUES (1), (2), (4) AS t(value)") >>> df.select("*", sf.log(2.0, df.value)).show() +-----+---------------+ |value|LOG(2.0, value)| +-----+---------------+ | 1| 0.0| | 2| 1.0| | 4| 2.0| +-----+---------------+ Example 2: Return NULL for invalid input values >>> from pyspark.sql import functions as sf >>> df = spark.sql("SELECT * FROM VALUES (1), (2), (0), (-1), (NULL) AS t(value)") >>> df.select("*", sf.log(3.0, df.value)).show() +-----+------------------+ |value| LOG(3.0, value)| +-----+------------------+ | 1| 0.0| | 2|0.6309297535714...| | 0| NULL| | -1| NULL| | NULL| NULL| +-----+------------------+ Example 3: Specify only the input value (Natural logarithm) >>> from pyspark.sql import functions as sf >>> df = spark.sql("SELECT * FROM VALUES (1), (2), (4) AS t(value)") >>> df.select("*", sf.log(df.value)).show() +-----+------------------+ |value| ln(value)| +-----+------------------+ | 1| 0.0| | 2|0.6931471805599...| | 4|1.3862943611198...| +-----+------------------+

zhengruifeng · 2025-11-28T02:41:49Z

python/pyspark/sql/tests/test_utils.py

            messageParameters={"error_msg": error_msg},
        )

-    def test_list_row_unequal_schema(self):


why removing this test?

This one is a bit special. There's another test with the same name at line 1593. I believe they are doing very similar things. However, this test is failing (it never shows up because it was shielded by the other one). I would guess the other one is the new one that is supposed to replace this old one.

zhengruifeng · 2025-11-28T02:42:13Z

python/pyspark/sql/tests/arrow/test_arrow.py

                            expected[r][e] == result[r][e], f"{expected[r][e]} == {result[r][e]}"
                        )

-    def test_createDataFrame_pandas_with_struct_type(self):


why remove this one?

There is another one at line 986 that is exactly the same.

HyukjinKwon · 2025-12-01T23:52:54Z

Merged to master.

Enable flake8 F811

f2842f5

github-actions bot added SQL ML MLLIB STRUCTURED STREAMING BUILD CORE PYTHON PANDAS API ON SPARK CONNECT labels Nov 27, 2025

Remove the old test case

46c722c

zhengruifeng reviewed Nov 28, 2025

View reviewed changes

HyukjinKwon approved these changes Dec 1, 2025

View reviewed changes

HyukjinKwon closed this in b4c84c9 Dec 1, 2025

gaogaotiantian deleted the flake8-f811 branch December 2, 2025 03:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54544][PYTHON] Enable flake8 F811 check #53253

[SPARK-54544][PYTHON] Enable flake8 F811 check #53253

Uh oh!

gaogaotiantian commented Nov 27, 2025 •

edited

Loading

Uh oh!

zhengruifeng Nov 28, 2025 •

edited

Loading

Uh oh!

gaogaotiantian Nov 28, 2025

Uh oh!

zhengruifeng Nov 28, 2025

Uh oh!

gaogaotiantian Nov 28, 2025

Uh oh!

zhengruifeng Nov 28, 2025

Uh oh!

gaogaotiantian Nov 28, 2025

Uh oh!

HyukjinKwon commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-54544][PYTHON] Enable flake8 F811 check #53253

[SPARK-54544][PYTHON] Enable flake8 F811 check #53253

Uh oh!

Conversation

gaogaotiantian commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zhengruifeng Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gaogaotiantian Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

gaogaotiantian Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

gaogaotiantian Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gaogaotiantian commented Nov 27, 2025 •

edited

Loading

zhengruifeng Nov 28, 2025 •

edited

Loading