-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54544][PYTHON] Enable flake8 F811 check #53253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| :class:`~pyspark.sql.Column` | ||
| natural logarithm of the given value. | ||
|
|
||
| Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move the examples to the remaining log, to keep the doctest coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remaining log has its examples and I think the coverage is similar (if not more):
Examples
--------
Example 1: Specify both base number and the input value
>>> from pyspark.sql import functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1), (2), (4) AS t(value)")
>>> df.select("*", sf.log(2.0, df.value)).show()
+-----+---------------+
|value|LOG(2.0, value)|
+-----+---------------+
| 1| 0.0|
| 2| 1.0|
| 4| 2.0|
+-----+---------------+
Example 2: Return NULL for invalid input values
>>> from pyspark.sql import functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1), (2), (0), (-1), (NULL) AS t(value)")
>>> df.select("*", sf.log(3.0, df.value)).show()
+-----+------------------+
|value| LOG(3.0, value)|
+-----+------------------+
| 1| 0.0|
| 2|0.6309297535714...|
| 0| NULL|
| -1| NULL|
| NULL| NULL|
+-----+------------------+
Example 3: Specify only the input value (Natural logarithm)
>>> from pyspark.sql import functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1), (2), (4) AS t(value)")
>>> df.select("*", sf.log(df.value)).show()
+-----+------------------+
|value| ln(value)|
+-----+------------------+
| 1| 0.0|
| 2|0.6931471805599...|
| 4|1.3862943611198...|
+-----+------------------+
| messageParameters={"error_msg": error_msg}, | ||
| ) | ||
|
|
||
| def test_list_row_unequal_schema(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why removing this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a bit special. There's another test with the same name at line 1593. I believe they are doing very similar things. However, this test is failing (it never shows up because it was shielded by the other one). I would guess the other one is the new one that is supposed to replace this old one.
| expected[r][e] == result[r][e], f"{expected[r][e]} == {result[r][e]}" | ||
| ) | ||
|
|
||
| def test_createDataFrame_pandas_with_struct_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another one at line 986 that is exactly the same.
|
Merged to master. |
What changes were proposed in this pull request?
Enabled flake8 F811 check on our repo and fixed reported issues.
Why are the changes needed?
I know upgrading lint system is a pain, but we should not just put it aside forever. Our pinned
flake8version is not even usable on Python3.12+.During this "lint fix", I actually discovered a few real bugs - most of them are silently disabled unittests because there is a test method that has the same name (probably due to copy/paste). I think this result supported the idea that we should take lint more seriously.
About
functions.log, we got it wrong. It's not because@overloaddoes not work properly - it's because we have twologfunction in that gigantic file. The former one is dead. I just removed that one.Again, I really think we should upgrade our lint system. I'm trying to do it slowly - piece by piece, so that people's daily workflow is not impacted too much.
I hope we can eventually move to a place where all linters are updated and people can be more confident about their changes.
Does this PR introduce any user-facing change?
No
How was this patch tested?
flake8test on major directories. CI should give more a comprehensive result.Was this patch authored or co-authored using generative AI tooling?
No