Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: list_databases throws an exception with the PySpark backend #10854

Closed
1 task done
kyrre opened this issue Feb 17, 2025 · 5 comments · Fixed by #10877
Closed
1 task done

bug: list_databases throws an exception with the PySpark backend #10854

kyrre opened this issue Feb 17, 2025 · 5 comments · Fixed by #10877
Labels
bug Incorrect behavior inside of ibis pyspark The Apache PySpark backend

Comments

@kyrre
Copy link

kyrre commented Feb 17, 2025

What happened?

Executing list_databases using the PySpark backend throws an exception.

con = ibis.pyspark.connect(spark)
con.list_databases(catalog="old_security_logs")

stacktrace:

ValueError: 'namespace' is not in list

During handling of the above exception, another exception occurred:
PySparkAttributeError                     Traceback (most recent call last)
File <command-8138415741171468>, line 3
      1 # Liste ut databaser i en gitt katalog
      2 ## OBS OBS OBS : en BUG gjør at dette ikke fungerer, men ser du på dokumentasjonen er det slik APIet brukes
----> 3 con.list_databases(catalog="old_security_logs")
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.12/site-packages/ibis/backends/pyspark/__init__.py:360, in Backend.list_databases(self, like, catalog)
    355 def list_databases(
    356     self, *, like: str | None = None, catalog: str | None = None
    357 ) -> list[str]:
    358     with self._active_catalog(catalog):
    359         databases = [
--> 360             db.namespace for db in self._session.sql("SHOW DATABASES").collect()
    361         ]
    362     return self._filter_with_like(databases, like)
File /databricks/spark/python/pyspark/sql/types.py:3116, in Row.__getattr__(self, item)
   3112     raise PySparkAttributeError(
   3113         errorClass="ATTRIBUTE_NOT_SUPPORTED", messageParameters={"attr_name": item}
   3114     )
   3115 except ValueError:
-> 3116     raise PySparkAttributeError(
   3117         errorClass="ATTRIBUTE_NOT_SUPPORTED", messageParameters={"attr_name": item}
   3118     )

What version of ibis are you using?

10.0.0

What backend(s) are you using, if any?

PySpark

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct
@kyrre kyrre added the bug Incorrect behavior inside of ibis label Feb 17, 2025
@cpcloud
Copy link
Member

cpcloud commented Feb 21, 2025

Thanks for the report! This does indeed look like a bug.

Working on a fix now.

@cpcloud cpcloud added the pyspark The Apache PySpark backend label Feb 21, 2025
@cpcloud
Copy link
Member

cpcloud commented Feb 21, 2025

@kyrre Can you show what version of pyspark you're using? And are you using Spark Connect or is this all local pyspark?

@kyrre
Copy link
Author

kyrre commented Feb 21, 2025

@kyrre Can you show what version of pyspark you're using? And are you using Spark Connect or is this all local pyspark?

It failed both when using spark-connect and a local pyspark session. The version of pyspark is 3.5.2 on Databricks clusters with runtime 16.2 and 16.1.

@cpcloud
Copy link
Member

cpcloud commented Feb 21, 2025

@kyrre In the meantime while I work on a test for this, would you mind giving #10877 a try to see if that fixes the issue for you?

@cpcloud
Copy link
Member

cpcloud commented Feb 21, 2025

Ok, I managed to reproduce the problem with databricks connect:

Image

For some reason when using databricks-connect the field name from SHOW DATABASES is databaseName instead of namespace, but it looks like you can use the catalog.listDatabases() API to get uniform output, and that's what I did in my PR.

I can't seem to reproduce this without using databricks-connect so I'm going leave the PR as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis pyspark The Apache PySpark backend
Projects
Status: done
2 participants