Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pylibcudf.Scalar.from_py for construction from Python strings, bool, int, float #17898

Open
wants to merge 3 commits into
base: branch-25.04
Choose a base branch
from

Conversation

mroeschke
Copy link
Contributor

Description

Towards #17054

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package labels Feb 1, 2025
@mroeschke mroeschke self-assigned this Feb 1, 2025
@mroeschke mroeschke requested a review from a team as a code owner February 1, 2025 02:12
@github-actions github-actions bot added the Python Affects Python cuDF API. label Feb 1, 2025
@mroeschke mroeschke changed the title Add pylibcudfScalar.from_py for construction from Python strings, bool, int, float Add pylibcudf.Scalar.from_py for construction from Python strings, bool, int, float Feb 1, 2025
Comment on lines +100 to +116
if isinstance(py_val, py_bool):
dtype = DataType(type_id.BOOL8)
c_val = make_numeric_scalar(dtype.c_obj)
(<numeric_scalar[cbool]*>c_val.get()).set_value(py_val)
elif isinstance(py_val, int):
dtype = DataType(type_id.INT64)
c_val = make_numeric_scalar(dtype.c_obj)
(<numeric_scalar[int64_t]*>c_val.get()).set_value(py_val)
elif isinstance(py_val, float):
dtype = DataType(type_id.FLOAT64)
c_val = make_numeric_scalar(dtype.c_obj)
(<numeric_scalar[double]*>c_val.get()).set_value(py_val)
elif isinstance(py_val, str):
dtype = DataType(type_id.STRING)
c_val = make_string_scalar(py_val.encode())
else:
raise NotImplementedError(f"{type(py_val).__name__} is not supported.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rewrite this pattern with singledispatch? I anticipate this cascade eventually getting arbitrarily large and we'll start paying the price of failing through earlier cases to get to later ones. That also gives you the benefit of (since each overload is a separate function) being able to cdef the c_val type accordingly in each case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, how would you handle creating the final Scalar with multiple dispatched functions? I'm referring to this logic

        cdef Scalar s = Scalar.__new__(Scalar)
        s.c_obj.swap(c_val)
        s._data_type = dtype
        return s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh would this just go in a separate method so it can be used in each dispatched method?

Copy link
Contributor

@Matt711 Matt711 Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eg.

@staticmethod
def _finalize_scalar(c_obj, dtype):
    cdef Scalar s = Scalar.__new__(Scalar)
    s.c_obj.swap(c_obj)
    s._data_type = dtype
    return s

Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, a couple non blocking questions


@classmethod
def from_py(cls, py_val):
"""Convert a Python standard library object to a Scalar.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expand this doc string if this method is public?

dtype = DataType(type_id.STRING)
c_val = make_string_scalar(py_val.encode())
else:
raise NotImplementedError(f"{type(py_val).__name__} is not supported.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is a nitpick (or incorrect) but should we distinguish between a type (that can be converted to a scalar) but is not implemented yet and a type that cannot be converted to a scalar? If its the latter, then we should raise TypeError, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a nitpick, but not incorrect. You're right, that would be optimal. The difficulty is in separating the two. We could have this fallback case be a TypeError and then add another NotImplementedError case that checks isinstance on a bunch of types that we expect to support. It'll be very difficult to capture all of them exhaustively, though. We'll probably have some things falling through to TypeError that we actually want to support for a while if we do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants