-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String encoding in structured metadata arrays #3097
Comments
Does the numpy 2.0 StringDtype alter this logic much? I'm considering implementing access to the |
From the docs I can't see that |
OK, can we park strings for a bit so and just raise an error if the schema as a string field? I want to play with this myself on the SC2 data to make sure it all scales. |
I'm not sure we need to error - the current code deals with strings, it's just that they have the |
Right but we'd like that to be a string rather than bytes. Raising an error makes sure we don't forget this and release accidentally. |
#3091 introduced returning a numpy structured array from a compatible metadata buffer. The
StructCodec
allows the specification of string encoding in the schema, however numpy only supportsbytes
andutf-32
inS
andU
dtypes. #3091 therefore returns structured arrays with only theS
dtype for each string field.At the cost of a copy and some shuffling, it would be possible to decode these to the users specification in the schema using
numpy.char.decode
then reassigning the encoded string array back into the structured array. This could be implemented as an option with a boolean flagdecode_strings
tostructured_array_from_buffer
, however asts.X_metadata
is a property this couldn't be set when retrieving the array, so either an additional property on the ts (ts.X_metadata_string_decode
?) or leaving the user to do this via the lower-level code.The text was updated successfully, but these errors were encountered: