Skip to content

Packing on an object-dtyped column may fail if items are not comparable #210

Open
@hombit

Description

@hombit

Bug report

Today @mi-dai and myself were working on packing SALT3 data into nested pandas. When we tried to run nf.add_nested(src, on='SNID', name='lc') it failed, because it happened that SNID column had object dtype and consisted of heterogeneous data: numbers and strings. A reproducible example:

from nested_pandas import NestedFrame

nf = NestedFrame({"SNID": [1, 'abc']})
nf.add_nested(NestedFrame({"SNID": [1, 1, 'abc', 'abc'], "x": [1, 2, 3, 4]}), name='lc', on='SNID')

It failed in our code, basically because this doesn't work in pandas:

nd.set_index('SNID').sort_index()

I'm not sure if it is our problem or not. We potentially may fix this use-case with catching the TypeError and trying to cast on column values to str. Or we can do nothing, because we wouldn't like to change user's data and fix pandas behavior.

Before submitting
Please check the following:

  • I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
  • I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a description of what I expected instead.
  • If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions