Skip to content

Allow filtering via masking #273

@nevencaplar

Description

@nevencaplar

I wanted to reproduce the following pandas workflow (all_forced_sources_w19 is flat pandas df with sources):

'''Identify all columns that contain 'flag' '''
flag_cols = [col for col in all_forced_sources_w19.columns if 'flag' in col.lower()]

'''Exclude rows where any flag column is True'''
flag_mask = ~(all_forced_sources_w19[flag_cols].any(axis=1)) # True where all flags are False
all_forced_sources_w19_clean = all_forced_sources_w19[flag_mask]

The solution we found via .query was something like

'''Identify flag columns'''
flag_cols = [col for col in dia_object_lc_computed.diaObjectForcedSource.nest.fields if 'flag' in col.lower()]
'''Build the condition string, e.g., "flag1 == False & flag2 == False & ..." '''
query_str = " & ".join([f"diaObjectForcedSource.{col} == False" for col in flag_cols])
'''Apply query directly to the nested column'''
dia_object_lc_computed_filtered = dia_object_lc_computed.query(query_str)

I wish I was able to more directly mask, something like

flag_mask = ~(dia_object_lc_computed.diaObjectForcedSource.nest.fields[flag_cols].any(axis=1))
dia_object_lc_computed.diaObjectForcedSource = dia_object_lc_computed.diaObjectForcedSource[flag_mask]

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions