-
Notifications
You must be signed in to change notification settings - Fork 1
Description
I wanted to reproduce the following pandas workflow (all_forced_sources_w19 is flat pandas df with sources):
'''Identify all columns that contain 'flag' '''
flag_cols = [col for col in all_forced_sources_w19.columns if 'flag' in col.lower()]'''Exclude rows where any flag column is True'''
flag_mask = ~(all_forced_sources_w19[flag_cols].any(axis=1)) # True where all flags are False
all_forced_sources_w19_clean = all_forced_sources_w19[flag_mask]
The solution we found via .query
was something like
'''Identify flag columns'''
flag_cols = [col for col in dia_object_lc_computed.diaObjectForcedSource.nest.fields if 'flag' in col.lower()]
'''Build the condition string, e.g., "flag1 == False & flag2 == False & ..." '''
query_str = " & ".join([f"diaObjectForcedSource.{col} == False" for col in flag_cols])
'''Apply query directly to the nested column'''
dia_object_lc_computed_filtered = dia_object_lc_computed.query(query_str)
I wish I was able to more directly mask, something like
flag_mask = ~(dia_object_lc_computed.diaObjectForcedSource.nest.fields[flag_cols].any(axis=1))
dia_object_lc_computed.diaObjectForcedSource = dia_object_lc_computed.diaObjectForcedSource[flag_mask]