-
Notifications
You must be signed in to change notification settings - Fork 404
Open
Description
I am not able to provide a simple reproducing test case but the following fail
from category_encoder import CountEncoder
# Works as expected
CountEncoder(
drop_invariant=True,
normalize=False,
min_group_size=0.03,
combine_min_nan_groups=True
).fit_transform(df[['a single column]])
# Works as expected
CountEncoder(
drop_invariant=True,
normalize=True,
min_group_size=3,
combine_min_nan_groups=True
).fit_transform(df[['a single column]])
# Doesn't work and returns an empty DataFrame with same number of rows
CountEncoder(
drop_invariant=True,
normalize=True,
min_group_size=0.03,
combine_min_nan_groups=True
).fit_transform(df[['a single column']])
weirdly enough I can't reproduce any of it with a simple test dataframe and i wasn't able to pinpoint any specific of my df column (dtype=object)
df = pd.DataFrame({
'a': ['A', 'A', 'A', 'B', 'B', np.nan],
'b': ['A', 'A', 'B', 'B', 'B', np.nan]
})
#3 works when switching drop_invariant from True to False but doesn't change the result of #2 and #1
- conclusion: drop_invariant have an effect on the result but only on normalize=True and min_group_size as percentage
This does not make sense but i think it needs to be investigated.
I am resorting to not use normalize=True and instead place a standardscaler on the output.
Metadata
Metadata
Assignees
Labels
No labels