-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Some features contain mostly numeric values except for a few entries that contain a number followed by a string, eg. "3 days" or "4 years" so technically we can use this if we split the entry and just take the number.
We have a list of features that need to be manually "fixed" so the way we are doing this is converting the column to numbers, then checking what the NULL inputs were initially. If we can make use of the column then we would fix the entires or remove them, otherwise drop the entire column.
Here is the list of features: the format is (column, number of NULL)
(ices_daysfriendsconflict', 374),
('ices_daysrelativesconflict', 246),
('ices_regularuse', 203),
('ices_totalyearsused', 151),
('ices_osathour', 99),
('ices_currentdosetime', 63),
('ices_dosetime3', 58),
('ices_dosetime6', 49),
('ices_mealhour', 32),
('ices_weight', 14),
('ices_dosetime9', 13),
('ices_timessex', 12),
('ices_height', 8),
('ices_dosetime12', 7),
('ices_lengthosat', 6),
('ices_selldrug', 6),
('ices_ofvehicleday', 5),
('ices_numsex', 4),
('ices_formunempl', 4),
('ices_propertyday', 4),
('ices_fromvehicleday', 4),
('ices_length12', 4),
('ices_numpaidwork', 3),
('ices_b5cocaine', 2),
('ices_b7amph', 2),
('ices_length3', 2),
('ices_length9', 2),
('ices_b2heroin', 1),
('ices_b4benzo', 1),
('ices_b6crack', 1),
('ices_b8cannabis', 1),
('ices_length6', 1)