-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hello,
I have a fairly large gene expression matrix (~20k genes, 2k samples) with luckily just a handful of NA values.
At the moment the affinityprop just stops reporting an parsing error if the input contains NA value somewhere.
I just dropped a small number (<20 ) out of 20k rows and kept on going, so this is not urgent by any means.
At the same time I expect to get data with way more NAs, so figuring out how to handle these better (if possible..) is important.
Apart from just dropping either rows (genes at this point) or columns (patient data), is there some other way of clustering such data?
I am reluctant to replace NAs with say median expression value for the gene since this may interfere with clustering.
Would it be possible to get something like a input data parsing log:
X rows out of Y with at least one NA value
P columns out of Q with at least one NA value
dropping X rows / bailing out if X/Y > some_threshold (10%? 1/3?)
Best wishes,
Darek Kedra