Skip to content

handling NA values in the input #42

@darked89

Description

@darked89

Hello,

I have a fairly large gene expression matrix (~20k genes, 2k samples) with luckily just a handful of NA values.
At the moment the affinityprop just stops reporting an parsing error if the input contains NA value somewhere.

I just dropped a small number (<20 ) out of 20k rows and kept on going, so this is not urgent by any means.
At the same time I expect to get data with way more NAs, so figuring out how to handle these better (if possible..) is important.

Apart from just dropping either rows (genes at this point) or columns (patient data), is there some other way of clustering such data?
I am reluctant to replace NAs with say median expression value for the gene since this may interfere with clustering.

Would it be possible to get something like a input data parsing log:

X rows out of Y with at least one NA value
P columns out of Q with at least one NA value

dropping X rows / bailing out if X/Y > some_threshold (10%? 1/3?)

Best wishes,

Darek Kedra

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions