handling NA values in the input

Hello,

I have a fairly large gene expression matrix (~20k genes, 2k samples) with luckily just a handful of NA values. 
At the moment the ```affinityprop``` just stops reporting an parsing error if the input contains NA value somewhere.

I just dropped a small number (<20 ) out of 20k rows and kept on going, so this is not urgent by any means.
At the same time I expect to get data with way more NAs, so figuring out how to handle these better (if possible..) is important.

Apart from just dropping either rows (genes at this point) or columns (patient data), is there some other way of clustering such data?
I am reluctant to replace NAs with say median expression value for the gene since this may interfere with clustering. 

Would it be possible to get something like a input data parsing log:
```
X rows out of Y with at least one NA value
P columns out of Q with at least one NA value

dropping X rows / bailing out if X/Y > some_threshold (10%? 1/3?)
```
Best wishes,  

Darek Kedra


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handling NA values in the input #42

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

handling NA values in the input #42

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions