Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask error in GRNBoost2 #12

Open
jamesrhowe opened this issue Dec 2, 2018 · 0 comments
Open

Dask error in GRNBoost2 #12

jamesrhowe opened this issue Dec 2, 2018 · 0 comments

Comments

@jamesrhowe
Copy link

Hello,
I incorporated pySCENIC into my workflow a while ago, and its been working for me pretty well for a bit. I last used it a month or two ago, and I went to run my code again, but this time I started getting a bunch of dask errors out of the blue.

` if name == 'main': # required to run outside jupyter notebook

import os  # used to interface with the operating system on a basic level
import glob  # finds pathnames for UNIX
import pickle # allows serialization of objects
import pandas as pd  # required for data array manipulation

from dask.diagnostics import ProgressBar  # creates progress bar to check completion
from distributed import Client, LocalCluster

from arboreto.utils import load_tf_names  
from arboreto.algo import grnboost2  

from pyscenic.rnkdb import FeatherRankingDatabase as RankingDatabase # imports cisTarget database metadata
from pyscenic.utils import modules_from_adjacencies, load_motifs # creates modules from GENIE3 adjacencies
from pyscenic.prune import prune2df, df2regulons
from pyscenic.aucell import aucell

# load paths for repeatedly invoked files
motifs_filename = os.path.join("motifs.csv")
regulons_filename = os.path.join("regulons.p")
TF_list_filename = os.path.join("mm_tfs.txt")

# this cell creates the TF list if not made yet
tf_raw = pd.read_csv("TF_import.txt", delimiter = "\t",     
                     error_bad_lines = False, encoding = "ISO-8859-1")  
tfs = tf_raw[["Gene ID", "Evidence Strength"]].drop_duplicates().dropna()    annotations
tfs["ID"] = list(map(int, tfs["Gene ID"]))      
conv_tfs = pd.read_csv("TF_conversion.txt", delimiter = "\t")   
def extract_symbol(name):      
    s_idx = name.rfind('(')
    e_idx = name.rfind(')')
    return name[s_idx+1:e_idx]
conv_tfs["Gene Name"].apply(extract_symbol).to_csv(TF_list_filename, index = False)     
tf_names = load_tf_names(TF_list_filename)     
ex_matrix = pd.read_csv("GENIE3_import.csv", header = 0, index_col = 0).T    
databases_glob = os.path.join("mm10__*.feather") 
db_fnames = glob.glob(databases_glob)
def name(fname):
    return os.path.basename(fname).split(".")[0]
dbs = [RankingDatabase(fname=fname, name=name(fname)) for fname in db_fnames]

client = Client(LocalCluster())

adjacencies = grnboost2(ex_matrix, tf_names = tf_names, verbose = True, client_or_address=client)     
modules = list(modules_from_adjacencies(adjacencies, ex_matrix))  `

And then I get this error upon running it:

adjacencies = grnboost2(ex_matrix, tf_names = tf_names, verbose = True, client_or_address=client) preparing dask client parsing input creating dask graph ~/site-packages/arboreto/algo.py:214: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead. expression_matrix = expression_data.as_matrix() 12 partitions computing dask graph distributed.protocol.core - CRITICAL - Failed to deserialize

I don't really know much about Dask, do you have an idea what could be causing these errors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant