Dask error in GRNBoost2 #12

jamesrhowe · 2018-12-02T01:11:44Z

Hello,
I incorporated pySCENIC into my workflow a while ago, and its been working for me pretty well for a bit. I last used it a month or two ago, and I went to run my code again, but this time I started getting a bunch of dask errors out of the blue.

` if name == 'main': # required to run outside jupyter notebook

import os  # used to interface with the operating system on a basic level
import glob  # finds pathnames for UNIX
import pickle # allows serialization of objects
import pandas as pd  # required for data array manipulation

from dask.diagnostics import ProgressBar  # creates progress bar to check completion
from distributed import Client, LocalCluster

from arboreto.utils import load_tf_names  
from arboreto.algo import grnboost2  

from pyscenic.rnkdb import FeatherRankingDatabase as RankingDatabase # imports cisTarget database metadata
from pyscenic.utils import modules_from_adjacencies, load_motifs # creates modules from GENIE3 adjacencies
from pyscenic.prune import prune2df, df2regulons
from pyscenic.aucell import aucell

# load paths for repeatedly invoked files
motifs_filename = os.path.join("motifs.csv")
regulons_filename = os.path.join("regulons.p")
TF_list_filename = os.path.join("mm_tfs.txt")

# this cell creates the TF list if not made yet
tf_raw = pd.read_csv("TF_import.txt", delimiter = "\t",     
                     error_bad_lines = False, encoding = "ISO-8859-1")  
tfs = tf_raw[["Gene ID", "Evidence Strength"]].drop_duplicates().dropna()    annotations
tfs["ID"] = list(map(int, tfs["Gene ID"]))      
conv_tfs = pd.read_csv("TF_conversion.txt", delimiter = "\t")   
def extract_symbol(name):      
    s_idx = name.rfind('(')
    e_idx = name.rfind(')')
    return name[s_idx+1:e_idx]
conv_tfs["Gene Name"].apply(extract_symbol).to_csv(TF_list_filename, index = False)     
tf_names = load_tf_names(TF_list_filename)     
ex_matrix = pd.read_csv("GENIE3_import.csv", header = 0, index_col = 0).T    
databases_glob = os.path.join("mm10__*.feather") 
db_fnames = glob.glob(databases_glob)
def name(fname):
    return os.path.basename(fname).split(".")[0]
dbs = [RankingDatabase(fname=fname, name=name(fname)) for fname in db_fnames]

client = Client(LocalCluster())

adjacencies = grnboost2(ex_matrix, tf_names = tf_names, verbose = True, client_or_address=client)     
modules = list(modules_from_adjacencies(adjacencies, ex_matrix))  `

And then I get this error upon running it:

adjacencies = grnboost2(ex_matrix, tf_names = tf_names, verbose = True, client_or_address=client) preparing dask client parsing input creating dask graph ~/site-packages/arboreto/algo.py:214: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead. expression_matrix = expression_data.as_matrix() 12 partitions computing dask graph distributed.protocol.core - CRITICAL - Failed to deserialize

I don't really know much about Dask, do you have an idea what could be causing these errors?

The text was updated successfully, but these errors were encountered:

bramvds mentioned this issue Dec 4, 2018

distributed.core - ERROR aertslab/pySCENIC#28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dask error in GRNBoost2 #12

Dask error in GRNBoost2 #12

jamesrhowe commented Dec 2, 2018

Dask error in GRNBoost2 #12

Dask error in GRNBoost2 #12

Comments

jamesrhowe commented Dec 2, 2018