Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error of 'distributed' when running GRNboost on server without internet connection #8

Open
WeiCSong opened this issue Sep 16, 2018 · 1 comment

Comments

@WeiCSong
Copy link

Hi arboreto author,
i'm trying to run GRNboost on supercomputer server,which cannot connect internet. my code:

import pandas as pd
from arboreto.utils import load_tf_names
from arboreto.algo import grnboost2
if name == 'main':
in_file = '1.1_exprMatrix_filtered_t.txt'
tf_file = '1.2_inputTFs.txt'
out_file = 'net1_grn_output.tsv'
ex_matrix = pd.read_csv(in_file, sep='\t')
tf_names = load_tf_names(tf_file)
network = grnboost2(expression_data=ex_matrix,
tf_names=tf_names)
network.to_csv(out_file, sep='\t', index=False, header=False)

pandas and arboreto were installed successfully before i upload this task. I got following error message:

/lustre/home/acct-bmelgn/bmelgn-3/.conda/envs/mypython3/lib/python3.7/site-packages/distributed/utils.py:134: RuntimeWarning: Couldn't detect a suitable IP address for reaching '8.8.8.8', defaulting to '127.0.0.1': [Errno 101] Network is unreachable

I followed the example in https://arboreto.readthedocs.io/en/latest/ ,which does not import 'distributed'. But the error message seemed to tell me that 'distrbuted' is trying to connect internet. I wonder whether 'distributed' can be avoided when i run GRNboost. Is there any suggestion for running arboreto on server? Thanks for your help.

ps: at first, i followed example in https://arboreto.readthedocs.io/en/latest/examples.html,which indeed import 'distributed'. But now i use the code listed above(which also come from your example),which seems to have nothing to do with 'distributed'.

@tmoerman
Copy link
Contributor

Hi @goubegou,
thanks for raising this issue. This might indeed be annoying for multiple users.

The current implementation uses distributed even when no explicit Client is specified. Implicitly, a Client connected to a LocalCluster instance is used. A while ago, I filed following issue as a reminder for a future improvement to decouple arboreto from distributed, and use the dask multiprocessing scheduler on single node instead.

Does this error crash the program completely or does it continue despite the error?
I'll look into it when I find the time.

PS: a similar issue has been filed on the distributed github page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants