oNMF call uses max number of cores #12

wmacnair · 2020-11-26T10:20:59Z

Hi

I'm trying to use popalign on a cluster, and finding that I can't stop it from using all available computing power. This is not making me any friends...!

Once I get to the PA.onmf call, every core on the cluster jumps to 100%. This is despite manually setting pop['ncores'] = 8. I've also tried multiple other things without success (see below, where I have thrown in everything I've been able to find)...

Thanks for any help,
Will

import os
os.environ["OMP_NUM_THREADS"]           = "8" # export OMP_NUM_THREADS=1
os.environ["OPENBLAS_NUM_THREADS"]      = "8" # export OPENBLAS_NUM_THREADS=1
os.environ["MKL_NUM_THREADS"]           = "8" # export MKL_NUM_THREADS=1
os.environ["VECLIB_MAXIMUM_THREADS"]    = "8" # export VECLIB_MAXIMUM_THREADS=1
os.environ["NUMEXPR_NUM_THREADS"]       = "8" # export NUMEXPR_NUM_THREADS=1

import popalign as PA
import pickle
import numpy as np
from scipy import io as sio
from multiprocessing import Pool
import torch

torch.set_num_threads(8)

def main(types_list):
    pool        = Pool(processes=8)
    
    save_dir    = 'output/ms27_popalign'
    genes_f     = os.path.join(save_dir, 'oligo_features.tsv')
    samples     = {t: os.path.join(save_dir, t + '.mtx') for t in types_list}

    print('loading samples')
    pop = PA.load_samples(samples=samples, genes=genes_f, outputfolder=save_dir)
    # pop = load_samples_hack(samples=samples, genes=genes_f, outputfolder=save_dir)
    pop['ncores'] = 8
    print('normalizing')
    PA.normalize(pop)
    print('identifying HVGs')
    PA.plot_gene_filter(pop, offset=1.3)
    PA.filter(pop, remove_ribsomal=False, remove_mitochondrial=False)
    print('running oNMF')
    PA.onmf(pop, ncells=5000, nfeats=np.arange(2,20,2).tolist(), nreps=3, niter=200)
    print('running GSEA')
    PA.choose_featureset(pop, alpha = 3, multiplier=3)
    # PA.gsea(pop, geneset='c5bp')

    # save
    print('saving')
    pop_f       = os.path.join(save_dir, 'popalign_obj.p')
    pickle.dump(pop, open(pop_f, "wb"))

The text was updated successfully, but these errors were encountered:

sisichen-dev · 2020-11-26T18:33:33Z

Hey Will, Thanks for letting us know about this issue! I'm going to have to consult someone else about this because I'm not an expert on Python multiprocessing. We've had other issues on AWS servers due to the different versions of dependencies, so it could be something specific to your cluster. Just to let you know, we're on holiday in the US until next week so I probably won't get a response until then. Until then, do you need to run the code on a software cluster or is it something you run locally on your machine? How large is your dataset? I can usually run the code on up to 50-100k cells on my macbook pro. Sisi

…

On Thu, Nov 26, 2020 at 4:21 AM Will Macnair ***@***.***> wrote: Hi I'm trying to use popalign on a cluster, and finding that I can't stop it from using all available computing power. This is not making me any friends...! Once I get to the PA.onmf call, every core on the cluster jumps to 100%. This is despite manually setting pop['ncores'] = 8. I've also tried multiple other things without success (see below, where I have thrown in everything I've been able to find)... Thanks for any help, Will import osos.environ["OMP_NUM_THREADS"] = "8" # export OMP_NUM_THREADS=1os.environ["OPENBLAS_NUM_THREADS"] = "8" # export OPENBLAS_NUM_THREADS=1os.environ["MKL_NUM_THREADS"] = "8" # export MKL_NUM_THREADS=1os.environ["VECLIB_MAXIMUM_THREADS"] = "8" # export VECLIB_MAXIMUM_THREADS=1os.environ["NUMEXPR_NUM_THREADS"] = "8" # export NUMEXPR_NUM_THREADS=1 import popalign as PAimport pickleimport numpy as npfrom scipy import io as siofrom multiprocessing import Poolimport torch torch.set_num_threads(8) def main(types_list): pool = Pool(processes=8) save_dir = 'output/ms27_popalign' genes_f = os.path.join(save_dir, 'oligo_features.tsv') samples = {t: os.path.join(save_dir, t + '.mtx') for t in types_list} print('loading samples') pop = PA.load_samples(samples=samples, genes=genes_f, outputfolder=save_dir) # pop = load_samples_hack(samples=samples, genes=genes_f, outputfolder=save_dir) pop['ncores'] = 8 print('normalizing') PA.normalize(pop) print('identifying HVGs') PA.plot_gene_filter(pop, offset=1.3) PA.filter(pop, remove_ribsomal=False, remove_mitochondrial=False) print('running oNMF') PA.onmf(pop, ncells=5000, nfeats=np.arange(2,20,2).tolist(), nreps=3, niter=200) print('running GSEA') PA.choose_featureset(pop, alpha = 3, multiplier=3) # PA.gsea(pop, geneset='c5bp') # save print('saving') pop_f = os.path.join(save_dir, 'popalign_obj.p') pickle.dump(pop, open(pop_f, "wb")) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#12>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABMHHODWMDXIR66FAKF4K2TSRYTZTANCNFSM4UDQ7FRA> .

wmacnair · 2020-11-27T07:52:53Z

Hey Sisi

Thanks for getting back so quickly. In the end I tried it out on the desktop and it ran fine in a few minutes so there's no real urgency to fixing this, at least from my side. I'm also not familiar with python multiprocessing, so I was hoping to learn something ;)

Enjoy Thanksgiving!
Will

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oNMF call uses max number of cores #12

oNMF call uses max number of cores #12

wmacnair commented Nov 26, 2020

sisichen-dev commented Nov 26, 2020 via email

wmacnair commented Nov 27, 2020

oNMF call uses max number of cores #12

oNMF call uses max number of cores #12

Comments

wmacnair commented Nov 26, 2020

sisichen-dev commented Nov 26, 2020 via email

wmacnair commented Nov 27, 2020