Skip to content

Question Regarding precompute_conversion Runtime and Multi-threading #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
QianhuiWan opened this issue May 11, 2025 · 7 comments
Open
Assignees

Comments

@QianhuiWan
Copy link

Hello, I have been running precompute_conversion for CHM13, but it has taken more than 3 days and has only produced about 4GB of output so far. I expect the total output to be around 30GB.

Is there any parameter available to enable multi-threading for this function?

Thank you so much.
Best regards,
Qianhui

@juanfmacias
Copy link
Contributor

Hi Qianhui,

Thank you for your interest!

It is indeed a very slow process as currently implemented. There is a parallelized implementation, BUT it is very much not resource friendly. When I wrote this I expected the precompute_conversion would be run very rarely. So I allowed it be be pretty slow.

I am actively working on improving the speed and efficiency. I am hopeful i'll have a much faster release this coming week.

Best,
Juan

@QianhuiWan
Copy link
Author

Hi Juan, that's very helpful. I think the HPC I'm currently using has a time limit of one week per job. I'll try setting the longest allowed time so the job can finish. Thank you so much!

Best,
Qianhui

@juanfmacias
Copy link
Contributor

Ok. So it is much much faster now and less resource intensive. Both the pre-computation and later the processing of real data. Should now take less than a day. I am putting together the updates and new docs.

What graph are you using exactly?

@QianhuiWan
Copy link
Author

QianhuiWan commented May 19, 2025

Cool, thank you so much! I am using hprc_v1_1_mc_chm13.gfa graph genome for now. I noticed that writing to disk becomes slower after reaching 5GB. Could this be due to accumulated memory (RAM) usage?

@juanfmacias
Copy link
Contributor

juanfmacias commented May 27, 2025

Ok. So for the sake of practicality as I am updating the docs. I went ahead and ran the process on hprc_v1_1_mc_chm13. This way you can use that directly without having to regenerate it yourself. I've updated the docs documenting how to generate files needed to do surject real results it and the new method for how to run that surjection. I am setting up the FTP so you can download the essential file. Let me know if the new docs make sense to you

@juanfmacias
Copy link
Contributor

There are some hangups on how best to host these files. So while we work that out, if you could email me I will send you a box link directly to where you can access the databases files you need. My email is on the read me page at the bottm

@QianhuiWan
Copy link
Author

Great, that would be very helpful. I'm sending you an email now. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants