-
Notifications
You must be signed in to change notification settings - Fork 431
High memory use when using Python and threads #855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@lh3 I or might be able to spare some time to dig through the Cython (though @marcus1487 is more a Cython person than me). valgrind gave me quite a bit of noise when I quickly ran it yesterday. |
I've looked at this a little today. If I modify the program to not reuse the @lh3 Am I correct in thinking the minimap2 program does not use persistent |
Sorry that I don't use python threads and I don't know how python threads handle global and thread-local memory. Anyway, a |
Actually a ThreadBuffer may shrink. The following block means if the size of the buffer is larger than Lines 367 to 378 in 06fedaa
|
After a bit my prodding from both myself and @jts, I'm fairly well convinced that the high memory use I've observed is simply an accumulation in the size of the thread buffer, nothing untoward in Python or Cython. I do occasionally see sizeable (i.e. 1Gb) deallocations. If the example Python program is changed to periodically use a new ThreadBuffer in each thread, or not pass one to The part that I am still perplexed by is why this is happening in the Python program but not in minimap2 when applied to the same dataset. I have a theory it might simply come down to how work is being processed by the thread pools in the two cases and how often the allocation cap is therefore being hit and the thread buffer being reset. |
After studying things more, I'm relatively well satisfied that in a sense this is the intended behaviour of the code and not a bug per-se. (I will change the title of this issue to reflect this). I have datasets where for aligning HG002 reads GRCh38 and using the (using minimap v2.27: `minimap2 -t 64 -a -x map-ont grch38.fastq.gz reads.fastq.gz) If I disable use of kalloc I see much more stable memory usage, and no loss in performance. This begs the question: when does the use of kalloc out perform vanilla use of malloc in minimap2? |
Malloc performance is system dependent. When minimap2 was developed in 2018, kalloc was giving considerable performance improvement, on our server, over glibc (CentOS 6), musl and rpmalloc and minor improvement over tcmalloc and jemalloc. Similarly for bwa-mem, some users and myself could observe large performance increase with tcmalloc but some other users didn't see this. Minimap2 does frequent heap allocation per read and across threads. Allocators are usually sensitive to this pattern. It is safer to enable kalloc for consistent performance across systems. One thing I may try is to reset kalloc much more frequently, for example, reset per million query bases. The resetting logic is currently implemented here: Lines 362 to 373 in 9b0ff24
Resetting for every read would look like: if (b->km) {
km_destroy(b->km);
b->km = km_init();
} |
Looking at the source code, I realized another way to control kalloc resetting frequency is to add |
For what its worth I'm using:
so nothing blazingly new, but not terribly crusty either. Maybe I'll spend my evening going into the weeds of glibc changes. I've got a few experiments running including setting By the way, I noticed that the define |
Setting Tomorrow I may look at the Python code to see if it can be made to expose these options. |
The program align.py uses mappy to align reads in Python using multiple worker threads. After loading the index the memory usage jumps up quickly to >20Gb and then continues to climb steadily through 40Gb an beyond.
This issue was first discovered in bonito and isolated to mappy. The data flow in the example mirrors that in bonito but reduced to using only Python stdlib functionality.
mappy: v2.24
pysam: v0.18 (just for optionally reading fastq inputs)
python: v3.8.6
Run program, creating query sequences from index on the fly
or using a directory containing
*.fastq*
files:The inputs I am using are available in the AWS S3 bucket at:
I've not fully ascertained if using lots of threads exacerbates the problem or simply makes the symptom apparent more quickly.
The text was updated successfully, but these errors were encountered: