Skip to content

Weeks_10and11

Michal J. Gajda edited this page Oct 28, 2020 · 3 revisions
alternate text

(source: http://learnyouahaskell.com,
cc Miran Lipovača)

#10th Week

By now, the current version of the program is faster and need much less memory than the original version. To be able to run this program on a standard computer, it is necessary that the memory consumption decreases a bit more.

##Work To Do For the purpose of being able to run the program on a standard machine, Judy arrays will be included in the program. Judy arrays are a high performance data structure with low memory consumption and they are sorted by default, since the transalign program requires the positions in the alignment to be sorted.

##Work Done

Judy arrays are a time- and memory-efficient data structure. Its elements are sorted by default. This is an important point for transalign since when filling the dynamic programming matrix, the elements need to be sorted.
One of the least efficient parts of transalign was the group_al''' method. Originally, group_al''' was implemented using an IntMap. To make this part more efficient, the IntMap was first replaced by a HashMap during the 9th week. During the last week, Judy arrays were implemented in the code replacing the HashMap. This caused a significant improvement in the running time of the program, since the running time decreased from a total of 1704.68 seconds to 724.29 seconds in total. Also the memory consumption decreased from around 34 GB to around 30 GB. This can be seen in the following profiling statistic and the time and space profiling plots.

###Profiling statistics and plots

1,110,112,107,088 bytes allocated in the heap  
125,637,745,960 bytes copied during GC  
9,567,909,120 bytes maximum residency (29 sample(s))  
4,740,818,072 bytes maximum slop  
30622 MB total memory in use (0 MB lost due to fragmentation)  

                                Tot time (elapsed)  Avg pause  Max pause  
Gen  0     265443 colls,     0 par   120.74s   127.53s     0.0005s    0.6264s  
Gen  1        29 colls,     0 par   22.93s   37.35s     1.2880s    7.4612s  

TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)  

SPARKS: 327 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 327 fizzled)  

INIT    time    0.00s  (  0.00s elapsed)  
MUT     time  580.59s  (590.03s elapsed)  
GC      time  143.67s  (164.88s elapsed)  
RP      time    0.00s  (  0.00s elapsed)  
PROF    time    0.00s  (  0.00s elapsed)  
EXIT    time    0.02s  (  0.02s elapsed)  
Total   time  724.29s  (754.93s elapsed)  

Alloc rate    1,912,044,677 bytes per MUT second  

Productivity  80.2% of total user, 76.9% of total elapsed  
alternate text
alternate text

#11th Week

The running time of transalign reached an acceptable value. The memory consumption could be decreased further to be able to use the program on a standard computer.

##Work To Do

Work a bit more on the memory consumption and finally work on the parallelization.

##Work Done

This week, I worked on the parallelization. I tried to include parMap, parBuffer and parListChunk at different points in the program. I got many different result which could never increase the memory usage and just sometimes had a better running time than the current version. Since our aim is to increase the memory consumption significantly, I decided not to parallelize the program right now. There might be some possibilities but the program is rather complex.

I included parBuffer in merge_aligns, since it has the lowest memory consumtion and an acceptable low running time. Since we focus on the memory consumption, parBuffer is the best solution.

alternate text

#Next Weeks

Here, you can find the entry for the next two weeks.

#References Judy arrays on Wikipedia
Judy arrays on hackage

Back to main page.
Previous weeks.

Clone this wiki locally