-
Notifications
You must be signed in to change notification settings - Fork 1
Weeks_1and2
(source: http://learnyouahaskell.com,
cc Miran Lipovača)
19th of May: official start of Google Summer of Code!
Hey ho, lets go!
#First Week
In my original timeline, this is the milestone for the first two weeks:
Milestone 1: Have a complete analysis of the program’s time and space profiling statistics.
##Work To Do
Analysis of time and space consumption of the program (Profiling) to see which parts of the code need the most time and memory. When those parts will be found, think about a possibility to change the code in a way that the time and space consumption will be improved.
##Work Done
I did some profiling runs to see the productivity of the program, the time spend for garbage collection and allocation rate. Additionally, a detailed profiling run shows these information for each of the methods separately. Here, you can see a summary of the results.
Overview:
INIT time 0.00s ( 0.00s elapsed)
MUT time 699.10s (729.01s elapsed)
GC time 1169.81s (1329.56s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.08s ( 1.59s elapsed)
Total time 1868.99s (2060.17s elapsed)
%GC time 62.6% (64.5% elapsed)
Alloc rate 1,694,144,469 bytes per MUT second
Productivity 37.4% of total user, 33.9% of total elapsed
It can be seen that the program uses most of the time for garbage collection which lowers its productivity a lot. Thus, the amount of memory which is allocated during runtime needs to be reduced.
Detailed statistic:
transalign_prof +RTS -p -RTS 452.xml U50_vs_SP.xml
total time = 753.48 secs (753482 ticks @ 1000 us, 1 processor)
total alloc = 782,868,820,008 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
group_al'''.merge Align 39.0 43.7
trans_align Align 38.4 9.7
group_al'''.toMap Align 8.3 26.7
group_al''' Align 6.3 6.0
readAlignmentCache BlastCache 1.6 5.0
group_al'''.toList Align 1.1 1.3
merge1 Align 1.1 1.6
readAlignments Blast 0.9 1.9
readAlignmentCache.unconvert BlastCache 0.4 1.3
In the detailed analysis, a list is given which shows the methods having a large time and space consumption. In this way, I see which methods have to be changed. There are some more important methods which should be changed, but they are not listed in this summary.
#Second Week
In my original timeline, this is the milestone for the first two weeks:
Milestone 1: Have a complete analysis of the program’s time and space profiling statistics.
##Work To Do
I will look at the methods which need the largest amounts of time and space. I will think about how to change those methods or if it is necessary to rewrite them.
##Work Done
I did some space profiling statistics and plots to see which methods have the highest space consumption. In the space profiling statistics, it could be seen that a large part of the allocation is due to lists. That's why I also did a tracing to see the size of the lists.
In the folling subsections you can see plots for the standard profiling showing different methods and a profiling plot showing which constructors allocate the highest amount of space. Afterwards, you see the summary of the tracing.
At the top of the plots, you can see which compiler flags I used to generate those statistics:
- +RTS: linkage with a non-trivial runtime system
- -p: profiling
- -K100M: limitation of the stack space
- -hc: standard heap profiling
- -hd: profiling showing allocation by constructors
The plots were done using the Haskell package hp2pretty. As an input, hp2pretty takes the output file of the profiling run, the .hp file. See also the references for some more information.
###Standard heap profiling
If you look at the keys on the right, it can be seen that the first, second and fourth entry is due to reading the input files. The third entry and all others from the fifth entry on are from the 'real' program, mostly from the methods trans_align and group_al'''.
###Profiling showing the allocating constructors The keys on the right shows as a first entry the list constructor since lists are used very often. The second and third entry are due to reading the input. The following two entry occur in the transalign program as positions of the alignment.
###Tracing
The current version of transalign uses sparse matrices for the alignment algorithm. As the time and space profiling showed, lists are one of the causes for high time and space consumption. Tracing shows the sizes of the lists while the program is running.
One of the traces is done in the group_al''' method. This method uses lists to group together positions in the alignment. The output of the tracing is a table showing for each call of the group_al''' method the sizes of the lists. The list has the type [Pos,[(Pos,Score)]]. In the first column, the length of the outer list is given. The second column show the sum of the lengths of the inner lists. In this way, a small number would show a sparsa matrix, a high number shows that there are many lists which are not empty.
The table shows a summary of the output table done with R. The average values are 1829 and 164982 which indicates that the matrix is not a sparse matrix.
length y sum $ map (length snd) y
Min. : 5 Min. : 5
1st Qu.: 448 1st Qu.: 1351
Median :1881 Median : 5336
Mean :1829 Mean : 164982
3rd Qu.:2967 3rd Qu.: 31296
Max. :3954 Max. :11056220
The second table gives an analysis of the output of the transalign program started with a test set. This test set was also used for the profiling. Again, the summary is done with R.
The output of the program is a table showing the scores, lengths and positions of the aligned sequences of query and target (these columns are not shown here). p0, pn, q0 and qn are the start and end positions on both sequences. For this test set alignments with an average length of 412 were found.
score length score/length
Min. : 0.79 Min. : 5.0 Min. :0.0800
1st Qu.: 48.23 1st Qu.: 169.5 1st Qu.:0.1900
Median : 89.06 Median : 343.0 Median :0.2800
Mean : 123.25 Mean : 412.4 Mean :0.2987
3rd Qu.: 131.25 3rd Qu.: 503.5 3rd Qu.:0.3900
Max. :1369.80 Max. :2634.0 Max. :1.2200
p0 pn q0 qn
Min. : 1 Min. : 2458 Min. : 1.0 Min. : 20.0
1st Qu.: 4546 1st Qu.: 8524 1st Qu.: 47.0 1st Qu.: 490.5
Median : 7825 Median :10792 Median : 121.0 Median : 673.0
Mean : 7567 Mean :10353 Mean : 281.1 Mean : 1292.6
3rd Qu.:10231 3rd Qu.:12646 3rd Qu.: 292.5 3rd Qu.: 1255.5
Max. :12856 Max. :12913 Max. :2732.0 Max. :35213.0
#Next Weeks
Here, you can find the entry for the next two weeks.
#References
Profiling
Compiler flags for profiling
hp2pretty
Back to main page.