Performance enhancements of conditional logit #81
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the past, I've used Pylogit (specifically the
MNL) on a large dataset of 200mln rows. I have noticed two bottlenecks:weights_per_obsis not always kept, causing a 200 mln x 200 mln dense numpy array to be created, see also issue Sparse to Dense #79.dh_dvfor a conditional logit represent an identity matrix but are coded as acsr_matrix. This causes the calculationdh_dv.dot(design)to be relatively slow even though its result is triviallydesign.To remedy the first bottleneck, I used the same solution proposed in issue #79.
For the second bottleneck, I made an efficient
identity_matrixclass (derived from scipy'sspmatrix). When such an identity matrixIis multiplied withAusingI.dot(A)we getAagain.I've run a benchmark by making a script that estimates an
MNLon the usual Swiss-Metro dataset. I ran theline-profileron some of the critical functions, namelycalc_gradientandcalc_fisher_info_matrix. In summary, this change reduced the computation time ofcalc_gradientby 26% (from 0.080697 to 0.059372), and that ofcalc_fisher_info_matrixby 99% (!) (from 0.906896s to 0.0062323s).Profiling results are attached.
profile_before.txt
profile_after.txt