Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

django>=1.8 required, speed up bulk_update, new feature #54

Merged
merged 5 commits into from
May 19, 2017

Conversation

arnau126
Copy link
Collaborator

@arnau126 arnau126 commented May 18, 2017

Hi @aykut,

First of all, I know that it's a huge PR with a lot of changes, so there's no rush, and feel free to deny any change. Any comment will be appreciated.

Commit 79549fe "Drop support for django<1.8":

  • Drop suport to django<1.8, and add django>=1.8 as installation requirement.
  • Update python3.4 to python3.5.

Commit 301eaa5 "Rename module from bulk_update to django_bulk_update":

  • To avoid misunderstanding between project name and package name (it's what open source projects usually do).

Commit 9cf8c70 "Remove deprecated code.":

  • Just remove some deprecated code.

Commit 0818448 "Refactor 'bulk_update'."

  • I've refactored bulk_update function in order to speed it up.

I performed a speed test with a model called Lead which has 50 fields:

First I commented out the line that executes the query, so we are just measuring how much time it takes to build the query, not the query executing time.

            lenpks += len(pks)
            del values, pks

            # connection.cursor().execute(sql, parameters)
    return lenpks

Results with the current function:

In [5]: %timeit Lead.objects.bulk_update(Lead.objects.all()[:1000])
1 loop, best of 3: 553 ms per loop

In [6]: %timeit Lead.objects.bulk_update(Lead.objects.all()[:5000])
1 loop, best of 3: 9.06 s per loop

In [7]: %timeit Lead.objects.bulk_update(Lead.objects.all()[:10000])
1 loop, best of 3: 34.6 s per loop

and with the refactored function:

In [5]: %timeit Lead.objects.bulk_update(Lead.objects.all()[:1000])
1 loop, best of 3: 369 ms per loop

In [6]: %timeit Lead.objects.bulk_update(Lead.objects.all()[:5000])
1 loop, best of 3: 1.79 s per loop

In [7]: %timeit Lead.objects.bulk_update(Lead.objects.all()[:10000])
1 loop, best of 3: 3.62 s per loop

Conclusion: the new version is faster and it has a linear growth with respect to the number of objects to update.

Commit 1b67304 "Make bulk_update work with django expressions like the 'F expressions'.":

  • New feature: use django expresions like F, Func, Concat to update object values.
    For example:
people = Person.objects.all()
for person in people:
    person.name = Func(F('name'), function='UPPER')
    person.age = F('age') - idx
    person.text = Concat(F('slug'), Value('@'), F('name'))
Person.objects.bulk_update(people)

Copy link
Owner

@aykut aykut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks a lot.

@aykut
Copy link
Owner

aykut commented May 19, 2017

Hi @arnau126,

Thanks a lot for your work. 👍

@aykut aykut merged commit 2094211 into aykut:master May 19, 2017
@aykut aykut mentioned this pull request May 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants