-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate more model's IDs to bigint #9492
Labels
Comments
Really good description of the issue 💯 |
stsewd
added a commit
that referenced
this issue
Oct 18, 2022
We could disable search indexing while we do the migration, but I don't think that should be required, we have 11M records, But to migrate the SphinxDomain model it took 15 min, and we had ~56M. ```python In [7]: ImportedFile.objects.count() Out[7]: 11527437 ``` So some 3 min of not being able to index new versions doesn't seem bad... There are two things that could happen: - The query times out and we don't index that version. - The query waits till the migration is done, nothing gets lost. But if we disable search indexing we definitely won't index new versions. We don't use those models outside search indexing, so doc serving the such shouldn't be affected. ref #9492
stsewd
added a commit
that referenced
this issue
Oct 18, 2022
We could disable search indexing while we do the migration, but I don't think that should be required, we have 11M records, but to migrate the SphinxDomain model it took 15 min, and we had ~56M. ```python In [7]: ImportedFile.objects.count() Out[7]: 11527437 ``` So some 3 min of not being able to index new versions doesn't seem bad... There are two things that could happen: - The query times out and we don't index that version. - The query waits till the migration is done, nothing gets lost. But if we disable search indexing we definitely won't index new versions. We don't use those models outside search indexing, so doc serving and such shouldn't be affected. ref #9492
stsewd
added a commit
that referenced
this issue
Oct 18, 2022
How to deploy We create page views on 404 and on page views (duh), so while we do the migration this may slow down doc serving (specially on .com where we have this feature enable for everyone), so in order to avoid that we need to disable page views while we do the migration. Luckily we already have a feature flag for that: https://github.com/readthedocs/readthedocs.org/blob/a09bc1a976a93bcc3f987fa0a052901f0065619f/readthedocs/projects/models.py#L1897-L1900 ref #9492
stsewd
added a commit
that referenced
this issue
Oct 18, 2022
We have 5M records, so migration shouldn't take that long (1-2 min?), and we use a task to create the records, so this shouldn't affect search. ```python In [1]: SearchQuery.objects.count() Out[1]: 5062590 ``` Ref #9492
stsewd
added a commit
that referenced
this issue
May 25, 2023
How to deploy We create page views on 404 and on page views (duh), so while we do the migration this may slow down doc serving (specially on .com where we have this feature enable for everyone), so in order to avoid that we need to disable page views while we do the migration. Luckily we already have a feature flag for that: https://github.com/readthedocs/readthedocs.org/blob/a09bc1a976a93bcc3f987fa0a052901f0065619f/readthedocs/projects/models.py#L1897-L1900 ref #9492
stsewd
added a commit
that referenced
this issue
Sep 26, 2023
* SearchQuery: use BigAutoField for primary key We have 5M records, so migration shouldn't take that long (1-2 min?), and we use a task to create the records, so this shouldn't affect search. ```python In [1]: SearchQuery.objects.count() Out[1]: 5062590 ``` Ref #9492 * Linter
The only "big" table that's missing migration is ImportedFile, currently at 32%. Since we are no longer creating a record per each html page, the growth rate should slow down now. Open PR to migrate that id is at #9669. |
How to calculate the percent: max_int = 2**31 - 1
current_id = Model.objects.order_by('id').last().id
current_id * 100 / max_int |
stsewd
added a commit
that referenced
this issue
Jan 30, 2025
We could disable search indexing while we do the migration, but I don't think that should be required, we have 11M records, but to migrate the SphinxDomain model it took 15 min, and we had ~56M. ```python In [7]: ImportedFile.objects.count() Out[7]: 11527437 ``` So some 3 min of not being able to index new versions doesn't seem bad... There are two things that could happen: - The query times out and we don't index that version. - The query waits till the migration is done, nothing gets lost. But if we disable search indexing we definitely won't index new versions. We don't use those models outside search indexing, so doc serving and such shouldn't be affected. ref #9492
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
These are the tables that can grow quite large (the percent is how many IDs we have consumed from the largest possible, 2^31 - 1):
The other important tables (projects, versions, etc) are less than 1%, so we are fine there.
We already experienced this for the SphinxDomains table (#9482, #9483). The migration took around 15 min, and we temporally disabled all access to those models so they won't hang till the migration was completed, the current models are still small, so I think we should be fine without having to temporarily disabling them.
We should also make sure to use a bigint for all new models (django's create app already does this). We can't change the global default since it will change the IDs of existing models, and may require some downtime...
And these are the numbers for .com
The text was updated successfully, but these errors were encountered: