Migrate more model's IDs to bigint #9492

stsewd · 2022-08-10T23:01:39Z

These are the tables that can grow quite large (the percent is how many IDs we have consumed from the largest possible, 2^31 - 1):

ImportedFile (22%)
PageView (9%)
SearchQuery (2%)

The other important tables (projects, versions, etc) are less than 1%, so we are fine there.

We already experienced this for the SphinxDomains table (#9482, #9483). The migration took around 15 min, and we temporally disabled all access to those models so they won't hang till the migration was completed, the current models are still small, so I think we should be fine without having to temporarily disabling them.

We should also make sure to use a bigint for all new models (django's create app already does this). We can't change the global default since it will change the IDs of existing models, and may require some downtime...

And these are the numbers for .com

SphinxDomain (17%)
ImportedFile (4%)
PageView (1%)
SearchQuery (0.18%)
Auditlog (0.07%)

humitos · 2022-08-11T07:32:53Z

Really good description of the issue 💯

We could disable search indexing while we do the migration, but I don't think that should be required, we have 11M records, But to migrate the SphinxDomain model it took 15 min, and we had ~56M. ```python In [7]: ImportedFile.objects.count() Out[7]: 11527437 ``` So some 3 min of not being able to index new versions doesn't seem bad... There are two things that could happen: - The query times out and we don't index that version. - The query waits till the migration is done, nothing gets lost. But if we disable search indexing we definitely won't index new versions. We don't use those models outside search indexing, so doc serving the such shouldn't be affected. ref #9492

We could disable search indexing while we do the migration, but I don't think that should be required, we have 11M records, but to migrate the SphinxDomain model it took 15 min, and we had ~56M. ```python In [7]: ImportedFile.objects.count() Out[7]: 11527437 ``` So some 3 min of not being able to index new versions doesn't seem bad... There are two things that could happen: - The query times out and we don't index that version. - The query waits till the migration is done, nothing gets lost. But if we disable search indexing we definitely won't index new versions. We don't use those models outside search indexing, so doc serving and such shouldn't be affected. ref #9492

How to deploy We create page views on 404 and on page views (duh), so while we do the migration this may slow down doc serving (specially on .com where we have this feature enable for everyone), so in order to avoid that we need to disable page views while we do the migration. Luckily we already have a feature flag for that: https://github.com/readthedocs/readthedocs.org/blob/a09bc1a976a93bcc3f987fa0a052901f0065619f/readthedocs/projects/models.py#L1897-L1900 ref #9492

We have 5M records, so migration shouldn't take that long (1-2 min?), and we use a task to create the records, so this shouldn't affect search. ```python In [1]: SearchQuery.objects.count() Out[1]: 5062590 ``` Ref #9492

How to deploy We create page views on 404 and on page views (duh), so while we do the migration this may slow down doc serving (specially on .com where we have this feature enable for everyone), so in order to avoid that we need to disable page views while we do the migration. Luckily we already have a feature flag for that: https://github.com/readthedocs/readthedocs.org/blob/a09bc1a976a93bcc3f987fa0a052901f0065619f/readthedocs/projects/models.py#L1897-L1900 ref #9492

* SearchQuery: use BigAutoField for primary key We have 5M records, so migration shouldn't take that long (1-2 min?), and we use a task to create the records, so this shouldn't affect search. ```python In [1]: SearchQuery.objects.count() Out[1]: 5062590 ``` Ref #9492 * Linter

stsewd · 2023-09-27T21:03:23Z

The only "big" table that's missing migration is ImportedFile, currently at 32%. Since we are no longer creating a record per each html page, the growth rate should slow down now.

Open PR to migrate that id is at #9669.

stsewd · 2023-09-27T21:05:43Z

How to calculate the percent:

max_int = 2**31 - 1
current_id = Model.objects.order_by('id').last().id
current_id * 100 / max_int

We could disable search indexing while we do the migration, but I don't think that should be required, we have 11M records, but to migrate the SphinxDomain model it took 15 min, and we had ~56M. ```python In [7]: ImportedFile.objects.count() Out[7]: 11527437 ``` So some 3 min of not being able to index new versions doesn't seem bad... There are two things that could happen: - The query times out and we don't index that version. - The query waits till the migration is done, nothing gets lost. But if we disable search indexing we definitely won't index new versions. We don't use those models outside search indexing, so doc serving and such shouldn't be affected. ref #9492

stsewd added Improvement Minor improvement to code Accepted Accepted issue on our roadmap labels Aug 18, 2022

stsewd mentioned this issue Oct 18, 2022

ImportedFile: use BigAutoField for primary key #9669

Merged

stsewd mentioned this issue Oct 18, 2022

PageView: use BigAutoField for primary key #9670

Merged

stsewd mentioned this issue Oct 18, 2022

SearchQuery: use BigAutoField for primary key #9671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate more model's IDs to bigint #9492

Migrate more model's IDs to bigint #9492

stsewd commented Aug 10, 2022 •

edited

Loading

humitos commented Aug 11, 2022

stsewd commented Sep 27, 2023

stsewd commented Sep 27, 2023

Migrate more model's IDs to bigint #9492

Migrate more model's IDs to bigint #9492

Comments

stsewd commented Aug 10, 2022 • edited Loading

humitos commented Aug 11, 2022

stsewd commented Sep 27, 2023

stsewd commented Sep 27, 2023

stsewd commented Aug 10, 2022 •

edited

Loading