Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models for open data simplified database #1423

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

Luisella21
Copy link
Contributor

@Luisella21 Luisella21 commented Mar 4, 2025

TP2000-1740 Open data database

There is a desire to use TAP data in open data analysis. The data currently published contain all the historical information, and analysis are complex.

This PR create a set of 'current' data that can be exported every day and used to create reports within TAP.

It creates a shadow model of each tracked model in TAP. The shadow model contains only the approved and current version.
Where feasible, models have been merged: for instance, descriptions are in the described model, not in a different one.

The tables supporting the models are created in a different partition of the database, so they can easily be excluded from the daily back up of the database.

There is a mechanism to populate the data : this will eventually happen daily, but it is not part of the current PR.
Some raw sql is used to copy the data (to improve performance). Even so, the copy process on localhost takes a couple of hours. This may or may not be a problem, it is too early to know.

  • Requires migrations: Yes
  • Requires dependency updates - No

@Luisella21 Luisella21 requested a review from a team as a code owner March 4, 2025 17:03
@TomMacca TomMacca self-assigned this Mar 18, 2025
@TomMacca
Copy link
Contributor

TomMacca commented Mar 18, 2025

Really nice work @Luisella21

One possible addition could be to save the date of the simplified snapshot somewhere in the new reporting schema. It might help that we pass it along from the schema itself to the guys who are going to be making reports so its easier to find out if somethings gone wrong with the transfer/ snapshot. Maybe:

  • A snapshot_date column on all the tables?
  • Or it could be in the table names themselves?
  • Or a metadata table?
    🤔

What do you think?
Another q, I take it the plan is to overwrite the reporting schema each day in TAP?

Copy link
Contributor

@TomMacca TomMacca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved from me, looks great @Luisella21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants