Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent storage solution & refactor #24

Merged
merged 15 commits into from
Feb 12, 2025
Merged

Persistent storage solution & refactor #24

merged 15 commits into from
Feb 12, 2025

Conversation

Sarahtonein
Copy link
Collaborator

Whats new

  • Persistent storage solution in the form of postgres DB
  • Version history of files
  • Scripts to migrate, query, and clear the database
  • Compatibility across Heroku, or Docker, in the event we change
  • Refactor of code into a more domain driven architecture style
  • Remove old files / unnecessary code
  • Fixed a bug where creations / updates were heavily delayed, or missed.

How to test

  • Configure .env as per .env.example
  • docker-compose up -d db
  • pipenv shell
  • python -m src.scripts.migrate
  • python -m src.scripts.query

Once the migration has completed you can either run heroku local, or docker.
NOTE:

The existing db.json is a bit outdated, and formatted in a way that some things weren't titled appropriately this will result in extra discord notifications.
A suggestion is that we migrate the staging postgres DB into production, so that it continues from a point in time snapshot and reduces the number of discord notifications (if any)

@Sarahtonein Sarahtonein requested a review from bagelface February 7, 2025 05:24
# Create async engine
engine = None
if os.getenv("ENV") != "TEST":
url = get_database_url().replace("postgres://", "postgresql+asyncpg://", 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you're using "asyncpg"? the Heroku docs call for "psycopg2" https://devcenter.heroku.com/articles/connecting-heroku-postgres#connecting-in-python so this should be: url = get_database_url().replace("postgres://", "postgresql+psycopg2://", 1)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was using this as it's supposedly more efficient but will go with what Heroku docs stipulates

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was using this as it's supposedly more efficient but will go with what Heroku docs stipulates

So I can use psycopg2 with heroku locally for async operations but when using in docker it doesn't work.

If we aren't concerned about it we can move on, otherwise we'll need to use something like asyncpg alongside it for cross compatability I think.

bot_1  | sqlalchemy.exc.InvalidRequestError: The asyncio extension requires an async driver to be used. The loaded 'psycopg2' is not async.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what context are we going to use Docker? if it's just for testing, we can potentially have the environment variable dictate what the database url prefix should be. Also, i'm curious if it's unspecified and we just used postgres:// will it work

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what context are we going to use Docker? if it's just for testing, we can potentially have the environment variable dictate what the database url prefix should be. Also, i'm curious if it's unspecified and we just used postgres:// will it work

I was thinking in terms of 'backward' compatability so to say in the event we moved back to docker. After more thought this seems unecessary so I'll continue with psycopg2 and disregard the need for Docker to be used for the Notion integration.

Copy link
Contributor

@bagelface bagelface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of code changes so didn't scrutinize it super closely, but overall everything looks good. A few requested changes.

Made a new dev branch. Please update to target that branch, so we can add the other changes you mentioned in your Discord message and get them all merged in to main in one go (it will auto deploy).

As for migration, I think if the notifications aren't too crazy, we just run it the same way on production as we do on staging and let it "catch up". But if we see that it's an insane amount of notifications, then I'm ok with migrating "db.json -> postgres" on staging and then "postgres -> postgres" on production.

…nsitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
if os.getenv("ENV") != "TEST":
url = get_database_url().replace("postgres://", "postgresql+asyncpg://", 1)
sanitized_url = url.split('@')[0].split(':')[0] + ':***@' + url.split('@')[1]
print(f"Connecting to {sanitized_url}")

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.

Copilot Autofix AI 11 days ago

To fix the problem, we should avoid logging any part of the database URL that contains sensitive information. Instead, we can log a generic message indicating that the connection to the database is being attempted without including the URL. This way, we maintain the functionality of logging the connection attempt without exposing any sensitive information.

  • Remove the logging of sanitized_url and replace it with a generic message.
  • Update the code in src/infrastructure/config/database.py to reflect this change.
Suggested changeset 1
src/infrastructure/config/database.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/infrastructure/config/database.py b/src/infrastructure/config/database.py
--- a/src/infrastructure/config/database.py
+++ b/src/infrastructure/config/database.py
@@ -25,4 +25,3 @@
     url = get_database_url().replace("postgres://", "postgresql+asyncpg://", 1)
-    sanitized_url = url.split('@')[0].split(':')[0] + ':***@' + url.split('@')[1]
-    print(f"Connecting to {sanitized_url}")
+    print("Attempting to connect to the database...")
 
EOF
@@ -25,4 +25,3 @@
url = get_database_url().replace("postgres://", "postgresql+asyncpg://", 1)
sanitized_url = url.split('@')[0].split(':')[0] + ':***@' + url.split('@')[1]
print(f"Connecting to {sanitized_url}")
print("Attempting to connect to the database...")

Copilot is powered by AI and may make mistakes. Always verify output.
@Sarahtonein Sarahtonein committed this autofix suggestion 11 days ago.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
@Sarahtonein Sarahtonein changed the base branch from main to dev February 10, 2025 02:20
@Sarahtonein
Copy link
Collaborator Author

Sarahtonein commented Feb 10, 2025

Lots of code changes so didn't scrutinize it super closely, but overall everything looks good. A few requested changes.

Made a new dev branch. Please update to target that branch, so we can add the other changes you mentioned in your Discord message and get them all merged in to main in one go (it will auto deploy).

As for migration, I think if the notifications aren't too crazy, we just run it the same way on production as we do on staging and let it "catch up". But if we see that it's an insane amount of notifications, then I'm ok with migrating "db.json -> postgres" on staging and then "postgres -> postgres" on production.

Because of the seemingly missed updates / creates in the tinyDB implementation there would be quite a lot of notifications (50-100+ iirc) - can display this in the test/staging server. I think we should proceed with the postgres -> postgres option between staging and prod

Sarahtonein and others added 2 commits February 10, 2025 06:15
…nsitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@bagelface bagelface merged commit 2f829f9 into dev Feb 12, 2025
@bagelface bagelface deleted the state branch February 12, 2025 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants