-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast sync MySQL runs into deadlocks #412
Comments
As in my other issue, I worked around this by disabling fast sync manually in the code to force traditional singer sync, which worked fine, presumably due to working in much smaller batches. It would be nice to have a way to disable fast sync on certain tables (or globally) for these situations with a large table on an actively used database. |
Maybe you're right, but would be also great to know the what's causing the deadlock in Redshift. Btw, the redshift connections are not shared in the fastsync processes, do you think that'd cause any issue in Redshift? Also, do you have a specific error message from Redshift and do you have this problem only in very active tables? I'd like to reproduce this problem. |
Sorry, I realized I was not specific enough in my original issue report.
The deadlock is happening on the MySQL side when selecting the data.
We are actually reading from an AWS read replica of a fairly but not hugely
active table (just guessing, but maybe a few updates per minute). This read
replica was configured to itself have row replication enabled so that PPW
can use it as a source.
I just skimmed the fast sync code at the time I was having the issue, but
it seems like it works by selecting large chunks of the data. My assumption
is that an UPDATE is happening at the same time and deadlocking.
This doesn’t seem like an easy problem to solve. I read into isolation
levels, but it seems a little risky - if the fast sync gets a dirty read,
will a future Singer incremental sync pick up the change from row
replication?
…On Thu, May 28, 2020 at 2:05 AM Peter Kosztolanyi ***@***.***> wrote:
Maybe you're right, but would be also great to know the what's causing the
deadlock in Redshift.
FastSync is loading tables in parallel and using the same number of
processes as the number of CPU cores found in the system. This is a subject
to change and we'd like to introduce the parallelism and max_parallelism
options for FastSync that's available in every PPW target components.
Btw, the redshift connections are not shared in the fastsync processes.
- Do you think that'd cause any issue in Redshift?
- Do you have a specific error message from Redshift?
- Do you have this problem only in very active tables?
I'd like to reproduce this problem.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#412 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKA3SV5OYWXRPOC3KR34BTRTYSPBANCNFSM4NAD5OCA>
.
|
Hi, |
FastSync parallelism level is now configurable by a new The tradeoff mentioned by @danielerapati is valid and the performance drops in sync with lowering the fastsync parallelism level. Would be nice if we could reproduce the problem somehow to see what's causing the deadlock in the source db. |
@JasonSanDiego , can you share with me your code change to disable fastsync? I'm experiencing a similar issue and managed to 'fool' pipelinewise into not fastsyncing most of my tables by creating bookmarks in state.json. But for some reason it still attempts fast sync for a few tables. |
For anyone else encountering this issue, I found a way to disable fastsync: In pipelinewise.py, in the run_tap function, find this line (line 1003 at this time): if len(fastsync_stream_ids) > 0: And change it to: if 1 == 0: |
As we also have run into issues with fastsync, I have started a PR to allow turning off fastsync without having to edit and install custom source code, so this should become easier in the future: #697 |
I am trying to sync MySQL to Redshift and finding that fast sync fails due to deadlocks. We are pointing pipelinewise at a MySQL read replica of a fairly active production DB. Unfortunately, the pipelinewise job fails pretty consistently while trying to fast sync one of our larger tables.
Is there some way to disable fast sync and use traditional singer sync only? I couldn't find any way in the documentation, and I'm now crawling through the code looking for a way to comment it out as a test.
The text was updated successfully, but these errors were encountered: