Skip to content

Add detach replica in shutdown function when error occurs during apply segment of journal #8688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 16 additions & 7 deletions src/remote/server/ReplServer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -384,11 +384,11 @@ namespace
provider->attachDatabase(&localStatus, m_config->dbName.c_str(),
dpb.getBufferLength(), dpb.getBuffer());
localStatus.check();
m_attachment.assignRefNoIncr(att);
m_attachment = att;

const auto repl = m_attachment->createReplicator(&localStatus);
localStatus.check();
m_replicator.assignRefNoIncr(repl);
m_replicator = repl;

fb_assert(!m_sequence);

Expand Down Expand Up @@ -416,8 +416,17 @@ namespace

void shutdown()
{
m_replicator = nullptr;
m_attachment = nullptr;
FbLocalStatus localStatus;
if (m_replicator)
{
m_replicator->close(&localStatus);
m_replicator = nullptr;
}
if (m_attachment)
{
m_attachment->detach(&localStatus);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both m_replicator and m_attachment are RefPtr's and nullifying them causes a release() call which does the proper cleanup (similar to close/detach) under the hood. So I don't see what is actually fixed. What do I miss?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that for m_replicator close is called during release and it is not necessary to do this (I added it for the company with m_attachment, which is the problem).
There detach is not called during release, only destroy, in which requests, events, transactions, etc. are released and the handle is also removed, but the connection to the DB itself remain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. YAttachment::destroy() calls destroy2() which nullifies next which in turn calls release() for the underlying provider. So AFAIU the connection should be closed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right.
The problem occurs when using an external engine, even without a replication error.
It uses the context of the current replication connection and increases the reference counter, and when the main connection is nullified. In the JAttachment::release function the counter is not equal to 0 and freeEngineData is not called, in which the following functions should be called in the chain:

purge_attachment
release_attachment
dbb->dbb_extManager->closeAttachment
delete attInfo
~ExternalContextImpl
externalAttachment->release // In which the counter should be reduced and the external engine connection should be deleted.

This behavior occurs in RedDatabase, but I am sure that it will be the same in Firebird, since the code in this part is not different, but I cannot check, since there are problems with configuring jaybird for Firebird.

Calling the detach function on the main connection directly calls internalDetach and freeEngineData, which closes the connection and releases all resources.

Copy link
Member

@dyemanov dyemanov Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this smells like a problem to me -- the fact that the external engine confuses the reference counter of the object created by user, thus making the whole ref-count-based management unreliable.

Nevertheless, I don't mind fixing this particular issue the way you suggest. However, it seems to me that nullifying the reference (and thus possibly calling release()) is unnecessary after close()/detach(). I have a feeling that the code should look something like this:

			if (m_replicator.hasData())
			{
				m_replicator->close(&localStatus);
				m_replicator.clear();
			}

			if (m_attachment.hasData())
			{
				m_attachment->detach(&localStatus);
				m_attachment.clear();
			}

Could you please double-check this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reproduce the issue using sample UDR ?

Yes, I got a reproducible case with UDR on Firebird. I didn't think about it, thanks.

It is enough to setup asynchronous replication and add a procedure on the master:

create or alter procedure gen_rows (
  start_n integer not null,
  end_n integer not null
) returns (
  n integer not null
)
external name 'udrcpp_example!gen_rows'
engine udr;

In this case, the segment will be applied on the replica, but the connection will remain active, if after that add the segment again using the external engine, a second connection will appear, and so on.

And maybe these two pointers should not be RefPtr's at all, given they're used by a single object only and destroyed explicitly...

Remade without using RefPtr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reproduce the issue using sample UDR ?

Yes, I got a reproducible case with UDR on Firebird. I didn't think about it, thanks.

Could you provide it here ? With all necessary steps, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide it here ? With all necessary steps, please.

Setup replication.conf:

database = d:\repl\master.fdb
{
  journal_directory = d:\repl\journal
  journal_archive_directory = d:\repl\archive
  journal_archive_timeout = 10
}
database = d:\repl\replica.fdb
{
  journal_source_directory = d:\repl\replica
}

isql> create database 'd:\repl\master.fdb';

Copy DB to d:\repl\replica.fdb and execute:
gfix -repl read_only d:\repl\replica.fdb

On master DB:
isql> alter database enable publication;
isql> alter database include all to publication;

Connect to master DB and create procedure:

create or alter procedure gen_rows (
  start_n integer not null,
  end_n integer not null
) returns (
  n integer not null
)
external name 'udrcpp_example!gen_rows'
engine udr;
commit;

Disconnect or wait for archive segment 1.
Create new same procedure on master for create archive 2.

Copy segment 1 to d:\repl\replica\, wait applied the segment, connect to replica and execute:
isql> select count(*) from mon$attachments;
isql> commit;

got 4 attachments:

  • current attachment,
  • garbage collector,
  • cache writer,
  • dead applier attachment.

Copy segment 2 and wait applied, after execute same query and got 5 attachments: current, garbage collector, cache writer, dead applier attachment (x2).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Create new same procedure on master for create archive 2.

Do you mean recreate or create with a new name ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean recreate or create with a new name ?

I create it with a new name, but I think it can be recreated.

m_attachment = nullptr;
}
m_sequence = 0;
m_connected = false;
}
Expand All @@ -437,7 +446,7 @@ namespace

bool isShutdown() const
{
return (m_attachment == NULL);
return (m_attachment == nullptr);
}

const PathName& getDirectory() const
Expand Down Expand Up @@ -495,8 +504,8 @@ namespace

private:
AutoPtr<const Replication::Config> m_config;
RefPtr<IAttachment> m_attachment;
RefPtr<IReplicator> m_replicator;
IAttachment* m_attachment;
IReplicator* m_replicator;
FB_UINT64 m_sequence;
bool m_connected;
string m_lastError;
Expand Down
Loading